The 12 Questions That Decide Your AI Architecture

A cheatsheet to avoid costly rework in agent systems.

The 12 Questions That Decide Your AI Architecture

Watch the video!

Most AI projects fail before implementation begins. Teams select architectures based on trends rather than requirements, choose frameworks without evaluating alternatives, and skip the scoping conversations that determine success.

I’m Louis-Francois, co-founder and CTO at Towards AI. In this article, I’ll walk through our decision-making process using two real builds: a single-agent system for marketing content generation and a multi-agent pipeline for article writing. Both projects required different architectural choices based on their constraints, and both delivered working systems. I’ll also share a cheatsheet we made for you to use to understand when and what to build in this new agentic era.

By the end, you’ll know which questions to ask to design AI agent systems and prevent architectural rework mid-project.

Let’s start with the first question that changes everything. It’s not about agents, models, or tools. It’s about scope.

Understanding what the client actually needs

It’s simpler: what does the client actually want?

That sounds obvious, but here’s what happens in practice. A request like “AI-powered marketing content generation” isn’t a deliverable. You still need to pin down what success looks like: do they want a production feature, an integration, a prototype, or a handoff their team will productionize? And you need the hidden requirements early: demo cadence, documentation, and how much you’re expected to explain your design choices.

That’s exactly what happened in our CRM project. The initial ask sounded like a chatbot product, but the real deliverable was proof-of-concept Python code across a set of scenarios, plus weekly demos and clear documentation so their team could implement it later.

And even for our internal article writing system, the same scoping discipline mattered. We didn’t need a polished UI. We needed low cost, fast iteration, trustworthy outputs, and an easy way to add human feedback because we were running it from an IDE and optimizing for speed and quality, not product packaging.

Matching architecture to task shape

Once you’ve nailed the scope, you earn the right to talk architecture. The rule is simple: task shape dictates structure. Don’t start from “multi-agent or not.” Start from how the work actually unfolds. Here’s the decision framework. If your tasks are predictable and linear (step A, then B, then C, with consistent reasoning throughout), use a workflow or a single agent. If the tasks are divergent but stay within one domain, where the reasoning style is similar, even if outputs vary, a single agent with specialized tools is usually enough. Multi-agent systems earn their complexity when the work contains fundamentally different modes that don’t mix well in one loop, especially when one phase needs exploration, and another needs strict constraints. As a practical ceiling: if you’re within about fifteen well-scoped tools, a single agent is still manageable. Beyond that, consider splitting by domain. You can see this contrast in our two builds. For the CRM marketing system, the work was sequential, and the reasoning stayed consistent across outputs. Whether it’s an email, SMS, or push notification, the agent follows the same logic: understand the brief, stick to the format, respect constraints. That made a single agent with a well-organized toolset the cleanest starting point. For the article writing system, the task shape splits into two different modes. Research is exploratory: search the web, evaluate sources, decide if you have enough, pivot based on what you find. Writing is constrained: follow style guides, enforce formatting rules, and maintain a consistent voice. So we split them into two agents with a simple handoff, and we didn’t add an orchestrator because the actual usage pattern was naturally sequential.

Keeping agents thin and tools heavy

Now, even with the right architecture, you can still build the wrong system if the agent is doing too much. Once you’ve decided on your architecture, the next question is: how should you structure the work between your agent and its tools?

A common mistake is putting implementation logic inside the agent’s reasoning loop. When that happens, the agent spends tokens doing low-level work instead of making high-level decisions.

The guideline we follow is: thin agent, heavy tools. The agent reasons, plans, and decides which tool to call. Tools execute the actual work.

This separation matters for three reasons. First, debugging: when something breaks, you know immediately whether it’s a reasoning problem or an execution problem. Second, reusability: well-designed tools can be shared across agents or projects. Third, maintainability: other developers can add new tools without touching the agent’s orchestration logic.

What makes a good tool? Each tool should do one job well, return structured outputs, and handle its own error cases. If a tool fails, it should return specific feedback the agent can act on, not vague text. And whenever you can enforce rules deterministically in code, do that in the tool rather than asking the LLM to remember constraints.

That’s what we did in the CRM marketing system. The agent orchestrated three categories of tools: retrieval tools for customer data and documentation, generation tools for creating content, and validation tools for checking character limits and template syntax. The agent decided what to generate and when to validate. The tools handled the mechanics.

In the article writing system, the research agent relied on tools for web search, scraping, transcription, and extracting code from repos. The writer agent relied on tools for formatting, diagrams, and structure. Both agents stayed focused on reasoning while the tools handled the complexity.

Selecting an orchestration framework

Once tools are in place, the next decision is whether you need a framework to run the loop or whether that’s overkill. Even a single agent needs orchestration. You need loops for planning, tool selection, iteration, error handling, and context management. The real decision is whether to build that yourself or use an existing framework. Here’s how to think about the options. If you need complex state management (checkpointing, branching execution paths, resuming from failures), LangGraph is built for that. If you need role-based multi-agent coordination with defined handoffs, CrewAI fits. If you need a straightforward agent loop with tool calling, LangChain, or a lightweight custom implementation works. And if the overhead of any framework isn’t worth it, build from scratch. The rule is: don’t pay for features you don’t need. Orchestration also looks different depending on whether you’re building a single-agent or multi-agent system. For a single agent, orchestration is about the internal loop: how it plans, selects tools, handles errors, and manages context. For multi-agent systems, you add a coordination layer on top: how agents hand off to each other, whether they share state or stay isolated, and who decides when one agent is done and another should start. That coordination layer is where complexity accumulates, so avoid it unless the task genuinely requires it. That’s what we did in the CRM marketing system. We evaluated all the main options, but the flow was straightforward: user sends a request, agent plans, calls tools, generates content, validates, and returns the result. A simple agent loop covered everything we needed. Building from scratch was tempting, but a lightweight framework gave us reliable patterns for tool integration and error handling. In the article writing system, we had two agents but kept coordination minimal: a simple handoff, no orchestrator. Research worked as a repeatable recipe without a heavy state. Writing benefited from state management because we wanted to save versions after each revision so users could roll back. The research agent outputs a file, and the writer agent reads it. Each agent managed its own orchestration internally.

Selecting models by task difficulty

Alright! Architecture, tools, orchestration. Now comes the part everyone jumps to too early: model choice. The real answer is that it depends on the task. Don’t default to the largest model everywhere, and don’t use the cheapest model everywhere just to save costs. Match the model’s capability to the step difficulty. In practice, you can group steps into tiers. Planning, evaluation, and judgment tasks usually benefit from stronger models because they require consistent reasoning. Narrow execution steps (generating short-form text, cleaning scraped pages, simple transformations) can often run on cheaper models as long as quality stays acceptable. Test the cheaper model first. Upgrade only when quality demands it. In the CRM marketing system, we used stronger models for orchestration and evaluation, and cheaper models for routine generation, like SMS and short emails. In the article writing system, we used stronger models for source selection and writing, and cheaper models for cleanup steps.

Determining whether you need RAG

Once the model is chosen, there’s one more piece people treat like a default: retrieval. Which brings us to the next question: do you even need RAG? Retrieval-Augmented Generation is powerful, but retrieval is not always the right tool. The real question is: do you need external data at generation time, and if so, what type of data is it? Here’s the decision tree. If you have large amounts of unstructured text (documentation, examples, policies, past outputs) and you need to pull relevant snippets at runtime, that’s a retrieval problem. Use embeddings and vector search. If you have structured records (customer data, product catalogs, transaction histories) that’s a query problem. Use SQL or an API to fetch exactly what you need. And if your reference material fits in the model’s context window and you need consistency across documents, just load it directly. In the CRM marketing system, we needed both approaches. For unstructured sources like documentation and campaign examples, we used retrieval with embedding search. For structured data like customer records and product information, we used SQL queries. In the article writing system, neither agent used RAG. Research outputs were written into a notes file and passed directly to the writer. For style guidelines and example articles, we had a small curated set (maybe fifty thousand tokens total) that fit comfortably in the context window. We loaded it directly instead of building a retrieval pipeline.

Building validation loops

At this point, the system can run, but “it runs” is not the same as “it’s reliable.” Here’s where a lot of teams stop. They’ve got their architecture, their models, their data pipeline. But they skip the part that actually makes the system reliable.

So the real question becomes: how do you make output quality non-negotiable? You cannot hope the LLM gets it right on the first try. If outputs matter, you need explicit checks and a structured way to fix failures. The rule is: build generate-validate-fix loops with actionable feedback. Validation should be a gate with specific failure reasons that the system can act on, not a vague quality score. Here’s what that looks like in practice. First, check hard constraints: length limits, syntax validity, required fields, and format compliance. These checks are fast and deterministic. Then layer on softer checks: tone adherence, style consistency, factual accuracy. These often require an LLM-as-judge approach with clear rubrics. When something fails, don’t retry blindly. Pass specific feedback back to the agent: “Too long by fifteen characters,” “Syntax error on line three,” “Tone is too formal for this audience.” The agent regenerates using that feedback. Loop until checks pass or you hit a retry limit. A second guideline: plan your human-in-the-loop checkpoints deliberately. Decide upfront where a person should review outputs before the system moves forward, especially before expensive steps or irreversible actions. Design it in from the start. In the CRM marketing system, we validated character limits for SMS, template syntax for their proprietary format, and tone against brand guidelines. Each validation tool returned specific feedback that the agent could use to fix problems. In the article writing system, validation was more granular. We checked each section for structure, narrative flow, citations, grammar rules, formatting requirements, and vocabulary constraints. We also built in human checkpoints: the research agent pauses to ask if the user wants more sources, and the writer saves state after each revision so users can roll back.

The Decision Framework: Twelve Questions

So if you step back, you’ll notice a pattern: we weren’t making “AI choices.” We were making trade-offs. These were some of the key questions we asked on these projects. There are more, of course, but these are the big ones that shaped our architecture and implementation, and these walkthroughs should give you a solid framework to start your own projects.

To make it easy for you, we have categorized our learnings from these and other projects into a checklist of the following questions that you can ask yourself at the start of any AI project to decide on the right architecture and tools.

I’ll group them into four categories.

First, understanding the task.

Q1: Is your task shape sequential or branching? Sequential tasks fit workflows; branching tasks need agents.

Q2: Is your reasoning exploratory or deterministic? Exploratory reasoning needs flexibility, deterministic reasoning needs constraints.

Second, system design.

Q3: How many tools do you need? If it’s more than twenty, consider splitting into multiple agents or splitting tools by domain.

Q4: Do you need internal or proprietary data? That’s your RAG decision.

Q5: Do you need a persistent state? If yes, you need a framework like LangGraph. If not, a simple script is enough.

Third, quality and constraints.

Q6: Do your outputs need validation loops or quality gates?

Q7: How much human-in-the-loop do you need, and where should those checkpoints be for reviewing plans, research, or drafts?

Q8: Do you have evaluation data? If yes, build automated evals. If no, start collecting examples and human ratings before over-engineering.

Fourth, operational constraints.

Q9: What are your latency tolerances? Tight latency means fewer agent hops, smaller models, and more workflows. Loose latency means you can afford deeper reasoning, multi-agent coordination, and more validation.

Q10: What’s your budget per task? Low budget means cheaper models, fewer tool calls, and more caching. A higher budget means stronger reasoning models and more reflection.

Q11: How will you do observability? Decide where logs and traces live. Use a tool like Opik, or at a minimum, structured logging for every run. No observability means you’re flying blind.

Q12: Can your problem be decomposed cleanly into distinct competencies? If yes, multi-agent or multi-workflow. If no, single agent or single workflow.

Same twelve questions. Different answers. Different architectures.

And as promised, we have made this content and more into a complete cheatsheet to help you with what you need to build in each scenario: from simple workflows to multi-agent systems. Just download it for free on my website: links.louisbouchard.ai.

Applying the Framework

So how do you actually use this framework?

Don’t treat it like a form where you fill in all twelve answers in order. Use it as a thinking tool. Start with a few questions, follow where the answers lead, and revisit them as the project evolves. You might begin with a single agent and later realize you need to split it. You might start with a workflow and later realize you need more flexibility. That’s normal. These aren’t rigid rules; they’re guidelines that help you make tradeoffs deliberately.

And one habit matters more than most teams expect: document your decisions. Don’t just document what you chose, document why. When someone asks, “Why are we using this framework instead of that one?” you should be able to answer in terms of task shape, state needs, tool complexity, quality requirements, and operational constraints. That documentation also helps new team members onboard and helps you remember the reasoning behind the build.

If you want to go deeper into how we build these systems, we teach all of this in our courses at Towards AI. The point is not just theory; it’s hands-on work where you build the systems, deploy them, and learn the trade-offs by actually running into them.