Multi-agent is becoming the new overengineering

You’re not building agents. You’re building workflows (and that’s fine)

Multi-agent is becoming the new overengineering

Watch the video!

SECTION 1: WHAT PEOPLE CONFUSE

It’s almost 2026, and words like workflows, agents, tools, and multi-agent systems are still getting mixed up. Beyond terminology, this has also led to overengineered solutions. Fortunately, we just created a cheatsheet for you to use to understand what you need to build, which I’ll share shortly. But first, let’s start by clarifying two things that trip people up.

First, not every LLM application is an agent. The key difference is autonomy. In a workflow, you control the flow, you decide the steps and their order. In an agent, the model controls the flow; it decides what to do next based on the goal you give it. If you can write down the exact sequence of steps in advance, you’re building a workflow, not an agent. Second, and this is where many people get confused, tools are not agents. A tool is a capability, such as a calculator, a database query, a web browser, a validator, or an API call. An agent is the decision maker who chooses which tools to use and when. So if someone tells you they built a “multi-agent system” but it’s actually one model calling ten different APIs, that’s not multi-agent at all. That’s a single agent with ten tools. This distinction defines how you architect, debug, and scale your system. It drives the core architecture choice: a workflow, a single agent with tools, or multiple agents.

SECTION 2: THE SPECTRUM (THE MENTAL MODEL)

This slider or spectrum makes the architecture choice easy to navigate. Think of it as a spectrum of complexity, and your goal is to stay as far left as possible while still solving your problem.

Level one is workflows, where you chain multiple LLM calls together. Level two is a single agent with tools, where the model makes decisions about what to do next. Level three is multi-agent systems, where you have multiple decision makers that need to coordinate with each other. The core principle is this: move right on this spectrum only when you absolutely have to. Each step to the right increases your costs, your latency, and your debugging complexity. More LLM calls means more tokens, more traces to follow, and more places where things can go wrong. The best AI systems are the simplest ones that reliably solve the problem. That usually means starting with workflows.

SECTION 3: WHEN A WORKFLOW IS THE RIGHT ANSWER

Workflows are the right answer when your steps are known and stable. If the process is largely the same each time, regardless of input, a workflow is almost always the best choice because it is predictable, easy to test, easy to debug, and much cheaper than agent-based approaches. You can write unit tests for each step, trace exactly what happened when something goes wrong, and you’re not burning tokens on the model to figure out what to do next. Consider a support ticket system. A ticket comes in. You classify it, route it to the right team, draft a response from templates and context, validate it against policy, and then send it. Each of these steps might involve an LLM call, but the model doesn’t need to decide whether to classify before routing; that’s always the order. This is a workflow, and building it as an agent would add overhead without adding capability.

SECTION 4: WHEN A SINGLE AGENT + TOOLS WINS

Sometimes the order of work is not fixed, and you genuinely can’t write down the steps in advance. Not because the task is impossibly complex, but because the path changes depending on what you discover along the way. Maybe the first API call fails, and you need to try an alternative. Maybe the data you retrieve is incomplete, and you need to ask for clarification. This is what agents handle well. But here’s the rule: start with one agent. A single agent with tools works best when tasks are tightly coupled and mostly sequential, when global context matters because step one affects step five, when you need fewer tools, and when budget or latency constraints push you toward minimal overhead.

Think of a marketing platform company that wants AI-assisted content generation for emails, SMS, and promotional messages. Their initial spec called for a multi-agent setup with a long list of specialized agents. An orchestrator. Request analysis. Content generation. Structure generation. Syntactic validation. Semantic validation. Spam prevention. Optimization. Security. Scoring. HTML normalization. Migration agents. Even comparison and analysis agents. On paper, it looked clean, specialists doing specialist work. But here, a single agent will work much better because the tasks are tightly coupled and sequential. Template choice affects content, personalization depends on both the content and the contact data, and validation depends on the final output. Splitting that across multiple decision makers creates information silos and handoff errors because each agent only sees part of the picture. They also didn’t need parallelism. The flow was plan, generate, validate, and fix if needed. A tool can have its own system prompt and even use a different model specialized for that task. The validation tool can use its own LLM with instructions to catch errors. The personalization tool can have a validator and field reference system, so the model pulls correct syntax instead of inventing it. The SMS generation tool can treat character limits and spam-word detection as deterministic engineering constraints, not as prompting problems. You still get specialists, but you keep one brain to maintain context and make final decisions. The result is a system that is faster to build, cheaper to run, and easier to debug, with the same capabilities, but without the coordination overhead.

SECTION 5: THE TOOL COUNT PROBLEM

As the tool list grows, tool selection gets harder, which is one of the main ways agent systems quietly break down, and one of the clearest signals that splitting into multiple agents might be worth it. Every tool you give an agent has a name, a description, and a schema that the model needs in context to use correctly. So the more tools you add, the more of your context budget you burn before the agent even starts thinking about your actual task. System instructions, few-shot examples, retrieved documents, and conversation history take up space, too. That’s why a single agent tends to work best when your tasks need fewer than 10 to 20 tools. Past that threshold, tool selection degrades because the agent has to choose among too many options in a context that’s already packed. If you’re above that threshold, managing context can only reduce history and retrieved content, not the tool schema load. The only approach that actually reduces how many tool definitions the model sees per call is splitting across agents. If one agent sees only email tools and another only sees validation tools, each call stays smaller, and tool selection gets easier. This is often the real driving force pushing anyone toward multi-agent architectures. Once you’re splitting tools across agents to keep calls small, you’re in multi-agent territory.

SECTION 6: WHEN MULTI-AGENT IS ACTUALLY CORRECT

Multiple agents are justified for a few specific reasons. The first reason is true parallelism. If tasks are genuinely independent and you need them running simultaneously, multiple agents help. The second reason is when context is overloaded with instructions, tools, retrieval, and history to the point where performance degrades, specialist agents operating in smaller, focused contexts can be the right move. The third reason is modularity or external integration: if you need to connect with third-party agent systems you don’t control, or with reusable, self-contained components. The fourth is hard separation requirements like security boundaries, compliance isolation, or sensitive data handling. When we built our article generator for technical content, we started out to build a single agent for research and writing. But had to pivot because the research phase is exploratory and dynamic, and the writing phase is constrained and deterministic. Research needs flexibility and broad tool access across web search, YouTube transcription, GitHub scraping, and document processing. Writing needs focused constraints, consistent style enforcement, and iterative refinement against fixed rubrics. So we had to pivot to a multi-agent system and ended up with two distinct agents: a research agent and a writer agent.

In the multi-agent system, the research agent searches, reads, pivots based on what it finds, searches again, and iterates based on human feedback about which directions to pursue. And the writing agent follows style guides and formatting rules, with validation loops checking each section against specific criteria. The agents communicate through explicit artifacts; the research agent produces a structured research.md file that the writer agent consumes as context. No complex runtime orchestration, just a sequential handoff with a clear contract between them. Each agent has its own optimized context without the bloat of carrying the other’s tools and instructions. If you do go multi-agent, the pattern that usually works isn’t everyone talking to everyone; it’s the orchestrator-worker pattern. One orchestrator maintains the main context and delegates specific tasks to worker agents, then synthesizes results. This prevents the information silos that kill multi-agent systems. Multi-agent systems can simplify individual contexts and enable parallelization and specialization, but they increase coordination costs. More token usage, added latency, more failure points, and handoff complexity. Only accept those costs when you’ve hit a real constraint that simpler architectures can’t solve.

And as promised, we have made this content and more into a complete cheatsheet to help you with what you need to build in each scenario: from simple workflows to multi-agent systems. Just download it for free on my website: links.louisbouchard.ai.

Thanks for reading. I’ll see you next time.