Why Single Agents Have Limits
A well-scoped AI agent does one thing reliably: qualifies leads, handles customer support, processes documents. The narrower the scope, the more reliable the behaviour. We've watched that pattern hold up across every project we've shipped.
The limit shows up when a business process needs multiple capabilities working together. A customer contacts your company. The message might be a support issue, a sales opportunity, a billing question, or a product return — and the right response in each case is completely different. A single agent trying to handle all four ends up either unreliably broad, or so tightly constrained that it fails most of the time.
Multi-agent systems are how you get out of that bind. Instead of one agent trying to do everything, you have multiple specialised agents each doing one thing well, with a coordinating layer that routes work to the right place and assembles the results. The trade-off is more moving parts to maintain, which we'll get into.
The Core Architecture: Coordinator and Specialists
Every multi-agent system has two types of agents:
The coordinator (orchestrator). Receives incoming requests, figures out what type of task it is, dispatches to the appropriate specialist, and assembles the result. The coordinator doesn't do the work — it manages the workflow.
Specialist agents. Each one is optimised for a specific task: customer support, lead qualification, order processing, document retrieval. They receive structured inputs from the coordinator, do their task reliably, and return structured outputs.
This separation produces a system where:
- Each specialist can be optimised, tested, and improved independently
- New capabilities can be added by building a new specialist and updating the routing
- Failures in one specialist don't cascade to others
- The system scales by adding more specialists or running them in parallel
When to Build Multi-Agent vs Single Agent
Use a single agent when:
- The workflow is genuinely one type of task (all customer support, all lead qualification)
- The volume is manageable by one agent
- The edge cases are predictable and few
Use a multi-agent system when:
- A single entry point needs to handle genuinely different types of requests that require different capabilities
- A complex workflow has sequential steps where different expertise is needed at each step
- Different parts of the workflow have different reliability requirements or tool access needs
- You need parallel processing — multiple tasks happening simultaneously rather than sequentially
If you can solve it with a single agent and tighter prompts, do that first. Multi-agent is more powerful and more expensive to maintain — both true.
The Routing Layer
The coordinator's routing decision is the most critical component. Get routing wrong and the whole system feels broken even when every specialist is doing its job perfectly.
Classification-Based Routing
The coordinator uses an LLM to classify the incoming request into one of a defined set of categories. Each category maps to a specialist agent.
ROUTING_PROMPT = """
Classify this customer message into exactly one category:
- SUPPORT: Technical issues, product problems, how-to questions
- BILLING: Payment, invoices, subscription, charges
- SALES: Pricing, upgrades, new features, purchasing
- RETURNS: Refunds, returns, exchanges
- OTHER: Anything that doesn't fit the above
Message: {message}
Respond with only the category name.
"""
Classification routing works well when categories are distinct. It struggles when requests fall into multiple categories at once, or when users phrase things ambiguously — which they do, constantly.
Keyword and Rule-Based Routing
For high-confidence routing on specific triggers, rule-based routing is faster and more reliable than LLM classification. "Order #12345" routes to the order status agent. An email from a known partner domain routes to the relevant agent.
In practice, most production multi-agent systems we build use both: rule-based routing for high-confidence cases, LLM classification for everything else.
Confidence Thresholds
When the classifier is uncertain, the system should not route to whichever specialist won by 0.51 vs 0.49. Low-confidence classifications go to a generalised handler or escalate to a human rather than making a potentially wrong routing decision. This single rule prevents a lot of bad outcomes.
Sequential vs Parallel Execution
Sequential Pipelines
Some workflows are inherently sequential: step 2 depends on the output of step 1.
A document processing pipeline might work sequentially:
- Extraction agent: Extract structured data from the document
- Validation agent: Check the extracted data against business rules
- Routing agent: Determine which department the validated data should go to
- Notification agent: Send the appropriate notifications
Each agent receives the previous agent's output, processes it, and passes to the next. The coordinator manages the sequence and handles failures at each step.
Parallel Execution
When multiple tasks can run simultaneously without dependencies, parallel execution cuts latency dramatically.
A research workflow might run in parallel:
- Web search agent: Searches for recent news on the topic
- Database agent: Retrieves internal records related to the topic
- Document agent: Searches the knowledge base for relevant content
All three run at the same time. The coordinator waits for all of them, then passes their combined output to a synthesis agent that assembles the final response.
LangGraph handles parallel execution natively. Standard LangChain chains are sequential.
State Management
Multi-agent systems need careful state management. Each agent needs to know:
- What the original request was
- What previous agents have done
- What context is relevant to its task
- What it should return
Shared State Object
Pass a structured state object through the system that each agent reads from and writes to:
class AgentState(TypedDict):
original_message: str
customer_id: str
classification: str
classification_confidence: float
support_result: Optional[dict]
billing_result: Optional[dict]
final_response: Optional[str]
escalation_required: bool
escalation_reason: Optional[str]
Every agent receives this state, does its work, and returns an updated version. The coordinator reads the state to make routing decisions.
Memory Across Turns
For conversational multi-agent systems, each turn needs access to the conversation history. Store conversation history separately from the task state:
- Task state: The structured data flowing through the current workflow
- Conversation memory: The full history of the conversation for context
Mixing the two is one of the most common architectural mistakes we see. It looks fine at small scale and gets impossible to debug at any meaningful volume.
Error Handling and Fallbacks
Production multi-agent systems fail in ways single agents don't. A specialist might fail. The coordinator might misclassify. A parallel branch might time out while others complete. Every multi-agent system needs explicit error handling — assuming it'll work is how you end up with bizarre customer experiences nobody can reproduce.
Specialist failure: If the routed specialist fails, fall back to a generalised handler and log the failure. Never surface a technical error to the end user.
Timeout handling: Parallel branches have individual timeouts. If one branch times out, complete with the results that are available and note the gap.
Misclassification recovery: If a specialist receives a query it can't handle, it returns a structured signal indicating misclassification. The coordinator reroutes or escalates.
Circuit breakers: If a specialist fails repeatedly, stop routing to it and alert the operations team. A broken specialist producing errors at scale is worse than no specialist at all — at least with no specialist, the coordinator escalates cleanly.
LangGraph: The Right Tool for Multi-Agent Coordination
LangGraph's graph-based execution model is designed for multi-agent coordination. Nodes are agents or processing steps. Edges are routing decisions. The graph executor handles parallel execution, state management, and conditional routing.
from langgraph.graph import StateGraph, END
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("classifier", classify_request)
workflow.add_node("support_agent", handle_support)
workflow.add_node("billing_agent", handle_billing)
workflow.add_node("synthesiser", synthesise_response)
# Add routing
workflow.add_conditional_edges(
"classifier",
route_to_specialist,
{
"SUPPORT": "support_agent",
"BILLING": "billing_agent",
"OTHER": END,
}
)
workflow.add_edge("support_agent", "synthesiser")
workflow.add_edge("billing_agent", "synthesiser")
LangGraph handles the execution graph, parallel branches, state passing, and conditional routing — the scaffolding that would otherwise need significant custom engineering. It's not the only way to build multi-agent systems, but in our work it's been the path of least resistance for anything beyond two or three agents.
Testing Multi-Agent Systems
Multi-agent systems are harder to test than single agents because failures can happen at the routing layer, within any specialist, or at the assembly layer. Bugs in one place look like bugs in another.
Test each layer independently:
- Test the classifier on a diverse set of inputs to verify routing accuracy
- Test each specialist independently with the range of inputs it might see
- Test the full system end-to-end with integration test cases
- Test error handling by deliberately failing individual components
Document the expected routing decision for every test case. Routing changes are the most common source of regression in multi-agent systems — and the hardest to spot without explicit checks.
When Multi-Agent Systems Are Overkill
Not every complex use case needs a multi-agent system. Before reaching for the architectural complexity, verify:
- Is the routing genuinely necessary, or can a single agent handle the variety of inputs with better prompt engineering?
- Is the parallel execution performance gain actually worth the complexity?
- Does the team have the capacity to maintain multiple agents instead of one?
A well-designed single agent with clear scope and good escalation handling is more maintainable than a complex multi-agent system with unclear routing logic. We've talked clients out of multi-agent architectures more than once. Build multi-agent when the single-agent approach has clearly hit a ceiling — not before.
Talk to us about your architecture — we build and maintain multi-agent systems in production and can help you assess honestly whether you actually need one.