Multi-Agent Orchestration in LangGraph: Supervisor vs Swarm, Tradeoffs and Architecture
Build multi-agent systems in LangGraph using supervisor and swarm patterns. Compare routing accuracy, latency, and real production tradeoffs, with implementation details and failure modes you’ll actually hit.
Mar 25, 2026

A customer opens a support chat: "I want to upgrade my plan, but first I need help fixing my SSO, it's been broken since last Tuesday. Also, can you waive the setup fee?"
That message touches three domains: billing, technical support, and account management. A single-agent system either needs every tool and prompt crammed into one context window, or it punts the message to a human. Neither scales.
Multi-agent orchestration splits the work across specialist agents, each with its own tools, prompts, and expertise. The question is how you route between them. LangGraph gives you two patterns: Supervisor and Swarm. We built both for the same customer service scenario. Here's what we measured:
The supervisor is more accurate because routing is its only job, a dedicated LLM call with a focused prompt. The swarm is faster because it skips the intermediary. The right choice depends on whether your bottleneck is latency or misroutes.
The Two Multi-Agent Orchestration Patterns
Supervisor: A central orchestrator receives every message, classifies intent, and routes to the appropriate specialist. After the specialist responds, control returns to the supervisor, which decides whether to route again or end.
┌→ [Billing Agent] ─┐
[User] → [Supervisor] ──┼→ [Tech Support] ─┼→ [Supervisor] → [Response]
└→ [Account Mgmt] ─┘
Swarm: No central orchestrator. The first agent receives the message. If it can handle the request, it responds directly. If not, it hands off to the appropriate specialist using a Command — no return trip through a supervisor.
[User] → [Triage Agent] ──→ [Billing Agent] ──→ [Response]
│ │
├──→ [Tech Support] ─┘
└──→ [Account Mgmt]
The structural difference: in supervisor mode, every agent interaction passes through the supervisor (2 LLM calls per domain). In swarm mode, agents route directly to each other (1 LLM call per domain after the first). For a request that spans two domains, that's 4 LLM calls vs. 2.
Pattern 1: The Supervisor
The supervisor is a dedicated routing node that uses structured output to decide which specialist handles the next step. Build it from scratch with StateGraph — you want to see every routing decision in your traces.
State
import operator
from typing import Annotated, TypedDict
from langgraph.graph import MessagesState
class CustomerServiceState(MessagesState):
current_agent: str
resolution_notes: Annotated[list[str], operator.add]
current_agent tracks which specialist is active. resolution_notes uses the operator.add reducer so multiple agents can append without clobbering — same pattern as parallel sub-agents.
Specialist Agents
Three specialist agents, each with a focused system prompt and domain-specific tools. Using create_agent from langchain.agents — each is a self-contained tool-calling loop.
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
from langchain.agents import create_agent
from langsmith import traceable
llm = ChatAnthropic(model="claude-sonnet-4-5-20250929", temperature=0)
@tool
def lookup_billing_info(customer_id: str) -> str:
"""Look up billing information for a customer."""
return (
f"Customer {customer_id}: Enterprise plan, $2,400/mo, "
f"next billing date 2026-03-01, payment method: invoice."
)
@tool
def apply_discount(customer_id: str, discount_percent: int) -> str:
"""Apply a discount to a customer's account."""
return f"Applied {discount_percent}% discount to customer {customer_id}."
@tool
def diagnose_sso(customer_id: str, error_code: str) -> str:
"""Diagnose SSO integration issues."""
return (
f"SSO diagnosis for {customer_id}: Error {error_code} indicates "
f"SAML certificate expiration. Certificate expired 2026-02-04. "
f"Resolution: regenerate SAML certificate in IdP and re-upload."
)
@tool
def check_system_status(service: str) -> str:
"""Check the status of a service."""
return f"Service {service}: operational, 99.97% uptime last 30 days."
@tool
def lookup_account_details(customer_id: str) -> str:
"""Look up account details and plan information."""
return (
f"Customer {customer_id}: Enterprise plan since 2024-06, "
f"5 seats, primary contact: jane@example.com, "
f"account manager: Sarah Chen."
)
@tool
def update_plan(customer_id: str, new_plan: str) -> str:
"""Update a customer's plan."""
return f"Plan updated for {customer_id}: now on {new_plan}."
billing_agent = create_agent(
llm,
tools=[lookup_billing_info, apply_discount],
system_prompt="You are a billing specialist. Help customers with invoices, "
"payments, discounts, and plan pricing. Be precise with numbers. "
"Customer ID is 'C-1042' unless otherwise specified.",
)
tech_agent = create_agent(
llm,
tools=[diagnose_sso, check_system_status],
system_prompt="You are a technical support specialist. Help customers diagnose "
"and resolve technical issues. Provide specific remediation steps. "
"Customer ID is 'C-1042' unless otherwise specified.",
)
account_agent = create_agent(
llm,
tools=[lookup_account_details, update_plan],
system_prompt="You are an account management specialist. Help customers with "
"plan changes, upgrades, and account administration. "
"Customer ID is 'C-1042' unless otherwise specified.",
)
The Supervisor Node
The supervisor uses structured output to pick the next agent. No regex parsing, no string matching, the LLM returns a Pydantic model.
from pydantic import BaseModel, Field
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
class RoutingDecision(BaseModel):
next_agent: str = Field(
description="The next agent to handle the request: "
"'billing', 'tech_support', 'account', or 'DONE'"
)
reasoning: str = Field(description="Why this agent was chosen")
routing_llm = llm.with_structured_output(RoutingDecision)
@traceable(name="supervisor", run_type="chain")
def supervisor(state: CustomerServiceState) -> dict:
response = routing_llm.invoke([
SystemMessage(
content="You are a customer service supervisor. Analyze the "
"conversation and decide which specialist should handle "
"the next part of the request.\n\n"
"Available agents:\n"
"- billing: invoices, payments, discounts, pricing\n"
"- tech_support: technical issues, SSO, integrations, bugs\n"
"- account: plan changes, upgrades, account administration\n"
"- DONE: the customer's request has been fully addressed\n\n"
"If multiple domains are involved, handle them one at a time. "
"Route to the most urgent or blocking issue first."
),
*state["messages"],
])
return {"current_agent": response.next_agent}
Specialist Wrappers
Each specialist node invokes its agent and appends a resolution note for the audit trail.
@traceable(name="billing_node", run_type="chain")
def billing_node(state: CustomerServiceState) -> dict:
result = billing_agent.invoke({"messages": state["messages"]})
return {
"messages": result["messages"][-1:],
"resolution_notes": [f"Billing: {result['messages'][-1].content[:200]}"],
}
@traceable(name="tech_support_node", run_type="chain")
def tech_support_node(state: CustomerServiceState) -> dict:
result = tech_agent.invoke({"messages": state["messages"]})
return {
"messages": result["messages"][-1:],
"resolution_notes": [f"Tech Support: {result['messages'][-1].content[:200]}"],
}
@traceable(name="account_node", run_type="chain")
def account_node(state: CustomerServiceState) -> dict:
result = account_agent.invoke({"messages": state["messages"]})
return {
"messages": result["messages"][-1:],
"resolution_notes": [f"Account: {result['messages'][-1].content[:200]}"],
}
Graph Assembly
The routing function reads current_agent from state and directs traffic. After each specialist finishes, control returns to the supervisor for the next routing decision.
from langgraph.graph import StateGraph, START, END
def route_to_agent(state: CustomerServiceState) -> str:
agent = state.get("current_agent", "DONE")
if agent == "DONE":
return "end"
return agent
builder = StateGraph(CustomerServiceState)
builder.add_node("supervisor", supervisor)
builder.add_node("billing", billing_node)
builder.add_node("tech_support", tech_support_node)
builder.add_node("account", account_node)
builder.add_edge(START, "supervisor")
builder.add_conditional_edges(
"supervisor",
route_to_agent,
{
"billing": "billing",
"tech_support": "tech_support",
"account": "account",
"end": END,
},
)
builder.add_edge("billing", "supervisor")
builder.add_edge("tech_support", "supervisor")
builder.add_edge("account", "supervisor")
supervisor_graph = builder.compile()
Run it:
result = supervisor_graph.invoke({
"messages": [HumanMessage(
content="I want to upgrade my plan, but first I need help fixing "
"my SSO — it's been broken since last Tuesday. "
"Also, can you waive the setup fee?"
)],
"current_agent": "",
"resolution_notes": [],
})
for msg in result["messages"]:
if isinstance(msg, AIMessage):
print(f"Agent: {msg.content[:150]}...")
print()
Pattern 2: The Swarm
The swarm eliminates the supervisor. Agents hand off directly to each other using Command objects returned from handoff tools. When an agent calls a handoff tool, Command(goto=..., graph=Command.PARENT) tells LangGraph to navigate to a different node in the parent graph.
Handoff Tools
Each agent gets handoff tools for the agents it can transfer to. The tool returns a Command that updates the parent graph's state and redirects execution.
from langchain_core.messages import AIMessage, ToolMessage
from langgraph.types import Command
def make_handoff_tool(target_agent: str, description: str):
"""Factory that creates a handoff tool for transferring to another agent."""
@tool(f"transfer_to_{target_agent}")
def handoff(reason: str) -> Command:
"""Transfer the conversation to another specialist agent."""
return Command(
goto=target_agent,
update={"current_agent": target_agent},
graph=Command.PARENT,
)
handoff.__doc__ = description
return handoff
transfer_to_billing = make_handoff_tool(
"billing",
"Transfer to the billing specialist for invoices, payments, or discounts.",
)
transfer_to_tech = make_handoff_tool(
"tech_support",
"Transfer to technical support for SSO, integrations, or system issues.",
)
transfer_to_account = make_handoff_tool(
"account",
"Transfer to account management for plan changes or upgrades.",
)
Specialist Agents with Handoffs
Each specialist gets its own tools plus handoff tools for the other specialists. The triage agent has no domain tools — it only routes.
triage_agent = create_agent(
llm,
tools=[transfer_to_billing, transfer_to_tech, transfer_to_account],
system_prompt="You are a customer service triage agent. Analyze the customer's "
"request and transfer to the appropriate specialist. "
"Do not try to answer questions yourself — always transfer. "
"If multiple issues exist, transfer to the most urgent one first.",
)
billing_swarm_agent = create_agent(
llm,
tools=[lookup_billing_info, apply_discount, transfer_to_tech, transfer_to_account],
system_prompt="You are a billing specialist. Help with invoices, payments, and "
"discounts. If the customer has unresolved issues outside your "
"domain, transfer to the appropriate specialist. "
"Customer ID is 'C-1042' unless otherwise specified.",
)
tech_swarm_agent = create_agent(
llm,
tools=[diagnose_sso, check_system_status, transfer_to_billing, transfer_to_account],
system_prompt="You are a technical support specialist. Help with technical "
"issues, SSO, and integrations. If the customer has unresolved "
"issues outside your domain, transfer to the appropriate specialist. "
"Customer ID is 'C-1042' unless otherwise specified.",
)
account_swarm_agent = create_agent(
llm,
tools=[lookup_account_details, update_plan, transfer_to_billing, transfer_to_tech],
system_prompt="You are an account management specialist. Help with plan changes "
"and upgrades. If the customer has unresolved issues outside your "
"domain, transfer to the appropriate specialist. "
"Customer ID is 'C-1042' unless otherwise specified.",
)
Swarm Graph Assembly
The swarm graph wires each agent as a node with conditional routing. The key difference from the supervisor: after each agent runs, we check if it returned a Command (handoff) or an AIMessage without tool calls (done).
from typing import Literal
@traceable(name="triage_node", run_type="chain")
def triage_node(state: CustomerServiceState) -> Command:
result = triage_agent.invoke({"messages": state["messages"]})
return result
@traceable(name="billing_swarm_node", run_type="chain")
def billing_swarm_node(state: CustomerServiceState) -> dict:
result = billing_swarm_agent.invoke({"messages": state["messages"]})
return {
"messages": result["messages"][-1:],
"resolution_notes": [f"Billing: {result['messages'][-1].content[:200]}"],
}
@traceable(name="tech_swarm_node", run_type="chain")
def tech_swarm_node(state: CustomerServiceState) -> dict:
result = tech_swarm_agent.invoke({"messages": state["messages"]})
return {
"messages": result["messages"][-1:],
"resolution_notes": [f"Tech Support: {result['messages'][-1].content[:200]}"],
}
@traceable(name="account_swarm_node", run_type="chain")
def account_swarm_node(state: CustomerServiceState) -> dict:
result = account_swarm_agent.invoke({"messages": state["messages"]})
return {
"messages": result["messages"][-1:],
"resolution_notes": [f"Account: {result['messages'][-1].content[:200]}"],
}
def route_after_agent(
state: CustomerServiceState,
) -> Literal["billing", "tech_support", "account", "__end__"]:
messages = state.get("messages", [])
if messages:
last_msg = messages[-1]
if isinstance(last_msg, AIMessage) and not last_msg.tool_calls:
return "__end__"
current = state.get("current_agent", "")
if current in ("billing", "tech_support", "account"):
return current
return "__end__"
swarm_builder = StateGraph(CustomerServiceState)
swarm_builder.add_node("triage", triage_node)
swarm_builder.add_node("billing", billing_swarm_node)
swarm_builder.add_node("tech_support", tech_swarm_node)
swarm_builder.add_node("account", account_swarm_node)
swarm_builder.add_edge(START, "triage")
swarm_builder.add_conditional_edges(
"billing", route_after_agent,
["billing", "tech_support", "account", END],
)
swarm_builder.add_conditional_edges(
"tech_support", route_after_agent,
["billing", "tech_support", "account", END],
)
swarm_builder.add_conditional_edges(
"account", route_after_agent,
["billing", "tech_support", "account", END],
)
swarm_graph = swarm_builder.compile()
Run it:
result = swarm_graph.invoke({
"messages": [HumanMessage(
content="I want to upgrade my plan, but first I need help fixing "
"my SSO — it's been broken since last Tuesday. "
"Also, can you waive the setup fee?"
)],
"current_agent": "",
"resolution_notes": [],
})
for msg in result["messages"]:
if isinstance(msg, AIMessage):
print(f"Agent: {msg.content[:150]}...")
print()
Production Failures in Multi-Agent Supervisor Systems
Both patterns have their own failure modes. Some are shared, some are pattern-specific.
1. Routing Loops (Supervisor). The supervisor routes to billing, billing responds, the supervisor routes to billing again — the same question loops indefinitely. This happens when the supervisor's routing prompt doesn't account for "this agent already handled it." LangSmith traces show the loop clearly: the same supervisor → billing → supervisor → billing pattern repeating. Fix: include resolution notes in the supervisor's context so it can see what's already been addressed:
@traceable(name="supervisor_with_history", run_type="chain")
def supervisor_with_history(state: CustomerServiceState) -> dict:
notes = "\n".join(state.get("resolution_notes", []))
history_context = f"\n\nAlready resolved:\n{notes}" if notes else ""
response = routing_llm.invoke([
SystemMessage(
content="You are a customer service supervisor. Analyze the "
"conversation and decide which specialist should handle "
"the next part of the request.\n\n"
"Available agents:\n"
"- billing: invoices, payments, discounts, pricing\n"
"- tech_support: technical issues, SSO, integrations, bugs\n"
"- account: plan changes, upgrades, account administration\n"
"- DONE: the customer's request has been fully addressed\n\n"
"Do NOT re-route to an agent that has already handled its "
"portion of the request." + history_context
),
*state["messages"],
])
return {"current_agent": response.next_agent}
2. Context Loss on Handoff (Swarm). Agent A resolves part of the issue and hands off to Agent B. Agent B sees the original message but has no context about what Agent A already did. The customer gets asked the same questions again or, worse, contradictory advice. Fix: propagate resolution context through the handoff. The Command.update should include what was done, not just where to go.
3. Supervisor Bottleneck. At scale, the supervisor becomes a latency bottleneck and a cost center. Every single interaction requires a routing LLM call — even when the intent is obvious ("I need to change my password" doesn't need a routing decision). Fix: add a fast-path classifier using a smaller model or keyword matching for unambiguous intents:
FAST_PATH = {
"password": "tech_support",
"invoice": "billing",
"upgrade": "account",
"downgrade": "account",
}
def fast_path_or_supervisor(state: CustomerServiceState) -> dict:
last_msg = state["messages"][-1].content.lower()
for keyword, agent in FAST_PATH.items():
if keyword in last_msg:
return {"current_agent": agent}
return supervisor(state)
4. Swarm Ping-Pong. In the swarm pattern, Agent A doesn't know the answer and hands off to Agent B. Agent B also doesn't know and hands off back to Agent A. The conversation bounces between agents until a recursion limit kills it. Fix: track handoff count in state and set a hard limit:
class CustomerServiceState(MessagesState):
current_agent: str
resolution_notes: Annotated[list[str], operator.add]
handoff_count: int
Check the count before each handoff. After 3, force escalation to a human or fall back to a general-purpose agent.
5. Lost Messages During Handoff (Swarm). The handoff tool returns a Command with graph=Command.PARENT, but the messages from the specialist agent's internal tool-calling loop don't propagate to the parent graph. The customer sees the handoff but loses the specialist's response. Fix: ensure your Command.update includes the relevant messages. LLMs expect tool calls to be paired with ToolMessage responses — if you break that pairing, the next agent will see malformed conversation history and may error or hallucinate.
Observability
Multi-agent systems are hard to debug without per-agent tracing. The @traceable decorator on every node gives you isolated spans in LangSmith. Tag traces with the pattern type for A/B comparison:
from langsmith import tracing_context
with tracing_context(
metadata={"pattern": "supervisor", "agents_available": 3},
tags=["production", "multi-agent-v1"],
):
result = supervisor_graph.invoke({
"messages": [HumanMessage(content="Fix my SSO and waive the setup fee.")],
"current_agent": "",
"resolution_notes": [],
})
The three things to watch in LangSmith:
- Routing accuracy — open the
supervisorspan and check if the chosen agent matches the actual domain. Log misroutes as negative feedback. - Handoff chains — in the swarm, trace the full
triage → tech → billingpath. If it's longer than 3 hops, you have a routing problem. - Token waste on re-routing — the supervisor pattern doubles your token spend on routing calls. Track total tokens per pattern and compare.
Evals
Two evaluators: routing correctness (did the right agent handle the request?) and resolution completeness (were all parts of a multi-domain request addressed?).
from langsmith import Client
ls_client = Client()
dataset = ls_client.create_dataset(
dataset_name="multi-agent-routing-evals",
description="Multi-agent routing and resolution evaluation dataset",
)
ls_client.create_examples(
dataset_id=dataset.id,
inputs=[
{"question": "I need to change my payment method to a credit card."},
{"question": "My SSO integration is returning error code SAML-401."},
{"question": "I want to upgrade to Enterprise and also fix my broken SSO."},
{"question": "Can you tell me who my account manager is?"},
],
outputs=[
{"expected_agents": ["billing"], "must_mention": ["payment", "credit card"]},
{"expected_agents": ["tech_support"], "must_mention": ["SSO", "SAML"]},
{"expected_agents": ["tech_support", "account"], "must_mention": ["SSO", "upgrade"]},
{"expected_agents": ["account"], "must_mention": ["account manager"]},
],
)from langsmith import evaluate
from openevals.llm import create_llm_as_judge
ROUTING_QUALITY_PROMPT = """\
Customer query: {inputs[question]}
Expected domains: {reference_outputs[expected_agents]}
Agent response: {outputs[final_response]}
Resolution notes: {outputs[resolution_notes]}
Rate 0.0-1.0 on whether the correct specialist agents handled the request
and the response fully addressed the customer's needs.
Return ONLY: {{"score": <float>, "reasoning": "<explanation>"}}"""
routing_judge = create_llm_as_judge(
prompt=ROUTING_QUALITY_PROMPT,
model="anthropic:claude-sonnet-4-5-20250929",
feedback_key="routing_quality",
)
def resolution_coverage(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
"""Did the agents address all parts of the customer's request?"""
text = outputs.get("final_response", "").lower()
notes = " ".join(outputs.get("resolution_notes", [])).lower()
combined = text + " " + notes
must_mention = reference_outputs.get("must_mention", [])
hits = sum(1 for t in must_mention if t.lower() in combined)
return {
"key": "resolution_coverage",
"score": hits / len(must_mention) if must_mention else 1.0,
}
def agent_routing_accuracy(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
"""Were the correct agents invoked?"""
notes = " ".join(outputs.get("resolution_notes", [])).lower()
expected = reference_outputs.get("expected_agents", [])
hits = sum(1 for agent in expected if agent.lower() in notes)
return {
"key": "routing_accuracy",
"score": hits / len(expected) if expected else 1.0,
}
def supervisor_target(inputs: dict) -> dict:
result = supervisor_graph.invoke({
"messages": [HumanMessage(content=inputs["question"])],
"current_agent": "",
"resolution_notes": [],
})
return {
"final_response": result["messages"][-1].content,
"resolution_notes": result.get("resolution_notes", []),
}
def swarm_target(inputs: dict) -> dict:
result = swarm_graph.invoke({
"messages": [HumanMessage(content=inputs["question"])],
"current_agent": "",
"resolution_notes": [],
})
return {
"final_response": result["messages"][-1].content,
"resolution_notes": result.get("resolution_notes", []),
}
supervisor_results = evaluate(
supervisor_target,
data="multi-agent-routing-evals",
evaluators=[routing_judge, resolution_coverage, agent_routing_accuracy],
experiment_prefix="supervisor-v1",
max_concurrency=2,
)
swarm_results = evaluate(
swarm_target,
data="multi-agent-routing-evals",
evaluators=[routing_judge, resolution_coverage, agent_routing_accuracy],
experiment_prefix="swarm-v1",
max_concurrency=2,
)
Run both eval suites on every PR. The routing_accuracy evaluator is the canary — if it drops, your routing prompt or handoff logic regressed. Compare supervisor-v1 and swarm-v1 side-by-side in LangSmith to make the pattern decision with data instead of intuition.
When to Use This
Use the Supervisor pattern when:
- Routing accuracy is more important than latency
- You need a centralized audit trail of every routing decision
- Your domain boundaries are ambiguous (e.g., "billing" vs. "account" overlap)
- You're iterating on routing logic and want to change it in one place
Use the Swarm pattern when:
- Latency is your primary constraint
- Domain boundaries are clear and agents rarely misroute
- Requests often span multiple domains (the latency savings compound)
- You want agents to maintain conversational context through handoffs
Skip multi-agent orchestration when:
- You have fewer than 3 distinct domains
- Most queries are single-domain (a specialized single agent is simpler)
- You don't have per-agent evals — without them, you're debugging in production
The Bottom Line
The supervisor pattern is easier to reason about: one routing node, clear control flow, every decision visible in traces. The swarm is faster: no intermediary, direct agent-to-agent handoffs, fewer LLM calls.
Start with the supervisor. It's simpler to build, simpler to debug, and the routing accuracy advantage matters more than the latency penalty in most early deployments. Graduate to swarm when you have data showing latency is the bottleneck and your agents rarely misroute. Write the routing_accuracy eval before you write the second agent. And track handoff count — a multi-agent system without a recursion guard is a production incident waiting to happen.
Technical References:
