If you’re anywhere near the AI application development world, whether you’re an engineer, a product manager or a CEO, I’m sure that you’ve heard about multi-agent systems. But why? Why is there so much hype around multi agent systems? Are they really that good? If they are, what are they good at? And do I need it in my business or app?
In my case, it was all about scalability. Let me explain. I’ve been working for the last couple of months in a chatbot based application, and even though it’s been getting better with regular prompt engineering + evals, we still had an issue: prompts were getting bigger and bigger, and work on further improving the prompt was returning less benefit over time. Also, we wanted the chatbot to get better at certain specific things, which meant adding much more prompt to the already long existing one.
So, we decided to start looking into multi-agent architectures, with the hope that it would help us build agents that are really good at one thing, without having astronomically long prompts. After implementing our first architecture with this approach, I want to share some learnings with you.
Learning #1: names (swarm, supervisor, etc) are just names. Build whatever you want!
The two most popular multi-agent architectures you will hear about are: Supervisor and Swarm.
In the first one, you have one agent (the supervisor) that takes care of taking the user’s input, and then deciding whether to respond directly to the user, or to hand off the conversation to a subagent. Once the subagent is done (this can mean generate a response or do things like search the web), the conversation goes back to the supervisor which generates the final response.
On the other hand, swarm architectures consist of a group of independent agents that are aware of each other, and work together as a whole. When a user sends a message, it goes to the default agent, and this agent will decide if it should respond or handoff to a different agent. If it hands it off to a different agent, that agent will generate a response and send it to the user. Next time a message comes, it will go to the latest active agent, and the process begins again.
So: supervisor or swarm? Which is right for me? The answer is: why not both?
At the end of the day names don't matter. What matters is developing an application that suits your use case and adds business value. In my case, I needed output that was both consistent and checked before reaching the user, which pointed toward using a supervisor. But I also needed the application to be fast, which is where a swarm shines, since multiple agents can run in parallel with minimal latency.
To satisfy both requirements, I built a swarm of two regular agents managed by a supervisor agent. The supervisor coordinates the agents and ensures a final quality check without adding significant delay. This setup gives me fast parallel responses while still providing a reliable review step before the output reaches the user.
Learning #2: evals are key, especially in multi-agent architectures.
When you have an app with only one LLM that’s generating responses and you make changes to the prompt and want to see if the responses improved, you can simply chat with your application and see how it performs. This works, but isn’t scalable.
As your model evolves, you’ll want it to pick up new skills without forgetting the old ones. Early on, you might be able to manually track whether it’s doing the right thing, but that doesn’t scale. Over time, it becomes harder to remember what “good performance” even looked like. That’s where evals come in: they’re an automated way to measure and monitor how well your model is doing across both new and existing capabilities.
This is even more true when you start adding more than one agent to the equation. You start talking to the chatbot and all of a sudden you don’t even know which agent is generating the response you’re seeing!
Evals allow you to:
- Measure the overall system performance and behavior
- Test and improve each agent in an isolated way
- Fine-tune the handoffs, ensuring each agent is invoked when they should be or not
- Be able to iterate quickly with confidence
Learning #3: Let your use case drive the architecture—not the other way around
This is similar to what I mentioned in Learning #1, and honestly, it’s not a new challenge in software. It’s been happening forever. What’s different with AI is the sheer speed of change. Things are moving so fast that it often feels like we’re constantly scrambling to keep up with the latest development.
It’s easy to get swept up in the hype of new tools, frameworks, or architectures. We’ve all been there. I’ve been there too, which is exactly why I decided to write this post.
If I could go back just three weeks and give myself some advice, it would be this: Before implementing anything new, take the time to deeply understand what you’re actually trying to achieve. What’s the problem? Why does it matter? How will solving it improve the system?
Once you have that clarity, you can start exploring your options. But now you’ll have a clear goal to guide you, which makes it much easier to filter out distractions and focus only on what fits your needs.
Conclusion
AI development is moving pretty quickly, and it does feel like we’re trying to surf a tsunami. The whole industry is experimenting and learning about these technologies, no one has it all figured out.
In my experience, it’s best to focus on fundamentals:
- Understand your use case deeply
- Evals, evals, evals
- Don’t get hung up on labels—build what fits your problem and adds the most business value
When you focus on the building blocks instead of the hype, you learn faster, adapt quicker, and end up with a system that actually works for your product.
Curious how we design multi-agent systems that actually hold up in production?
Learn more at https://focused.io/langchain