LangGraph for Enterprise Agent Development | Focused Labs

I've spent the last two years building agentic AI systems for enterprises. Coinbase, insurance companies, patent firms, healthcare startups, every engagement starts with a team that's excited about agents but unsure how to ship them. They've seen the demos and built the prototypes but, now they need something that meets the demand and reliability of scaled production.

We chose LangGraph. After shipping agents into production across a dozen different industries, it's the only framework that treats the hardest problems as first-class concerns. And, the LangChain partnership (which we have) came after the conviction that their stack will become a default for enterprise agents.

I do get asked "why?" Well maybe I can break it down a bit and help others understand why we are all in on LangChain.

Deterministic workflows, autonomous AI agents, oh my!

Most of an agent's work should (and can) be deterministic. I know that sounds backwards. The whole appeal of an agent is autonomy, the LLM should reason and act on its own.

Then you ship it, and within a week someone's asking why it sent a customer the wrong email. Or why it charged a credit card twice. Or why it hallucinated a policy that doesn't exist. When an agent processes an insurance claim, certain steps have to happen in a specific order with specific validations. And frankly most enterprise SOPs are deterministic workflows.

LangGraph lets you draw that graph explicitly. Deterministic workflows for the parts that need to be reliable. Conditional branching for the parts that benefit from LLM reasoning. And autonomous tool selection for genuinely open-ended work where you want the model to figure out the best path.

No other OSS agent framework gives you the flexibility. LangGraph handles the orchestration graph (the "what happens when" logic), and Deep Agents handle the open-ended thought work. A claims processing agent can follow a strict graph for intake, validation, and payout calculation, but spawn a Deep Agent to analyze an ambiguous photo of damage or interpret a claimant's free-text description. The deterministic parts stay deterministic. The parts that need intelligence get intelligence.

Most frameworks force you to pick one or the other. You get a rigid workflow engine or a fully autonomous agent. LangGraph gives you both in the same runtime, and that matters when you're building systems that touch production money and production customers.

Deep Agents: the harness for complex AI agent workflows

Deep Agents is the most interesting thing LangChain has shipped. It's an agent harness built on LangGraph that handles what engineers were hand-rolling for every project: task decomposition, context management through file systems, sub-agent spawning, auto-summarization when context windows get long, etc.

Before Deep Agents, most teams at Focused were building their own version of this. Custom planning loops. Bespoke memory management. Hand-tuned summarization chains. It worked, but it meant the first month of the engagement was rebuilding plumbing.

Now we use LangGraph for the deterministic orchestration layer and Deep Agents when we need an agent that can think through a multi-step problem. The combination is powerful. A supervisor graph routes work to specialized nodes, and some of those nodes are Deep Agents that can plan, use tools, spawn sub-agents, and manage their own context without blowing up the parent's context window.

The file system abstraction is worth calling out. Instead of cramming everything into the context window, Deep Agents offload working state to a virtual filesystem. In-memory for fast iteration, durable storage for production. The agent reads and writes files like a developer would, and auto-summarization keeps conversation history manageable during long-running tasks. Context engineering taken to its logical conclusion.

Agent observability with LangSmith the lynchpin

If you can't observe your agent, you can't trust your agent. I keep saying this because teams keep skipping it. Trust has to be a deployment blocker; a feature's not done if it's not observable.

Agentic AI systems are non-deterministic. The same input can produce different tool call sequences, different reasoning paths, different outputs. When something goes wrong... and it will... you need to understand exactly what happened. Which tools did the agent call? What was in the context at each step? Where did the reasoning diverge from what you expected?

LangSmith is built for this. It traces every step of a LangGraph execution. Every node transition, every tool call, every LLM invocation with full input and output. When a customer reports a weird agent behavior, I can pull up the trace and see that at step 7, the agent received ambiguous search results, chose tool X with parameters Y, got back Z, and that's where things went sideways.

The tight integration between LangGraph and LangSmith is by design. You set an environment variable and everything is traced. No instrumentation code, no custom logging, no stitching together disparate monitoring tools. The simplicity is key to getting teams to observe, evaluate and deploy every agent.

We wrote about why observability matters for AI systems after O11yDay NYC. The short version: agents without tracing are black boxes, and enterprises don't deploy black boxes into production. At least not for long.

AI agent evaluation is the engineering discipline nobody wants to do

Test-driven development was a niche practice for 20 years before it became mainstream. Eval-driven development is in the same spot right now for AI agent development.

Most teams shipping agents today don't have eval suites. They're moving fast, trusting vibes, and hoping the demo quality holds in production. It won't.

LangSmith treats evaluation as a central structure, rather than an afterthought. Offline evals let you benchmark against curated datasets before shipping. Online evals monitor production quality on live traffic. The two feed into each other: production anomalies become test cases, test cases validate fixes, and monitoring confirms the fix works in the wild. You remember, Red -> Green -> Refactor... Yeah it's that all over again.

What I appreciate about LangSmith's approach is that it acknowledges reality. LLM outputs are non-deterministic, so you can't assert exact correctness the way you would with traditional unit tests. Instead, you build evaluation rubrics: does the agent call the right tool? Does it use the right parameters? Is the output factually grounded? You measure quality on a spectrum, and you track whether it's improving or degrading over time.

We've started every agent engagement with evals since late 2024. The teams that invest in them early ship faster and with more confidence. Evals anchor the agent on user outcomes – increasingly important as development pace accelerates. The teams that skip them spend their time firefighting production issues they could have caught before deployment.

LangGraph as a hedge against foundation model lock-in

The foundation model landscape changes every three months. GPT-4 was king, then Claude 3.5, then Gemini had a moment, then DeepSeek came out of nowhere, then Claude 4. And that was just in the last year...

If your agent architecture is coupled to a specific model's API, you're rewriting code every quarter. LangGraph and LangChain abstract the model layer. Swap providers by changing a config value. Run different models for different tasks in the same graph. Use a cheap fast model for routing and a big model for complex reasoning.

We've swapped models mid-project because a new release outperformed what we were using. With LangChain's abstraction layer, that's a config change and an eval run to validate. Without it, that's a rewrite.

For enterprises who are making multi-year bets on agent infrastructure, model portability is risk management.

LangGraph is already deployed in the enterprise

Klarna, Uber, JP Morgan, and others are running LangGraph in production. The LangGraph Platform handles the specific challenges of stateful, long-running agent workflows: durable execution that survives failures, horizontal scaling, managed persistence.

When I'm talking to a CTO about agent infrastructure, being "at Fortune 500 companies" matters more than any feature list. These are production systems handling live traffic and live transactions.

We've built production agents at Coinbase, shipped an AI therapist with LangGraph, architected an agentic patent validation system, and built a multi-modal claims agent. In every case, LangGraph was the orchestration backbone.

Why Focused chose LangGraph for enterprise agent development

We've evaluated CrewAI, AutoGen, raw API calls with custom orchestration. All of them.

We went deep on LangGraph because it matches how we think about enterprise software: observability, testing, deterministic guarantees where they matter, and flexibility to evolve as the technology moves. The broader LangChain ecosystem provides all of that as integrated concerns, not bolt-ons.

Every custom AI agent we build at Focused follows the same pattern: deterministic LangGraph workflows for the predictable parts, Deep Agents for the parts that need genuine reasoning, LangSmith for observability and evals, and a model-agnostic architecture that lets us adapt as the landscape shifts.

The enterprise agent space is early. The teams who invest in engineering discipline, evals, observability, deterministic-where-possible architectures, will be the ones whose agents survive contact with production. Two years in, I'm more convinced of that than ever.

LangGraph for Enterprise Agent Development: Why We Built Our Entire Practice on It