Building a Production AI Therapist with LangGraph

The Challenge

A mental health startup set out to build an AI therapist that actually feels therapeutic. Their goal was not a chatbot that answers questions, but a system capable of delivering structured therapy sessions grounded in real clinical practice.

The team had already begun development with an offshore vendor, but the early prototype fell short. The system behaved like a typical AI chatbot. Responses felt robotic, conversation flow lacked structure, and sessions did not resemble real therapy.

For the startup’s clinician team, that was a problem. They had spent years defining what good therapy looks like. They needed an AI system that could reflect those principles. The system needed to build trust, guide conversations intentionally, and incorporate techniques from cognitive behavioral therapy and humanistic therapy.

They also needed confidence that the system could safely evolve. The existing prototype had no evaluation framework in place to measure whether outputs aligned with clinical expectations.

The company partnered with Focused to redesign the system architecture, improve the conversational experience, and bring the application to production.

Rebuilding the System as a Multi-Agent Architecture

Focused rebuilt the system using LangGraph to orchestrate a structured multi-agent architecture.

At the center of the graph sits a supervisor agent called V, which manages the overall therapy session. V determines how the conversation progresses. It decides when to ask reflective questions, when to introduce a therapeutic exercise, and when to pause and allow the user space to process.

Instead of relying on a single monolithic prompt, the system is composed of multiple specialized nodes. Each node handles a specific responsibility within the therapy workflow. These agents work together within the graph to produce the final response.

This architecture allowed the team to break complex conversational behavior into manageable components. Prompts define how the language model makes decisions, while code handles tool usage and session logic. Together they create a system that behaves more like structured software than a simple chat interface.

Humanizing the Conversation

One of the key design challenges was making the system feel natural and empathetic.

To address this, Focused implemented humanizer nodes at the final stage of the graph. Before a response reaches the user, it passes through a rewrite layer that adjusts tone, pacing, and language so the message feels more human.

Separating the decision of what to say from how to say it gave the team precise control over the conversational experience. Clinical logic could evolve independently from the stylistic layer that makes responses feel personal and supportive.

The result is a system that maintains structure while still sounding natural in conversation.

Modeling Real Therapy Sessions

To help the agent behave more like a therapist, the startup’s clinical team provided transcripts of real therapy sessions that had appropriate permissions for use.

Focused used these transcripts to model how therapy sessions actually unfold. The conversations helped shape the structure of the agent graph and informed how the supervisor agent navigates different conversational states.

Rather than relying purely on generic prompting, the system learned the pacing and structure that experienced clinicians use when guiding sessions.

Therapeutic Tools Within the Session

The architecture also allows the agent to call tools during conversations.

For example, if a user appears to be experiencing anxiety or panic, the system can offer a guided breathing exercise such as Box Breathing. The agent walks the user through the technique step by step and may follow up later in the session to check whether the exercise helped.

This ability to introduce structured interventions and revisit them later mirrors how human therapists track progress during a session.

The LangGraph architecture makes these tools explicit components of the system rather than ad hoc responses generated by the model.

Building Clinical Evaluations

Standard LLM evaluations focus on fluency or coherence. Those metrics are not sufficient when building an AI therapist.

Focused implemented custom eval pipelines using LangSmith to measure whether responses aligned with the startup’s clinical framework.

These evaluations allowed the team to assess qualities such as:

• Appropriate use of therapeutic techniques
• Empathy and supportive tone
• Proper introduction of therapeutic tools
• Safety and clinical alignment of responses

This evaluation layer allows prompt changes, architecture adjustments, and model upgrades to be tested against clinical criteria before reaching users.

Infrastructure Optimization

Focused also redesigned the system’s infrastructure.

The previous architecture was running on infrastructure that cost roughly $20,000 per month. Focused migrated inference to Groq, running Llama models for fast, efficient inference.

The result was a dramatic improvement in both performance and cost efficiency. Infrastructure spend dropped to approximately $1,000 per month, a 95 percent reduction, while maintaining fast response times.

The Outcome

With the new architecture in place, the company moved from a stalled prototype to a production-ready AI therapist.

The system now delivers structured therapy sessions guided by a supervisor agent, supported by specialized conversational nodes, and evaluated against clinical standards.

Most importantly, the experience feels far more human than the original prototype.

As one of the company’s leaders put it:

“This app wouldn’t have gone to production without Focused.”

Focused on delivery

Focused on results

Focused on partnership