LangChain Under the Hood: 5 Features We Rely On Daily

LangChain has become the go-to framework for building LLM applications, but what makes it so powerful? At Focused we’ve put a lot of agents and AI driven applications into production using the tools that LangChain provides. Let’s go over five core features of LangChain that we rely on daily at Focused and then take a look at how you can put these concepts together to build a production-ready LLM-powered application with LangGraph.

We’ll build a very simple research bot as we work through the rest of the blog.

1. Prompt Management with Prompt Templates

LangChain’s prompt templates provide a simple way to interact with LLM APIs. They support various message types, message history, and role-based prompting. They are great for managing prompts programmatically to keep them reusable and to keep your LLM integrations maintainable. Here’s an example of a basic prompt template.

    # imports and env setup omitted for brevity 
research_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a research assistant. Provide some key points and concepts about the requested topic."),
    ("human", "{input}")
])

llm = ChatOpenAI(model="gpt-4")

response = llm.invoke(research_prompt.format_messages(input="What is langchain?"))

Prompt templates reduce complexity by allowing the developer to define general prompts that can be customized with variables so they can be reused across different use cases. They are also portable across the LLM abstractions that LangChain provides, so you don’t have to redefine your prompt to swap out your model.

Pro Tip: Few-shot prompt templates like the FewShotChatMessagePromptTemplate can take it a step further by providing the model a set of examples that the LLM can use to guide its response generation for new inputs.

2. Persistent Conversation Context with Message History

LangChain provides out-of-the-box tools for managing message history. Providing an LLM with historical messages allows it to retain conversational context throughout a session as it processes new messages.

Using an InMemoryChatMessageHistory object is a great way to enable short-term memory, which provides thread-level persistence and allows for tracking multi-turn conversations. With the use of a keyed memory store, we can track session-specific message history for multiple distinct user sessions.

Let’s modify our chat prompt template to include message history.

    # imports and env setup omitted for brevity
memory_store = {}

def get_session_history(session_id: str):
    if session_id not in memory_store:
        memory_store[session_id] = InMemoryChatMessageHistory()
    return memory_store[session_id]

research_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a research assistant. Provide some key points and concepts about the requested topic."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

llm = ChatOpenAI(model="gpt-4")

runnable_with_history = RunnableWithMessageHistory(
    research_prompt | llm,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

response1 = runnable_with_history.invoke(
    {"input": "Help me learn more about LangChain"},
    config={"configurable": {"session_id": "user-1"}}
)

response2 = runnable_with_history.invoke(
    {"input": "What did I just ask you?"},
    config={"configurable": {"session_id": "user-1"}}
)

Note the addition of the MessagesPlaceholder in the research_prompt initialization. This provides a place in our context window for the message history that we save in our memory store to be inserted.

We also introduce the RunnableWithMessageHistory wrapper, which automatically manages inserting historical messages into the context window, as well as the input mapping to your prompt invocation.

Without memory, chatbots and agents would have no way to keep track of message progression over time and wouldn’t be able to build upon previous context. It’s an essential feature for anything other than one-shot prompting.

Pro Tip: Long-term memory can be achieved by introducing a persistent datastore to track history across conversations and even across deployments of your chatbot or agent.

Pro Tip: LLM context windows have a limited size, which means that context can be lost if you exceed the available context window. If your use case involves long-lasting conversations or large context windows, consider using an LLM to periodically summarize message history to preserve historical context while limiting the size of the context window necessary to continue the conversation.

3. Chain Composition with LCEL (LangChain Expression Language)

LCEL is LangChain's current standard for building runnable “chains”. Chains are composed of runnable components chained together with the pipe operator (|).

We actually already used a simple LCEL chain to provide the runnable when constructing our RunnableWithMessageHistory in the previous section. Let’s add a summary step to our example using LCEL to create a more complex chain.

    # imports and env setup omitted for brevity
memory_store = {}

def get_session_history(session_id: str):
    if session_id not in memory_store:
        memory_store[session_id] = InMemoryChatMessageHistory()
    return memory_store[session_id]

research_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a research assistant. Provide some key points and concepts about the requested topic."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

summary_prompt = ChatPromptTemplate.from_messages([
    ("human", "Summarize this information in a few sentences: {input}.")
])

llm = ChatOpenAI(model="gpt-4")

lcel_chain = (
    research_prompt
    | llm
    | summary_prompt
    | llm
)

runnable_with_history = RunnableWithMessageHistory(
    lcel_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

response = runnable_with_history.invoke(
    {"input": "Help me learn more about LangChain"},
    config={"configurable": {"session_id": "user-1"}}
)

LCEL chains automatically implement the Runnable interface, which supports .invoke(), .batch(), and .stream() methods without the need for doing it yourself. You can run Runnable objects in parallel with RunnableParallel, run multiple inputs through a chain with the RunnableBatch API, or run chains asynchronously with the RunnableAsync API.

All of these runnable components can be chained together, and with proper input/output mapping, you can build some very complex chains. It sort of makes you feel like Xzibit.

Pro Tip: LCEL provides a simple interface for orchestrating complex workflows, but for production applications it’s recommended to keep orchestration chains simple and compose complex workflows that require state management, branching, cycles and/or multiple agents working together with LangGraph. Define LCEL chains that accomplish simple tasks as individual nodes in your graph.

4. Easy Retrieval-Augmented Generation (RAG) with Vector Store Abstractions

Vector stores use an embedding model to store data in such a way that relevant data can be retrieved using a similarity search for an arbitrary input. This unlocks an incredibly effective way to find content that LLMs can use to augment their responses with information that may not have been a part of their training data. LangChain provides abstractions that make it easy to integrate with a variety of vector stores which makes building RAG workflows a breeze.

If you’ve been following along with the code, you may have noticed that you haven’t gotten great responses for your inquiries about LangChain from your chatbot. That’s because, depending on the model you’re using, it may not have much information about LangChain available to it from its training dataset. Let’s add some fetchable context for it to use to augment its responses.

    # imports and env setup omitted for brevity
memory_store = {}

def get_session_history(session_id: str):
    if session_id not in memory_store:
        memory_store[session_id] = InMemoryChatMessageHistory()
    return memory_store[session_id]

def intialize_docs():
    docs = []
    # The directory "docs" in this example contains a set of markdown files containing the information in this blog separated by feature, but you could use any datasource 
    for file in os.listdir("docs"):
        with open(os.path.join("docs", file), "r") as f:
            docs.append(Document(page_content=f.read()))
    return docs

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

research_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a research assistant. Provide some key points and concepts about the requested topic. Use the following context for your research: {context}"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

summary_prompt = ChatPromptTemplate.from_messages([
    ("human", "Summarize this information in a few sentences: {input}.")
])

llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma.from_documents(intialize_docs(), embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

lcel_chain = (
    RunnablePassthrough().assign(context=itemgetter("input") | retriever | format_docs)
    | research_prompt
    | llm
    | summary_prompt
    | llm
)

runnable_with_history = RunnableWithMessageHistory(
    lcel_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

response = runnable_with_history.invoke(
    {"input": "Help me learn more about LangChain"},
    config={"configurable": {"session_id": "user-1"}}
)

With a few utility functions to seed our vector database with our context data we can easily configure the vector store as a retriever that can be used in our LCEL chain. The additional context needs to be passed into our prompt template as the context variable.

To do this, we create a RunnablePassthrough object that passes the user input to the retriever which retrieves the context data and formats it before finally injecting the resulting content into our prompt template’s context variable.

With the abstractions LangChain provides, RAG functionality can be achieved easily without the need for fussing over the nuances of integrating with one vector store vs another or planning an intensive migration if, or more likely when, you decide that you want to try out the hot new vector database.

Pro Tip: LangChain has out-of-the-box integrations with tons of popular vector databases. Chances are, they support the vector store you’d like to use. Check here for more information on which vector stores they already support and documentation on how to integrate them.

5. Structured Data from Unstructured Responses with Output Parsers

Output parsers are a great way to introduce some structure to your non-deterministic workflows. They work seamlessly with LCEL chains and provide robust structured data extraction with modern validation.

For the sake of visibility in the final output, let’s make our research summarizer output its summary with a defined structure that includes the research subject, a brief summary, and three next steps for learning more about the subject.

    # imports and env setup omitted for brevity
class ResearchSummary(BaseModel):
    subject: int = Field(description="research subject in one word")
    summary: str = Field(description="brief summary")
    next_steps: str = Field(description="3 next steps to learn more about the subject")

memory_store = {}

def get_session_history(session_id: str):
    if session_id not in memory_store:
        memory_store[session_id] = InMemoryChatMessageHistory()
    return memory_store[session_id]

def intialize_docs():
    docs = []
    # The directory "docs" in this example contains a set of markdown files containing the information in this blog separated by feature
    for file in os.listdir("docs"):
        with open(os.path.join("docs", file), "r") as f:
            docs.append(Document(page_content=f.read()))
    return docs

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

research_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a research assistant. Provide some key points and concepts about the requested topic. Use the following context for your research: {context}"),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

summary_prompt = ChatPromptTemplate.from_messages([
    ("system", "Summarize research and extract structured data."),
    ("human", "Summarize this information in a few sentences: {input}.\n\n{format_instructions}")
])

llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma.from_documents(intialize_docs(), embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
parser = PydanticOutputParser(pydantic_object=ResearchSummary)

lcel_chain = (
    RunnablePassthrough().assign(context=itemgetter("input") | retriever | format_docs)
    | research_prompt
    | llm
    | summary_prompt.partial(format_instructions=parser.get_format_instructions())
    | llm
)

runnable_with_history = RunnableWithMessageHistory(
    lcel_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

response = runnable_with_history.invoke(
    {"input": "Help me learn more about LangChain"},
    config={"configurable": {"session_id": "user-1"}}
)

While this demonstrates the power of output parsers in a way you can see in the result, using output parsers to introduce a consistent structure to the interface between two nodes in your graph or two components in your chain can mean the difference between wrestling with non-deterministic results and getting consistent scores in your evals and, more importantly, consistent results for your production users.

Pro Tip: Using output parsers to enforce structured data output is great for categorization workflows and breaking down context to pass on to an additional LLM step or node in a LangGraph workflow.

Bonus: LLMs That Take Action with Tool Calling

LangChain's tool system allows LLMs to call external functions and APIs. Tools transform LLMs from text generators into action-taking agents that can interact with external systems. In LangChain, tools are declared using the @tool decorator which is used to associate a function with a schema that defines the function’s name, description and expected arguments. We can then bind the tool to an LLM so that it has the tool at its disposal. Given a request, the LLM decides whether or not it needs to call a tool it has in its toolbox to satisfy the request and how the call(s) should be structured according to the tool schema.

Let’s supercharge our research assistant with the ability to search for the latest LangChain news.

    # imports and env setup omitted for brevity
class ResearchSummary(BaseModel):
    subject: int = Field(description="research subject in one word")
    summary: str = Field(description="brief summary")
    next_steps: str = Field(description="3 next steps to learn more about the subject")
    news: dict[str, str] = Field(description="3 news articles in the following format: {'title': 'url'}")

memory_store = {}

def get_session_history(session_id: str):
    if session_id not in memory_store:
        memory_store[session_id] = InMemoryChatMessageHistory()
    return memory_store[session_id]

def intialize_docs():
    docs = []
    # The directory "docs" in this example contains a set of markdown files containing the information in this blog separated by feature
    for file in os.listdir("docs"):
        with open(os.path.join("docs", file), "r") as f:
            docs.append(Document(page_content=f.read()))
    return docs

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

@tool
def search_web() -> str:
    """Search the web for current information about a topic."""
    response = requests.get(
        "https://api.search.brave.com/res/v1/web/search",
        headers={
            "X-Subscription-Token": f"{BRAVE_SEARCH_API_KEY}",
        },
        params={
            "q": "latest news about langchain",
            "count": 5,
            "country": "us",
            "search_lang": "en",
        },
    )
    return [{"title": result["title"], "url": result["url"]} for result in response.json()["web"]["results"][0:2]]

def execute_tool_calls(tool_calls):
    for tool_call in tool_calls:
        if tool_call["name"] == "search_web":
            response = search_web()
            return response
    return None

research_prompt = ChatPromptTemplate.from_messages([
    ("system", """
        You are a research assistant. Provide some key points and concepts about the requested topic. 
        Use any tools available to you to get the latest information about the topic. 
        Additionally, use the following context for your research: {context}
    """),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}")
])

summary_prompt = ChatPromptTemplate.from_messages([
    ("system", "Summarize research and extract structured data."),
    ("human", "Summarize this information in a few sentences: {input}.\n\n{format_instructions}")
])

llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vectorstore = Chroma.from_documents(intialize_docs(), embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
parser = PydanticOutputParser(pydantic_object=ResearchSummary)

lcel_chain = (
    RunnablePassthrough().assign(context=itemgetter("input") | retriever | format_docs)
    | research_prompt
    | llm.bind_tools([search_web])
    | RunnableLambda(lambda x: execute_tool_calls(x.tool_calls) if x.tool_calls else x)
    | summary_prompt.partial(format_instructions=parser.get_format_instructions())
    | llm
)

runnable_with_history = RunnableWithMessageHistory(
    lcel_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

response = runnable_with_history.invoke(
    {"input": "Help me learn more about LangChain"},
    config={"configurable": {"session_id": "user-1"}}
)

The important thing to keep in mind when using tools to extend an LLM’s functionality is that binding tools only allows the LLM to determine what tool it needs to call, if any, and how it should structure the call to the selected tool.

The tool calls that the LLM generates need to then be invoked, resulting in a ToolMessage that contains the result of the tool execution. This ToolMessage can be included in the context window to inform the final response generation. One way this can be done is shown above with the new RunnableLambda step in our LCEL chain.

This step takes the ToolCalls generated by the research step and executes them so that the results can be passed to the summary step which will attempt to include them in the final response output according to the ResearchSummary schema class.

In under 110 lines of code (including the omitted imports) we were able to use the core features of LangChain to build an AI research assistant to help us learn about LangChain. It has the ability to retrieve additional context from a datastore to inform its responses and search the web for the latest buzz around LangChain 🦜⛓️.

It’s not perfect, and there are lots of improvements to be made, but I hope this helped to illustrate just how easy it is to prototype and iterate on AI chatbots and agentic workflows with LangChain.

Pro Tip: Tools fit neatly into nodes in a graph built with LangGraph.

Pro Tip: Provide a list of tools with their names and a description of how and when to use them in the prompt to the LLM.

Putting It All Together with LangGraph

While LangChain provides the building blocks, LangGraph is the tool for building production-level applications and agents that orchestrate complex workflows with state management, conditional logic, cycles, and human-in-the-loop patterns.

Here's what our simple research assistant might look like, implemented as a graph with LangGraph. We can reuse all of the utility functions, tool definitions, and prompts from our LangChain-only implementation, but instead of modeling the workflow as an LCEL chain, we model it as a graph with distinct nodes, edges, and conditional edges.

    # imports and env setup omitted for brevity 

# ---------------------------
# Graph state and nodes
# ---------------------------
class GraphState(TypedDict):
    input: str
    history: List[BaseMessage]
    context: str
    research_message: Any
    tool_results: Any
    summary: str

def retrieve_context_node(state: GraphState) -> dict:
    query_text = state.get("input", "")
    retrieved_docs = retriever.invoke(query_text) if query_text else []
    return {"context": format_docs(retrieved_docs)}


def research_node(state: GraphState) -> dict:
    model_with_tools = llm.bind_tools([search_web])
    chain = research_prompt | model_with_tools
    ai_message = chain.invoke(
        {
            "input": state["input"],
            "history": state.get("history", []),
            "context": state.get("context", ""),
        }
    )

    updated_history = list(state.get("history", []))
    updated_history.append(HumanMessage(content=state["input"]))
    # The AI message can include tool calls
    updated_history.append(ai_message)

    return {"research_message": ai_message, "history": updated_history}


def tools_node(state: GraphState) -> dict:
    ai_message = state.get("research_message")
    result = execute_tool_calls(getattr(ai_message, "tool_calls", None))
    return {"tool_results": result}


def summarize_node(state: GraphState) -> dict:
    summarization_input = state.get("tool_results")
    if summarization_input is None:
        research_msg = state.get("research_message")
        summarization_input = getattr(research_msg, "content", "")

    chain = summary_prompt.partial(
        format_instructions=parser.get_format_instructions()
    ) | llm

    response = chain.invoke({"input": summarization_input})

    updated_history = list(state.get("history", []))
    updated_history.append(response)

    return {"summary": getattr(response, "content", str(response)), "history": updated_history}


def route_from_research(state: GraphState) -> str:
    ai_message = state.get("research_message")
    has_tool_calls = bool(getattr(ai_message, "tool_calls", None))
    return "tools" if has_tool_calls else "summarize"


# ---------------------------
# Build the graph
# ---------------------------
workflow = StateGraph(GraphState)

workflow.add_node("retrieve", retrieve_context_node)
workflow.add_node("research", research_node)
workflow.add_node("tools", tools_node)
workflow.add_node("summarize", summarize_node)

workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "research")
workflow.add_conditional_edges("research", route_from_research, {"tools": "tools", "summarize": "summarize"})
workflow.add_edge("tools", "summarize")
workflow.add_edge("summarize", END)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)


if __name__ == "__main__":
    result = app.invoke(
        {"input": "Help me learn more about LangChain", "history": []},
        config={"configurable": {"thread_id": "user-1"}},
    )
    print(result.get("summary", ""))

This workflow uses tools for research, chat templates for summarization, state management for coordination, and conditional logic for flow control - all the LangChain features working together in a stateful graph.

Pro Tip: Use LangGraph when developing production applications that require state management, branching, cycles, multiple agents working together, and/or human-in-the-loop functionality. At Focused, the majority of our work is done with LangGraph.

Why These Features Matter

These features offer more than just convenience. They offer reliability and maintainability. There’s enough uncertainty and non-determinism to deal with when working with LLMs. Having a proven toolkit for development that abstracts away a lot of the complexity of managing integrations, state management, historical context, and composition allows developers to focus on modeling the behavior of their chatbot or agent and get to value delivery faster.

Remember, while it’s essential to understand the building blocks that LangChain provides, the real power comes from combining these concepts with LangGraph to deliver production-ready AI-driven workflows and agents specialized in tackling complex tasks. Happy LangGraph-ing!

Back to Explore Focused Lab