LangGraph Patterns for Production Scale: Tool Calling for Context Engineering

Prompting plays a central role in any agentic system. It shapes the tone, governs the agent’s behavior, and defines how it responds to edge cases.

One of the hardest parts of prompting is context engineering: giving the agent only the information it needs, exactly when it needs it. The goal is to provide enough context for the agent to act correctly, but not so much that it becomes distracted or confused (hallucination risk). This becomes increasingly important as the agent takes on more use cases, each one adding pressure and complexity to the prompt.

So how do we hit the sweet spot between prompting and tool calling? Sometimes it might be easier to just add a new paragraph to the prompt to handle a new case, but as we already mentioned, this pattern doesn’t seem to scale.

Tools are what turn a plain LLM call into a real agent. They’re the cornerstone of agentic systems because they give models the ability to make decisions based on the specific problem they’re trying to solve.

So why don’t we leverage them to let the agent pull exactly the information it needs, only when it needs it?

‍

Tool calling for context engineering

This is one lesson I learned recently: make sure the system prompt only contains essential instructions on how the agent should behave (response guidelines, available tools, etc), and information that needs to be in context every single time. Any other piece of data that won’t be used in every single request, should only be injected into context on demand.

Let’s follow a simple example of an agent that can help users with questions about flights, hotels, as well as bookings.

Imagine the agent needs to be able to answer simple questions about 20 different airlines: what’s their customer service phone number, website, possible destinations, etc.

One way to do it would be to create a block of text with all that data, and just put it inside the prompt, right? Well, that could work for a few airlines with a few pieces of data for each. But think about hallucinations: would you really trust LLMs to not get confused if you paste 20 identical blocks of text that look like this?

‍

Airline name: LLM Airlines
Address: 1234 W LangChain Rd
Website: https://www.llmairlines.com
Phone number: XXX-XXXX-XXXX
...

‍

The risk of getting hallucinations is just too high, and usually companies don’t want to risk making silly mistakes like providing a wrong phone number when building a customer facing bot. And the problem would only get worse as we scale the chatbot and want to add 5 or 10 more airlines, which is the whole point of scalability, right?

Aside from hallucinations, the biggest issue with this approach is that we’re including potentially useless data into the agent’s context. But what if instead of having a huge amount of text that we don’t even need on every single interaction, we build a tool that provides the data for a specific airline on demand? That way, the agent will only pull that data if it needs it, and leave the rest of the useless context out of the context.

The code could be really simple too. Just decide how you want to store the data: it could be locally as json, csv, xml or even a table in a database. Then you just create the tool that does the query and provide instructions to the agent on how and when to use it.

Note: You can think of it as a form of RAG, since you’re augmenting the agent’s context based on their need.

‍

import json
from langchain_core.tools import tool
from chatbot.airline_data import airline_data

@tool(parse_docstring=True)
def airline_info_fetcher(airline_id: str) -> str:
	try:
        if airline_id in airline_data:
            return json.dumps(airline_data[airline_id], indent=2)
        else:
            return f"Airline with ID '{airline_id}' not found. Available IDs: {list(airline_data.keys())}"
    except Exception as e:
        return f"Error loading ariline_data: {str(e)}"

In this case, we’re storing the data as a python dictionary for simplicity, and returning JSON to the LLM. But again, the data source can vary depending on the use case.

Note: the agent needs to have good accuracy on airline ids to get the correct data. So you would need to provide a map of ids and then evaluate that it uses the correct one every single time. A good tip for this is remembering that LLMs read and interpret the docstrings of the tools we provide them, so the ID map could also be there.

Bonus Track: Evaluation

Just as a final point, it seems weird to end a post like this without talking about evals since they are always the cherry on top, and what helps us ensure all the above works as expected.

Now, we’ve only demonstrated the implementation of one tool. But agentic systems usually have more than one agent with more than one tool. So a recommended eval to put together here would be a dataset of multiple inputs or scenarios for the agent to handle, and the tool it should call on each case. This will be a north star when trying to tune the prompt for correct tool calling.

At a bare minimum, you want to ensure the agents call the correct tool. But if it’s relevant to your use case, you could also add the params the agent should use to the dataset. For instance, the tool we mentioned earlier about the airline information would be a good example of this, because it won’t work as expected if the agent uses the wrong airline id.

Final thoughts

Ultimately, the main takeaway of this article is understanding that context is a resource to be queried, not a constant to be embedded. By leveraging the Tool Calling for Context Engineering pattern (and rigorously evaluating that your agents select the correct tools and parameters) you move beyond the limitations of static prompting. This approach transforms your system from a simple LLM wrapper into a dynamic, adaptable agent, one capable of managing complexity, mitigating hallucination risk, and reliably growing alongside real-world use cases.

LangGraph Patterns That Scale In Production: Tool Calling for Context Engineering

Let's Build better Agents Together

Modernize your legacy with Focused