MCP Security Starts After Tool Approval

Approving an MCP server once for production is the first step in securing MCP. The real danger comes after that when the surface that the model is interacting with changes slowly but fundamentally.

A read-only customer lookup tool becomes an export tool. A database helper adds a required raw-SQL parameter. A local file search tool starts calling an external API. Same server. Same connection. Same green check from the original approval screen.

This agent does not remember the approval granted last Tuesday. All it sees at runtime is the current tool description, the current schema, the current return shape and current affordance (i.e. what the tool allows the model to do). The agent then acts accordingly.

This is the runtime security problem that MCP is walking into.

Tool metadata is runtime authority

MCP tools spec explicitly calls out that tools are model-controlled. i.e. language model can automatically discover and use tools based on context and prompts. tools/list and tools/call clients on official MCP server tools specification as well as notifications/tools/list_changed for servers that notify clients when available tool list changes for that server.

That small protocol shape matters. A tool definition is no longer something a team documents next to an API, it is the API’s documentation brought to life to teach the model about the world of actions the API supports.

The description tells the model the intent of the tool. The input schema tells the model how to use the tool to ask a question. The annotations tell the host and user interface how the tool behaves. Finally, the response from tools/call becomes evidence for the next step for the agent. Using a framework like LangChain, a team can for example use MultiServerMCPClient.get_tools() to fetch all the MCP tools for a client, and then pass them to create_agent(), as explained in the current MCP guide for LangChain current LangChain MCP guide. This is useful, because a changed tool surface now directly maps to changed executable behavior of the agent.

I like MCP because it makes the work of integrating different things look similar. As Focused argued in a recent article, MCP makes integration work runtime work MCP turns integration into runtime work. And yes, shared shape of integration work does not automatically make the work of that runtime safe. But it makes it worth securing.

Approval is a snapshot

Admission-time controls answer the narrow question of whether it should be safe to connect to a given server at admission time. Admission time controls include information about the server itself (e.g. the server’s identity) as well as the various trust roots, signature information, the registry for the server, scopes for the server and various human consent decisions made by a human administrator for the server. It’s good to make these admission time security decisions.

The fresher question is: at the time of the call, does the tool that is running to perform the call, still fall within the capability surface that the team had approved for the server to which the call is being made, admission-time security having long since having passed.

This issue came up recently in a MCP community discussion (May 2026) about runtime tool drift detection after admission-time security. The admission-time security checks verify the server’s identity, that it is in the correct trust boundary, etc. However, after that connection is made, the tool that was connected to that server could change from being read-only to a full mutate tool, add new PII data classes, etc., and the server’s identity wouldn’t change until later.

OWASP calls this type of attack a ‘rug pull’ where after approving an MCP server a server can change the tool definitions. The OWASP MCP Security Cheat Sheet identifies the problem and also lists the broader MCP attack surface, including tool poisoning, tool shadowing, confused deputy, over-scoped tokens, replay attacks, and even sandbox escapes. Note that this cheat sheet suggests hashing or pinning tool definitions and alerting on changes to tool descriptions, function and stored procedure registry, including their descriptions, parameter names, parameter types, and return schemas. That is the basis for the sane MCP security best practices for this category of problem. Treat the entire tool definition as security-sensitive material.

Architecture diagram showing admission-time MCP tool approval compared with runtime drift verification before a tool call. — Approval is a snapshot. Runtime security checks the tool that actually ran.

The runtime has to carry a capability manifest

An approved capability manifest is a useful primitive for runtime governance before the tool call. That primitive is not just paper work to be approved before deploying a server to the environment. It is an artifact that the Runtime can compare with the actual server/tool surface before executing any tool calls.

The practical details of what to put in the capability manifest can be worked out, but for now, a simple and approved capability manifest for server tools is required. This is not a piece of paper, but an artifact that the runtime can check against before running the tools. A simple way to look at this is as a list of fields corresponding to points of drift. The description field steers the model, the input and output schema fields open up new paths, the effects field changes the nature of a lookup, the data classes for sensitive data changes a metadata call to a personal data call. All of these fields must be approved by a human, but approval for a new optional description field should not require a page to the CISO, approval for a new required input parameter, external destination, sensitive data class, or declared effect should quarantine the tool until its surface has been reconciled with the approved surface for that tool.

The runtime surface of the tools under management is then compared against the manifest for the server prior to execution, as early as when a new tool definition is added to the managed tool set. Does the new surface look the same as the approved manifest, or is it a different shape to be classified?

Classify it. Cosmetic changes to the name of a tool will likely generate little interest from the CISO or security team of an organization. A new optional string field however may require review by the approval team. A new required parameter, external destination, sensitive data class or an increase in the declared effects of a tool that currently acts as a read-only lookup of metadata and outputs results to a log or dashboard for example, would require the tool to be quarantined until the tool’s surface has been reconciled by a human.

This is where runtime governance before the tool call is more than just policy. The runtime will either allow the call to happen, deny the call, prompt the human for input, or put the tool in isolated mode while the server is brought back to proper configuration to reconcile the surface with the deployed approved surface.

Auditing of tool capabilities before deployment is still relevant. In Auditing MCP Servers for Over-Privileged Tool Capabilities (paper “mcp-sec-audit”) the authors describe a protocol-aware toolkit for auditing the code and the metadata of MCP server capabilities. The paper comes with static rules for auditing as well as with the description for dynamic sandboxing using Docker and eBPF. In contrast to runtime detection of drift, this kind of audit is meant to be conducted before a server actually enters the environment where it will be used. Runtime detection of drift then takes over as the server’s reports go stale.

List changes are not enough

There is already a signal in the current MCP implementation: notifications/tools/list_changed. This signal is useful as it allows a server to notify the client that there are available tools in the tools capability section of the spec. We will use this signal as is, i.e. as a signal, not as a security control.

Similar ideas have already been brought up regarding lazy MCP tool loading. Loading the description of every single approved tool to the context window has already been argued to be detrimental to model focus. But filling the context window with the descriptions of approved but unused tools would be even worse, since each of them would then be promptable into authority in addition to just sitting there. Instead load the smallest set of tools relevant for the model, and then bind each call of each of those tools to the current evidence.

Per-call receipts beat trust vibes

It won’t fit in a security team’s incident response to debug an incident from a single screenshot of an approval dialog (as cool as that may be). They need the actual call record.

I’d also looked at per-call signed records. The GitHub discussion points strongly in that direction: input, outcome, effects, authorization decision, risk determination, and a hash chain back to the decision made for that call. A drift detector generates recomputable evidence: approved surface hash, current surface hash, classified delta, policy decision, and observed effect.

Data-flow diagram showing user intent, policy decision, MCP tool call, side effects, and a signed call record feeding traces and evals. — The runtime should leave a receipt for the decision, the call, and the effect.

We at Focused use the term side-effect ledger for this reason. “Log” is too weak a word to describe what we need here. A log simply reports that something has happened. A receipt, on the other hand, contains a record of decisions made, operation keys, tool versions, etc., and the effects of those decisions, plus a record of how to undo any harm caused by incorrect decisions.

This leads directly into the next point. As we already established, trace evidence across the MCP boundary is required. This is because an agent trace that stops at tools/call does hide the crucial information that production teams need to investigate. This information is the MCP call, downstream API span, policy decision, and the returned data class. All of this information must be included in the same investigation path.

Also the Honeycomb production feedback framing makes sense here. As Austin Parker writes in Honeycomb’s AI agents and production feedback Q&A Honeycomb's AI agents and production feedback Q&A the term drift can be used for two things: first for agents that deviate from their intended functionality and second for agents that silently produce wrong output while the output still looks correct. To detect such kind of drift the observability platform needs to be able to observe the interactions of the agent with the outside world. For MCP tool drift this outside-world interaction itself has changed shape.

MCP security belongs in the runtime

The double-edged sword of MCP is blunt about the ownership line. The protocol standardizes access. The runtime owns enforcement.

A production MCP runtime should do five boring things:

Bind each approved tool to a capability manifest.
Diff live tool definitions against that manifest.
Score drift by effect, data class, external reach, schema shape, and credential scope.
Quarantine high-severity drift before the model can use the tool.
Write per-call receipts into traces, incident evidence, and eval inputs.

So basic but important. MCP moves the integration surface closer to the model. Thus security moves closer to the call.

I'm not envisioning a one-and-done approve button (nice as it might be with a prettier modal dialog, for example). What I'm envisioning is something more that functions at runtime to be able to describe what an agent was allowed to do, what a tool claimed to be able to do, what happened (i.e. what changed), who approved the change, etc. (and so on). This is the nature of MCP security after tool approval.

That is MCP security after tool approval.