A 500KB Tool Response Just Wiped Your Agent's Memory

tldr;

Context exhaustion floods your AI agent's context window with massive tool responses, pushing out your system prompt and conversation history, causing the agent to forget its instructions and behave unpredictably.

Context exhaustion is an MCP attack where a tool returns an excessively large response designed to fill your AI agent's context window, pushing out the system prompt, safety instructions, and conversation history.

How context exhaustion works

LLMs have a finite context window (typically 4K to 200K tokens). When a tool response is very large, it consumes most of this window, pushing out:

1.System prompt with your agent's core instructions and guardrails
2.Conversation history with important context from the user
3.Other tool results from previous, legitimate tool calls

The real danger

When the system prompt gets pushed out of context, safety guardrails disappear. The agent forgets what it was supposed to do. Responses become generic or unpredictable. This is also why context exhaustion pairs well with instruction hijacking: flood the window to erase the guardrails, then inject new instructions into the now-unprotected context.

Attack thresholds

Response Size	Risk Level
< 100KB	Normal
100KB to 500KB	Warning, unusually large
> 500KB	Critical, likely intentional flooding

Defenses

Set maximum response size limits on your MCP client. Truncate tool responses before adding them to the LLM context. Use sliding-window context management to preserve system prompts, and monitor response sizes to alert on anomalies.

A 500KB Tool Response Just Wiped Your Agent's Memory

How context exhaustion works

The real danger

Attack thresholds

Defenses

Read Next

file://, smb://, and Other Dangerous URIs in MCP Responses

Does Your MCP Server Work in the Browser?

How to Add API Key Auth to Your MCP Server (The Right Way)