Context Engineering for AI Agents: Practical Lessons from Building Manus
⏱️ Estimated reading time: 12 min
Introduction
One of the hardest choices in AI agent development is deciding between model training and context engineering. Through the hands-on experience shared by Yichao ‘Peak’ Ji of the Manus AI team, we can explore context optimization strategies validated in production environments.
Having witnessed the shift from the BERT era to GPT-3, the Manus team arrived at their current optimization strategy through an empirical approach they call “Stochastic Graduate Descent,” rebuilding their framework four times in the process.
KV-Cache-Centric Design: The Key Metric for Production Agents
Why Does KV-Cache Hit Rate Matter?
For a production-stage AI agent, the KV-cache hit rate is the most important performance metric, because it directly affects latency and cost.
A typical agent task flow:
- Receive user input
- Select an action based on current context
- Execute the action in the environment and generate an observation
- Add the action-observation pair to the context
- Repeat until the task is complete
Manus real-world data: Average input-to-output token ratio of 100:1
Real Impact on Cost Optimization
Cost comparison using Claude Sonnet as reference:
- Cached input tokens: $0.30/MTok
- Uncached tokens: $3.00/MTok
- 10x cost difference
KV-Cache Optimization Strategies
1. Prompt Prefix Stabilization
# Invalidates the cache
system_prompt = f"""
Current time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
You are an AI agent...
"""
# Cache-friendly pattern
system_prompt = """
You are an AI agent...
Current task: {task_description}
"""
2. Append-Only Context Design
class AgentContext:
def __init__(self):
self.actions = []
self.observations = []
def add_step(self, action, observation):
# Append only, no modification
self.actions.append(action)
self.observations.append(observation)
def serialize(self):
# Guarantee deterministic serialization
return json.dumps({
'actions': self.actions,
'observations': self.observations
}, sort_keys=True)
3. Explicit Cache Breakpoint Management
def create_cache_breakpoints(context):
breakpoints = [
'system_prompt_end',
'tools_definition_end',
'conversation_start'
]
for bp in breakpoints:
context.add_cache_marker(bp)
Dynamic Tool Management: Masking vs. Removal
The Tool Explosion Problem
With the adoption of MCP (Model Context Protocol), the number of tools is growing exponentially. Adding tools indiscriminately degrades the agent’s decision-making quality.
Manus’s Solution: Context-Aware State Machine
Reasons to mask tools rather than remove them:
- KV-cache preservation: Tool definitions sit at the front of the context, so changing them invalidates the entire cache
- Reference integrity: If a tool referenced by a previous action disappears, it causes model confusion
Function Call Control Implementation
class FunctionCallController:
def __init__(self):
self.call_modes = {
'auto': '<|im_start|>assistant',
'required': '<|im_start|>assistant<tool_call>',
'specified': '<|im_start|>assistant<tool_call>{"name": "browser_'
}
def constrain_tools(self, context, allowed_prefixes):
"""Allow only tools with specific prefixes"""
# Control via token logit masking
pass
Tool Naming Conventions
# Use consistent prefixes
BROWSER_TOOLS = [
'browser_navigate',
'browser_click',
'browser_extract'
]
SHELL_TOOLS = [
'shell_execute',
'shell_cd',
'shell_env'
]
Using the File System as Context
Context Window Limitations
Despite modern LLMs offering 128K+ token windows, real-world agent scenarios face these problems:
- Massive observation data: Unstructured data such as web pages and PDFs
- Performance degradation: Model performance decline in long contexts
- High cost: Token costs remain high even with prefix caching
Manus’s File System Approach
class FileSystemContext:
def __init__(self, sandbox_path):
self.sandbox_path = sandbox_path
self.context_files = {}
def store_observation(self, content, context_type):
"""Store large observations as files"""
file_path = f"{self.sandbox_path}/observations/{uuid4()}.{context_type}"
with open(file_path, 'w') as f:
f.write(content)
return {
'type': 'file_reference',
'path': file_path,
'summary': self.generate_summary(content)
}
def restore_observation(self, file_ref):
"""Restore content from file when needed"""
with open(file_ref['path'], 'r') as f:
return f.read()
Recoverable Compression Strategy
def compress_context(self, context):
"""Compression without information loss"""
compressed = []
for item in context:
if item['type'] == 'web_page':
# Preserve URL for recovery
compressed.append({
'type': 'web_page_ref',
'url': item['url'],
'summary': item['summary']
})
elif item['type'] == 'document':
# Preserve file path
compressed.append({
'type': 'document_ref',
'path': item['path'],
'key_points': item['key_points']
})
return compressed
Maintaining Goals via Attention Manipulation
Todo List-Based Attention Control
Why Manus creates and updates a todo.md file for complex tasks:
- Goal re-confirmation: Repeatedly place goals at the end of the context
- Solving the lost-in-the-middle problem: Prevent forgetting of critical information in long contexts
- Natural language-based attention bias: Implemented without special architectural changes
Concrete Implementation Example
# todo.md example
## Current Task: Website Performance Optimization
### Completed
- [x] Measure current page load time
- [x] Analyze image file sizes
- [x] Check CSS/JS file compression status
### In Progress
- [ ] Implement caching strategy
- [ ] Apply image optimization
### Next Steps
- [ ] Run performance tests
- [ ] Write results report
Average Task Complexity
Manus real-world data: Average of 50 tool calls required per task
Error Utilization Strategy: Learn From Failures Instead of Hiding Them
The Trap of Removing Failures
Common error handling approaches:
- Remove failed actions
- Reset state and retry
- Add randomness by adjusting “temperature”
Manus’s Error Utilization Strategy
class ErrorLearningContext:
def handle_failed_action(self, action, error):
"""Keep failed actions in context"""
error_context = {
'action': action,
'error': error,
'timestamp': datetime.now(),
'learning_signal': True
}
# Keep failure information in context
self.context.append(error_context)
# Reduce the probability of the model repeating similar mistakes
self.update_action_priors(action, negative_signal=True)
The Importance of Error Recovery
Error recovery is a core indicator of truly agentic behavior. Most academic research and benchmarks today focus only on success under ideal conditions, failing to reflect the complexity of real production environments.
Avoiding the Few-Shot Prompting Trap
The Risk of Pattern Imitation
The language model’s strong imitation ability can actually become a problem in agent systems:
# Problematic uniform pattern
context_examples = [
"Action: analyze file, Observation: 20 items found",
"Action: analyze file, Observation: 15 items found",
"Action: analyze file, Observation: 25 items found"
]
# The model ends up always choosing "analyze file"
Manus’s Diversity-Promoting Strategy
class ContextDiversifier:
def add_structural_variation(self, action, observation):
"""Introduce structural variation"""
templates = [
"Executed: {action} -> Result: {observation}",
"Action: {action}\nObservation: {observation}",
"{action} completed. {observation}"
]
template = random.choice(templates)
return template.format(action=action, observation=observation)
def add_semantic_noise(self, text):
"""Diversify expression while preserving meaning"""
synonyms = {
'analyze': ['review', 'inspect', 'check'],
'execute': ['perform', 'run', 'process']
}
# Random synonym substitution
return self.apply_synonyms(text, synonyms)
Practical Application Guide
Development Phase Checklist
1. Design Phase
- Design a KV-cache-friendly prompt structure
- Define tool naming conventions (consistent prefixes)
- Plan a file system-based memory architecture
2. Implementation Phase
- Implement an append-only context structure
- Implement tool control via token logit masking
- Implement a recoverable compression strategy
3. Optimization Phase
- Monitor KV-cache hit rate
- Introduce context diversity mechanisms
- Build an error learning system
Performance Monitoring Metrics
class AgentMetrics:
def __init__(self):
self.metrics = {
'kv_cache_hit_rate': 0.0,
'avg_context_length': 0,
'error_recovery_rate': 0.0,
'task_completion_rate': 0.0,
'cost_per_interaction': 0.0
}
def calculate_efficiency_score(self):
"""Calculate a composite efficiency score"""
weights = {
'kv_cache_hit_rate': 0.3,
'error_recovery_rate': 0.25,
'task_completion_rate': 0.25,
'cost_efficiency': 0.2
}
return sum(self.metrics[k] * weights[k] for k in weights)
Future Outlook: State Space Models and Agents
The Potential of SSMs
An interesting perspective presented by the Manus team: if State Space Models (SSMs) master file-based memory, it may be possible to build efficient agents without Transformer’s full attention.
Potential advantages of SSM-based agents:
- Speed: Faster inference than Transformers
- Efficiency: Optimized memory usage
- Scalability: Ability to handle long sequences
This suggests the potential to be the true successor to the Neural Turing Machine.
Conclusion
The lessons from the Manus team’s four framework rebuilds are clear: context engineering is central to AI agent development and requires a systematic approach.
Summary of Core Principles
- KV-cache first: Optimizing hit rate directly impacts performance and cost
- Masking over removal: Ensures stability in tool management
- File system utilization: External memory for unlimited context
- Learn from failures: Use errors as learning signals rather than hiding them
- Maintain diversity: Structural variation to avoid pattern traps
Advice for Practitioners
If you are an AI agent development team, use these questions to audit your current system:
- Are you measuring KV-cache hit rate?
- Does adding a tool invalidate the entire cache?
- How are you handling failed actions?
- Is your context too uniform?
Context engineering is still a developing field. But based on principles validated in real production environments like Manus, you can build more efficient and robust AI agents.
The future of agents will be built on carefully designed context. Start now.
References:
- Manus AI original blog
- Research on KV-Cache optimization
- MCP (Model Context Protocol) documentation