Agentic Loop
You type a message. Claude thinks. Tools execute. Code appears. But what's actually happening between your terminal and Anthropic's servers?
After intercepting thousands of API calls, I can show you exactly how Claude Code's agent loop works—and how to capture it yourself.
The Mental Model
Most people think Claude Code works like this:
You → Claude → Response
The reality is far more interesting:
You → Claude → Tool Call → Tool Result → Claude → Tool Call → Tool Result → Claude → Response
A single "list files" command triggers 18 API calls. A codebase exploration spawns 94 requests. Let's see why.
Setting Up Traffic Capture
First, we need to see what's happening. Create this proxy:
// proxy.js
const http = require('http');
const https = require('https');
const fs = require('fs');
const PROXY_PORT = 8888;
let counter = 0;
if (!fs.existsSync('captures')) fs.mkdirSync('captures');
http.createServer((req, res) => {
const id = String(++counter).padStart(4, '0');
let body = '';
req.on('data', chunk => body += chunk);
req.on('end', () => {
// Save request
fs.writeFileSync(`captures/${id}_request.json`, JSON.stringify({
url: req.url,
method: req.method,
body: body ? JSON.parse(body) : null
}, null, 2));
// Forward to Anthropic
const proxy = https.request({
hostname: 'api.anthropic.com',
port: 443,
path: req.url,
method: req.method,
headers: { ...req.headers, host: 'api.anthropic.com' }
}, proxyRes => {
let responseBody = '';
res.writeHead(proxyRes.statusCode, proxyRes.headers);
proxyRes.on('data', chunk => {
responseBody += chunk;
res.write(chunk);
});
proxyRes.on('end', () => {
fs.writeFileSync(`captures/${id}_response.json`, JSON.stringify({
status: proxyRes.statusCode,
body: responseBody
}, null, 2));
res.end();
});
});
if (body) proxy.write(body);
proxy.end();
});
}).listen(PROXY_PORT, () => {
console.log(`Proxy on http://localhost:${PROXY_PORT}`);
});
Run it:
node proxy.js
In another terminal:
ANTHROPIC_BASE_URL=http://localhost:8888 claude "list files in current directory"
Now check your captures/ folder. You'll see something like this:
captures/
├── 0001_request.json # Warmup
├── 0001_response.json
├── 0002_request.json # Main request
├── 0002_response.json
├── 0003_request.json # Token counting
...
├── 0014_request.json # Haiku parsing
├── 0015_request.json # Continuation
└── 0015_response.json
The Agent Loop Anatomy
Phase 1: Warmup (Request #1)
Before anything else, Claude Code checks your quota:
{
"model": "claude-haiku-4-5-20251001",
"max_tokens": 1,
"messages": [{ "role": "user", "content": "." }]
}
Why? This tiny request (max 1 token) verifies your API key works before spending compute on the real request. If you're rate-limited, you find out here—not after processing a 100KB request.
Phase 2: Main Request (Request #2)
Now the real work begins:
{
"model": "claude-opus-4-5-20251101",
"max_tokens": 16000,
"system": [
{ "text": "You are Claude Code, Anthropic's official CLI...", "cache_control": {"type": "ephemeral"} }
],
"messages": [
{ "role": "user", "content": "list files in current directory" }
],
"tools": [
{ "name": "Bash", "description": "...", "input_schema": {...} },
{ "name": "Read", "description": "...", "input_schema": {...} },
// ... 28 more tools
],
"stream": true
}
Key observations:
- Model: Opus for reasoning (the expensive one)
- System prompt: 13,000 characters of instructions
- Tools: 30 tool definitions (~17,000 tokens)
- Cache control: "ephemeral" means cache for ~5 minutes
- Streaming: Always true for real-time output
Phase 3: Tool Use Response
Claude doesn't just respond with text. It responds with a tool call:
event: content_block_start
data: {"content_block":{"type":"tool_use","id":"toolu_abc123","name":"Bash"}}
event: content_block_delta
data: {"delta":{"type":"input_json_delta","partial_json":"{\"command\""}}
event: content_block_delta
data: {"delta":{"type":"input_json_delta","partial_json":": \"ls -la\""}}
event: content_block_delta
data: {"delta":{"type":"input_json_delta","partial_json":"}"}}
event: content_block_stop
event: message_delta
data: {"delta":{"stop_reason":"tool_use"}}
Critical detail: The stop_reason is tool_use, not end_turn. This tells Claude Code: "I'm not done—execute this tool and give me the result."
Phase 4: Token Counting (Requests #3-13)
While you're watching Claude think, 11 parallel requests fire off:
{
"url": "/v1/messages/count_tokens",
"body": {
"model": "claude-opus-4-5-20251101",
"messages": [...]
}
}
These run in parallel to count tokens for:
- The conversation so far
- Potential next messages
- Cache optimization calculations
Phase 5: Tool Execution
Claude Code runs the Bash command locally:
ls -la
Output:
total 16
drwxr-xr-x 5 user staff 160 Jan 26 10:00 .
drwxr-xr-x 3 user staff 96 Jan 26 09:00 ..
-rw-r--r-- 1 user staff 1234 Jan 26 10:00 proxy.js
drwxr-xr-x 2 user staff 64 Jan 26 10:05 captures
Phase 6: Haiku Parsing (Request #14)
Here's a surprise: the Bash output goes to Haiku, not Opus:
{
"model": "claude-haiku-4-5-20251001",
"messages": [{
"role": "user",
"content": "Command: ls -la\nOutput: total 16\ndrwxr-xr-x..."
}],
"system": "Extract file paths from this command output. Return them in <filepaths> tags."
}
Why Haiku?
- Opus costs ~15x more than Haiku
- Parsing command output is a simple task
- Haiku extracts file paths, Opus does the reasoning
This multi-model routing is invisible to users but saves significant cost.
Phase 7: Continuation (Request #15)
Now the tool result goes back to Opus:
{
"model": "claude-opus-4-5-20251101",
"messages": [
{ "role": "user", "content": "list files in current directory" },
{ "role": "assistant", "content": [
{ "type": "tool_use", "id": "toolu_abc123", "name": "Bash", "input": {"command": "ls -la"} }
]},
{ "role": "user", "content": [
{ "type": "tool_result", "tool_use_id": "toolu_abc123", "content": "total 16\ndrwxr-xr-x..." }
]}
]
}
The response now has stop_reason: "end_turn":
Here are the files in your current directory:
- `proxy.js` - Your proxy server
- `captures/` - Directory with captured traffic
The Loop Visualized
┌─────────────────────────────────────────────────────────────────┐
│ AGENT LOOP │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ │
│ │ User │ │
│ │ Input │ │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Warmup │───▶│ Main │───▶│ Token │ (parallel) │
│ │ (Haiku) │ │ (Opus) │ │ Counting │ │
│ └──────────┘ └────┬─────┘ └──────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ stop_reason? │ │
│ └────────┬───────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ "end_turn" "tool_use" "max_tokens" │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ [DONE] ┌──────────┐ [Continue] │
│ │ Execute │ │
│ │ Tool │ │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Haiku │ (parse output) │
│ │ Parsing │ │
│ └────┬─────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Send │ │
│ │ Result │──────────┐ │
│ └──────────┘ │ │
│ │ │
│ ┌───────────────┘ │
│ │ │
│ ▼ │
│ Back to Opus │
│ (Loop continues) │
│ │
└─────────────────────────────────────────────────────────────────┘
Multiple Tool Calls
Claude can call multiple tools in one response:
{
"content": [
{ "type": "text", "text": "Let me check both files." },
{ "type": "tool_use", "id": "toolu_1", "name": "Read", "input": {"file_path": "/app/config.json"} },
{ "type": "tool_use", "id": "toolu_2", "name": "Read", "input": {"file_path": "/app/package.json"} }
]
}
Claude Code executes these in parallel and returns multiple results:
{
"role": "user",
"content": [
{ "type": "tool_result", "tool_use_id": "toolu_1", "content": "{\"port\": 3000}" },
{ "type": "tool_result", "tool_use_id": "toolu_2", "content": "{\"name\": \"my-app\"}" }
]
}
Sub-Agent Loops
When Claude spawns a sub-agent via the Task tool, an entirely separate loop begins:
Main Agent (Opus)
│
├── Request #2: Calls Task tool
│
│ Sub-Agent (Haiku) - NEW LOOP
│ │
│ ├── Request #16: Initial exploration
│ ├── Request #17: Tool execution
│ ├── Request #18: More tools
│ │ ... (20+ requests)
│ └── Request #38: Final response
│
├── Request #39: Receives sub-agent result
│
└── Request #40: Final response to user
The sub-agent has:
- Different system prompt (read-only, specialized)
- Restricted tools (no Edit, Write, Task)
- Independent caching (builds its own cache)
- Own conversation context (can't see parent's history)
Cache Economics
Here's why the agent loop is economically viable:
| Request | New Tokens | Cached Tokens | Cache Rate |
|---|---|---|---|
| #2 (Main) | 3 | 24,055 | 99.9% |
| #15 (Continuation) | 1,200 | 24,055 | 95.2% |
| #16 (Sub-agent start) | 13,709 | 0 | 0% |
| #20 (Sub-agent cont.) | 517 | 15,272 | 96.7% |
The first request creates the cache. Every subsequent request in the session hits 95%+ cache rate. This means:
- System prompt (13KB): Cached after first use
- Tool definitions (17K tokens): Cached
- Conversation history: Incrementally cached
Cost implication: A 20-turn conversation costs roughly the same as 2-3 turns without caching.
Stop Reasons Explained
The stop_reason field controls the loop:
| Stop Reason | Meaning | Action |
|---|---|---|
end_turn |
Claude is done | Display response, end loop |
tool_use |
Claude needs tool output | Execute tool, continue loop |
max_tokens |
Hit token limit | Continue with new request |
stop_sequence |
Hit custom stop | Depends on configuration |
Building Your Own Loop
Here's a minimal agent loop implementation:
async function agentLoop(userMessage) {
let messages = [{ role: 'user', content: userMessage }];
while (true) {
const response = await anthropic.messages.create({
model: 'claude-opus-4-5-20251101',
max_tokens: 16000,
system: SYSTEM_PROMPT,
tools: TOOL_DEFINITIONS,
messages: messages
});
// Add assistant response to history
messages.push({ role: 'assistant', content: response.content });
// Check stop reason
if (response.stop_reason === 'end_turn') {
return response.content; // Done!
}
if (response.stop_reason === 'tool_use') {
// Find tool calls
const toolCalls = response.content.filter(c => c.type === 'tool_use');
// Execute tools in parallel
const results = await Promise.all(
toolCalls.map(async (tool) => ({
type: 'tool_result',
tool_use_id: tool.id,
content: await executeTool(tool.name, tool.input)
}))
);
// Add results to history
messages.push({ role: 'user', content: results });
}
}
}
Key Takeaways
- It's a loop, not a single call: One user message can trigger 20+ API requests
- Multi-model routing: Opus reasons, Haiku parses—invisible to users
- Caching is critical: 95%+ cache hit rates make this economically viable
- Stop reasons control flow:
tool_usecontinues,end_turnstops - Sub-agents are nested loops: Completely independent execution contexts
- Parallel execution: Token counting and tool execution happen concurrently
Try It Yourself
# Clone the proxy
git clone https://github.com/your-repo/claude-code-reversal
cd claude-code-reversal
# Start capturing
node tools/proxy.js
# In another terminal
ANTHROPIC_BASE_URL=http://localhost:8888 claude "explore this codebase"
# Analyze the captures
node tools/trace-conversation.js
Watch the agent loop in action. See every tool call, every continuation, every cache hit.
The magic isn't magic—it's just a well-designed loop.