The Agent Loop: How Claude Code Actually Works

Reverse Engineering

Agentic Loop

You type a message. Claude thinks. Tools execute. Code appears. But what's actually happening between your terminal and Anthropic's servers?

After intercepting thousands of API calls, I can show you exactly how Claude Code's agent loop works—and how to capture it yourself.


The Mental Model

Most people think Claude Code works like this:

You → Claude → Response

The reality is far more interesting:

You → Claude → Tool Call → Tool Result → Claude → Tool Call → Tool Result → Claude → Response

A single "list files" command triggers 18 API calls. A codebase exploration spawns 94 requests. Let's see why.


Setting Up Traffic Capture

First, we need to see what's happening. Create this proxy:

// proxy.js
const http = require('http');
const https = require('https');
const fs = require('fs');

const PROXY_PORT = 8888;
let counter = 0;

if (!fs.existsSync('captures')) fs.mkdirSync('captures');

http.createServer((req, res) => {
  const id = String(++counter).padStart(4, '0');
  let body = '';

  req.on('data', chunk => body += chunk);
  req.on('end', () => {
    // Save request
    fs.writeFileSync(`captures/${id}_request.json`, JSON.stringify({
      url: req.url,
      method: req.method,
      body: body ? JSON.parse(body) : null
    }, null, 2));

    // Forward to Anthropic
    const proxy = https.request({
      hostname: 'api.anthropic.com',
      port: 443,
      path: req.url,
      method: req.method,
      headers: { ...req.headers, host: 'api.anthropic.com' }
    }, proxyRes => {
      let responseBody = '';
      res.writeHead(proxyRes.statusCode, proxyRes.headers);

      proxyRes.on('data', chunk => {
        responseBody += chunk;
        res.write(chunk);
      });

      proxyRes.on('end', () => {
        fs.writeFileSync(`captures/${id}_response.json`, JSON.stringify({
          status: proxyRes.statusCode,
          body: responseBody
        }, null, 2));
        res.end();
      });
    });

    if (body) proxy.write(body);
    proxy.end();
  });
}).listen(PROXY_PORT, () => {
  console.log(`Proxy on http://localhost:${PROXY_PORT}`);
});

Run it:

node proxy.js

In another terminal:

ANTHROPIC_BASE_URL=http://localhost:8888 claude "list files in current directory"

Now check your captures/ folder. You'll see something like this:

captures/
├── 0001_request.json   # Warmup
├── 0001_response.json
├── 0002_request.json   # Main request
├── 0002_response.json
├── 0003_request.json   # Token counting
...
├── 0014_request.json   # Haiku parsing
├── 0015_request.json   # Continuation
└── 0015_response.json

The Agent Loop Anatomy

Phase 1: Warmup (Request #1)

Before anything else, Claude Code checks your quota:

{
  "model": "claude-haiku-4-5-20251001",
  "max_tokens": 1,
  "messages": [{ "role": "user", "content": "." }]
}

Why? This tiny request (max 1 token) verifies your API key works before spending compute on the real request. If you're rate-limited, you find out here—not after processing a 100KB request.

Phase 2: Main Request (Request #2)

Now the real work begins:

{
  "model": "claude-opus-4-5-20251101",
  "max_tokens": 16000,
  "system": [
    { "text": "You are Claude Code, Anthropic's official CLI...", "cache_control": {"type": "ephemeral"} }
  ],
  "messages": [
    { "role": "user", "content": "list files in current directory" }
  ],
  "tools": [
    { "name": "Bash", "description": "...", "input_schema": {...} },
    { "name": "Read", "description": "...", "input_schema": {...} },
    // ... 28 more tools
  ],
  "stream": true
}

Key observations:

  • Model: Opus for reasoning (the expensive one)
  • System prompt: 13,000 characters of instructions
  • Tools: 30 tool definitions (~17,000 tokens)
  • Cache control: "ephemeral" means cache for ~5 minutes
  • Streaming: Always true for real-time output

Phase 3: Tool Use Response

Claude doesn't just respond with text. It responds with a tool call:

event: content_block_start
data: {"content_block":{"type":"tool_use","id":"toolu_abc123","name":"Bash"}}

event: content_block_delta
data: {"delta":{"type":"input_json_delta","partial_json":"{\"command\""}}

event: content_block_delta
data: {"delta":{"type":"input_json_delta","partial_json":": \"ls -la\""}}

event: content_block_delta
data: {"delta":{"type":"input_json_delta","partial_json":"}"}}

event: content_block_stop

event: message_delta
data: {"delta":{"stop_reason":"tool_use"}}

Critical detail: The stop_reason is tool_use, not end_turn. This tells Claude Code: "I'm not done—execute this tool and give me the result."

Phase 4: Token Counting (Requests #3-13)

While you're watching Claude think, 11 parallel requests fire off:

{
  "url": "/v1/messages/count_tokens",
  "body": {
    "model": "claude-opus-4-5-20251101",
    "messages": [...]
  }
}

These run in parallel to count tokens for:

  • The conversation so far
  • Potential next messages
  • Cache optimization calculations

Phase 5: Tool Execution

Claude Code runs the Bash command locally:

ls -la

Output:

total 16
drwxr-xr-x  5 user  staff   160 Jan 26 10:00 .
drwxr-xr-x  3 user  staff    96 Jan 26 09:00 ..
-rw-r--r--  1 user  staff  1234 Jan 26 10:00 proxy.js
drwxr-xr-x  2 user  staff    64 Jan 26 10:05 captures

Phase 6: Haiku Parsing (Request #14)

Here's a surprise: the Bash output goes to Haiku, not Opus:

{
  "model": "claude-haiku-4-5-20251001",
  "messages": [{
    "role": "user",
    "content": "Command: ls -la\nOutput: total 16\ndrwxr-xr-x..."
  }],
  "system": "Extract file paths from this command output. Return them in <filepaths> tags."
}

Why Haiku?

  • Opus costs ~15x more than Haiku
  • Parsing command output is a simple task
  • Haiku extracts file paths, Opus does the reasoning

This multi-model routing is invisible to users but saves significant cost.

Phase 7: Continuation (Request #15)

Now the tool result goes back to Opus:

{
  "model": "claude-opus-4-5-20251101",
  "messages": [
    { "role": "user", "content": "list files in current directory" },
    { "role": "assistant", "content": [
      { "type": "tool_use", "id": "toolu_abc123", "name": "Bash", "input": {"command": "ls -la"} }
    ]},
    { "role": "user", "content": [
      { "type": "tool_result", "tool_use_id": "toolu_abc123", "content": "total 16\ndrwxr-xr-x..." }
    ]}
  ]
}

The response now has stop_reason: "end_turn":

Here are the files in your current directory:

- `proxy.js` - Your proxy server
- `captures/` - Directory with captured traffic

The Loop Visualized

┌─────────────────────────────────────────────────────────────────┐
│                        AGENT LOOP                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────┐                                                   │
│  │  User    │                                                   │
│  │  Input   │                                                   │
│  └────┬─────┘                                                   │
│       │                                                         │
│       ▼                                                         │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐                  │
│  │ Warmup   │───▶│  Main    │───▶│  Token   │ (parallel)       │
│  │ (Haiku)  │    │ (Opus)   │    │ Counting │                  │
│  └──────────┘    └────┬─────┘    └──────────┘                  │
│                       │                                         │
│                       ▼                                         │
│              ┌────────────────┐                                 │
│              │ stop_reason?   │                                 │
│              └────────┬───────┘                                 │
│                       │                                         │
│         ┌─────────────┼─────────────┐                          │
│         ▼             ▼             ▼                          │
│    "end_turn"    "tool_use"    "max_tokens"                    │
│         │             │             │                          │
│         ▼             ▼             ▼                          │
│      [DONE]     ┌──────────┐   [Continue]                      │
│                 │ Execute  │                                    │
│                 │  Tool    │                                    │
│                 └────┬─────┘                                    │
│                      │                                          │
│                      ▼                                          │
│                 ┌──────────┐                                    │
│                 │ Haiku    │ (parse output)                     │
│                 │ Parsing  │                                    │
│                 └────┬─────┘                                    │
│                      │                                          │
│                      ▼                                          │
│                 ┌──────────┐                                    │
│                 │  Send    │                                    │
│                 │ Result   │──────────┐                        │
│                 └──────────┘          │                        │
│                                       │                        │
│                       ┌───────────────┘                        │
│                       │                                         │
│                       ▼                                         │
│                  Back to Opus                                   │
│                  (Loop continues)                               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Multiple Tool Calls

Claude can call multiple tools in one response:

{
  "content": [
    { "type": "text", "text": "Let me check both files." },
    { "type": "tool_use", "id": "toolu_1", "name": "Read", "input": {"file_path": "/app/config.json"} },
    { "type": "tool_use", "id": "toolu_2", "name": "Read", "input": {"file_path": "/app/package.json"} }
  ]
}

Claude Code executes these in parallel and returns multiple results:

{
  "role": "user",
  "content": [
    { "type": "tool_result", "tool_use_id": "toolu_1", "content": "{\"port\": 3000}" },
    { "type": "tool_result", "tool_use_id": "toolu_2", "content": "{\"name\": \"my-app\"}" }
  ]
}

Sub-Agent Loops

When Claude spawns a sub-agent via the Task tool, an entirely separate loop begins:

Main Agent (Opus)
    │
    ├── Request #2: Calls Task tool
    │
    │   Sub-Agent (Haiku) - NEW LOOP
    │   │
    │   ├── Request #16: Initial exploration
    │   ├── Request #17: Tool execution
    │   ├── Request #18: More tools
    │   │   ... (20+ requests)
    │   └── Request #38: Final response
    │
    ├── Request #39: Receives sub-agent result
    │
    └── Request #40: Final response to user

The sub-agent has:

  • Different system prompt (read-only, specialized)
  • Restricted tools (no Edit, Write, Task)
  • Independent caching (builds its own cache)
  • Own conversation context (can't see parent's history)

Cache Economics

Here's why the agent loop is economically viable:

Request New Tokens Cached Tokens Cache Rate
#2 (Main) 3 24,055 99.9%
#15 (Continuation) 1,200 24,055 95.2%
#16 (Sub-agent start) 13,709 0 0%
#20 (Sub-agent cont.) 517 15,272 96.7%

The first request creates the cache. Every subsequent request in the session hits 95%+ cache rate. This means:

  • System prompt (13KB): Cached after first use
  • Tool definitions (17K tokens): Cached
  • Conversation history: Incrementally cached

Cost implication: A 20-turn conversation costs roughly the same as 2-3 turns without caching.


Stop Reasons Explained

The stop_reason field controls the loop:

Stop Reason Meaning Action
end_turn Claude is done Display response, end loop
tool_use Claude needs tool output Execute tool, continue loop
max_tokens Hit token limit Continue with new request
stop_sequence Hit custom stop Depends on configuration

Building Your Own Loop

Here's a minimal agent loop implementation:

async function agentLoop(userMessage) {
  let messages = [{ role: 'user', content: userMessage }];

  while (true) {
    const response = await anthropic.messages.create({
      model: 'claude-opus-4-5-20251101',
      max_tokens: 16000,
      system: SYSTEM_PROMPT,
      tools: TOOL_DEFINITIONS,
      messages: messages
    });

    // Add assistant response to history
    messages.push({ role: 'assistant', content: response.content });

    // Check stop reason
    if (response.stop_reason === 'end_turn') {
      return response.content; // Done!
    }

    if (response.stop_reason === 'tool_use') {
      // Find tool calls
      const toolCalls = response.content.filter(c => c.type === 'tool_use');

      // Execute tools in parallel
      const results = await Promise.all(
        toolCalls.map(async (tool) => ({
          type: 'tool_result',
          tool_use_id: tool.id,
          content: await executeTool(tool.name, tool.input)
        }))
      );

      // Add results to history
      messages.push({ role: 'user', content: results });
    }
  }
}

Key Takeaways

  1. It's a loop, not a single call: One user message can trigger 20+ API requests
  2. Multi-model routing: Opus reasons, Haiku parses—invisible to users
  3. Caching is critical: 95%+ cache hit rates make this economically viable
  4. Stop reasons control flow: tool_use continues, end_turn stops
  5. Sub-agents are nested loops: Completely independent execution contexts
  6. Parallel execution: Token counting and tool execution happen concurrently

Try It Yourself

# Clone the proxy
git clone https://github.com/your-repo/claude-code-reversal
cd claude-code-reversal

# Start capturing
node tools/proxy.js

# In another terminal
ANTHROPIC_BASE_URL=http://localhost:8888 claude "explore this codebase"

# Analyze the captures
node tools/trace-conversation.js

Watch the agent loop in action. See every tool call, every continuation, every cache hit.

The magic isn't magic—it's just a well-designed loop.