The Claude Code Rabbit Hole

I wanted to understand how Claude Code actually works. What started as a simple question turned into a deep dive that revealed Anthropic's entire playbook for building AI agents.

Here's the rabbit hole. See how far you want to go.

Is This Legal?

Yes. Everything here uses documented, public features:

ANTHROPIC_BASE_URL is a documented environment variable for enterprise proxy setups
We're inspecting our own API traffic—like using browser DevTools on a website
No binary patching, no decompilation, no bypassing security measures
The system prompt and tool schemas are sent in plaintext to your machine on every request
This is equivalent to reading HTTP headers—data that's already yours

Anthropic designed Claude Code to work with custom proxies. We're just... using that feature.

What we're NOT doing:

Redistributing Anthropic's code
Bypassing authentication or rate limits
Accessing other users' data
Violating the Terms of Service

This is educational exploration of a system's public interface. Let's go.

Level 1: The Entry Point

Claude Code respects an environment variable for custom API routing:

export ANTHROPIC_BASE_URL=http://localhost:8888

Point it at a simple proxy that logs requests:

const http = require('http');
const https = require('https');

http.createServer((req, res) => {
  let body = '';
  req.on('data', chunk => body += chunk);
  req.on('end', () => {
    console.log('REQUEST:', JSON.parse(body));
    // Forward to real API...
  });
}).listen(8888);

Run Claude Code through it:

ANTHROPIC_BASE_URL=http://localhost:8888 claude "what is 2+2?"

First surprise: One simple question generates 14 API calls.

#	Endpoint	Size
1	`/v1/messages`	2KB (Haiku warmup)
2	`/v1/messages`	113KB (main request)
3-12	`/v1/messages/count_tokens`	Token counting
13-14	`/api/event_logging/batch`	Telemetry

113KB for a single request? What's in there?

Rabbit hole opens.

Level 2: The System Prompt

Inside request #2, there's a system field. Extract it:

const systemPrompt = request.body.system.map(s => s.text).join('\n');
console.log(systemPrompt.length); // ~13,000 characters

13,000 characters of instructions. This is the complete behavioral specification for Claude Code.

Highlights:

Anti-Sycophancy (They Actually Wrote This Down)

Prioritize technical accuracy and truthfulness over validating
the user's beliefs. It is best for the user if Claude honestly
applies the same rigorous standards to all ideas and disagrees
when necessary, even if it may not be what the user wants to hear.

Avoid using over-the-top validation or excessive praise such as
"You're absolutely right" or similar phrases.

They're explicitly fighting AI yes-man syndrome. In the prompt. For every request.

Banned: Time Estimates

Never give time estimates or predictions for how long tasks will take.
Avoid phrases like "this will take me a few minutes" or "quick fix."

AI time estimates would be unreliable. So they removed the capability entirely.

The Task Management Mandate

Use these tools VERY frequently to ensure that you are tracking your
tasks and giving the user visibility into your progress.

It is critical that you mark todos as completed as soon as you are
done with a task. Do not batch up multiple tasks before marking
them as completed.

Wait—what tools?

Level 3: The 26 Tools

Same request, tools field. 26 complete tool definitions with JSON schemas:

{
  "name": "Bash",
  "description": "Execute a bash command...",
  "input_schema": {
    "type": "object",
    "properties": {
      "command": { "type": "string" },
      "timeout": { "type": "number" }
    },
    "required": ["command"]
  }
}

The full arsenal:

Category	Tools
Files	Read, Write, Edit, Glob, Grep
Execution	Bash, Task, TaskOutput, TaskStop
Planning	EnterPlanMode, ExitPlanMode
Tasks	TodoWrite
Web	WebFetch, WebSearch
Interaction	AskUserQuestion, Skill

Some of these are straightforward. But EnterPlanMode? Task with a subagent_type parameter?

Rabbit hole deepens.

Level 4: The Planning System

The EnterPlanMode tool description is 2,000+ characters. Key excerpt:

Use this tool proactively when you're about to start a non-trivial
implementation task. Getting user sign-off on your approach before
writing code prevents wasted effort and ensures alignment.

When to use it:

New feature implementation
Multiple valid approaches exist
Architectural decisions required
Changes touch 3+ files
Requirements are unclear

The workflow:

1. Agent calls EnterPlanMode
2. Agent explores codebase (read-only)
3. Agent writes plan to file
4. Agent calls ExitPlanMode with required permissions:

   {
     "allowedPrompts": [
       { "tool": "Bash", "prompt": "run tests" },
       { "tool": "Bash", "prompt": "install dependencies" }
     ]
   }

5. User reviews and approves plan
6. Agent executes with pre-approved permissions only

This is approval-before-execution. The agent declares what it needs upfront. User signs off. Then it runs.

But what's this Task tool with subagent_type?

Level 5: Multi-Agent Orchestration

The Task tool description is 4,000+ characters. It spawns sub-agents:

{
  "name": "Task",
  "input_schema": {
    "properties": {
      "subagent_type": { "type": "string" },
      "model": { "enum": ["sonnet", "opus", "haiku"] },
      "prompt": { "type": "string" },
      "run_in_background": { "type": "boolean" },
      "resume": { "type": "string" }
    }
  }
}

Built-in agent types:

Agent	Purpose	Tools
`Explore`	Fast codebase search	Read-only
`Plan`	Architecture design	Read-only
`Bash`	Command execution	Bash only
`general-purpose`	Complex research	All
`code-reviewer`	Find bugs, security issues	Analysis
`silent-failure-hunter`	Find inadequate error handling	All
`pr-test-analyzer`	Review test coverage	All
`code-architect`	Design feature blueprints	Analysis

Features:

Model selection per agent — Use Haiku for fast searches, Opus for planning
Background execution — Agents run async, check results later
Agent resumption — Continue a previous agent with full context
Parallel spawning — Multiple agents at once

Claude Code isn't one agent. It's an orchestrator that spawns specialists.

But wait—what's this code-reviewer doing automatically?

Level 6: Self-Validation

From the code-reviewer agent description:

Use this agent proactively after writing or modifying code,
especially before committing changes or creating pull requests.

And silent-failure-hunter:

Use this agent when reviewing code that involves error handling,
catch blocks, fallback logic, or any code that could potentially
suppress errors.

The pattern:

Write code
    ↓
Spawn code-reviewer agent
    ↓
Fix issues found
    ↓
Spawn silent-failure-hunter
    ↓
Fix error handling issues
    ↓
Present to user

Claude Code reviews its own work before showing you.

This explains a lot. But why are some requests going to different models?

Level 7: Multi-Model Routing

Watching the traffic, I noticed some requests go to claude-opus-4-5-20251101, others to claude-haiku.

Pattern identified:

After running a Bash command, the output goes to Haiku with:

System: Extract any file paths from this command output.
Format as: <filepaths>path1\npath2</filepaths>

Model	Purpose	Cost
Opus	Reasoning, decisions	$15/1M tokens
Haiku	Parsing, extraction	$0.80/1M tokens

~95% cost savings on parsing tasks.

Opus thinks. Haiku extracts. Then Opus continues with structured data.

But that 113KB request—surely that costs a fortune?

Level 8: The Caching Secret

Token usage from a captured response:

{
  "input_tokens": 1,
  "cache_read_input_tokens": 23604
}

1 new token. 23,604 cached.

That's a 99.99% cache hit rate.

How:

{
  "system": [
    {
      "type": "text",
      "text": "[13KB system prompt]",
      "cache_control": { "type": "ephemeral" }
    }
  ]
}

The system prompt and tool definitions are sent every request but cached server-side. Only new conversation content costs tokens.

The pattern: Static content first (cacheable). Variable content last (new tokens).

Level 9: Human-in-the-Loop

The AskUserQuestion tool isn't free-form. It's structured:

{
  "questions": [{
    "question": "Which auth method should we use?",
    "header": "Auth",
    "options": [
      { "label": "JWT", "description": "Stateless, good for APIs" },
      { "label": "Sessions", "description": "Traditional, server-side state" },
      { "label": "OAuth", "description": "Third-party auth delegation" }
    ],
    "multiSelect": false
  }]
}

Constraints:

1-4 questions at a time
2-4 options per question
Each option has label + description
Automatic "Other" option for custom input

This is structured human oversight. Not "what do you think?" but "pick from these analyzed options."

Level 10: The Task Tracking System

The TodoWrite tool maintains a live task list:

{
  "todos": [
    { "content": "Research codebase", "status": "completed", "activeForm": "Researching codebase" },
    { "content": "Implement feature", "status": "in_progress", "activeForm": "Implementing feature" },
    { "content": "Run tests", "status": "pending", "activeForm": "Running tests" }
  ]
}

Rules from the prompt:

Exactly ONE task must be in_progress at any time (not less, not more).

ONLY mark a task as completed when you have FULLY accomplished it.
If you encounter errors, blockers, or cannot finish, keep the task
as in_progress.

Two forms per task:

content: "Run tests" (what appears in list)
activeForm: "Running tests" (shown in spinner)

This is progress visibility. User always knows current state.

The Complete Picture

┌────────────────────────────────────────────────────────────────┐
│                    CLAUDE CODE ARCHITECTURE                     │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  PLANNING          API LAYER              ORCHESTRATION        │
│  ┌──────────┐     ┌──────────────────┐   ┌────────────────┐   │
│  │  Enter   │     │   Opus (reason)  │   │  Sub-Agents    │   │
│  │  Plan    │────▶│   Haiku (parse)  │──▶│  - Explore     │   │
│  │  Mode    │     │   99% cached     │   │  - Review      │   │
│  │          │     └──────────────────┘   │  - Architect   │   │
│  │  Exit    │              │             └────────────────┘   │
│  │  Plan    │              ▼                                   │
│  │  Mode    │     ┌──────────────────┐   ┌────────────────┐   │
│  └──────────┘     │  Local Client    │   │  Task Tracking │   │
│                   │  - Bash          │   │  - pending     │   │
│  HUMAN APPROVAL   │  - Read/Write    │   │  - in_progress │   │
│  ┌──────────┐     │  - Edit          │   │  - completed   │   │
│  │  Ask     │◀────│  - Glob/Grep     │   └────────────────┘   │
│  │  User    │     └──────────────────┘                        │
│  │ Question │                                                  │
│  └──────────┘     VALIDATION                                   │
│                   ┌──────────────────┐                        │
│                   │  Code Reviewer   │                        │
│                   │  Failure Hunter  │                        │
│                   │  Test Analyzer   │                        │
│                   └──────────────────┘                        │
└────────────────────────────────────────────────────────────────┘

What I Learned

This isn't just a coding assistant. It's a reference architecture for production AI agents:

1. Plan Before Execute

Enter planning mode. Explore. Design. Get approval. Then run.

2. Structured Human Oversight

Don't ask open questions. Present 2-4 analyzed options. Let humans pick.

3. One Task at a Time

Track everything. Work on one. Mark complete immediately.

4. Spawn Specialists

Delegate to purpose-built sub-agents with appropriate tools and models.

5. Validate Your Own Work

Review code. Hunt failures. Check coverage. Before the human sees it.

6. Right-Size Models

Opus for thinking. Haiku for parsing. Match capability to task.

7. Cache Aggressively

Static content first. 99% cache hits are achievable.

8. Be Honest, Not Agreeable

Disagree when necessary. No fake time estimates. Truth over validation.

Try It Yourself

# Create a simple logging proxy
cat > proxy.js << 'EOF'
const http = require('http');
const https = require('https');
const fs = require('fs');

let n = 0;
http.createServer((req, res) => {
  let body = '';
  req.on('data', c => body += c);
  req.on('end', () => {
    fs.writeFileSync(`request_${++n}.json`, body);
    const proxy = https.request({
      hostname: 'api.anthropic.com',
      path: req.url,
      method: req.method,
      headers: { ...req.headers, host: 'api.anthropic.com' }
    }, pRes => {
      res.writeHead(pRes.statusCode, pRes.headers);
      pRes.pipe(res);
    });
    proxy.write(body);
    proxy.end();
  });
}).listen(8888, () => console.log('Proxy on :8888'));
EOF

node proxy.js &

# Run Claude Code through it
ANTHROPIC_BASE_URL=http://localhost:8888 claude "hello"

# Inspect what was captured
cat request_2.json | jq '.system[0].text' | head -100
cat request_2.json | jq '.tools[].name'

How Deep Did You Go?

Level 1-2: You understand the entry point and system prompt
Level 3-4: You see the tool architecture and planning system
Level 5-6: You grasp multi-agent orchestration and self-validation
Level 7-8: You understand the economics (multi-model, caching)
Level 9-10: You see the full human-in-the-loop and task tracking system

The rabbit hole goes as deep as you want.

Everything above is in plaintext, in your own API traffic, using documented features.

Anthropic built a production agent architecture. They just didn't write a blog post about it.

Now you have the map.

This exploration uses only documented features and inspects the author's own API traffic. No terms of service were violated. For educational purposes.