The Claude Code Rabbit Hole
I wanted to understand how Claude Code actually works. What started as a simple question turned into a deep dive that revealed Anthropic's entire playbook for building AI agents.
Here's the rabbit hole. See how far you want to go.
Is This Legal?
Yes. Everything here uses documented, public features:
ANTHROPIC_BASE_URLis a documented environment variable for enterprise proxy setups- We're inspecting our own API traffic—like using browser DevTools on a website
- No binary patching, no decompilation, no bypassing security measures
- The system prompt and tool schemas are sent in plaintext to your machine on every request
- This is equivalent to reading HTTP headers—data that's already yours
Anthropic designed Claude Code to work with custom proxies. We're just... using that feature.
What we're NOT doing:
- Redistributing Anthropic's code
- Bypassing authentication or rate limits
- Accessing other users' data
- Violating the Terms of Service
This is educational exploration of a system's public interface. Let's go.
Level 1: The Entry Point
Claude Code respects an environment variable for custom API routing:
export ANTHROPIC_BASE_URL=http://localhost:8888
Point it at a simple proxy that logs requests:
const http = require('http');
const https = require('https');
http.createServer((req, res) => {
let body = '';
req.on('data', chunk => body += chunk);
req.on('end', () => {
console.log('REQUEST:', JSON.parse(body));
// Forward to real API...
});
}).listen(8888);
Run Claude Code through it:
ANTHROPIC_BASE_URL=http://localhost:8888 claude "what is 2+2?"
First surprise: One simple question generates 14 API calls.
| # | Endpoint | Size |
|---|---|---|
| 1 | /v1/messages |
2KB (Haiku warmup) |
| 2 | /v1/messages |
113KB (main request) |
| 3-12 | /v1/messages/count_tokens |
Token counting |
| 13-14 | /api/event_logging/batch |
Telemetry |
113KB for a single request? What's in there?
Rabbit hole opens.
Level 2: The System Prompt
Inside request #2, there's a system field. Extract it:
const systemPrompt = request.body.system.map(s => s.text).join('\n');
console.log(systemPrompt.length); // ~13,000 characters
13,000 characters of instructions. This is the complete behavioral specification for Claude Code.
Highlights:
Anti-Sycophancy (They Actually Wrote This Down)
Prioritize technical accuracy and truthfulness over validating
the user's beliefs. It is best for the user if Claude honestly
applies the same rigorous standards to all ideas and disagrees
when necessary, even if it may not be what the user wants to hear.
Avoid using over-the-top validation or excessive praise such as
"You're absolutely right" or similar phrases.
They're explicitly fighting AI yes-man syndrome. In the prompt. For every request.
Banned: Time Estimates
Never give time estimates or predictions for how long tasks will take.
Avoid phrases like "this will take me a few minutes" or "quick fix."
AI time estimates would be unreliable. So they removed the capability entirely.
The Task Management Mandate
Use these tools VERY frequently to ensure that you are tracking your
tasks and giving the user visibility into your progress.
It is critical that you mark todos as completed as soon as you are
done with a task. Do not batch up multiple tasks before marking
them as completed.
Wait—what tools?
Level 3: The 26 Tools
Same request, tools field. 26 complete tool definitions with JSON schemas:
{
"name": "Bash",
"description": "Execute a bash command...",
"input_schema": {
"type": "object",
"properties": {
"command": { "type": "string" },
"timeout": { "type": "number" }
},
"required": ["command"]
}
}
The full arsenal:
| Category | Tools |
|---|---|
| Files | Read, Write, Edit, Glob, Grep |
| Execution | Bash, Task, TaskOutput, TaskStop |
| Planning | EnterPlanMode, ExitPlanMode |
| Tasks | TodoWrite |
| Web | WebFetch, WebSearch |
| Interaction | AskUserQuestion, Skill |
Some of these are straightforward. But EnterPlanMode? Task with a subagent_type parameter?
Rabbit hole deepens.
Level 4: The Planning System
The EnterPlanMode tool description is 2,000+ characters. Key excerpt:
Use this tool proactively when you're about to start a non-trivial
implementation task. Getting user sign-off on your approach before
writing code prevents wasted effort and ensures alignment.
When to use it:
- New feature implementation
- Multiple valid approaches exist
- Architectural decisions required
- Changes touch 3+ files
- Requirements are unclear
The workflow:
1. Agent calls EnterPlanMode
2. Agent explores codebase (read-only)
3. Agent writes plan to file
4. Agent calls ExitPlanMode with required permissions:
{
"allowedPrompts": [
{ "tool": "Bash", "prompt": "run tests" },
{ "tool": "Bash", "prompt": "install dependencies" }
]
}
5. User reviews and approves plan
6. Agent executes with pre-approved permissions only
This is approval-before-execution. The agent declares what it needs upfront. User signs off. Then it runs.
But what's this Task tool with subagent_type?
Level 5: Multi-Agent Orchestration
The Task tool description is 4,000+ characters. It spawns sub-agents:
{
"name": "Task",
"input_schema": {
"properties": {
"subagent_type": { "type": "string" },
"model": { "enum": ["sonnet", "opus", "haiku"] },
"prompt": { "type": "string" },
"run_in_background": { "type": "boolean" },
"resume": { "type": "string" }
}
}
}
Built-in agent types:
| Agent | Purpose | Tools |
|---|---|---|
Explore |
Fast codebase search | Read-only |
Plan |
Architecture design | Read-only |
Bash |
Command execution | Bash only |
general-purpose |
Complex research | All |
code-reviewer |
Find bugs, security issues | Analysis |
silent-failure-hunter |
Find inadequate error handling | All |
pr-test-analyzer |
Review test coverage | All |
code-architect |
Design feature blueprints | Analysis |
Features:
- Model selection per agent — Use Haiku for fast searches, Opus for planning
- Background execution — Agents run async, check results later
- Agent resumption — Continue a previous agent with full context
- Parallel spawning — Multiple agents at once
Claude Code isn't one agent. It's an orchestrator that spawns specialists.
But wait—what's this code-reviewer doing automatically?
Level 6: Self-Validation
From the code-reviewer agent description:
Use this agent proactively after writing or modifying code,
especially before committing changes or creating pull requests.
And silent-failure-hunter:
Use this agent when reviewing code that involves error handling,
catch blocks, fallback logic, or any code that could potentially
suppress errors.
The pattern:
Write code
↓
Spawn code-reviewer agent
↓
Fix issues found
↓
Spawn silent-failure-hunter
↓
Fix error handling issues
↓
Present to user
Claude Code reviews its own work before showing you.
This explains a lot. But why are some requests going to different models?
Level 7: Multi-Model Routing
Watching the traffic, I noticed some requests go to claude-opus-4-5-20251101, others to claude-haiku.
Pattern identified:
After running a Bash command, the output goes to Haiku with:
System: Extract any file paths from this command output.
Format as: <filepaths>path1\npath2</filepaths>
| Model | Purpose | Cost |
|---|---|---|
| Opus | Reasoning, decisions | $15/1M tokens |
| Haiku | Parsing, extraction | $0.80/1M tokens |
~95% cost savings on parsing tasks.
Opus thinks. Haiku extracts. Then Opus continues with structured data.
But that 113KB request—surely that costs a fortune?
Level 8: The Caching Secret
Token usage from a captured response:
{
"input_tokens": 1,
"cache_read_input_tokens": 23604
}
1 new token. 23,604 cached.
That's a 99.99% cache hit rate.
How:
{
"system": [
{
"type": "text",
"text": "[13KB system prompt]",
"cache_control": { "type": "ephemeral" }
}
]
}
The system prompt and tool definitions are sent every request but cached server-side. Only new conversation content costs tokens.
The pattern: Static content first (cacheable). Variable content last (new tokens).
Level 9: Human-in-the-Loop
The AskUserQuestion tool isn't free-form. It's structured:
{
"questions": [{
"question": "Which auth method should we use?",
"header": "Auth",
"options": [
{ "label": "JWT", "description": "Stateless, good for APIs" },
{ "label": "Sessions", "description": "Traditional, server-side state" },
{ "label": "OAuth", "description": "Third-party auth delegation" }
],
"multiSelect": false
}]
}
Constraints:
- 1-4 questions at a time
- 2-4 options per question
- Each option has label + description
- Automatic "Other" option for custom input
This is structured human oversight. Not "what do you think?" but "pick from these analyzed options."
Level 10: The Task Tracking System
The TodoWrite tool maintains a live task list:
{
"todos": [
{ "content": "Research codebase", "status": "completed", "activeForm": "Researching codebase" },
{ "content": "Implement feature", "status": "in_progress", "activeForm": "Implementing feature" },
{ "content": "Run tests", "status": "pending", "activeForm": "Running tests" }
]
}
Rules from the prompt:
Exactly ONE task must be in_progress at any time (not less, not more).
ONLY mark a task as completed when you have FULLY accomplished it.
If you encounter errors, blockers, or cannot finish, keep the task
as in_progress.
Two forms per task:
content: "Run tests" (what appears in list)activeForm: "Running tests" (shown in spinner)
This is progress visibility. User always knows current state.
The Complete Picture
┌────────────────────────────────────────────────────────────────┐
│ CLAUDE CODE ARCHITECTURE │
├────────────────────────────────────────────────────────────────┤
│ │
│ PLANNING API LAYER ORCHESTRATION │
│ ┌──────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ Enter │ │ Opus (reason) │ │ Sub-Agents │ │
│ │ Plan │────▶│ Haiku (parse) │──▶│ - Explore │ │
│ │ Mode │ │ 99% cached │ │ - Review │ │
│ │ │ └──────────────────┘ │ - Architect │ │
│ │ Exit │ │ └────────────────┘ │
│ │ Plan │ ▼ │
│ │ Mode │ ┌──────────────────┐ ┌────────────────┐ │
│ └──────────┘ │ Local Client │ │ Task Tracking │ │
│ │ - Bash │ │ - pending │ │
│ HUMAN APPROVAL │ - Read/Write │ │ - in_progress │ │
│ ┌──────────┐ │ - Edit │ │ - completed │ │
│ │ Ask │◀────│ - Glob/Grep │ └────────────────┘ │
│ │ User │ └──────────────────┘ │
│ │ Question │ │
│ └──────────┘ VALIDATION │
│ ┌──────────────────┐ │
│ │ Code Reviewer │ │
│ │ Failure Hunter │ │
│ │ Test Analyzer │ │
│ └──────────────────┘ │
└────────────────────────────────────────────────────────────────┘
What I Learned
This isn't just a coding assistant. It's a reference architecture for production AI agents:
1. Plan Before Execute
Enter planning mode. Explore. Design. Get approval. Then run.
2. Structured Human Oversight
Don't ask open questions. Present 2-4 analyzed options. Let humans pick.
3. One Task at a Time
Track everything. Work on one. Mark complete immediately.
4. Spawn Specialists
Delegate to purpose-built sub-agents with appropriate tools and models.
5. Validate Your Own Work
Review code. Hunt failures. Check coverage. Before the human sees it.
6. Right-Size Models
Opus for thinking. Haiku for parsing. Match capability to task.
7. Cache Aggressively
Static content first. 99% cache hits are achievable.
8. Be Honest, Not Agreeable
Disagree when necessary. No fake time estimates. Truth over validation.
Try It Yourself
# Create a simple logging proxy
cat > proxy.js << 'EOF'
const http = require('http');
const https = require('https');
const fs = require('fs');
let n = 0;
http.createServer((req, res) => {
let body = '';
req.on('data', c => body += c);
req.on('end', () => {
fs.writeFileSync(`request_${++n}.json`, body);
const proxy = https.request({
hostname: 'api.anthropic.com',
path: req.url,
method: req.method,
headers: { ...req.headers, host: 'api.anthropic.com' }
}, pRes => {
res.writeHead(pRes.statusCode, pRes.headers);
pRes.pipe(res);
});
proxy.write(body);
proxy.end();
});
}).listen(8888, () => console.log('Proxy on :8888'));
EOF
node proxy.js &
# Run Claude Code through it
ANTHROPIC_BASE_URL=http://localhost:8888 claude "hello"
# Inspect what was captured
cat request_2.json | jq '.system[0].text' | head -100
cat request_2.json | jq '.tools[].name'
How Deep Did You Go?
- Level 1-2: You understand the entry point and system prompt
- Level 3-4: You see the tool architecture and planning system
- Level 5-6: You grasp multi-agent orchestration and self-validation
- Level 7-8: You understand the economics (multi-model, caching)
- Level 9-10: You see the full human-in-the-loop and task tracking system
The rabbit hole goes as deep as you want.
Everything above is in plaintext, in your own API traffic, using documented features.
Anthropic built a production agent architecture. They just didn't write a blog post about it.
Now you have the map.
This exploration uses only documented features and inspects the author's own API traffic. No terms of service were violated. For educational purposes.