Inside Agentic Coding
Reproducing Claude Code's core flow in 900 lines
Something You Use Every Day
$ claude "fix this bug"
Claude read 5 files
Claude edited 2 files
Claude ran the tests
Claude confirmed tests pass
✓ Bug fixed
What actually happens in between?
Goal for Today
Walk through the core flow of an agentic coding tool
and examine a minimal working implementation
Starting from running code, not slides full of abstractions.
Pocket Code
Pocket Code — deepseek-chat @ https://api.deepseek.com
Type /help for commands, /exit to quit.
> check bug.py and fix it
[thinking] Let me read the file first...
[tool] read_file({"path":"bug.py"})
[result] def add(a, b): return a - b # bug here
[thinking] Found it — subtraction instead of addition...
[tool] edit_file({"path":"bug.py","old_string":"a - b","new_string":"a + b"})
[confirm] edit_file: bug.py (replace 5 chars) (y/n) y
[result] OK
[thinking] Fixed. Let me verify...
[tool] run_command({"command":"python bug.py"})
[confirm] run_command: python bug.py (y/n) y
[result] {"exitCode":0,"output":"3\n","timedOut":false}
[answer] Bug fixed. add(1, 2) now correctly returns 3.
⌨️
Switch to terminal — live run
Have Pocket Code ready with a buggy file
Part 2
What Just Happened
The Agentic Loop
while (true)
messages[]
→
LLM APIchatCompletion()
↓
tool_calls?
Yes
↓
Confirm?permissions.ts
↓
Execute Tooltools.ts
↓
Result → messages[]
↻
Message Flow
// Round 1
→ { role: "user", content: "fix bug.py" }
← { role: "assistant", tool_calls: [read_file("bug.py")] }
→ { role: "tool", content: "def add(a,b): return a-b" }
// Round 2
← { role: "assistant", tool_calls: [edit_file(...)] }
→ { role: "tool", content: "OK" }
// Round 3
← { role: "assistant", tool_calls: [run_command(...)] }
→ { role: "tool", content: '{"exitCode":0,...}' }
// Round 4 — no tool_calls, loop ends
← { role: "assistant", content: "Bug fixed." }
Chatbot vs Agent
Chatbot
- User says → LLM replies
- Single turn
- Text generation only
- No interaction with the outside
Agent
- User says → LLM decides what to do
- Loops until done
- Can call tools
- Read files, write code, run commands
At the API level, the difference is the tools parameter + a loop.
Real products, of course, involve far more than that.
Part 3
Layer by Layer
7 files, 920 lines, outside-in
Entry — index.ts
140 lines
- Parse CLI args (--model, --base-url)
- Load POCKET.md as system prompt
- Initialize MCP servers
- Start the REPL, handle slash commands
Analogy to Claude Code: POCKET.md is CLAUDE.md
index.ts — REPL
initReadline();
while (true) {
const input = await question("> ");
const trimmed = input.trim();
if (!trimmed) continue;
// Slash commands: /model, /clear, /help, /exit
if (trimmed.startsWith("/")) {
handleSlashCommand(trimmed, agent, config);
continue;
}
// Regular input → hand off to the Agent
await agent.run(trimmed);
}
Core — agent.ts
123 lines
The key file
The agent loop lives here
agent.ts — the core loop (~30 lines)
async run(userInput: string) {
this.messages.push({ role: "user", content: userInput });
while (true) {
// 1. Call the LLM
const reply = await chatCompletion(
this.config, this.messages, this.tools
);
// 2. No tool_calls → final answer, done
if (!reply.tool_calls?.length) {
printAnswer(reply.content);
this.messages.push(reply);
return;
}
// 3. Has tool_calls → execute each one
printThinking(reply.content); // [thinking]
this.messages.push(reply);
for (const tc of reply.tool_calls) {
const args = JSON.parse(tc.function.arguments);
printToolCall(name, args); // [tool]
if (needsConfirmation.has(name)) {
if (!await confirm(name, args)) continue; // [confirm]
}
const result = await executeTool(name, args);
printResult(result); // [result]
this.messages.push({
role: "tool", tool_call_id: tc.id, content: result
});
}
// ↻ Back to while(true) — LLM sees tool results, decides next step
}
}
Why while(true)?
How many rounds does it take to fix a bug?
Round 1: read_file → read the code
Round 2: edit_file → apply the fix
Round 3: run_command → run tests
Round 4: tests fail → error fed back to LLM
Round 5: edit_file → try again
Round 6: run_command → tests pass
Round 7: final answer
The LLM decides how many rounds it needs.
You don't write if/else to orchestrate steps — the LLM is the orchestrator.
LLM Wrapper — llm.ts
138 lines
- OpenAI-compatible format — one POST request
- Send messages + tools, receive message
- Retry on 429, 120s timeout
- Switch models by changing the base URL
llm.ts — the request body
{
"model": "deepseek-chat",
"messages": [
{ "role": "system", "content": "You are a coding assistant..." },
{ "role": "user", "content": "fix bug.py" },
{ "role": "assistant", "tool_calls": [...] },
{ "role": "tool", "content": "def add(a,b): return a-b" }
],
"tools": [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read file contents (max 2000 lines)",
"parameters": {
"type": "object",
"properties": { "path": { "type": "string" } },
"required": ["path"]
}
}
}
]
}
This is function calling.
The LLM doesn't execute tools — it tells you what to call. You execute.
Tool System — tools.ts
249 lines
Largest file
Auto read_file
Confirm write_file
Auto list_dir
Confirm edit_file
Auto search_files
Confirm run_command
Auto ask_user
Basic rule: read-only operations run automatically, writes and commands need human confirmation.
Claude Code's permission model is more granular, but the starting point is the same.
tools.ts — definitions + execution
// 1. JSON Schema tells the LLM what tools are available
export const toolDefs: ToolDef[] = [
{
type: "function",
function: {
name: "read_file",
description: "Read file contents (max 2000 lines)",
parameters: {
type: "object",
properties: { path: { type: "string" } },
required: ["path"],
},
},
},
// ... 6 more tools
];
// 2. A switch dispatches execution
export async function executeTool(name, args) {
switch (name) {
case "read_file": return readFileTool(args.path);
case "write_file": return writeFileTool(args.path, args.content);
case "run_command": return runCommandTool(args.command);
// ...
}
}
Output Truncation
// What if the LLM reads a 100K-line log file?
read_file → max 2000 lines
search_files → max 100 matches
run_command → max 2000 lines + 30s timeout
[truncated, showing first 2000 of 98473 lines]
No truncation → context overflow → agent loop breaks
Every agent framework has to deal with this
Permissions — permissions.ts
20 lines
Smallest file
export async function confirm(toolName, args) {
// write_file: demo/index.html (2847 chars)
// run_command: npm test
const summary = summarize(toolName, args);
const answer = await question(`[confirm] ${summary} (y/n) `);
return answer.trim().toLowerCase() === "y";
}
Human in the loop.
The AI can't silently rm -rf /
MCP — mcp.ts
191 lines
Model Context Protocol — connect external tools to the agent
MCP Lifecycle
initialize
→
initialized
→
tools/list
→
Running
→
Shutdown
// pocket.json — declare MCP servers
{
"mcpServers": {
"weather": {
"command": "node",
"args": ["./my-weather-server.js"]
}
}
}
MCP tools and built-in tools are registered together.
The LLM doesn't care where a tool comes from — it only sees the JSON Schema.
Terminal UI — ui.ts
59 lines
[thinking] Let me read this file...
[tool] read_file({"path":"bug.py"})
[confirm] edit_file: bug.py (replace 5 chars) (y/n)
[result] OK
[error] LLM API error 429
[answer] Bug fixed.
⠹ Thinking...
Each color = one phase of the agent loop
This is how Pocket Code makes every step visible
Part 4
Comparison & Reflection
Pocket Code vs Production
What Pocket Code has
- Agentic loop
- 7 built-in tools
- Permission confirmation
- MCP support
- Multi-model switching
- Project instruction file
What Claude Code adds
- Streaming output
- Context compression / overflow
- Sandbox isolation
- Parallel sub-agents
- File rollback / checkpoints
- Memory system
Where Are the Gaps?
- Context management — conversation too long? Compression, summarization, sliding window
- Streaming — users don't want to stare at a blank screen for 30 seconds
- Security — y/n isn't enough: sandbox, path restrictions, command allowlists
- Fault tolerance — broke a file? Checkpoints + rollback
- Parallelism — multiple sub-agents reading files and running tasks concurrently
These gaps span both engineering effort and design philosophy.
Takeaway
3
core concepts
1. LLM as decision maker — it chooses which tool to call, not your hardcoded logic
2. Tools as capabilities — JSON Schema defines what's available, results feed back to the LLM
3. The loop as the backbone — while(true) lets the agent decide when it's done
In One Line
Agent = LLM + Tools + Loop
This is the minimal form. Production builds a lot more on top.