Build a Production AI Agent with the Claude Agent SDK
Stop wiring agents together with LangChain and duct tape. The Claude Agent SDK gives you tool use, subagents, file system access, and hooks in a few dozen lines. Here's a full working example.
Why the Agent SDK Exists
For most of 2024 and 2025, building an LLM agent meant gluing together LangChain, a vector store, a prompt template framework, a custom tool runner, and a retry loop. Everyone did it slightly differently. Everyone’s agent was slightly broken in slightly different ways.
Anthropic released the Claude Agent SDK (originally the engine behind Claude Code) to replace that whole stack with one opinionated library. It ships with tool use, subagents, file system operations, hooks, permission prompts, and a streaming event loop — all production-tested, because it’s literally the same code that powers Claude Code’s terminal agent.
This tutorial builds a real agent: a repo-auditor that clones a Git repository, scans it for security issues, and writes a Markdown report. You’ll end up with ~100 lines of Python that would have taken 500 lines in a custom framework.
Prerequisites
- Python 3.10+
- An Anthropic API key (
export ANTHROPIC_API_KEY=...) pip install claude-agent-sdk- Git installed on the machine running the agent
Step 1: The Minimal Loop
Here’s the smallest possible working Agent SDK call:
import anyio
from claude_agent_sdk import query, ClaudeAgentOptions
async def main():
options = ClaudeAgentOptions(
system_prompt="You are a concise coding assistant.",
allowed_tools=["Bash", "Read", "Glob", "Grep"],
)
async for message in query(
prompt="List the top-level Python files in this directory and describe each.",
options=options,
):
print(message)
anyio.run(main)
Run this in a Python project and you’ll see Claude stream its tool calls: a Glob for *.py, Read on each file, then a final text response. No boilerplate. No tool-registration dance.
The SDK handles:
- Sending the prompt with proper tool definitions
- Executing tool calls in a sandbox
- Feeding results back to the model
- Looping until Claude decides it’s done
- Streaming each event to your async generator
Step 2: Defining Custom Tools
Built-in tools cover filesystems, shells, and basic web fetching. For anything else you define your own:
from claude_agent_sdk import tool, create_sdk_mcp_server
@tool("clone_repo", "Clone a git repository into a temp directory.", {"url": str})
async def clone_repo(args):
import subprocess, tempfile
tmp = tempfile.mkdtemp()
subprocess.run(["git", "clone", "--depth", "1", args["url"], tmp], check=True)
return {
"content": [
{"type": "text", "text": f"Cloned to {tmp}"}
]
}
repo_server = create_sdk_mcp_server(
name="repo-tools",
version="1.0.0",
tools=[clone_repo],
)
The SDK exposes tools via the Model Context Protocol (MCP) under the hood, so any MCP server you’ve built for Claude Code also works here with zero changes. That’s a genuinely useful property — tools move freely between your IDE, your production agent, and your command line.
Step 3: Permission Hooks and Guardrails
Production agents need guardrails. The SDK’s hook system lets you intercept every tool call before it executes:
from claude_agent_sdk import ClaudeAgentOptions, HookMatcher
async def block_destructive(input_data, tool_use_id, context):
cmd = input_data.get("tool_input", {}).get("command", "")
if any(bad in cmd for bad in ["rm -rf", "sudo", "curl | sh", ":(){ :|:& };:"]):
return {
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason": f"Blocked dangerous command: {cmd}",
}
}
return {}
options = ClaudeAgentOptions(
allowed_tools=["Bash", "Read", "Glob", "Grep", "Write"],
hooks={
"PreToolUse": [
HookMatcher(matcher="Bash", hooks=[block_destructive]),
],
},
)
The hook runs before any Bash tool call and can deny it with a reason that Claude sees. Claude will then typically try a different approach. Hooks also fire on PostToolUse, UserPromptSubmit, and other lifecycle events — enough to implement audit logging, rate limiting, output filtering, or cost caps.
Step 4: The Repo Auditor
Putting it together. This agent clones a repo, scans for a few classes of issue, and writes a Markdown report:
import anyio
from claude_agent_sdk import (
query, ClaudeAgentOptions, tool,
create_sdk_mcp_server, HookMatcher,
)
@tool("clone_repo", "Clone a git repository to a fresh temp directory.", {"url": str})
async def clone_repo(args):
import subprocess, tempfile, os
tmp = tempfile.mkdtemp(prefix="audit-")
subprocess.run(
["git", "clone", "--depth", "1", args["url"], tmp],
check=True, capture_output=True,
)
return {"content": [{"type": "text", "text": tmp}]}
repo_server = create_sdk_mcp_server(
name="repo-tools", version="1.0.0", tools=[clone_repo],
)
async def log_tools(input_data, tool_use_id, context):
print(f"[tool] {input_data.get('tool_name')} {input_data.get('tool_input')}")
return {}
SYSTEM = """You are a security auditor. When given a repo URL:
1. Clone it with clone_repo
2. Read the directory structure with Glob
3. Grep for hardcoded secrets (API keys, tokens, passwords)
4. Grep for dangerous patterns (eval, exec, os.system, SQL string concat)
5. Read the top 5 most concerning files and assess them
6. Write a Markdown report to ./audit-report.md with: summary, findings table, recommendations
Be concise. Rank issues by severity: critical, warning, suggestion."""
async def main(repo_url: str):
options = ClaudeAgentOptions(
system_prompt=SYSTEM,
allowed_tools=[
"Bash", "Read", "Glob", "Grep", "Write",
"mcp__repo-tools__clone_repo",
],
mcp_servers={"repo-tools": repo_server},
hooks={
"PreToolUse": [HookMatcher(matcher="*", hooks=[log_tools])],
},
max_turns=40,
)
prompt = f"Audit this repository: {repo_url}"
async for message in query(prompt=prompt, options=options):
if hasattr(message, "text"):
print(message.text)
if __name__ == "__main__":
import sys
anyio.run(main, sys.argv[1])
Run it:
python auditor.py https://github.com/some/repo
You’ll watch Claude clone the repo, grep for secrets, read suspicious files, then drop a Markdown report in your working directory. The whole loop is driven by the SDK’s event stream. You wrote ~90 lines.
Step 5: Subagents for Parallelism
One Claude instance auditing one repo is fine. Auditing 50 repos sequentially is slow. The SDK’s subagent mechanism lets the main agent spawn parallel workers:
options = ClaudeAgentOptions(
system_prompt="You are an orchestrator. For each repo, spawn a subagent to audit it in parallel.",
allowed_tools=["Bash", "Task"], # Task is the built-in subagent spawner
subagents={
"auditor": {
"description": "Audits a single repository for security issues.",
"system_prompt": SYSTEM,
"allowed_tools": ["Bash", "Read", "Glob", "Grep", "Write"],
}
},
)
The Task tool spawns a fresh auditor subagent with its own context window. The parent gets only the summary back, which keeps context usage sane. This is the same pattern Claude Code uses when it spawns itself recursively for research tasks.
Step 6: Cost Control
The Agent SDK reports usage in every event. A simple cost gate:
total_cost = 0.0
MAX_USD = 1.00
async for message in query(prompt=prompt, options=options):
if hasattr(message, "usage") and message.usage:
total_cost += message.usage.cost_usd
if total_cost > MAX_USD:
print(f"Cost cap hit: ${total_cost:.2f}")
break
For production agents this is non-optional. Any loop where the LLM controls its own continuation must have a hard budget.
When Not To Use the Agent SDK
A few honest limitations:
- It’s Claude-only. If you need multi-provider routing, stick with a higher-level framework.
- It’s biased toward filesystem/shell workflows. Pure conversational agents don’t benefit as much.
- It assumes asyncio. Sync-only codebases need a wrapper.
For any workload that does involve filesystem operations, shell commands, or multi-step tool use, the Agent SDK is the shortest path from prompt to production in 2026. It’s the reference implementation of how Anthropic thinks agents should work, and it shows.
Next Steps
- Wire your agent into a cron job or queue worker for batch runs
- Add a
UserPromptSubmithook that routes simple questions to Sonnet instead of Opus - Write a custom MCP server exposing your company’s internal APIs
- Read the full
claude-agent-sdk-pythonsource — it’s small and worth understanding
The abstraction is thin enough that you can learn it in an afternoon and bent to nearly any shape. After years of agent frameworks that felt like cathedrals, an opinionated library built on top of MCP is a refreshing baseline.
Debugging Checklist
When your agent misbehaves, work through this list before blaming the model:
- Is the tool description clear? Vague tool descriptions are the number one cause of agents that “refuse to use the tool.” Rewrite the description so a human reading it cold would know exactly when to call it.
- Are tool inputs validated? If your tool accepts
{"url": str}but silently fails on a non-URL, the agent will keep calling it with garbage and never understand why. - Is the system prompt scoped? Agents that fail open-endedly usually had open-ended system prompts. Be specific about what done looks like.
- Are error messages actionable? Claude reads tool errors. “Error: 500” is useless. “Error: database not reachable, retry in 10s or use cached_data tool” is useful.
- Is max_turns too low? Complex workflows need headroom. Start at 40 and raise only if needed.
- Are you logging tool calls? Every production agent should log every tool call with inputs, outputs, and timing. Without this you can’t debug anything.
Following this list turns most agent failures from mysterious to trivial within minutes. Write it on a sticky note.
> Want more like this?
Get the best AI insights delivered weekly.
> Related Articles
Web Scraping with AI: Build a Smart Data Extraction Pipeline
Traditional web scraping breaks when websites change layouts. AI-powered scraping understands page structure and extracts data intelligently. Here's how to build one using Python, Beautiful Soup, and Claude.
Create an AI Art Portfolio: From Generation to Gallery in One Weekend
Build a professional AI art portfolio website with curated collections, consistent style, and proper attribution. Covers prompt engineering, style consistency, curation, and deployment.
Build an AI Chrome Extension: Add Claude to Any Webpage in 60 Minutes
Build a Chrome extension that summarizes web pages, answers questions about content, and rewrites selected text — all powered by Claude. Full source code and step-by-step instructions included.
Tags
> Stay in the loop
Weekly AI tools & insights.