Free Guide
Stop Burning Through
Claude Code Tokens
API users pay per token. Every bloated prompt, full-file read, and agent loop adds up fast. This guide covers the exact techniques that cut token usage 30-60% without slowing down your workflow.
How Much Does Claude Code Cost Per Token?
Costs vary by model tier. Claude Haiku 3.5 is the cheapest at $0.80 per million input tokens and $4.00 per million output tokens. Sonnet 3.7 runs around $3.00 input / $15.00 output per million tokens. Opus 4 is the most expensive tier. Output tokens cost significantly more than input tokens across all models — your verbose prompts are cheaper than Claude's verbose answers.
The Batch API cuts costs by 50% for non-urgent workloads. If you run nightly analysis jobs or bulk content tasks, batching alone can halve your monthly bill without changing a single prompt.
Why Is Claude Code Using So Many Tokens?
Context window bloat is the biggest culprit. Every message in a session gets re-sent with each new turn — so a 50-turn conversation can be processing the same early context 50 times. Add full-file reads when you only needed 10 lines, agent loops that re-examine the same files, and a CLAUDE.md with no guardrails, and token usage compounds fast.
Common token drains
- Reading entire files to find one function
- No .claudeignore — reads node_modules and dist
- Long sessions without /compact
- Agent loops that re-read already-processed files
- Vague prompts that trigger verbose reasoning
- System prompts not cached
What actually matters
- Context size at each turn — not just session total
- File read scope — lines vs full file
- Prompt specificity — vague = more tokens to compensate
- Ignored directories — define them or Claude reads everything
- Tool call frequency in agent mode
- Caching eligibility of repeated context
The core fix
How Do I Reduce Claude Code Token Usage?
Four levers make the biggest difference: the /compact command compresses conversation context mid-session without losing key decisions. A .claudeignore file stops Claude from reading build directories and lock files. Targeted CLAUDE.md instructions tell Claude to use grep and glob instead of reading full files. And specific prompts — "read lines 45-60 of auth.ts" instead of "look at auth.ts" — eliminate exploratory reads.
The free guide below has copy-paste configs for all four, plus a prompt pattern reference with before/after examples. Most API users cut their token usage 30-60% within the first week of applying these.
/compact command
Compresses session context while preserving key decisions. Use when context exceeds 50% capacity.
.claudeignore file
Prevents Claude from reading node_modules, dist, .git, and other directories that have no code value.
Targeted CLAUDE.md rules
Explicit instructions: use grep/glob before reading files. Avoid re-reading files already in context.
Specific prompt patterns
Line-range reads, targeted searches, and exact edits vs rewrites. Five before/after examples in the guide.
What Is the /compact Command in Claude Code?
/compact asks Claude to summarize the current conversation into a compressed representation — preserving key decisions, file paths, and implementation context while removing the raw back-and-forth. It dramatically reduces the token cost of subsequent turns without starting over. Think of it as a mid-session memory consolidation.
Use /compact when your context bar is over 50%, when you've finished a large feature and are starting another, or when Claude starts hallucinating earlier context. Use /clear when you're starting a completely fresh topic — it wipes the slate. Never use /clear mid-implementation with uncommitted work. The guide includes a decision tree for when to use each.
# When to run /compact vs /clear /compact — use when: - Context > 50% capacity - Finished major feature, starting next one - Context has grown with lots of back-and-forth - Want to preserve key decisions and file context /clear — use when: - Completely new topic or project - Context is polluted with irrelevant info - Starting fresh debugging session NEVER /clear when: - Mid-implementation with uncommitted code - Waiting for test results - In the middle of a multi-step task
Does Claude Code Use More Tokens Than Regular Claude?
Yes, significantly. Claude Code adds overhead that the chat interface doesn't have: tool call formatting, file read operations (each file read is a separate token block), multi-step agent workflows that accumulate context across turns, and the system prompt injected by Claude Code itself. A typical multi-file coding task uses 3-5× more tokens than an equivalent conversation in Claude.ai.
This is expected and acceptable — the value is higher too. But it means API costs scale quickly with complex agentic tasks. The optimization techniques in this guide are specifically designed for this context: tool-heavy, file-heavy, multi-turn workflows where default behavior wastes tokens at every step.
Subscription vs API: Which Is Cheaper for Claude Code?
For heavy users (6+ hours of Claude Code per day), a Max subscription at $100-200/mo is almost always cheaper than the API. API pricing is better for occasional or burst usage — a few hours a day or project-based work. The break-even point depends on your model mix and session complexity.
This guide focuses exclusively on API cost reduction. If you're on a Pro or Max subscription managing weekly credit limits, the techniques overlap but the priority order differs — see the managing subscription credits guide for that workflow.
Video walkthrough
Watch the token optimization workflow in action.
Full walkthrough: .claudeignore setup, /compact vs /clear decisions, CLAUDE.md token rules, and prompt patterns. About 12 minutes.
How Do I Track My Claude Code Token Usage?
The Anthropic API dashboard shows token usage by model, day, and API key. You can set monthly spend limits and usage alerts directly in the dashboard. For per-session tracking, Claude Code shows token usage in the session output. You can also add a CLAUDE.md instruction that logs token counts at the end of each session — the free guide below includes this config.
The most actionable monitoring approach is comparing usage before and after applying .claudeignore and targeted prompts. Most users see a noticeable drop in the first week. The guide walks through setting up baseline tracking so you can measure the improvement.
Free download
Get the Token Optimization Kit
CLAUDE.md token rules, .claudeignore template, /compact decision guide, prompt caching config, cost comparison table, and 5 prompt patterns with before/after examples. All copy-paste ready.
The full system
AI Workflow Audit — $750
The guide gets you the core fixes. The audit covers your entire Claude Code setup — CLAUDE.md review and rewrite, MCP server audit, prompt pattern diagnosis, and a prioritized action list delivered as a 75-minute Google Meet walkthrough with copy-paste configs.
Most setups I audit waste 30-60% of their token budget on context bloat, full-file reads that should be targeted searches, and MCP servers running more calls than necessary. The guide covers the obvious wins. The audit finds the rest.
CLAUDE.md review + rewrite
Token-focused rules added: grep-before-read, line-range reads, compact triggers, ignore patterns. Your config rebuilt from the ground up.
MCP server stack audit
Each server reviewed for call frequency and token cost. Identify which servers add value and which are creating unnecessary overhead.
Prompt pattern diagnosis
Your most-used prompts reviewed for specificity and token efficiency. Before/after rewrites included for the top 5 patterns.
Caching configuration
Identify which parts of your system prompt and context blocks are eligible for prompt caching. Set up correctly, caching cuts repeated costs by 90%.
Agent workflow review
Multi-step agent workflows diagnosed for redundant file reads, re-processing, and loops that could be collapsed into fewer turns.
10-min follow-up Loom
Within 14 days, send any question about your setup. I record a screen-share walkthrough answering it.
75-minute Google Meet. Copy-paste deliverables. No upsells.
Guarantee: if the audit does not surface at least 5 actionable improvements to your Claude setup, full refund. Same day.
FAQ
Common questions
How much does Claude Code cost on the API?+
How many tokens does a Claude Code session use?+
What happens when I hit my API usage limit?+
Does /compact lose important context?+
What is a .claudeignore file and how does it save tokens?+
How do MCP servers affect token consumption?+
Can I set workspace spend limits?+
Is the API cheaper than Claude Max for heavy usage?+
More free resources

