Free Guide

Stop Burning ThroughClaude Code Tokens

API users pay per token. Every bloated prompt, full-file read, and agent loop adds up fast. This guide covers the exact techniques that cut token usage 30-60% without slowing down your workflow.

How Much Does Claude Code Cost Per Token?

Costs vary by model tier. Claude Haiku 3.5 is the cheapest at $0.80 per million input tokens and $4.00 per million output tokens. Sonnet 3.7 runs around $3.00 input / $15.00 output per million tokens. Opus 4 is the most expensive tier. Output tokens cost significantly more than input tokens across all models — your verbose prompts are cheaper than Claude's verbose answers.

The Batch API cuts costs by 50% for non-urgent workloads. If you run nightly analysis jobs or bulk content tasks, batching alone can halve your monthly bill without changing a single prompt.

$0.80
Haiku input / 1M tokens
$3.00
Sonnet input / 1M tokens
50%
Batch API discount
Output vs input cost ratio

Why Is Claude Code Using So Many Tokens?

Context window bloat is the biggest culprit. Every message in a session gets re-sent with each new turn — so a 50-turn conversation can be processing the same early context 50 times. Add full-file reads when you only needed 10 lines, agent loops that re-examine the same files, and a CLAUDE.md with no guardrails, and token usage compounds fast.

Common token drains

  • Reading entire files to find one function
  • No .claudeignore — reads node_modules and dist
  • Long sessions without /compact
  • Agent loops that re-read already-processed files
  • Vague prompts that trigger verbose reasoning
  • System prompts not cached

What actually matters

  • Context size at each turn — not just session total
  • File read scope — lines vs full file
  • Prompt specificity — vague = more tokens to compensate
  • Ignored directories — define them or Claude reads everything
  • Tool call frequency in agent mode
  • Caching eligibility of repeated context

The core fix

How Do I Reduce Claude Code Token Usage?

Four levers make the biggest difference: the /compact command compresses conversation context mid-session without losing key decisions. A .claudeignore file stops Claude from reading build directories and lock files. Targeted CLAUDE.md instructions tell Claude to use grep and glob instead of reading full files. And specific prompts — "read lines 45-60 of auth.ts" instead of "look at auth.ts" — eliminate exploratory reads.

The free guide below has copy-paste configs for all four, plus a prompt pattern reference with before/after examples. Most API users cut their token usage 30-60% within the first week of applying these.

/compact command

Compresses session context while preserving key decisions. Use when context exceeds 50% capacity.

.claudeignore file

Prevents Claude from reading node_modules, dist, .git, and other directories that have no code value.

Targeted CLAUDE.md rules

Explicit instructions: use grep/glob before reading files. Avoid re-reading files already in context.

Specific prompt patterns

Line-range reads, targeted searches, and exact edits vs rewrites. Five before/after examples in the guide.

What Is the /compact Command in Claude Code?

/compact asks Claude to summarize the current conversation into a compressed representation — preserving key decisions, file paths, and implementation context while removing the raw back-and-forth. It dramatically reduces the token cost of subsequent turns without starting over. Think of it as a mid-session memory consolidation.

Use /compact when your context bar is over 50%, when you've finished a large feature and are starting another, or when Claude starts hallucinating earlier context. Use /clear when you're starting a completely fresh topic — it wipes the slate. Never use /clear mid-implementation with uncommitted work. The guide includes a decision tree for when to use each.

# When to run /compact vs /clear

/compact — use when:
  - Context > 50% capacity
  - Finished major feature, starting next one
  - Context has grown with lots of back-and-forth
  - Want to preserve key decisions and file context

/clear — use when:
  - Completely new topic or project
  - Context is polluted with irrelevant info
  - Starting fresh debugging session

NEVER /clear when:
  - Mid-implementation with uncommitted code
  - Waiting for test results
  - In the middle of a multi-step task

Does Claude Code Use More Tokens Than Regular Claude?

Yes, significantly. Claude Code adds overhead that the chat interface doesn't have: tool call formatting, file read operations (each file read is a separate token block), multi-step agent workflows that accumulate context across turns, and the system prompt injected by Claude Code itself. A typical multi-file coding task uses 3-5× more tokens than an equivalent conversation in Claude.ai.

This is expected and acceptable — the value is higher too. But it means API costs scale quickly with complex agentic tasks. The optimization techniques in this guide are specifically designed for this context: tool-heavy, file-heavy, multi-turn workflows where default behavior wastes tokens at every step.

Subscription vs API: Which Is Cheaper for Claude Code?

For heavy users (6+ hours of Claude Code per day), a Max subscription at $100-200/mo is almost always cheaper than the API. API pricing is better for occasional or burst usage — a few hours a day or project-based work. The break-even point depends on your model mix and session complexity.

This guide focuses exclusively on API cost reduction. If you're on a Pro or Max subscription managing weekly credit limits, the techniques overlap but the priority order differs — see the managing subscription credits guide for that workflow.

Video walkthrough

Watch the token optimization workflow in action.

Full walkthrough: .claudeignore setup, /compact vs /clear decisions, CLAUDE.md token rules, and prompt patterns. About 12 minutes.

How Do I Track My Claude Code Token Usage?

The Anthropic API dashboard shows token usage by model, day, and API key. You can set monthly spend limits and usage alerts directly in the dashboard. For per-session tracking, Claude Code shows token usage in the session output. You can also add a CLAUDE.md instruction that logs token counts at the end of each session — the free guide below includes this config.

The most actionable monitoring approach is comparing usage before and after applying .claudeignore and targeted prompts. Most users see a noticeable drop in the first week. The guide walks through setting up baseline tracking so you can measure the improvement.

Free download

Get the Token Optimization Kit

CLAUDE.md token rules, .claudeignore template, /compact decision guide, prompt caching config, cost comparison table, and 5 prompt patterns with before/after examples. All copy-paste ready.

The full system

AI Workflow Audit — $750

The guide gets you the core fixes. The audit covers your entire Claude Code setup — CLAUDE.md review and rewrite, MCP server audit, prompt pattern diagnosis, and a prioritized action list delivered as a 75-minute Google Meet walkthrough with copy-paste configs.

Most setups I audit waste 30-60% of their token budget on context bloat, full-file reads that should be targeted searches, and MCP servers running more calls than necessary. The guide covers the obvious wins. The audit finds the rest.

CLAUDE.md review + rewrite

Token-focused rules added: grep-before-read, line-range reads, compact triggers, ignore patterns. Your config rebuilt from the ground up.

MCP server stack audit

Each server reviewed for call frequency and token cost. Identify which servers add value and which are creating unnecessary overhead.

Prompt pattern diagnosis

Your most-used prompts reviewed for specificity and token efficiency. Before/after rewrites included for the top 5 patterns.

Caching configuration

Identify which parts of your system prompt and context blocks are eligible for prompt caching. Set up correctly, caching cuts repeated costs by 90%.

Agent workflow review

Multi-step agent workflows diagnosed for redundant file reads, re-processing, and loops that could be collapsed into fewer turns.

10-min follow-up Loom

Within 14 days, send any question about your setup. I record a screen-share walkthrough answering it.

Book Your AI Workflow Audit — $750

75-minute Google Meet. Copy-paste deliverables. No upsells.

Guarantee: if the audit does not surface at least 5 actionable improvements to your Claude setup, full refund. Same day.

FAQ

Common questions

How much does Claude Code cost on the API?+
It varies by model. Haiku 3.5 is the cheapest at around $0.80 per million input tokens. Sonnet 3.7 runs about $3.00 input / $15.00 output per million. Opus 4 is the highest tier. The Batch API offers a 50% discount for non-urgent workloads.
How many tokens does a Claude Code session use?+
A typical session uses 50K-500K tokens depending on task complexity, file sizes, and session length. Multi-file refactors and agentic loops at the high end. Short focused tasks at the low end. Applying the techniques in this guide routinely cuts usage 30-60%.
What happens when I hit my API usage limit?+
Requests fail with rate limit errors until the limit resets. You can request higher rate limits from Anthropic, and you can set monthly spend limits in the API dashboard to avoid surprise bills.
Does /compact lose important context?+
It summarizes, so some raw detail gets compressed. Key decisions, file paths, and implementation context are preserved. The risk is low if you use /compact when the session is mostly resolved rather than mid-task. The guide includes a decision tree for when to compact vs clear.
What is a .claudeignore file and how does it save tokens?+
A .claudeignore file works like .gitignore but for Claude Code. It tells Claude to skip reading specified directories and file patterns. Without it, Claude may read node_modules, dist, .git, and lock files — none of which contain useful code. The guide has a complete template.
How do MCP servers affect token consumption?+
Each MCP tool call uses tokens for the request, the tool schema description, and the response. More MCP servers in your config means more tokens spent on tool definitions even when you don't call them. Audit which servers you actually use — the audit does this systematically.
Can I set workspace spend limits?+
Yes. The Anthropic API dashboard lets you set monthly spend limits and usage alerts per API key. You can also set per-key rate limits to cap a single project or team member.
Is the API cheaper than Claude Max for heavy usage?+
Depends on volume. Claude Max at $100-200/mo is typically better for 6+ hours of Claude Code per day. The API is cheaper for occasional or burst usage. This guide covers API optimization. For subscription credit management, see the managing subscription credits guide.
Zio team member

Got a quick question?

Sep usually replies within a few hours

Or email us at sep@zioadvertising.com