Claude Code /compact — Save Context Tokens

Claude Code compact usually refers to using Claude Code in a more token-efficient way so your coding session keeps the important context and drops or compresses the rest; if you are trying to save context tokens, the practical goal is to reduce what Claude has to carry forward on each turn. c-ai.chat is an independent guide to Claude, not Anthropic, and this page explains how compact-style workflows work, when to use them, and what trade-offs to expect. For the bigger picture, see our Claude Code guide.

The short answer
How it works
What you’d actually do with it
Vs. the alternatives
Other questions readers ask
The honest take

The short answer

Claude Code compact is a shorthand for trimming, summarising, or otherwise reducing the amount of conversation and code context Claude Code needs to keep in memory during a coding session. It is for developers who hit context limits, pay attention to token cost, or want Claude Code to stay focused on the current task instead of dragging along every previous file, error log, and dead-end idea.

What it does · reduces carried context so sessions stay efficient
Where it runs · in Claude Code workflows tied to Claude models and the Claude ecosystem
What it costs · no separate compact fee; usage depends on your Claude plan or API token use
Who it’s for · developers working on longer code sessions, larger repos, or multi-step debugging

There is no separate public Claude pricing line item called “compact.” The cost question is really about the model and access path you use. On the API side, Claude Sonnet 4.6 is priced at $3 per million input tokens and $15 per million output tokens, Claude Haiku 4.5 at $1 and $5, and Claude Opus 4.7 at $5 and $25. Prompt caching can cut cached input cost by 90%, which matters when you keep reusing the same codebase context. You can review official plan details on Claude pricing and broader options in our Claude pricing guide.

How it works

The basic mechanism is simple: Claude Code performs better when it sees the right context, not all context. In a long session, token usage grows because the model may need earlier instructions, file excerpts, diffs, stack traces, command output, and previous decisions. A compact workflow reduces that load by keeping the parts that still matter and compressing or discarding the rest.

In practice, developers do this in a few ways. They restate the current objective in one sentence, provide only the files or functions that matter, ask Claude to summarise previous work into a short handoff note, or start a fresh thread with a compact brief instead of continuing a sprawling session. If you are using API-based tooling, prompt caching can also help because repeated project context does not need to be paid for at the full rate every time. For related model and access details, see our Claude API overview and Claude features page.

Compact does not mean “remove all detail.” The goal is selective retention. Keep stable facts such as architecture, coding standards, open bugs, and the exact task. Drop bulky logs, repeated explanations, and code that Claude no longer needs to reason about the next step.

State the current task

Start with one tight instruction such as Refactor the auth middleware to support role-based checks without changing the public API.
Attach only the relevant context

Pass the specific files, functions, error messages, or test failures involved now. Avoid dumping the whole repository if only three files matter.
Ask for a compressed handoff

Use a prompt like Summarise what we changed, what is still broken, and which files matter in under 200 words. That summary becomes your next-session context.
Start fresh when the thread gets noisy

Open a clean session and paste the compact handoff instead of carrying every previous exchange forward.

This approach matters more as context windows get larger, not less. Claude supports large context on certain models, including up to 1,000,000 tokens on Claude Opus 4.7, Opus 4.6, and Sonnet 4.6 at standard rates according to Anthropic’s published pricing and model documentation. A big context window is useful, but bigger windows also make it easier to be sloppy. Compacting keeps the model attentive and your spend more predictable.

90% off

cached input tokens with prompt caching

If you are building your own Claude-powered coding workflow, this cost optimisation is often more important than chasing the largest model. A clean reusable project prompt plus caching is usually better than sending a bloated prompt on every call.

What you’d actually do with it

Here are concrete cases where a compact workflow helps. These examples are less about a hidden feature and more about how skilled users keep Claude Code useful across longer sessions.

1. Compress a debugging session before it sprawls

Say you have spent 30 minutes pasting stack traces, trying two fixes, and discussing edge cases. Instead of continuing in the same overloaded thread, ask for a compact handoff:

Summarise this debugging session for a fresh Claude Code run.
Include:
- root cause we suspect
- fixes already attempted
- files involved
- exact failing test
- next best step

Keep it under 180 words.

You then start a clean session with that summary plus the failing test output. The benefit is not only lower token use. It also reduces the chance that Claude keeps anchoring on abandoned hypotheses.

2. Refactor one module without dragging in the whole repo

Many developers overshare context. If the task is isolated, make the scope explicit:

Work only on these files:
- src/auth/middleware.ts
- src/auth/policies.ts
- tests/auth.middleware.spec.ts

Ignore unrelated services unless a dependency is required.
Goal: add role hierarchy checks and keep current route signatures unchanged.

This is compacting by boundary-setting. You are telling Claude what not to think about. That often improves answer quality as much as it lowers context growth.

3. Create a reusable project brief for repeat work

If you work on the same application every day, create a stable short brief and reuse it:

Project brief:
- Next.js app with TypeScript
- PostgreSQL via Prisma
- Stripe subscriptions
- We prefer small pure functions and explicit types
- Write tests with Vitest
- Do not change public API routes without asking

Current task:
Fix duplicate webhook handling in billing sync.

This keeps each new Claude Code session compact from the start. If you are using API workflows, repeated stable context can pair well with caching.

4. Estimate token cost for a longer coding loop

Suppose you repeatedly send a 200,000-token project context and get back 20,000 tokens of output on Claude Sonnet 4.6. The raw cost without caching would be straightforward to estimate from Anthropic’s published token pricing.

Worked example

Five long Sonnet 4.6 coding turns without caching

Input per turn200,000 tokens

Output per turn20,000 tokens

Five-turn input total1,000,000 tokens

Five-turn output total100,000 tokens

Input cost at $3/M$3.00

Output cost at $15/M$1.50

Total$4.50

A compact workflow can lower repeated input, and prompt caching can reduce cached input cost much further.

The point is not that $4.50 is always expensive. The point is that repeated unnecessary context compounds quickly in real engineering work.

5. Hand off a session to yourself or a teammate

Compact summaries are useful even outside solo prompting. Ask Claude Code for a strict handoff note:

Create a handoff note for the next engineer.
Format:
1. Goal
2. Current status
3. Files changed
4. Open questions
5. Next command to run

Keep it concise and factual.

This is helpful on Team or Enterprise setups where several people may touch the same workspace. Anthropic’s team plans add shared workspace and admin features, but clear compact handoffs still matter more than tooling alone.

Pick when

Your session is getting long and repetitive
You want lower token use
You need Claude to focus on the current bug or refactor
You plan to restart with a clean prompt

Skip when

The task depends on broad cross-repo context
You have not yet identified which files matter
The omitted history contains decisions Claude still needs
You are compressing so aggressively that key constraints disappear

Vs. the alternatives

People searching for “claude code compact” are often comparing Claude-based coding workflows with tools like Cursor, GitHub Copilot, or Sourcegraph Cody. The real comparison is not just model quality. It is how well each option manages context, how much control you get, and whether you want a chat-first workflow or a heavily integrated editor experience.

Option	Where it fits	Strengths	Trade-offs
Claude Code with compact workflow	Developers who want deliberate control over context	Strong reasoning, flexible summaries, works well for handoffs and long tasks	Requires more prompt discipline; compacting is partly a user habit, not magic
Cursor	Editor-centric coding with strong inline assistance	Tight IDE workflow, convenient codebase operations	May feel more opinionated; context handling is less transparent to some users
GitHub Copilot	Fast completions and common coding help inside popular editors	Low friction, broad adoption, good for short iterative work	Less suited to long structured reasoning unless paired with careful prompting
Sourcegraph Cody	Code search and repository-aware assistance	Useful for understanding larger codebases	Value depends heavily on your code search and repo workflow
Direct Claude API workflow	Teams building custom coding agents or internal tools	Fine control over prompts, caching, batching, and model choice	More engineering work to build and maintain

The honest trade-off is this: Claude-style compact workflows reward explicit thinking. If you want the assistant to infer everything from a giant repository plus a vague request, a more IDE-opinionated tool may feel easier. If you want to control what context is in play and why, Claude is often a better fit.

The honest take

Claude Code compact is not a magic button that fixes bad prompts. It is a practical way of working: keep the current objective clear, keep the relevant files close, compress the history, and restart cleanly when a session gets noisy. If you regularly work on multi-step coding tasks, this habit can improve focus, reduce token waste, and make Claude more consistent from one session to the next.

For most developers, the best setup is simple: use Claude Sonnet 4.6 as the default, move to Opus 4.7 when the reasoning load is higher, and treat compact summaries as part of your normal workflow. If you need official access, use claude.ai for the product and Anthropic’s documentation for the technical details; if you want independent explanations, comparisons, and context, keep this page alongside our Claude Code guide.

Want the official product? — Open Claude directly, or compare workflows first in our independent guide.

Try Claude →

Independent guide. Not affiliated with Anthropic. For the official Claude product, visit claude.ai.

Last updated: 2026-05-12

Plans & pricing
Anthropic claude.com Official

Retrieved 2026-05-06
Models overview
Anthropic platform.claude.com Official

Retrieved 2026-05-06
Anthropic news
Anthropic anthropic.com Official

Retrieved 2026-05-06
Claude support center
Anthropic support.anthropic.com Official

Retrieved 2026-05-06
Anthropic Trust Center
Anthropic trust.anthropic.com Official

Retrieved 2026-05-06

The short answer

How it works

State the current task

Attach only the relevant context

Ask for a compressed handoff

Start fresh when the thread gets noisy

What you’d actually do with it

1. Compress a debugging session before it sprawls

2. Refactor one module without dragging in the whole repo

3. Create a reusable project brief for repeat work

4. Estimate token cost for a longer coding loop

5. Hand off a session to yourself or a teammate

Pick when

Skip when

Vs. the alternatives

Other questions readers ask

The honest take

Related Reading