Features & Capabilities

Claude Token Cost — Per-Token Pricing

8 min read This article cites 5 primary sources

Claude token cost is usage-based API pricing: you pay per million input and output tokens, with current headline rates of $5/$25 for Opus 4.7, $3/$15 for Sonnet 4.6, and $1/$5 for Haiku 4.5. This guide is from c-ai.chat, an independent reference site, and it breaks down the prices, how token billing works, when costs rise, and where discounts apply.

Claude Token Cost — Per-Token Pricing — hero illustration.
Claude Token Cost — Per-Token Pricing

What it does at a glance

Capability diagram for claude token cost
Capability diagram for claude token cost

Claude token cost means the amount Anthropic charges for API usage based on how many tokens you send in and how many tokens Claude sends back, with different rates by model and extra savings from prompt caching and the Batch API.

  • Opus 4.7 · $5/M input, $25/M output
  • Sonnet 4.6 · $3/M input, $15/M output
  • Haiku 4.5 · $1/M input, $5/M output
  • Discounts · 90% off cached input, 50% off Batch API

If you are comparing model tiers first, see our Claude models guide. If you are deciding between chat subscriptions and API billing, the short version is simple: plans like Free, Pro, Max, Team, and Enterprise cover product access in Claude apps, while token cost applies to developer usage through the Claude API.

ModelInput tokensOutput tokensTypical use
Claude Opus 4.7$5 per million$25 per millionHighest-quality reasoning, coding, complex long-context work
Claude Sonnet 4.6$3 per million$15 per millionDefault choice for most production apps
Claude Haiku 4.5$1 per million$5 per millionFast, cheap classification, extraction, and lightweight chat

There is also a practical split between API costs and product plans at claude.com/pricing. Free is $0/month. Pro is $20/month or $17/month annual. Max starts from $100/month. Team Standard is $25/seat/month or $20/seat/month annual, and Team Premium is $125/seat/month or $100/seat/month annual. Enterprise starts with a $20/seat base plus usage at API rates. Those subscription prices do not replace token billing for API use.

How it works

Tokens are the units Claude uses to process text. When your app sends a prompt, system instructions, tool definitions, conversation history, or attached text into the model, those count as input tokens. When Claude replies, that reply counts as output tokens. Your bill is the combined cost of both directions, multiplied by the rate for the model you picked.

The main reason costs vary so much is that output is usually priced much higher than input, and long conversations silently add up because every turn can include previous context again. That means a cheap-looking prompt can become expensive if you keep appending large documents, allow very long answers, or run the same heavy prompt at scale. For teams building production systems, prompt design matters almost as much as model choice. If you are working with code-centric workflows, our Claude Code guide explains where token-heavy development loops appear.

90% off

cached input tokens with prompt caching

Anthropic offers two important cost levers. Prompt caching reduces repeated input cost by 90% for cached input tokens, which helps when your application keeps reusing large stable instructions or documents. The Batch API cuts both input and output prices by 50% when you can tolerate asynchronous processing rather than immediate responses. Long context up to 1,000,000 tokens is available on Opus 4.7, Opus 4.6, and Sonnet 4.6 at standard rates, but a larger context window does not make usage cheaper; it just gives you room to send more, which can raise the final bill if you are not careful.

Worked example

Estimating one Sonnet 4.6 request

Input: 100,000 tokens at $3/M$0.30
Output: 10,000 tokens at $15/M$0.15
Total$0.45

A single large request can still be affordable, but repeated high-volume use is where model choice and prompt trimming start to matter.

For feature-level context beyond pricing, see our overview of Claude features. The billing model stays the same even when you add tools, large prompts, or longer reasoning chains: more tokens in or out means higher cost.

When this feature actually helps

Use-case scene for claude token cost
Use-case scene for claude token cost

Understanding Claude token cost helps most when you need to predict spend before shipping an app, choose the right model for a workload, or explain to a team why two similar prompts can have very different bills.

  • Budgeting an internal AI tool: If your company wants document Q&A, support drafting, or analyst workflows, token pricing tells you whether Sonnet 4.6 is enough or Opus 4.7 is worth the premium.
  • Choosing between quality and speed: Haiku 4.5 is often the low-cost fit for classification, tagging, routing, and extraction tasks where top-end reasoning is not necessary.
  • Controlling long-context costs: If you send large files or transcript history, knowing the input rate stops you from treating context as free.
  • Optimising repeated prompts: Teams with stable system prompts, policy blocks, or reusable reference docs can save heavily with prompt caching.
  • Estimating unit economics: Founders and product teams can translate tokens into cost per chat, cost per report, or cost per user per month.

Pick when

  • You need a clear cost model before launching an API workflow
  • You want to compare Opus 4.7, Sonnet 4.6, and Haiku 4.5 on budget, not just quality
  • Your prompts are large, repeated, or sent at high volume
  • You need to explain spend drivers to non-technical stakeholders

Skip when

  • You only use the consumer Claude app and do not touch the API
  • Your usage is tiny enough that a rough monthly estimate is sufficient
  • You assume model subscriptions and API billing are the same thing
  • You need an exact invoice forecast without testing real prompt lengths

A common pattern is to start with Sonnet 4.6 as the default, move simple bulk jobs to Haiku 4.5, and reserve Opus 4.7 for requests where better reasoning changes the outcome enough to justify the higher output price. That is usually more efficient than putting every request through the most expensive model.

What it can’t do

Token pricing is useful, but it does not tell you your real bill by itself. It cannot predict exact costs without knowing prompt length, response length, retry rates, tool calls, concurrency, and how much repeated context your application sends on each request. It also does not tell you whether a cheaper model will actually perform well enough for your task.

  • It is not flat-rate: there is no single per-message cost because each request can be a different size.
  • Long chats can mislead you: conversation history often gets resent, so cost can rise even when the newest user message is short.
  • Output can dominate spend: verbose answers are expensive relative to input, especially on Opus 4.7 and Sonnet 4.6.
  • Prompt caching is conditional: you only save when the same input is actually cacheable and reused.
  • Batch API is not instant: the 50% discount trades speed for lower price.
  • Subscriptions do not eliminate API charges: paying for Pro, Max, Team, or Enterprise access does not make token billing disappear for API workloads.

Other questions readers ask

These are the related pricing questions that usually come up alongside claude token cost.

For official reference, Anthropic publishes plan pricing at claude.com/pricing, model information at platform.claude.com, API pricing at platform.claude.com/docs, and service availability on status.claude.com.

The honest take

Claude token cost is straightforward once you separate two things: app subscriptions and API billing. For API use, Anthropic charges per million tokens, with Haiku 4.5 as the cheapest option, Sonnet 4.6 as the practical default for many workloads, and Opus 4.7 as the premium choice when higher-quality reasoning or coding output is worth the extra spend. The biggest pricing trap is not the headline rate. It is unnecessary context, long outputs, and repeated prompts that quietly multiply usage.

If you only need Claude through the official app, a monthly plan may be the more relevant question. If you are building with the API, token accounting matters immediately. Start with a small real-world test set, measure average input and output lengths, then optimise with prompt caching or Batch API where it fits. That gives you a usable cost model instead of guesswork.

Want the official product? — Compare your estimate against Anthropic’s current plans and try Claude directly.

Try Claude →

Independent guide. Not affiliated with Anthropic. For the official Claude product, visit claude.ai.

Last updated: 2026-05-12