Claude 3.5 Sonnet API Pricing

Claude 3.5 Sonnet API pricing is no longer the current Anthropic pricing tier: the closest current replacement is Claude Sonnet 4.6 at $3 per million input tokens and $15 per million output tokens, and this guide explains how that maps to older searches, how billing works, and where costs usually change in practice. For a broader overview, see our Claude pricing guide or the main Claude API hub.

Free tier · no card
API priced per million tokens

The short answer
How it works
What it costs
Limits and gotchas
Other questions readers ask
The honest take

The short answer

Abstract API request-response illustration

If you searched for claude 3.5 sonnet api pricing, the practical answer is this: Anthropic’s current general-purpose default is Claude Sonnet 4.6, priced at $3/M input tokens and $15/M output tokens through the API on platform.claude.com. Older searches for Claude 3.5 Sonnet usually mean “what does the mid-tier Claude model cost in the API?”, and today that price point is Sonnet 4.6, not a separate 3.5 listing on the current pricing page.

Anthropic bills API usage by tokens processed, not by request. Input tokens are what you send in the prompt and conversation history. Output tokens are what Claude generates back. If you are comparing model families, the current lineup is simple: Haiku 4.5 is cheapest, Sonnet 4.6 is the balanced default, and Opus 4.7 is the premium option. If you are new to the ecosystem, our independent Claude guide, Claude features overview, and Claude Code guide give the broader context.

Current model	Position	Input price	Output price
Claude Haiku 4.5	Fastest and cheapest	$1/M tokens	$5/M tokens
Claude Sonnet 4.6	Recommended default	$3/M tokens	$15/M tokens
Claude Opus 4.7	Flagship	$5/M tokens	$25/M tokens

Worked example

Simple Sonnet 4.6 API cost

300,000 input tokens$0.90

40,000 output tokens$0.60

Total$1.50

For many app workloads, output tokens drive cost faster than input.

cost = (input_tokens / 1_000_000 * 3) + (output_tokens / 1_000_000 * 15)

How it works

Abstract API metering / pricing illustration

Anthropic’s API pricing works on a token-metered model. Each request includes prompt content, system instructions, tool schemas, conversation history, and any other text you send. That becomes input tokens. Claude then returns generated text, structured output, or tool calls, which become output tokens. The API meter tracks both sides separately, so the same request pattern can be cheap or expensive depending on prompt length and answer length.

In practice, developers estimate cost by multiplying total input and output token counts by the model’s listed per-million-token rate from the official pricing documentation, then adjusting for optimisations like prompt caching or the Batch API. Anthropic also documents model capabilities and availability in its models overview. If you need implementation details, the official API docs at docs.claude.com and platform.claude.com are the primary references.

Choose the model

Pick claude-sonnet-4-6 for the usual balance of cost and quality, claude-haiku-4-5 for low-latency budget work, or claude-opus-4-7 when answer quality matters more than price.
Count both token directions

Estimate prompt size, conversation history, tool definitions, and expected completion length. A short prompt with a long answer can still cost more on output than input.
Apply the model rate

Use the listed per-million-token rates from Anthropic. The basic formula is input spend plus output spend, measured separately.
Reduce waste

Use prompt caching when the same long instructions repeat, and use Batch API when your workload is not latency-sensitive.

The key billing mistake is assuming “one API call” has one fixed price. It does not. The token mix matters more than the request count.

What it costs

Bar chart of Claude API pricing — current model lineup.

The current Anthropic API pricing relevant to this query is straightforward. Claude Sonnet 4.6 costs $3 per million input tokens and $15 per million output tokens. If you landed here looking for Claude 3.5 Sonnet, treat Sonnet 4.6 as the current equivalent reference point on the active pricing page at claude.com/pricing.

Model	Input	Output	Best fit
Claude Haiku 4.5	$1/M tokens	$5/M tokens	High-volume, speed-sensitive tasks
Claude Sonnet 4.6	$3/M tokens	$15/M tokens	General app default
Claude Opus 4.7	$5/M tokens	$25/M tokens	Highest-quality reasoning and output

That table covers the base rates. Two official discounts matter a lot in real workloads. Prompt caching gives 90% off cached input tokens. That is useful when you repeatedly send the same large instructions, policy blocks, documents, or tool definitions. Batch API gives 50% off both input and output, which is often the cheapest route for asynchronous jobs such as nightly enrichment, backfills, or content classification.

90% off

cached input tokens with prompt caching

Anthropic also supports long context on the higher-capability models at standard rates. According to the current pricing facts, Opus 4.7, Opus 4.6, and Sonnet 4.6 support up to 1,000,000 tokens of context. Long context is useful, but it can raise cost quickly because every extra token you send is billable input unless it is served from cache.

Pick Sonnet 4.6 when

You want the default balance of quality, speed, and spend
You need stronger output than Haiku 4.5 without paying Opus 4.7 rates
Your app mixes chat, summarisation, extraction, and coding tasks

Skip Sonnet 4.6 when

Latency and cost matter more than answer quality
Your workload is simple enough for Haiku 4.5
You need the highest-end reasoning and can justify Opus 4.7 pricing

For non-API users, Claude also has consumer and team plans on the product side. The current subscription lineup includes Free at $0/month, Pro at $20/month or $17/month annual, Max from $100/month, Team Standard at $25/seat/month or $20/seat/month annual, Team Premium at $125/seat/month or $100/seat/month annual, and Enterprise at $20/seat base plus usage at API rates. Those plans live on the app side, while this page focuses on API billing. If you need the app-plan breakdown, use our pricing guide.

A few budgeting rules help. First, watch output length because it is usually the more expensive side on Sonnet and Opus. Second, avoid resending large static instructions on every turn if prompt caching can cover them. Third, test with real production prompts, not toy examples. Small prompt changes can have a bigger cost effect than switching models.

Limits and gotchas

Cost-optimisation discounts (prompt caching + Batch API).

Developers usually get surprised by limits, availability details, and billing edge cases rather than the list price itself. These are the points to watch before you estimate spend or ship an integration.

Rate limits are account-specific. Anthropic can apply tiered rate limits by account, model, and usage pattern, so your effective throughput may differ from another developer’s setup.
Model access can depend on account status. Not every account gets every model immediately. Check your workspace in platform.claude.com before planning around a specific model name.
Older model names may disappear from current pricing pages. That is why searches for Claude 3.5 Sonnet can be confusing. Current pricing pages list active models, not every historical alias or release.
Long context is powerful but expensive. A 1,000,000-token window does not mean every request should be huge. Large prompts can dominate cost and latency.
Prompt history compounds input spend. In chat-style apps, each turn may resend prior context. If you do not trim conversation history, costs rise quietly.
Tool definitions count as input. Developers often forget that large schemas, instructions, and tool manifests add billable tokens.
Batch API is cheaper, not faster. The 50% discount is attractive, but it is designed for asynchronous workflows, not user-facing responses.
Prompt caching helps repeated context only. If each request is mostly new text, the caching discount may not move the total much.
Regional, compliance, and enterprise controls vary by plan. Features like regional data residency, SCIM, audit logs, role-based access, and spend controls are associated with enterprise offerings, not the default consumer setup.
Status incidents do happen. If latency or availability matters, monitor status.claude.com instead of assuming a cost issue is always an application bug.

Another common gotcha is mixing up Claude app subscriptions with API usage. Paying for Pro or Max on claude.ai does not mean you get unlimited API tokens included. The API is billed separately through the developer platform. That distinction causes a lot of pricing confusion.

The honest take

If your search is really “what should I budget for Claude 3.5 Sonnet in the API?”, use Claude Sonnet 4.6 at $3/M input and $15/M output as the current answer. That is the active Sonnet-tier reference on Anthropic’s pricing pages. For most production apps, it is the sensible middle ground between Haiku’s low cost and Opus’s premium pricing.

The bigger pricing issue is rarely the list rate. It is prompt design, repeated context, and output length. Teams that cache repeated prompts, trim conversation history, and reserve Opus for the few tasks that need it usually control spend well. Teams that send huge prompts on every turn usually do not.

Need the official product? — Use Claude directly, or compare it with our independent pricing coverage first.

Try Claude →

Independent guide. Not affiliated with Anthropic. For the official Claude product, visit claude.ai.

Last updated: 2026-05-10

This article is part of the Claude API for developers hub on c-ai.chat.

Plans & pricing
Anthropic claude.com Official

Retrieved 2026-05-06
Models overview
Anthropic platform.claude.com Official

Retrieved 2026-05-06
Anthropic news
Anthropic anthropic.com Official

Retrieved 2026-05-06
Claude support center
Anthropic support.anthropic.com Official

Retrieved 2026-05-06
Anthropic Trust Center
Anthropic trust.anthropic.com Official

Retrieved 2026-05-06

The short answer

How it works

Choose the model

Count both token directions

Apply the model rate

Reduce waste

What it costs

Pick Sonnet 4.6 when

Skip Sonnet 4.6 when

Limits and gotchas

Other questions readers ask

The honest take