Claude AI API — Pricing, Keys & Documentation

12 min read

The Claude API is Anthropic’s developer interface for calling Claude from your own software; this independent Claude AI guide explains API keys, pricing, first requests, SDKs, limits, and cost controls.

Claude AI API — Pricing, Keys & Documentation — hero illustration.
Claude AI API — Pricing, Keys & Documentation

c-ai.chat is not Anthropic and does not impersonate claude.ai. Use claude.ai for the official chat product and platform.claude.com for developer access. If you want subscription prices instead of token-metered API billing, see Claude pricing.

Table of contents

What the Claude API is

The Claude API is a hosted model interface from Anthropic. Your application sends structured requests, chooses a model, sets limits such as maximum output tokens, and receives text or streamed responses.

  • Access: use an Anthropic API key.
  • Billing: pay per million input and output tokens.
  • Models: choose from Opus, Sonnet, and Haiku tiers.
  • Common uses: apps, agents, document analysis, coding workflows, and support tools.

Official setup: use Anthropic’s API quickstart when you create a key or run code. Use this guide for planning, model choice, and cost controls.

Getting an API key

To use the Claude API, create or sign in to the Anthropic Console, generate an API key, add billing when required, and store the key as a server-side secret.

  1. Open the developer platform

    Go to platform.claude.com and sign in with the account you want to use for API billing and workspace management. The developer platform is separate from normal chat use on claude.ai.

  2. Set up your workspace

    Confirm the organization or workspace, review billing requirements, and check whether your team needs an admin owner before developers create keys.

  3. Create an API key

    Open the API keys area, create a new key, and name it by environment, such as prod-support-bot or dev-evals. Clear names make rotation and audits easier.

  4. Store the key as a secret

    Save it in a secret manager or environment variable named ANTHROPIC_API_KEY. Do not paste it into client-side JavaScript, mobile apps, public notebooks, or screenshots.

  5. Test with a small request

    Send a short Messages API request, confirm that authentication works, then add logging, retry handling, and usage monitoring before production traffic.

The safest pattern is to keep Claude API calls behind your own backend. Browser and mobile clients should call your server, not Anthropic directly. Any key shipped to a user device should be treated as exposed.

For the official flow, use Anthropic’s quickstart documentation. For regulated or enterprise data, review Anthropic’s trust center and your own contract terms before sending production data.

API pricing

Claude API pricing is metered per million tokens, with separate rates for input tokens and output tokens. Long prompts, long conversation history, and large retrieved documents cost more than short requests.

Claude Opus 4.7

$5/M input tokens · $25/M output tokens

For flagship reasoning, difficult analysis, and long-context work.

  • Flagship model
  • 1,000,000-token context
  • Use when quality matters more than cost

Claude Opus 4.6

$5/M input tokens · $25/M output tokens

For production systems pinned to the previous Opus generation.

  • Previous flagship
  • Useful for compatibility testing
  • Keep only when evaluations justify it

Claude Haiku 4.5

$1/M input tokens · $5/M output tokens

For fast, lower-cost tasks at higher volume.

  • Classification and extraction
  • Routing and lightweight drafting
  • Cost-sensitive production paths

Use the official Claude pricing page and platform pricing docs before committing budgets. Your bill depends on input tokens, output tokens, cached input, batch jobs, and any enterprise agreement.

Do not confuse API pricing with chat subscriptions. Free is $0. Pro is $20/mo, or $17/mo with annual billing. Max starts at $100/mo. Team Standard is $25/seat/mo, or $20/seat/mo with annual billing. Team Premium is $125/seat/mo, or $100/seat/mo with annual billing. Enterprise uses a $20/seat base plus API rates.

ModelInput rateOutput rateNotable detailTypical API choice
Claude Opus 4.7$5/M tokens$25/M tokens1,000,000-token contextHard reasoning, complex code review, high-stakes analysis, and long-context synthesis.
Claude Opus 4.6$5/M tokens$25/M tokensPrevious flagshipPinned systems that depend on the previous Opus generation.
Claude Sonnet 4.6$3/M tokens$15/M tokens1,000,000-token context; 128K max outputProduction assistants, agents, document workflows, and developer tools.
Claude Haiku 4.5$1/M tokens$5/M tokensLowest listed input and output ratesFast extraction, triage, routing, summarisation, and high-volume support tasks.

The practical pricing question is not only “Which model is cheapest?” It is “How many tokens will this workflow send and receive?” A support bot that includes a long policy manual in every request can cost more than expected. A document workflow that sends a large contract once, caches the stable part, and asks several small questions can be much cheaper.

For model selection, start with Sonnet 4.6, measure results, then move down to Haiku 4.5 where quality remains acceptable or up to Opus 4.7 where the task needs stronger reasoning. The Claude models guide compares model families and common trade-offs.

Cost optimisation: caching + batch

The two largest Claude API cost levers are prompt caching and the Batch API. Prompt caching reduces repeated input cost when the same long context appears across requests. The Batch API reduces cost for asynchronous work that does not need an immediate response.

90% off

cached input tokens with prompt caching

Prompt caching works best when the beginning of the request stays stable. Common examples include a long system prompt, a style guide, a product catalog excerpt, a legal template, or retrieved documentation that many users ask about.

Do not treat caching as a substitute for prompt design. If your prompt includes irrelevant documents, stale chat history, or copied data that the model does not need, trim first. Then cache the stable part that remains useful.

Use caching when

  • The same long context appears in many requests.
  • You run assistants over stable documentation or policy text.
  • You use large system prompts, examples, or tool instructions.
  • You can separate stable context from changing user input.

Skip caching when

  • Each request contains different source material.
  • The prompt is already short.
  • You change the cached section on nearly every call.
  • Your main cost is output length rather than input length.

The Batch API is for non-interactive jobs: evaluations, backfills, large content reviews, scheduled extraction, classification, or summarising a queue of documents. Your system submits work, checks status, and processes results when the batch completes.

Use batch processing when a delayed result is acceptable. Do not use it for live chat, autocomplete, support conversations, or anything where a user expects a response in seconds. For those cases, use the standard Messages API and control cost with model choice, prompt length, output caps, and caching.

Cost leverBest fitWhat it savesMain trade-off
Prompt cachingRepeated long prompts or repeated reference context90% off cached input tokensYou need stable prompt sections and cache-aware request design.
Batch APIOffline jobs, evaluations, queues, content processing50% off input and output tokensResults are asynchronous, so it is not suitable for live user interaction.
Model routingMixed workloads with easy and hard tasksUses Haiku for simple work, Sonnet for default work, and Opus only when neededYou need evaluation data to avoid routing hard tasks to a weaker model.
Output capsTasks with predictable response sizeLimits expensive output tokensToo small a cap can truncate useful answers.

For official implementation details, see Anthropic’s prompt caching documentation and Batch API documentation.

Your first API call

The quickest Claude API test is a Messages API request with one user message, a model, and a maximum output length. Use a small prompt first so you can confirm authentication, headers, response parsing, and billing before adding tools or large context.

Worked example

One prompt, one Claude response

Endpoint/v1/messages
Starter modelClaude Sonnet 4.6
SecretANTHROPIC_API_KEY
GoalReturn a short product brief

Set ANTHROPIC_VERSION and CLAUDE_MODEL from Anthropic’s official docs before running the snippets.

export ANTHROPIC_API_KEY="sk-ant-replace-with-your-key"
export ANTHROPIC_VERSION="replace-with-current-api-version-from-docs"
export CLAUDE_MODEL="replace-with-current-sonnet-model-id"

curl https://api.anthropic.com/v1/messages 
  -H "content-type: application/json" 
  -H "x-api-key: $ANTHROPIC_API_KEY" 
  -H "anthropic-version: $ANTHROPIC_VERSION" 
  -d '{
    "model": "'"$CLAUDE_MODEL"'",
    "max_tokens": 300,
    "messages": [
      {
        "role": "user",
        "content": "Write a two-sentence product brief for a password manager used by freelancers."
      }
    ]
  }'

If authentication succeeds, the response contains an assistant message with one or more content blocks. In a quick test, you can print the first text block. In production, inspect block types and handle errors, refusals, timeouts, and retries deliberately.

import os
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

message = client.messages.create(
    model=os.environ["CLAUDE_MODEL"],
    max_tokens=300,
    messages=[
        {
            "role": "user",
            "content": "Write a two-sentence product brief for a password manager used by freelancers.",
        }
    ],
)

for block in message.content:
    if block.type == "text":
        print(block.text)

For a real application, add four pieces before launch: server-side key storage, request logging without sensitive data, retry logic for transient failures, and a usage budget. If the response will be shown to a user, streaming can improve perceived latency because text appears as Claude generates it.

The same request structure can later support system instructions, tool use, streaming, structured output patterns, and longer context. The Claude features guide explains major capabilities from a product perspective.

SDKs and tooling

Most developers should use an official SDK unless they have a reason to call raw HTTP. SDKs reduce boilerplate around authentication, request construction, streaming, and typed responses while still mapping closely to the API.

  • Python: use the official Anthropic Python package for scripts, data workflows, backend services, evaluations, and notebooks.
  • TypeScript and JavaScript: use the official SDK for Node.js backends, serverless functions, and full-stack apps that keep API calls on the server.
  • Raw Anthropic API: call the HTTPS endpoint directly from any language that can send JSON and handle streaming.
  • Amazon Bedrock: access Claude through AWS identity, networking, billing, and regional controls when your organization standardizes on AWS.
  • Google Vertex AI: access Claude through Google Cloud projects, IAM, quotas, and regional deployment patterns.

Anthropic documents API behavior at docs.claude.com and in the platform model docs. If your company uses cloud-provider procurement or private networking, compare the direct Anthropic API with Claude on Amazon Bedrock and Claude on Google Vertex AI.

RouteAuthenticationBilling pathBest for
Direct Anthropic APIAnthropic API keyAnthropic platform billingTeams that want direct access to Anthropic’s API surface and docs.
Amazon BedrockAWS IAMAWS account billingOrganizations already running production AI workloads inside AWS.
Google Vertex AIGoogle Cloud IAMGoogle Cloud billingOrganizations using Google Cloud governance, projects, and regional controls.

Claude Code is different from the API. Claude Code is a coding tool for working in repositories and terminals. The API is for building your own software on Claude.

Community libraries can help, but treat them carefully. Check whether they support the Messages API, streaming, tool use, prompt caching, batch processing, and the model names you plan to use. Pin dependency versions and keep your own integration tests.

Rate limits and quotas

Claude API limits vary by workspace, model, usage tier, cloud route, and approval status. Check the official rate limits documentation and your Console limits page before sizing production traffic.

  • Requests per minute: do not assume high volume until your actual workspace limits confirm it.
  • Tokens per minute: token limits often become the real cap. Ten requests with 100,000 input tokens each can consume 1,000,000 input tokens before output is counted.
  • Output limits: cap max_tokens to the smallest useful response size. Long answers cost more and can slow user-facing workflows.
  • Concurrency: start batch-style workers conservatively, such as 5–20 concurrent tasks, then tune based on observed latency and 429 responses.
  • Retries: retry 429 and transient server errors with exponential backoff, jitter, and a maximum of 2–3 attempts. Do not retry a failing request in a tight loop.
  • Timeouts: use streaming for interactive responses and set longer server timeouts, such as 60–120 seconds, for large generations or long-context analysis.
  • Daily budget controls: set internal spend alerts before launch. A bug that repeats a long prompt can create a large bill faster than a normal chat workflow.
  • Status checks: if errors rise across all requests, check Claude status before assuming your code is broken.

For production systems, add a queue between your application and Claude. A queue lets you smooth traffic spikes, apply per-user budgets, pause nonessential jobs, and keep retry storms from turning a short outage into a larger incident.

Log token usage by feature, not only by customer. The expensive path is often a specific workflow: long conversation memory, repeated document context, verbose outputs, or a prompt chain that calls the model several times for one user action.

Honest take

The Claude API is a strong choice if you need high-quality writing, analysis, coding help, tool-using assistants, or long-context document workflows inside your own product. Sonnet 4.6 is the sensible default for most teams. Haiku 4.5 is the cost-control option for simpler high-volume work. Opus 4.7 is the higher-cost choice when task difficulty justifies it.

It is not right for every case. If you only want personal chat, use claude.ai. If you need deterministic output, add validation, tests, schemas, and fallbacks. If you compare providers, use the same prompts, latency targets, and token budgets; our Claude resources page lists comparison and implementation aids.

FAQ

Is the Claude API the same as claude.ai?

No. claude.ai is Anthropic’s official chat product for end users. The Claude API is for developers who want to call Claude from their own apps, services, agents, scripts, and internal tools.

How do I get a Claude API key?

Sign in through platform.claude.com, create or select a workspace, open the API keys area, and generate a key. Store it in a secret manager or environment variable. Do not put it in public client code.

How much does the Claude API cost?

The main public rates are Claude Opus 4.7 at $5/M input tokens and $25/M output tokens, Claude Sonnet 4.6 at $3/M input tokens and $15/M output tokens, and Claude Haiku 4.5 at $1/M input tokens and $5/M output tokens. Opus 4.6 is also priced at $5/M input tokens and $25/M output tokens.

Does a paid Claude chat plan include API usage?

Treat chat subscriptions and API billing as separate unless your Anthropic contract says otherwise. A chat plan is for using Claude through Anthropic’s product interface. API usage is metered by tokens through the developer platform.

Which Claude API model should I start with?

Start with Claude Sonnet 4.6 for most production use cases. Move to Haiku 4.5 when speed and cost matter more than depth. Move to Opus 4.7 when the task needs stronger reasoning, complex analysis, or long-context synthesis.

How can I reduce Claude API costs?

Use prompt caching for repeated long context, the Batch API for asynchronous jobs, smaller models for easier tasks, strict output limits, and shorter prompts. Measure token usage by feature so you can find the workflow that actually drives spend.

Can I use the Claude API from Python or TypeScript?

Yes. Anthropic provides official SDKs for Python and TypeScript or JavaScript. You can also call the HTTP API directly from other languages if you send the required headers and JSON body.

Where do I check Claude API outages?

Use status.claude.com for current service status. Your own app should still handle timeouts, rate limits, and transient errors gracefully.

Independent guide. Not affiliated with Anthropic. For the official Claude product, visit claude.ai.

Last updated: 2026-05-14