API & Developers

Claude Batch API — Bulk Processing

8 min read This article cites 5 primary sources

The Claude Batch API is Anthropic’s asynchronous bulk-processing option for submitting many Claude Messages API requests at once, checking status later, and receiving results at 50% off standard API token pricing; this independent c-ai.chat guide explains when to use it alongside our Claude API docs guide.

Claude Batch API — Bulk Processing — hero illustration.
Claude Batch API — Bulk Processing

50% off

Batch API input and output tokens

The short answer

Abstract API request-response illustration
Abstract API request-response illustration

The Claude Batch API lets developers send a large group of Messages API requests in one job, then retrieve the results after Anthropic processes them.

Use it when latency does not matter. It fits back-office work: classifying support tickets, scoring leads, extracting fields from documents, running model evaluations, generating product metadata, or applying the same prompt pattern to thousands of records.

Do not use it for real-time chat, live assistants, streaming responses, or user-facing workflows that need an answer in seconds. The trade-off is simple: you accept asynchronous processing for lower token cost.

Use the Batch API when

You have many similar requests, stable record IDs, and no user waiting for an immediate reply.

Use the standard API when

You need low latency, streaming, tool use in a live product flow, or tight control over response timing.

Minimal code shape

A batch is a list of Messages API requests with your own IDs

{
  "requests": [
    {
      "custom_id": "ticket-001",
      "params": {
        "model": "claude-sonnet-4-6",
        "max_tokens": 300,
        "messages": [
          {
            "role": "user",
            "content": "Classify this support ticket as billing, bug, or account."
          }
        ]
      }
    },
    {
      "custom_id": "ticket-002",
      "params": {
        "model": "claude-sonnet-4-6",
        "max_tokens": 300,
        "messages": [
          {
            "role": "user",
            "content": "Classify this second support ticket."
          }
        ]
      }
    }
  ]
}

Use Anthropic’s current platform documentation as the source of truth before shipping.

The important field is custom_id. It lets you match each result back to your source record after the batch finishes. Without a stable ID, bulk results become hard to reconcile.

How it works

Bar chart of Claude API pricing — current model lineup.
Bar chart of Claude API pricing — current model lineup.

The Batch API wraps normal Claude Messages API calls inside an asynchronous job. Each item includes the same kind of parameters you would send to the Messages API: model, messages, max output tokens, and other supported options.

Treat batches as jobs. Your app should create a batch, store the returned batch ID, check for completion, retrieve results, and update your database using the original custom_id values. Do not assume results arrive in the same order as inputs.

  1. Prepare records

    Normalize your rows, documents, tickets, or evaluation cases. Assign a stable custom_id to each item.

  2. Build Messages requests

    Create one Claude Messages API request per item. Keep prompts consistent, set the model, choose a sensible max_tokens, and include only the context each item needs.

  3. Create the batch

    Submit the batch through Anthropic’s API using the current endpoint or SDK method from the platform docs.

  4. Track status

    Store the batch ID and check whether the job is queued, running, complete, canceled, expired, or failed.

  5. Process results

    Map each result back to your database using custom_id. Save successful outputs and per-item errors.

A good batch workflow is boring. It logs inputs, records the model used, stores the prompt version, and keeps failed items for retry. This matters because batch workloads often touch large datasets. A silent prompt bug can affect thousands of rows.

If your application needs immediate responses, use the standard API flow described in our Claude API reference. If you are comparing product capabilities rather than building with code, use our Claude features guide and Claude models guide.

What it costs

Abstract API metering / pricing illustration
Abstract API metering / pricing illustration

The Batch API applies a 50% discount to both input and output tokens for eligible batch processing. Standard API prices differ by model. Output tokens are usually the expensive part, so cap outputs where you can.

ModelBest fitStandard input priceStandard output priceBatch API effective priceCapability note
Claude Opus 4.7Hard reasoning and long-context analysis$5/M tokens$25/M tokens$2.50/M input, $12.50/M output1M context
Claude Sonnet 4.6Default choice for most batch jobs$3/M tokens$15/M tokens$1.50/M input, $7.50/M output1M context; 128K max output
Claude Haiku 4.5Simple, high-volume work$1/M tokens$5/M tokens$0.50/M input, $2.50/M outputFastest and cheapest listed option

Haiku 4.5

$1/$5 per M tokens before batch discount

Use for simple classification, tagging, and extraction at high volume.

Sonnet 4.6

$3/$15 per M tokens before batch discount

Use as the default for quality-sensitive batch work.

Opus 4.7

$5/$25 per M tokens before batch discount

Use when harder reasoning or long-context analysis justifies the higher rate.

Prompt caching is separate. It can reduce cached input token cost by 90% when requests reuse the same large prompt prefix, instructions, schema, or reference material. Batching reduces the cost of asynchronous work. Caching reduces repeated input cost.

Worked example

Estimating a Sonnet 4.6 batch

Input tokens10M at $3/M before discount
Output tokens2M at $15/M before discount
Batch discount50% off both directions
Estimated total$30

The same workload outside the Batch API would be $60 before other optimisations.

The calculation above is only a token estimate. Real bills depend on exact token counts, model choice, cache hits, retries, failed requests, and account limits. See our Claude pricing guide for plan-level context and compare it with Anthropic’s official Claude pricing page.

Limits and gotchas

Cost-optimisation discounts (prompt caching + Batch API).
Cost-optimisation discounts (prompt caching + Batch API).

The Batch API is simple in concept, but bulk processing exposes mistakes quickly. Validate a small sample before sending a large job.

  • Rate limits still matter. Batch processing has capacity and account-level constraints. Your account may be limited by request volume, token volume, or usage tier.
  • It is asynchronous. Do not use it for live chat, streaming UI, support copilots, or anything where a human is waiting.
  • Model availability can vary. Confirm that the model you want is supported for your account and environment before building around it.
  • Compliance needs may affect deployment. Enterprise controls, regional data residency, HIPAA-ready options, and related settings depend on contract and configuration. Check Anthropic Trust and your account terms.
  • Large batches need careful sizing. Anthropic documents current request-count and file-size limits in its platform docs. Build your pipeline so it can split work into smaller batches.
  • One bad item should not break your process. Store per-item errors and retry only the failed records after fixing the cause.
  • Output order is not your data model. Always join results using custom_id, not array position.
  • Expired or canceled jobs need handling. Treat these as normal states in your job table.
  • Cost can still grow quickly. A 50% discount helps, but millions of long outputs can still cost more than expected.

Common errors include malformed JSON, invalid model names, missing authentication, unsupported parameters, batches that exceed documented limits, quota issues, and per-item validation failures. Log request metadata without storing sensitive user data unnecessarily. If Anthropic has an incident, check Claude status before debugging your own queue.

Pick it when

  • You have many similar API requests.
  • The job can finish later.
  • You can map results by stable IDs.
  • You care more about unit cost than response latency.

Skip it when

  • A user is waiting in a live interface.
  • You need token streaming.
  • Your requests require manual review before each call.
  • You have not tested the prompt on a small sample.

Security review is part of the job. Batch workloads often include customer records, sales data, support messages, or documents. Minimise what you send, redact where possible, and use your normal data-retention and access-control process. For account support, use Anthropic’s official support resources.

FAQ

Another practical pattern is to run smaller batches by task type, model, or customer segment. This makes errors and costs easier to isolate. It also makes retries safer.

The honest take

The Claude Batch API is worth using when you have high-volume, non-urgent work and a clean way to match each result back to a source record. The 50% discount is meaningful for repeatable jobs such as enrichment, extraction, classification, and evaluations.

The trade-off is operational. You need job tracking, retries, validation, and monitoring. Do not use it just because it is cheaper. Use it when the product requirement fits asynchronous processing. If a person is waiting, use the standard Claude API. If a dataset is waiting, batch it.

Building with Claude? Test prompts in the official product, then move production workloads to the API when you are ready.

Try Claude →

Independent guide. Not affiliated with Anthropic. For the official Claude product, visit claude.ai.

Last updated: 2026-05-12