Claude on Vertex AI means using Anthropic’s Claude models through Google Cloud’s Vertex AI instead of calling Anthropic directly; this guide explains how it works, what it costs, and where it differs from the native Claude API. c-ai.chat is an independent guide, not Anthropic, and if you want the broader Claude overview first, start at our Claude guide.

- The short answer
- How it works
- What it costs
- Limits and gotchas
- Other questions readers ask
- The honest take
- Claude via Vertex AI is a Google Cloud deployment path
- Native Anthropic API is still priced per million tokens
The short answer

Yes, you can use Claude with Vertex AI, but you are accessing Anthropic models through Google Cloud’s managed AI platform rather than through the native Claude API at platform.claude.com. For many teams that already run on Google Cloud, this helps with procurement, IAM, and consolidated billing; for developers who want the newest Anthropic features first, the direct Claude API is often the simpler reference point.
The practical difference is not the model family itself, but the control plane around it. You authenticate, deploy, and monitor through Vertex AI conventions, while model behavior and core Claude capabilities still come from Anthropic. If you are deciding between access routes, it also helps to compare the official Claude pricing structure with your cloud procurement requirements.
Worked example
Minimal request flow for Claude on Vertex AI
The request pattern is familiar, but the endpoint, auth, quotas, and console workflow are Google Cloud-specific.
{
"model": "claude-sonnet",
"messages": [
{"role": "user", "content": "Summarise this document in 5 bullet points."}
]
}
How it works

At a high level, Claude on Vertex AI is a hosted integration layer: Google Cloud exposes selected Anthropic models inside Vertex AI, and your application calls them with Google Cloud authentication and project-level controls. The model family remains Claude, but your operational surface area changes. That includes service accounts, quotas, logging, regional availability, and whatever controls your organisation already applies to Vertex AI workloads.
Developers should still keep Anthropic’s own model and API references close by, because those docs explain model capabilities, context windows, and feature behavior more clearly than cloud marketplace summaries. Anthropic’s official starting points are the models overview, the pricing documentation, and the broader product and trust pages at Anthropic and Anthropic Trust. If you plan to build coding workflows around Claude, our Claude Code guide covers the direct product side better than Vertex-specific material.
-
Enable the model in your Google Cloud environment
Confirm that the Claude model you want is actually available in Vertex AI for your project and region. Availability can differ by account setup and rollout timing.
-
Authenticate with Google Cloud credentials
Use the IAM path your team already trusts, typically a service account or managed workload identity, rather than an Anthropic API key stored in your app.
-
Send a request in the format Vertex AI expects
Your payload still contains Claude-style instructions and content, but the endpoint wrapper, request schema details, and SDK calls follow Google Cloud conventions.
-
Handle output, quotas, and retries
Treat errors such as quota exhaustion, unsupported parameters, or regional unavailability as infrastructure issues as much as model issues. Retry logic and fallback model selection matter.
That split matters in production. If a feature appears in Anthropic’s direct platform first, it may not appear in Vertex AI at the same time. If your team needs exact parity with new Claude capabilities, check both the Anthropic documentation and the specific Vertex AI model card before you commit architecture. For a general product-level feature overview, see our Claude features guide.
What it costs

Claude pricing is fundamentally token-based, and Anthropic’s official API rates are the cleanest baseline for estimating costs even if you later purchase through another channel. The current standard rates listed by Anthropic are Claude Opus 4.7 at $5 per million input tokens and $25 per million output tokens, Claude Sonnet 4.6 at $3 per million input tokens and $15 per million output tokens, and Claude Haiku 4.5 at $1 per million input tokens and $5 per million output tokens.
| Model | Best fit | Input price | Output price |
|---|---|---|---|
| Claude Opus 4.7 | Highest capability, complex reasoning | $5/M tokens | $25/M tokens |
| Claude Sonnet 4.6 | Default choice for most apps | $3/M tokens | $15/M tokens |
| Claude Haiku 4.5 | Fast, lower-cost workloads | $1/M tokens | $5/M tokens |
If your Vertex AI deployment mirrors these economics closely, your main cost drivers are still prompt size, output length, and model selection. Sonnet 4.6 is the usual default because it balances quality and spend better than Opus for everyday production tasks. Haiku 4.5 fits classification, extraction, routing, and other high-volume jobs where speed matters more than top-end reasoning.
90% off
cached input tokens with prompt caching
Anthropic also documents two major cost levers that developers should understand even when evaluating third-party access paths. Prompt caching cuts cached input token cost by 90%, which matters for long system prompts, repeated context, and agent workflows. The Batch API cuts both input and output pricing by 50% for asynchronous jobs. If your Vertex AI setup does not expose these options in the same way, direct API usage may be cheaper for some workloads.
Worked example
Simple Sonnet 4.6 cost estimate
This is why output-heavy apps can get expensive faster than prompt-heavy ones: output tokens are usually the costlier side.
Separate from API pricing, Anthropic also sells end-user Claude plans on claude.com/pricing: Free at $0/month, Pro at $20/month or $17/month annual, Max from $100/month, Team Standard at $25/seat/month or $20/seat/month annual, Team Premium at $125/seat/month or $100/seat/month annual, and Enterprise from a $20/seat base plus usage at API rates. Those plans matter if your organisation wants both app-building access and the consumer or workspace product.
Limits and gotchas

Most confusion around claude vertex ai comes from assuming that every Claude capability is exposed everywhere at the same time. In practice, developers run into rollout timing, cloud-specific quotas, and feature mismatches. Expect these checks before production launch.
- Model availability can differ. A Claude model listed by Anthropic may not yet be available in Vertex AI for your account, project, or region.
- Rate limits are not always identical. Google Cloud quotas, project caps, and service-level limits may apply on top of any model-level usage constraints.
- Region support matters. Data location, residency requirements, and regional endpoint support can affect whether you can deploy a given model at all.
- Feature parity is not guaranteed. New Claude capabilities can appear first in Anthropic’s own platform and only later in third-party cloud integrations.
- Request schemas can differ. Even if the underlying model is Claude, parameter names, SDK ergonomics, and error formats may follow Vertex AI conventions rather than Anthropic’s.
- Long-context assumptions can break. Anthropic documents long context up to 1,000,000 tokens for Opus 4.7, Opus 4.6, and Sonnet 4.6 at standard rates, but you still need to confirm support in your chosen deployment path.
- Common errors are operational, not model-related. Authentication failures, quota exhaustion, unsupported region, or disabled APIs are frequent first issues.
- Status checks live in different places. Anthropic’s service health is published at status.claude.com, but your cloud-side issue may still be specific to Google Cloud plumbing.
Pick when
- Your company standardises on Google Cloud IAM and billing
- You need procurement through an existing cloud vendor
- You want model access inside a broader Vertex AI stack
Skip when
- You need the clearest docs and fastest access to new Claude features
- You want one source of truth for Anthropic-native parameters
- You are a small team that does not benefit from Google Cloud overhead
Other questions readers ask
The honest take
Claude on Vertex AI is a sensible enterprise route, not automatically the best developer route. If your organisation already runs on Google Cloud, needs IAM and procurement alignment, or wants Claude inside an existing Vertex AI stack, it can be the right choice. If you want the clearest model docs, the fastest access to Anthropic-native capabilities, and fewer moving parts, the direct Claude API is usually easier.
The key is to treat Vertex AI as a delivery channel, not a different model family. Check model availability, feature parity, quotas, and cost controls before you commit. If you are still comparing routes, our Claude API guide and Claude features overview are the next useful references.
Independent guide. Not affiliated with Anthropic. For the official Claude product, visit claude.ai.
Last updated: 2026-05-10
This article is part of the Claude API for developers hub on c-ai.chat.





