Guides → Playground & Guide → Region Cost Map - Where to Run AI Workloads by Region

Region Cost Map - Where to Run AI Workloads by Region

Meet Sven Larsen. Cloud Architect designing a multi-region AI platform. "AWS Bedrock costs differ by region. Where should I run inference for our European customers without breaking GDPR?"

🔥 Eu-west-1 is 15% more expensive than us-east-1 - but routing through US is GDPR-risky.

The story

Same model, different prices by region. AWS Bedrock charges 5-25% more in Europe than US. Azure OpenAI charges differently in Sweden vs East US vs Australia. Google Cloud has 3-tier pricing (Tier 1 cheap, Tier 3 premium). The naive 'pick the cheapest region' breaks GDPR/data residency for many workloads.

Sven's challenge: serve European customers with GDPR-compliant residency, but minimize cost. Eu-west-1 (Ireland) is the cheapest GDPR-compliant region (15% premium over us-east). Eu-central-1 (Frankfurt) is 18% premium. UK regions (london) are 20% premium post-Brexit.

Three regional strategies. (1) Single region, residency-compliant - simple, may overpay. (2) Multi-region routing by user - cheapest-compliant per user. (3) Region-by-workload - sensitive workloads in residency, non-sensitive in cheapest. Most teams converge on (3).

About this calculator: Region Cost Map - Where to Run AI Workloads by Region

Same model, dramatically different costs by region. AWS Bedrock us-east vs eu-west, Azure OpenAI Sweden vs East US, Google Cloud us-central1 vs asia-southeast1. Pick the cheapest compliant region for your workload.

Inputs you control

Input	Impact on result	Range	Typical
Monthly inference requests	Total LLM calls per month.	10K – 100M	1000000
Region premium vs cheapest (%)	How much more expensive your compliant region is vs cheapest. EU regions: 15-25% premium. Asia-Pacific: 5-15%. UK: 20-30%.	0 – 50	15
% of traffic that must stay in residency	What fraction of users / data falls under residency requirements (GDPR, APPI, etc). Internal users + non-PII data may be flexible.	0 – 100	60

Outputs computed for you

Output	How inputs affect it
Monthly cost ($)	computed from inputs
Annual cost ($)	monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

Monthly inference requests 1,000,000

Total LLM calls per month.

Estimated: —

Region premium vs cheapest (%) 15

How much more expensive your compliant region is vs cheapest. EU regions: 15-25% premium. Asia-Pacific: 5-15%. UK: 20-30%.

Estimated: —

% of traffic that must stay in residency 60

What fraction of users / data falls under residency requirements (GDPR, APPI, etc). Internal users + non-PII data may be flexible.

Estimated: —

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Single-region cost = volume × per-request × (1 + premium). Sven: 1M req × $0.014/req × 1.15 = ~$16K/mo single-region in EU.

Multi-region savings = volume × non-compliant-share × premium. Sven: 1M × 40% × 15% × $0.014 = ~$840/mo savings (5% of bill) by routing 40% of non-residency traffic to US.

Operational complexity scales with regions. 1 region: simple. 2 regions: noticeable. 3+ regions: real platform investment. Pick the smallest set of regions that hits compliance.

Latency benefits compound with cost benefits. EU users hitting EU region get 50-100ms latency benefit. If your UX is latency-sensitive, multi-region is mandatory regardless of cost.

What "good" looks like:

Single-region default: Cheapest US region for non-residency, single EU region for residency
Multi-region budget: 1M+ monthly requests AND 30%+ residency-bound traffic
Vendor regional spreads: AWS 5-25%, Azure 0-20%, Google 0-30% (Tier 3)
Watch out: Some 'global' models route through specific regions silently (check the docs)

Vendors with explicit regional pricing

Verified 20 hours ago

1

GPT-5 Mini

$0.250 in · $2.00 out ·
2

Command

$1.00 in · $2.00 out ·
3

devstral-2

$0.400 in · $2.00 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

Pure US workload. Single region (us-east-1 or us-west-2). Pick by latency to your users. ~$3.5K/mo.

Healthy range: $3.5K/mo, single region

See inputs used

monthlyVolumeRequests: 500,000
avgInputTokens: 2,000
avgOutputTokens: 400
regionPremiumPct: 0
complianceConstraintRatio: 0
cheapestRegionPricePer1MUsd: 7

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

Single region simplest Pick cheapest compliant
Multi-region savings 5-15% at scale Worth complexity above 1M req/mo
Routing logic is real engineering Account for it

Multi-region is rarely cost-justified below 1M requests/month. Above that, you're leaving 5-15% on the table by staying single-region.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

100% European customer data. Eu-west-1 (Ireland) - cheapest GDPR-compliant. Single region for simplicity. ~$32K/mo. Don't try to save 10% by routing to US - GDPR fines dwarf the savings.

Healthy range: $32K/mo single EU region

See inputs used

monthlyVolumeRequests: 2,000,000
avgInputTokens: 2,000
avgOutputTokens: 400
regionPremiumPct: 15
complianceConstraintRatio: 100
cheapestRegionPricePer1MUsd: 7

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

Vendor regional pricing changes frequently - re-check quarterly.
Doesn't model network egress costs (cross-region data transfer).
Doesn't model regional capacity availability (some new models US-only at launch).
Latency benefit isn't quantified - depends on user geographic distribution.

For these, use: Cost Calculator for full-bill projection. Vendor Concentration Risk for multi-region hedging.

Where to go next

Project full bill at chosen region →

Validate the regional cost in full context.

Hedge vendor + region together →

Multi-vendor multi-region for resilience.

Self-host breakeven by region →

Compute capacity differs by region too.

Methodology

Source: /ai-cost-economics
Extraction: Regional pricing pulled monthly from AWS, Azure, Google Cloud documentation.
Editorial gate: 8-layer defense — see aicost.ai/ai-cost-economics
Last verified: 6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

All prices are USD per 1 million tokens, current as of 2026-06-05.
Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
Batch API discounts are 50% off standard rates across providers that offer Batch mode.
Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
Long-context pricing tiers apply when input exceeds model threshold.
Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic

2026-06-05

https://www.anthropic.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Anthropic Docs

2026-06-05

https://platform.claude.com/docs/en/about-claude/pricing

Daily snapshot since Sep 2023 · 578 days captured

OpenAI

2026-06-05

https://openai.com/api/pricing/

Daily snapshot since Sep 2023 · 579 days captured

Google AI

2026-06-05

https://ai.google.dev/gemini-api/docs/pricing

Daily snapshot since Dec 2023 · 554 days captured

Google Vertex

2026-06-05

https://cloud.google.com/vertex-ai/generative-ai/pricing

Daily snapshot since Dec 2023 · 554 days captured

DeepSeek

2026-06-05

https://api-docs.deepseek.com/quick_start/pricing

Daily snapshot since May 2024 · 493 days captured

xAI

2026-06-05

https://x.ai/api

Daily snapshot since Nov 2024 · 411 days captured

Mistral

2026-06-05

https://mistral.ai/pricing

Daily snapshot since Dec 2023 · 552 days captured

Cohere

2026-06-05

https://cohere.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Voyage AI

2026-06-05

https://docs.voyageai.com/docs/pricing

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →

Region Cost Map - Where to Run AI Workloads by Region

The story

About this calculator: Region Cost Map - Where to Run AI Workloads by Region

Inputs you control

Outputs computed for you

What you're looking at

Ready to run the numbers?

Reading your result

Vendors with explicit regional pricing

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)