Guides → Playground & Guide → Region Cost Map - Where to Run AI Workloads by Region

Region Cost Map - Where to Run AI Workloads by Region

Meet Sven Larsen. Cloud Architect designing a multi-region AI platform. "AWS Bedrock costs differ by region. Where should I run inference for our European customers without breaking GDPR?"

🔥 Eu-west-1 is 15% more expensive than us-east-1 - but routing through US is GDPR-risky.

The story

Same model, different prices by region. AWS Bedrock charges 5-25% more in Europe than US. Azure OpenAI charges differently in Sweden vs East US vs Australia. Google Cloud has 3-tier pricing (Tier 1 cheap, Tier 3 premium). The naive 'pick the cheapest region' breaks GDPR/data residency for many workloads.

Sven's challenge: serve European customers with GDPR-compliant residency, but minimize cost. Eu-west-1 (Ireland) is the cheapest GDPR-compliant region (15% premium over us-east). Eu-central-1 (Frankfurt) is 18% premium. UK regions (london) are 20% premium post-Brexit.

Three regional strategies. (1) Single region, residency-compliant - simple, may overpay. (2) Multi-region routing by user - cheapest-compliant per user. (3) Region-by-workload - sensitive workloads in residency, non-sensitive in cheapest. Most teams converge on (3).

About this calculator: Region Cost Map - Where to Run AI Workloads by Region

Same model, dramatically different costs by region. AWS Bedrock us-east vs eu-west, Azure OpenAI Sweden vs East US, Google Cloud us-central1 vs asia-southeast1. Pick the cheapest compliant region for your workload.

Inputs you control

Input Impact on result Range Typical
Monthly inference requests Total LLM calls per month. 10K – 100M 1000000
Region premium vs cheapest (%) How much more expensive your compliant region is vs cheapest. EU regions: 15-25% premium. Asia-Pacific: 5-15%. UK: 20-30%. 0 – 50 15
% of traffic that must stay in residency What fraction of users / data falls under residency requirements (GDPR, APPI, etc). Internal users + non-PII data may be flexible. 0 – 100 60

Outputs computed for you

Output How inputs affect it
Monthly cost ($) computed from inputs
Annual cost ($) monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

1,000,000

Total LLM calls per month.

Estimated:
15

How much more expensive your compliant region is vs cheapest. EU regions: 15-25% premium. Asia-Pacific: 5-15%. UK: 20-30%.

Estimated:
60

What fraction of users / data falls under residency requirements (GDPR, APPI, etc). Internal users + non-PII data may be flexible.

Estimated:

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Single-region cost = volume × per-request × (1 + premium). Sven: 1M req × $0.014/req × 1.15 = ~$16K/mo single-region in EU.

Multi-region savings = volume × non-compliant-share × premium. Sven: 1M × 40% × 15% × $0.014 = ~$840/mo savings (5% of bill) by routing 40% of non-residency traffic to US.

Operational complexity scales with regions. 1 region: simple. 2 regions: noticeable. 3+ regions: real platform investment. Pick the smallest set of regions that hits compliance.

Latency benefits compound with cost benefits. EU users hitting EU region get 50-100ms latency benefit. If your UX is latency-sensitive, multi-region is mandatory regardless of cost.

What "good" looks like:
  • Single-region default: Cheapest US region for non-residency, single EU region for residency
  • Multi-region budget: 1M+ monthly requests AND 30%+ residency-bound traffic
  • Vendor regional spreads: AWS 5-25%, Azure 0-20%, Google 0-30% (Tier 3)
  • Watch out: Some 'global' models route through specific regions silently (check the docs)

Vendors with explicit regional pricing

Verified 20 hours ago
  1. 1
    GPT-5 Mini
    $0.250 in · $2.00 out ·
  2. 2
    Command
    $1.00 in · $2.00 out ·
  3. 3
    devstral-2
    $0.400 in · $2.00 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

Pure US workload. Single region (us-east-1 or us-west-2). Pick by latency to your users. ~$3.5K/mo.

Healthy range: $3.5K/mo, single region

See inputs used
monthlyVolumeRequests
500,000
avgInputTokens
2,000
avgOutputTokens
400
regionPremiumPct
0
complianceConstraintRatio
0
cheapestRegionPricePer1MUsd
7

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

  1. Single region simplest Pick cheapest compliant
  2. Multi-region savings 5-15% at scale Worth complexity above 1M req/mo
  3. Routing logic is real engineering Account for it

Multi-region is rarely cost-justified below 1M requests/month. Above that, you're leaving 5-15% on the table by staying single-region.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

100% European customer data. Eu-west-1 (Ireland) - cheapest GDPR-compliant. Single region for simplicity. ~$32K/mo. Don't try to save 10% by routing to US - GDPR fines dwarf the savings.

Healthy range: $32K/mo single EU region

See inputs used
monthlyVolumeRequests
2,000,000
avgInputTokens
2,000
avgOutputTokens
400
regionPremiumPct
15
complianceConstraintRatio
100
cheapestRegionPricePer1MUsd
7

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

For these, use: Cost Calculator for full-bill projection. Vendor Concentration Risk for multi-region hedging.

Where to go next

Project full bill at chosen region →

Validate the regional cost in full context.

Hedge vendor + region together →

Multi-vendor multi-region for resilience.

Self-host breakeven by region →

Compute capacity differs by region too.

Methodology

Source
/ai-cost-economics
Extraction
Regional pricing pulled monthly from AWS, Azure, Google Cloud documentation.
Editorial gate
8-layer defense — see aicost.ai/ai-cost-economics
Last verified
6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →
📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →