Guides → Playground & Guide → Region Cost Map - Where to Run AI Workloads by Region
Meet Sven Larsen. Cloud Architect designing a multi-region AI platform. "AWS Bedrock costs differ by region. Where should I run inference for our European customers without breaking GDPR?"
🔥 Eu-west-1 is 15% more expensive than us-east-1 - but routing through US is GDPR-risky.
Same model, different prices by region. AWS Bedrock charges 5-25% more in Europe than US. Azure OpenAI charges differently in Sweden vs East US vs Australia. Google Cloud has 3-tier pricing (Tier 1 cheap, Tier 3 premium). The naive 'pick the cheapest region' breaks GDPR/data residency for many workloads.
Sven's challenge: serve European customers with GDPR-compliant residency, but minimize cost. Eu-west-1 (Ireland) is the cheapest GDPR-compliant region (15% premium over us-east). Eu-central-1 (Frankfurt) is 18% premium. UK regions (london) are 20% premium post-Brexit.
Three regional strategies. (1) Single region, residency-compliant - simple, may overpay. (2) Multi-region routing by user - cheapest-compliant per user. (3) Region-by-workload - sensitive workloads in residency, non-sensitive in cheapest. Most teams converge on (3).
Same model, dramatically different costs by region. AWS Bedrock us-east vs eu-west, Azure OpenAI Sweden vs East US, Google Cloud us-central1 vs asia-southeast1. Pick the cheapest compliant region for your workload.
Below: live sliders. Move them to see numbers change in real time.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Single-region cost = volume × per-request × (1 + premium). Sven: 1M req × $0.014/req × 1.15 = ~$16K/mo single-region in EU.
Multi-region savings = volume × non-compliant-share × premium. Sven: 1M × 40% × 15% × $0.014 = ~$840/mo savings (5% of bill) by routing 40% of non-residency traffic to US.
Operational complexity scales with regions. 1 region: simple. 2 regions: noticeable. 3+ regions: real platform investment. Pick the smallest set of regions that hits compliance.
Latency benefits compound with cost benefits. EU users hitting EU region get 50-100ms latency benefit. If your UX is latency-sensitive, multi-region is mandatory regardless of cost.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Pure US workload. Single region (us-east-1 or us-west-2). Pick by latency to your users. ~$3.5K/mo.
Healthy range: $3.5K/mo, single region
EU-residency for 60% (regulated EU customers + their data), US for 40% (US customers + non-PII). Saves ~$800/mo over EU-only. Worth the multi-region complexity at this scale.
Healthy range: $15K/mo with $1K savings vs single-EU
Global SaaS: US, EU, APAC. Residency-bound traffic in each region; non-residency in cheapest. ~$80K/mo with $10-12K savings vs single-region-everywhere.
Healthy range: $70-90K/mo with regional optimization
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Multi-region is rarely cost-justified below 1M requests/month. Above that, you're leaving 5-15% on the table by staying single-region.
Regional pricing differences don't affect model behavior. Quality is identical (assuming same model+version).
Region selection IS compliance for AI workloads. Get this right or risk fines that dwarf any cost savings.
Some 'global' API endpoints silently route through specific regions. Read vendor docs - don't assume.
Latency benefit of regional placement is real and often the dominant reason to multi-region (not cost).
Some models (particularly newer launches) are US-only at first. Plan multi-region rollout knowing this.
Multi-region multiplies observability complexity. Plan for it: per-region cost dashboards, latency tracking, eval pipelines.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
100% European customer data. Eu-west-1 (Ireland) - cheapest GDPR-compliant. Single region for simplicity. ~$32K/mo. Don't try to save 10% by routing to US - GDPR fines dwarf the savings.
Healthy range: $32K/mo single EU region
Japanese customer data. AWS ap-northeast-1 (Tokyo). 10% premium. Latency benefit for Japanese users. Single region, simple.
Healthy range: $6-8K/mo ap-northeast-1
Internal-only or non-PII workload. Pick cheapest region (us-east-1 for AWS, East US for Azure). No residency constraints. ~$20K/mo.
Healthy range: Pick cheapest region, $20K/mo
Voice agent - latency matters more than cost. Run in nearest region per user, accept the 5% premium. ~$700/mo.
Healthy range: Multi-region for latency
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Cost Calculator for full-bill projection. Vendor Concentration Risk for multi-region hedging.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →