Guides → Playground & Guide → Scale Projection - What Happens to Your Bill at 10×, 100×?

Scale Projection - What Happens to Your Bill at 10×, 100×?

Meet Diana Park. Head of Product at a 30-person Series A SaaS. "We're at 5K AI requests/day. Board wants to 100× to 500K/day. What does that bill look like - and when does it break?"

🔥 Investor deck assumes linear cost scaling. The CFO knows that's not how it works.

The story

AI cost is not linear at scale. Three things break the line: vendor rate limits force tier upgrades, premium tiers price differently than developer tiers, and at consumer scale you start renegotiating everything.

Diana's product currently does 5K requests/day, costs $400/mo. The naive projection says 100× = $40K/mo. Reality at 500K req/day involves: batching half the workload (50% discount on that half), negotiating volume pricing (15-30% off list), routing easy queries to cheap models (40-60% savings on those), and hitting throughput limits that force multi-vendor failover.

The actual 100× number is usually 30-60× cost - but only if you do the optimization work. If you scale naively without these levers, you hit $40-80K/mo and start questioning whether AI is profitable.

This calc shows the naive linear projection AND the optimized version, so you can see what's possible and what work it takes to get there.

📊 CALCULATOR AT A GLANCE

🚀 Open the full calculator ✉️ Email [email protected]

🎛 Inputs you control

Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.

▸ Primary model — The model whose per-token price drives the projection.

How to choose: Choose your production model; the chart can overlay cheaper alternatives.

▸ Workload preset — A starting template that fills typical token + volume values.

How to choose: Pick the closest preset then fine-tune, or choose Custom.

▸ Input tokens/request — Average prompt size per request, in tokens.

How to choose: About 750 words is ~1,000 tokens; include retrieved context.

▸ Output tokens/request — Average completion size per request, in tokens.

How to choose: Measure real responses; output is usually priced higher than input.

▸ Current requests/day — Today’s daily request volume — the baseline the projection scales from.

How to choose: Use real traffic; the slider multiplies this up to the scale factor.

▸ Scale factor — How many times current usage you want to project to.

How to choose: Set to your growth target (e.g. 10x); watch for inflection points where unit cost changes.

About this calculator: Scale Projection - What Happens to Your Bill at 10×, 100×?

Most AI bills aren't linear at scale. Find the cliffs - rate limits, tier jumps, latency walls - before they find you. Live pricing, real benchmarks, vendor stress-test.

Inputs you control

Input	Impact on result	Range	Typical
Current monthly AI spend ($)	What you're spending right now. Pull from your invoice. If you're pre-launch, use Cost Calculator first.	50 – 50K	400
Target scale (× current)	How many times your current usage. 10× = next-stage product growth. 100× = enterprise customer + general-availability launch. 1000× = the consumer-scale dream.	2 – 1K	100
Optimization stage (0-3)	0 = naive scaling, no optimization. 1 = batching where possible. 2 = + multi-model routing. 3 = + volume discounts + caching + token reduction. Most teams stop at 1; the spread between 1 and 3 is enormous.	0 – 3	0

Outputs computed for you · model: `scale_projection`

Output	How inputs affect it
Monthly cost ($)	computed from inputs
Annual cost ($)	monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

Current monthly AI spend ($) 400

What you're spending right now. Pull from your invoice. If you're pre-launch, use Cost Calculator first.

Estimated: —

Target scale (× current) 100

How many times your current usage. 10× = next-stage product growth. 100× = enterprise customer + general-availability launch. 1000× = the consumer-scale dream.

Estimated: —

Optimization stage (0-3) 0

0 = naive scaling, no optimization. 1 = batching where possible. 2 = + multi-model routing. 3 = + volume discounts + caching + token reduction. Most teams stop at 1; the spread between 1 and 3 is enormous.

Estimated: —

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Linear case is the upper bound. If you do nothing differently - same model, same prompts, same tier - you'll hit the linear projection. Most teams don't, but it's the worst-case planning anchor.

Optimized case is the floor. With every lever pulled (batching + routing + caching + volume discount), most workloads land 50-70% below linear. Your CFO should see both lines on the same chart.

Watch for cliffs. Some vendors have abrupt tier jumps (e.g., move from $5/$15 'developer' tier to $7/$20 'enterprise' at 10M tokens/month). Others have rate-limit walls that force secondary vendors. The calc shows where these hit.

Read the breakeven ratio. Optimized monthly cost ÷ revenue at 100× usage. If this is <5%, AI is not your cost problem. If >25%, you have margin compression incoming.

What "good" looks like:

Naive 100× scaling: usually $30-50K from a $400 base
Stage 1 (batching): 60-75% of naive
Stage 2 (+ routing): 35-55% of naive
Stage 3 (+ caching + volume): 20-35% of naive - the ambitious target

Vendors with the best volume economics

Verified 20 hours ago

1

GPT-5 Mini

$0.250 in · $2.00 out ·
2

Command

$1.00 in · $2.00 out ·
3

devstral-2

$0.400 in · $2.00 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

$2,000 / month ≈ $24,000 / year

Near-term growth - usually no time for optimization. Plan for ~5-6× linear cost, since you may add complexity (more features = more tokens per request) but also start small caching wins.

Healthy range: $1,800-2,500/mo (modest optimization expected)

See inputs used

currentMonthlyUsd: 400
scaleFactor: 5
modelTier: balanced
optimizationStage: 0

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

Multi-vendor routing 40-60% savings at scale
Batch processing 50% off non-realtime workloads
Prompt caching 30-50% off input-heavy bills

At 100× scale, optimization isn't optional - it's a margin lever. Teams that scale without optimizing typically see AI costs become the largest line item by month 6 of consumer launch.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

$50,000 / month ≈ $600,000 / year

Going from internal beta ($100/mo) to consumer scale (1000×). Mandatory: cheap-tier models with verification, multi-vendor routing, aggressive caching, batch overnight workloads, 30-40% volume discount negotiated. Linear would be $100K/mo; optimized usually $20-30K.

Healthy range: $15-30K/mo (vs $100K naive)

See inputs used

currentMonthlyUsd: 100
scaleFactor: 1,000
modelTier: cheap
optimizationStage: 3

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

Optimization stages 1-3 are heuristic averages - your actual savings depend on your workload mix and engineering investment.
Doesn't model rate-limit cliffs (some vendors throttle at specific QPS thresholds - forces multi-vendor failover).
Doesn't account for AI capability improvements over scaling timeline (Sonnet 4.6 today != Sonnet 5 in 18 months - pricing usually drops).
Doesn't model human cost of optimization (1 SRE + AI engineer at consumer scale).
Volume discount assumptions are public-list-based; actual enterprise negotiations vary.

For these, use: Prompt Cache ROI for caching math. Multi-Model Router for routing savings. Batch vs Realtime for batch.

Where to go next

12-month projection with breach alerts →

Month-by-month forecast given growth + optimization timeline.

Single-vendor exposure at scale →

What's your blast radius if your primary vendor changes pricing 50%?

Full TCO with optimization roadmap →

7-step wizard including optimization headcount + timeline.

Methodology

Source: /ai-cost-economics
Extraction: Optimization stages calibrated against 12 case studies (anonymized) of teams scaling from $500/mo to $50K+/mo.
Editorial gate: 8-layer defense — see aicost.ai/ai-cost-economics
Last verified: 6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

All prices are USD per 1 million tokens, current as of 2026-06-05.
Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
Batch API discounts are 50% off standard rates across providers that offer Batch mode.
Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
Long-context pricing tiers apply when input exceeds model threshold.
Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic

2026-06-05

https://www.anthropic.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Anthropic Docs

2026-06-05

https://platform.claude.com/docs/en/about-claude/pricing

Daily snapshot since Sep 2023 · 578 days captured

OpenAI

2026-06-05

https://openai.com/api/pricing/

Daily snapshot since Sep 2023 · 579 days captured

Google AI

2026-06-05

https://ai.google.dev/gemini-api/docs/pricing

Daily snapshot since Dec 2023 · 554 days captured

Google Vertex

2026-06-05

https://cloud.google.com/vertex-ai/generative-ai/pricing

Daily snapshot since Dec 2023 · 554 days captured

DeepSeek

2026-06-05

https://api-docs.deepseek.com/quick_start/pricing

Daily snapshot since May 2024 · 493 days captured

xAI

2026-06-05

https://x.ai/api

Daily snapshot since Nov 2024 · 411 days captured

Mistral

2026-06-05

https://mistral.ai/pricing

Daily snapshot since Dec 2023 · 552 days captured

Cohere

2026-06-05

https://cohere.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Voyage AI

2026-06-05

https://docs.voyageai.com/docs/pricing

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →

Scale Projection - What Happens to Your Bill at 10×, 100×?

The story

🎛 Inputs you control

About this calculator: Scale Projection - What Happens to Your Bill at 10×, 100×?

Inputs you control

Outputs computed for you · model: `scale_projection`

What you're looking at

Ready to run the numbers?

Reading your result

Vendors with the best volume economics

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

The story

🎛 Inputs you control

About this calculator: Scale Projection - What Happens to Your Bill at 10×, 100×?

Inputs you control

Outputs computed for you · model: scale_projection

What you're looking at

Ready to run the numbers?

Reading your result

Vendors with the best volume economics

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

Outputs computed for you · model: `scale_projection`