Guides → Playground & Guide → Batch vs Realtime - How Much of Your AI Bill Is Discountable?

Batch vs Realtime - How Much of Your AI Bill Is Discountable?

Meet Tariq Hassan. Engineering Manager at a 50-person SaaS. "AWS sales said batch saves 50%. Sounds great - but how much of my AI workload can actually run in batch mode?"

🔥 Need to deliver 30% AI cost reduction this quarter.

The story

Batch pricing is real money - and underused. OpenAI, Anthropic, Google, and Mistral all offer ~50% discount on batch (non-realtime, async) workloads. The question isn't 'is batch cheaper' (yes, by 50%). It's 'what fraction of your workload is actually batch-eligible?'

Tariq's bill is $8K/mo. He assumes 'most of it is interactive' and dismisses batch. Reality check: classifications, summarizations, embeddings, content moderation, daily reports, weekly digests - typically 30-50% of a SaaS's AI workload doesn't need realtime response. The user doesn't see the request happen.

The math is simple but the audit takes work. Walk through every AI call type. For each: is the user blocked waiting? If no → batch-eligible. If yes → realtime. Apply 50% discount to the 'no' bucket and recompute the bill.

This guide walks through Tariq's audit, identifies common batch-eligible workloads, and shows how to migrate without breaking UX.

📊 CALCULATOR AT A GLANCE

🚀 Open the full calculator ✉️ Email [email protected]

🎛 Inputs you control

Each input shapes the cost. Click an input on the calculator to set it — explanations below match the live calculator field by field.

▸ Model — The model running this workload. The 50% batch discount is roughly uniform across providers, so absolute savings scale with the model's price — the pricier the model, the more batch is worth.

How to choose: Pick what you actually run. OpenAI, Anthropic, Google and Mistral all offer ~50% off batch; DeepSeek goes further (60%+); a few models have no batch tier and show greyed-out in the comparison.

▸ Monthly requests — Total API requests per month. Scales the absolute dollars; the percentage savings depends only on how much of the workload is batch-eligible.

How to choose: Pull from your dashboard or requests/day × 30. At 1M+/mo the batch tier's higher rate limits often matter as much as the discount itself.

▸ Input tokens / request — Average input tokens per request — applied to both the realtime and batch portions equally.

How to choose: Use your real average prompt size. This is a planning estimate; the batch discount applies the same regardless of token mix.

▸ Output tokens / request — Average output tokens per request. Output is usually the larger cost component, so it amplifies the dollar value of moving work to batch.

How to choose: Set to your real average completion length. Longer generations make each call pricier, which raises the payoff from batching the eligible share.

▸ Batch-tolerant portion — The fraction of requests that can wait (up to ~24h) for a response — i.e. nothing is blocking a user in real time. This single number drives the whole result.

How to choose: Audit your call types: if the user is NOT waiting on it, it is batch-eligible. Classifications, summarization, embeddings, content moderation, daily/weekly reports usually qualify. Most SaaS workloads land at 25-45%. Interactive chat and anything a user stares at stays realtime.

📊 Outputs computed for you

What you'll see after the calculator runs. Each card explains how to read the number.

▸ All realtime — Monthly cost if every request runs at full realtime price.

How to read: Your starting point — the number the batch mix is measured against.

▸ Your mix — Monthly cost with your batch-tolerant share moved to the discounted batch tier.

How to read: The split bar shows the dollar distribution between the batch and realtime portions (not the request count).

▸ Monthly savings — All-realtime minus your mix — the headline monthly and annual dollars saved.

How to read: Equals monthly bill × batch-eligible % × discount %. Doubling the batch-eligible share roughly doubles this.

▸ Headroom — The savings you would capture if every batch-eligible request actually ran in batch mode.

How to read: Gap between this and your current savings = untapped batch headroom. Worth an audit if the gap is large.

About this calculator: Batch vs Realtime - How Much of Your AI Bill Is Discountable?

Most vendors offer 50% off batch processing. The question isn't 'is batch cheaper' - it's 'what fraction of your workload is actually batch-eligible?'

Inputs you control

Input	Impact on result	Range	Typical
Current monthly AI spend ($)	From your invoice. The number we're trying to cut.	100 – 100K	8000
Estimated batch-eligible % of workload	What fraction of your AI calls don't need a realtime response. Most teams: 25-45%. Audit your call types - see use cases below.	0 – 100	35
Batch discount (% off list)	Standard across major vendors: 50% off batch. Some vendors (DeepSeek) offer more aggressive batch discounts.	0 – 60	50

Outputs computed for you · model: `batch`

Output	How inputs affect it
Monthly cost ($)	computed from inputs
Annual cost ($)	monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

Current monthly AI spend ($) 8,000

From your invoice. The number we're trying to cut.

Estimated: —

Estimated batch-eligible % of workload 35

What fraction of your AI calls don't need a realtime response. Most teams: 25-45%. Audit your call types - see use cases below.

Estimated: —

Batch discount (% off list) 50

Standard across major vendors: 50% off batch. Some vendors (DeepSeek) offer more aggressive batch discounts.

Estimated: —

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Read the savings number. Monthly bill × batch-eligible % × discount % = monthly savings. For Tariq: $8K × 35% × 50% = $1,400/mo savings = $16.8K/year.

Watch the migration cost. Each batch-eligible workload needs code changes (queue submission, polling for results, error handling). Budget 1-3 days of engineering per workload type. Tariq has 4 workload types → ~10 days = ~$8K eng cost. Payback: 6 months.

The bigger savings: vendor competition. Once your workloads are batch-capable, you can shop the batch tier across vendors. DeepSeek batch is significantly cheaper than OpenAI batch for similar quality. Saves another 20-40% on the batch portion.

Don't over-batch. Some workloads look batch-eligible but aren't - anything where user retention hinges on speed (autocomplete, voice, instant feedback). Misclassifying these breaks UX worse than the savings is worth.

What "good" looks like:

Strong batch fit: 40%+ of workload eligible - e.g., content sites, BI tools, enterprise analytics
Moderate fit: 25-40% - typical SaaS with mix of interactive + background
Limited fit: 10-25% - chat-heavy, voice, real-time agents
Not worth it: <10% - engineering cost > savings

Top vendors with batch pricing tiers

Verified 20 hours ago

1

GPT-5 Mini

$0.250 in · $2.00 out ·
2

Command

$1.00 in · $2.00 out ·
3

devstral-2

$0.400 in · $2.00 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

$4,750 / month ≈ $57,000 / year

Customer-facing chatbot. 90% interactive, 10% background (content moderation, archival summarization). Savings $250/mo vs migration cost $4-6K eng. Payback >12 months. Skip - invest in caching/routing instead.

Healthy range: Savings ~$250/mo - probably not worth migration

See inputs used

currentMonthlyUsd: 5,000
batchEligiblePct: 10
batchDiscountPct: 50

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

Migrate batch-eligible first 50% off the migrated portion
Combine with cheaper vendor for batch Stack 30-50% more savings
Don't migrate marginal workloads Engineering cost > savings

Batch migration ROI is dominated by the eligibility audit. Get that right, and the math works. Get it wrong, you ship batch infra that captures 5% savings on 80% of your bill - and nobody can tell why.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

$750.00 / month ≈ $9,000 / year

Re-embed new docs nightly. Pure batch - nobody waits for it. Vendor choice: cheapest batch tier wins. OpenAI text-embedding-3 batch is hard to beat.

Healthy range: 100% batchable - cuts cost in half

See inputs used

currentMonthlyUsd: 1,500
batchEligiblePct: 100
batchDiscountPct: 50

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

Batch eligibility is workload-by-workload - generalizing 'X% batchable' hides the per-workload audit work.
Doesn't model batch latency SLA breaches (some vendors guarantee <24hr; bursts may take longer).
Migration eng cost is heuristic - actual depends on existing architecture cleanliness.
Batch discounts vary by vendor (DeepSeek 60%+, others 50%, some don't offer).

For these, use: Cost Calculator for per-workload pricing. Multi-Model Router for routing layer. Prompt Cache ROI for additional optimization.

Where to go next

Cost-out each workload separately →

Once you've audited eligibility, price each workload.

Route by complexity within each tier →

Batch tier + cheap model = stacked savings.

Project bill at 10× - does batch fit grow? →

Some workloads become batch-eligible only at scale (justifies infra).

Methodology

Source: https://docs.anthropic.com/en/api/messages-batch
Extraction: Batch discount % verified against vendor pricing pages, daily auto-fetch.
Editorial gate: 8-layer defense — see aicost.ai/ai-cost-economics
Last verified: 6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

All prices are USD per 1 million tokens, current as of 2026-06-05.
Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
Batch API discounts are 50% off standard rates across providers that offer Batch mode.
Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
Long-context pricing tiers apply when input exceeds model threshold.
Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic

2026-06-05

https://www.anthropic.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Anthropic Docs

2026-06-05

https://platform.claude.com/docs/en/about-claude/pricing

Daily snapshot since Sep 2023 · 578 days captured

OpenAI

2026-06-05

https://openai.com/api/pricing/

Daily snapshot since Sep 2023 · 579 days captured

Google AI

2026-06-05

https://ai.google.dev/gemini-api/docs/pricing

Daily snapshot since Dec 2023 · 554 days captured

Google Vertex

2026-06-05

https://cloud.google.com/vertex-ai/generative-ai/pricing

Daily snapshot since Dec 2023 · 554 days captured

DeepSeek

2026-06-05

https://api-docs.deepseek.com/quick_start/pricing

Daily snapshot since May 2024 · 493 days captured

xAI

2026-06-05

https://x.ai/api

Daily snapshot since Nov 2024 · 411 days captured

Mistral

2026-06-05

https://mistral.ai/pricing

Daily snapshot since Dec 2023 · 552 days captured

Cohere

2026-06-05

https://cohere.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Voyage AI

2026-06-05

https://docs.voyageai.com/docs/pricing

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →

Batch vs Realtime - How Much of Your AI Bill Is Discountable?

The story

🎛 Inputs you control

📊 Outputs computed for you

About this calculator: Batch vs Realtime - How Much of Your AI Bill Is Discountable?

Inputs you control

Outputs computed for you · model: `batch`

What you're looking at

Ready to run the numbers?

Reading your result

Top vendors with batch pricing tiers

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

The story

🎛 Inputs you control

📊 Outputs computed for you

About this calculator: Batch vs Realtime - How Much of Your AI Bill Is Discountable?

Inputs you control

Outputs computed for you · model: batch

What you're looking at

Ready to run the numbers?

Reading your result

Top vendors with batch pricing tiers

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

Outputs computed for you · model: `batch`