Batch vs Realtime · for async workloads

Every major provider offers 50% off via batch API. Can you use it?

OpenAI, Anthropic, Google, Mistral, and Bedrock all offer ~50% off for async requests with a 24-hour SLA. The discount is flat - the real question is what fraction of your workload can tolerate the latency.

Pricing verified: 2026-06-03 37/106 models support batch 50% discount · 24h SLA

What this calculator does

Every major provider offers 50% off for batch API requests with a 24-hour SLA. See how much you would save — based on what fraction of your workload can wait.

Why use it

The discount is uniform (50%) across OpenAI, Anthropic, Google, Mistral, Bedrock — no vendor shopping needed
Your only real decision is: what % of my workload is actually async-tolerant?
Batch API has much higher rate limits than realtime — often the only way to handle million-row backfills
Savings are additive with routing and caching — all three together can hit 80%+ total

📖 Read the full guide →

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide

ModelPick the model you run
Monthly requestsTotal monthly call volume
Input tokens / requestAverage input size
Output tokens / requestAverage output size
Batch-tolerant portionShare of work that can wait

📤 Outputs you get

All realtimeCost with everything realtime
Your mixCost with the batch share moved over
Monthly savingsDollars saved per month
HeadroomSavings if everything eligible moved

🎯 Use your results to

⏱️

Decide what can wait

Classify each call type as user-blocking or not; the non-blocking share is your batch-eligible bucket

💰

Quantify the cut

Real monthly and annual dollars from a 50% discount on the eligible portion

🆚

Compare across models

Same workload across every model — see if switching provider alongside batch saves more

🔌

Integrate with your agents

MCP available so agentic workflows can pull batch economics programmatically

👇 Now try the calculator below with your own AI workloads

Enter every field

The model dropdown flags which support batch (1 currently does not — o1 reasoning series). Monthly requests and input/output tokens scale absolute savings — the % savings depends only on your batch-tolerant mix. Note that batch API has separate rate limits from realtime — often 10x higher — which matters for large backfills.

Calibrate your batch %

The hero slider is the single most important control. Use presets if you're unsure. The latency tier picker below the slider helps you self-classify: if your workload tolerates 24h latency (🟢), it's batch-friendly; if it needs <1s (🔴), keep it realtime. Mixed workloads are the norm — most teams have a long tail of async work (analytics, moderation, enrichment) even if their main product is realtime.

Interpret all panels

Top 3 cards show all-realtime vs your mix vs savings. Split bar shows dollar distribution between batch and realtime portions (not request distribution). Cross-model comparison table runs your workload across all models — reveals if switching provider alongside enabling batch would save more. No-batch models (e.g., o1) show greyed-out rows.

Next actions

Typical batch completion is 1-4 hours even though the SLA is 24h — design for worst case but expect fast turnaround. Stack with routing (40-70% off) and caching (20-40% off) — all three compose multiplicatively. At 1M+ requests/mo, batch API's higher rate limits often matter more than the discount itself.

📊 Calculator at a glance

📅 Schedule a meeting via AvatarVA ✉️ Email [email protected]

🎛 CALCULATOR

📦 Your workload

Pick a preset or estimate manually.

Model -

Monthly requests

Input tokens / request

Output tokens / request

Batch-tolerant portion of workload 70%

% of requests that can wait up to 24 hours for a response. Use the preset above if you're unsure.

Does your latency budget allow batch?

Tap to self-classify. 🟢 = batch-friendly, 🟡 = maybe, 🔴 = realtime-only

📈 RESULTS

All realtime

Your mix

Monthly savings

batch realtime

Batch portion (50% off) Realtime portion (full price)

💡 Recommendations

📋 Same workload across all models

50% batch discount is uniform across providers - absolute savings scale with model price.

Model	All realtime	Your mix	All batch	Savings now

Stack prompt cache savings → Stack multi-model routing → Single-model baseline → Get an AI cost architecture review →

📋 What now?

Audit your call types — list every AI call and mark each user-blocking or not. The "not" bucket is your batch-eligible share; migrate the easy wins (reports, embeddings, moderation, summaries) first.
Design for the 24h SLA, expect faster — batch typically completes in 1-4 hours, but build for worst-case so nothing user-facing depends on it.
Then stack the other levers — routing (40-70% off) and caching (20-40% off) compose multiplicatively with batch, not additively.

📅 Book a cost-architecture review to apply this to your workload →

Every major provider offers 50% off via batch API. Can you use it?

Go deeper

The calculator's an estimate. Want the real number?