Batch vs Realtime · for async workloads

Every major provider offers 50% off via batch API. Can you use it?

OpenAI, Anthropic, Google, Mistral, and Bedrock all offer ~50% off for async requests with a 24-hour SLA. The discount is flat - the real question is what fraction of your workload can tolerate the latency.

Pricing verified: 2026-06-03 37/106 models support batch 50% discount · 24h SLA
What this calculator does

Every major provider offers 50% off for batch API requests with a 24-hour SLA. See how much you would save — based on what fraction of your workload can wait.

Why use it
  • The discount is uniform (50%) across OpenAI, Anthropic, Google, Mistral, Bedrock — no vendor shopping needed
  • Your only real decision is: what % of my workload is actually async-tolerant?
  • Batch API has much higher rate limits than realtime — often the only way to handle million-row backfills
  • Savings are additive with routing and caching — all three together can hit 80%+ total

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide
  • ModelPick the model you run
  • Monthly requestsTotal monthly call volume
  • Input tokens / requestAverage input size
  • Output tokens / requestAverage output size
  • Batch-tolerant portionShare of work that can wait
📤 Outputs you get
  • All realtimeCost with everything realtime
  • Your mixCost with the batch share moved over
  • Monthly savingsDollars saved per month
  • HeadroomSavings if everything eligible moved
🎯 Use your results to
⏱️
Decide what can wait

Classify each call type as user-blocking or not; the non-blocking share is your batch-eligible bucket

💰
Quantify the cut

Real monthly and annual dollars from a 50% discount on the eligible portion

🆚
Compare across models

Same workload across every model — see if switching provider alongside batch saves more

🔌
Integrate with your agents

MCP available so agentic workflows can pull batch economics programmatically

👇 Now try the calculator below with your own AI workloads

📊 Calculator at a glance
Batch vs Real-time full size
🎛 CALCULATOR
📦 Your workload

Pick a preset or estimate manually.

-
Batch-tolerant portion of workloadThe fraction of requests that can wait (up to ~24h) for a response — i.e. nothing is blocking a user in real time. This single number drives the whole result.How to choose: Audit your call types: if the user is NOT waiting on it, it is batch-eligible. Classifications, summarization, embeddings, content moderation, daily/weekly reports usually qualify. Most SaaS workloads land at 25-45%. Interactive chat and anything a user stares at stays realtime.Read the full guide → 70%
% of requests that can wait up to 24 hours for a response. Use the preset above if you're unsure.
Tap to self-classify. 🟢 = batch-friendly, 🟡 = maybe, 🔴 = realtime-only
📈 RESULTS
All realtime
-
-
Your mix
-
-
Monthly savings
-
-
batch realtime
Batch portion (50% off) Realtime portion (full price)
💡 Recommendations
    📋 Same workload across all models

    50% batch discount is uniform across providers - absolute savings scale with model price.

    Model All realtime Your mix All batch Savings now
    Stack prompt cache savings → Stack multi-model routing → Single-model baseline → Get an AI cost architecture review →
    📋 What now?
    📅 Book a cost-architecture review to apply this to your workload →

    Go deeper

    Our playbooks on cutting this number.

    💾
    Prompt Cache ROI
    Stack caching on top
    🧭
    Multi-Model Router
    Stack routing on top
    🧮
    Cost Calculator
    Baseline sanity check
    🧩
    RAG Pipeline Cost
    Full-stack RAG pricing

    The calculator's an estimate. Want the real number?

    A 5-day Quickscan ($1,500) reviews your actual usage across every pillar — financial, reliability, governance, privacy, MLOps, observability — and returns a concrete savings plan.

    Book a Quickscan →