Buy vs Build

The agentic AI question every team is asking in 2026: should you pay an outcome-priced vendor (Intercom Fin, Sierra, Anthropic Managed Agents), or build it yourself on a token-priced API? This calculator answers with real V2.1 vendor data, three time horizons, and engineering cost included.

Buy vs Build full size
Why this matters: Vendors charging per resolved outcome look expensive at first glance - until you add up your build-track engineering hours, runtime costs, and tool calls. The break-even depends on your volume, your engineering team's loaded rate, and your achievable success rate. Most teams guess wrong.
📘 How to use this

Buy an outcome-priced vendor, or build on a token-priced API?

This estimates your monthly cost at a given outcome volume both ways — buy = the vendor’s per-outcome rate; build = your API tokens + runtime + tool calls + amortized engineering — then compares them across month 1, 6, and 24 so you can see the break-even.

✅ Buying usually wins when…

  • Volume is low or spiky (engineering can’t amortize)
  • You need value in days, not a build cycle
  • You have no ML/platform engineers to spare
  • The vendor’s success rate beats what you’d build

🔧 Building usually wins when…

  • Volume is high and steady (amortizes the build)
  • You need control, data residency, or custom logic
  • The per-outcome vendor rate is high for your volume
  • You already run the infra and have the team
Back-of-napkin & directional — tune the Advanced inputs and success rates for your numbers. Buy-track rates are verified 2026 public pricing; some are representative (most vendors bill per-seat or per-page).

1. Your workload

What: the agent workload you’re pricing. It sets the per-outcome build defaults and which buy-track vendors apply. How to choose: pick the closest match. Research has no outcome-priced vendor in 2026, so it shows build-only.
Target successful outcomes per month
What: successful outcomes you need per month (resolved tickets, PRs reviewed, documents, etc.). How to choose: use your real historical count. Buy cost scales linearly with this; build cost is dominated by amortized engineering at low volume.

Per-outcome resource estimates (build track)

These determine your build-track API cost. Defaults are reasonable for the use case you picked. Override if you have better data.
% of input that's cached (10× cheaper)
Per outcome
Your model's resolution rate (0-1)
What: share of attempts your own build resolves end-to-end. How to choose: measure on a real sample — a lower rate means more attempts per success, raising build cost.
Vendor's claimed rate (0-1)
What: the vendor’s resolved-outcome rate (they bill per resolved outcome). How to choose: use the audited rate, not the marketing number — ~50–67% is typical for support agents.

Engineering cost (build track only)

Self-built systems require engineering investment. We amortize the initial build over the time horizon plus monthly maintenance.
~2 weeks for working v1
Tweaks, prompt tuning, monitoring
Salary + benefits + overhead
What: fully-loaded hourly cost of the engineer building & maintaining the build track. How to choose: base salary + benefits + overhead, often 1.4–2× base pay.
📋 Example Workload - change any field to see your actual buy-vs-build verdict
RECOMMENDATION
-
-

Track-by-track

Three time horizons

Engineering cost amortizes over time. Build looks expensive at month 1; pay-back curve depends on volume.

Top 5 build-track vendors

PlanVendorMonthly cost (API only)
🎯 Use this result to
📅 Schedule a call to apply this to your workload
🧮 What’s in the numbers

Counted

  • Buy: vendor per-outcome rate × your volume (+ any plan minimum)
  • Build: API input/output tokens, runtime, web/tool calls per outcome
  • Build: engineering — initial build amortized over the horizon + monthly maintenance

Not counted

  • Vendor seat/platform add-ons & data-prep / compliance
  • The cost of unresolved outcomes still reaching humans
  • Negotiated discounts & annual commitments
Reading it: the break-even month is where build’s amortizing line drops below the flat buy line. Low or spiky volume favors buy; high steady volume favors build. This is a directional estimate — confirm with real vendor quotes before deciding.
📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →