Buy vs Build

The agentic AI question every team is asking in 2026: should you pay an outcome-priced vendor (Intercom Fin, Sierra, Anthropic Managed Agents), or build it yourself on a token-priced API? This calculator answers with real V2.1 vendor data, three time horizons, and engineering cost included.

📅 Schedule a meeting via AvatarVA ✉️ Email [email protected]

Why this matters: Vendors charging per resolved outcome look expensive at first glance - until you add up your build-track engineering hours, runtime costs, and tool calls. The break-even depends on your volume, your engineering team's loaded rate, and your achievable success rate. Most teams guess wrong.

📘 How to use this

Buy an outcome-priced vendor, or build on a token-priced API?

This estimates your monthly cost at a given outcome volume both ways — buy = the vendor’s per-outcome rate; build = your API tokens + runtime + tool calls + amortized engineering — then compares them across month 1, 6, and 24 so you can see the break-even.

✅ Buying usually wins when…

Volume is low or spiky (engineering can’t amortize)
You need value in days, not a build cycle
You have no ML/platform engineers to spare
The vendor’s success rate beats what you’d build

🔧 Building usually wins when…

Volume is high and steady (amortizes the build)
You need control, data residency, or custom logic
The per-outcome vendor rate is high for your volume
You already run the infra and have the team

Back-of-napkin & directional — tune the Advanced inputs and success rates for your numbers. Buy-track rates are verified 2026 public pricing; some are representative (most vendors bill per-seat or per-page).

1. Your workload

Use case

What: the agent workload you’re pricing. It sets the per-outcome build defaults and which buy-track vendors apply. How to choose: pick the closest match. Research has no outcome-priced vendor in 2026, so it shows build-only.

Monthly volume

Target successful outcomes per month

What: successful outcomes you need per month (resolved tickets, PRs reviewed, documents, etc.). How to choose: use your real historical count. Buy cost scales linearly with this; build cost is dominated by amortized engineering at low volume.

Per-outcome resource estimates (build track)

These determine your build-track API cost. Defaults are reasonable for the use case you picked. Override if you have better data.

Input tokens / outcome

Output tokens / outcome

Cache hit % on input

% of input that's cached (10× cheaper)

Agent runtime (hours)

Per outcome

Web searches / outcome

Tool calls / outcome

Build success rate

Your model's resolution rate (0-1)

What: share of attempts your own build resolves end-to-end. How to choose: measure on a real sample — a lower rate means more attempts per success, raising build cost.

Buy success rate

Vendor's claimed rate (0-1)

What: the vendor’s resolved-outcome rate (they bill per resolved outcome). How to choose: use the audited rate, not the marketing number — ~50–67% is typical for support agents.

Engineering cost (build track only)

Self-built systems require engineering investment. We amortize the initial build over the time horizon plus monthly maintenance.

Initial build hours

~2 weeks for working v1

Monthly maintenance hours

Tweaks, prompt tuning, monitoring

Engineer loaded rate / hour

Salary + benefits + overhead

What: fully-loaded hourly cost of the engineer building & maintaining the build track. How to choose: base salary + benefits + overhead, often 1.4–2× base pay.

RECOMMENDATION

Track-by-track

Three time horizons

Engineering cost amortizes over time. Build looks expensive at month 1; pay-back curve depends on volume.

Top 5 build-track vendors

Plan	Vendor	Monthly cost (API only)

🎯 Use this result to

⚖️ Get an honest verdict — Buy, Build, or Tie based on your numbers, math decides
📅 Know the break-even month — when build pays for itself vs paying a vendor
📊 Justify to leadership — side-by-side 3-year TCO with reasoning
🔌 Integrate with your AI agents — MCP available for procurement and build planning

📅 Schedule a call to apply this to your workload

🧮 What’s in the numbers

Counted

Buy: vendor per-outcome rate × your volume (+ any plan minimum)
Build: API input/output tokens, runtime, web/tool calls per outcome
Build: engineering — initial build amortized over the horizon + monthly maintenance

Not counted

Vendor seat/platform add-ons & data-prep / compliance
The cost of unresolved outcomes still reaching humans
Negotiated discounts & annual commitments

Reading it: the break-even month is where build’s amortizing line drops below the flat buy line. Low or spiky volume favors buy; high steady volume favors build. This is a directional estimate — confirm with real vendor quotes before deciding.

Vendor / Model

Field

Why it’s inferred

Anthropic — Claude Sonnet 4.6

cachedInput

Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.

Anthropic — Claude Sonnet 4.5

cachedInput

Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.

Anthropic — Claude Sonnet 4.5

batchInput

Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Sonnet 4.5

batchOutput

Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Haiku 4.5

cachedInput

Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.

OpenAI — GPT-5.4 Mini

cachedInput

Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.

OpenAI — GPT-5.4 Nano

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Nano

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Nano

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Pro

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.2

cachedInput

Derived at 10% of input; no residency uplift.

OpenAI — GPT-5.2

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2

batchOutput

Derived at 50% of output.

OpenAI — GPT-5

cachedInput

Derived at 10% of input.

OpenAI — GPT-5

batchInput

Derived at 50% of input.

OpenAI — GPT-5

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.5 Pro

cachedInput

Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.

OpenAI — GPT-5.5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.2 Pro

cachedInput

Derived at 10% of input — pro-tier convention.

OpenAI — GPT-5.2 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.1

batchInput

Derived at 50% of input.

OpenAI — GPT-5.1

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Nano

cachedInput

Derived at 10% of input.

OpenAI — GPT-5 Nano

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Nano

batchOutput

Derived at 50% of output.

Google — Gemini 3 Flash

cachedInput

Derived at 10% of input — Google caching discount convention ~90%.

Google — Gemini 3.1 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 3.1 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 3.1 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Pro

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.5 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

cachedInput

Derived at 25% of input per Google 2.0 family caching rates.

Google — Gemini 2.0 Flash

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.0 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

xAI — Grok 4 (legacy)

cachedInput

Extrapolated at 25% of base.

Buy vs Build

Buy an outcome-priced vendor, or build on a token-priced API?

✅ Buying usually wins when…

🔧 Building usually wins when…

1. Your workload

Per-outcome resource estimates (build track)

Engineering cost (build track only)

Track-by-track

Three time horizons

Top 5 build-track vendors

Counted

Not counted

Methodology

Primary sources

Inferred values (marked with * in calculator tables)