Guides → Playground & Guide → Agent Loop Cost - Multi-Turn Agent Budget with Runaway Risk

Agent Loop Cost - Multi-Turn Agent Budget with Runaway Risk

Meet Aisha Patel. Staff Engineer building a multi-step research agent. "Each task takes 4-8 LLM calls. What does that actually cost - and what happens when an agent loops forever?"

🔥 One bad prompt last week ran 47 loops before timeout. Bill spike: $340 for one task.

The story

Agent loops compound. Each call has accumulated context from previous calls. Turn 1: 2K tokens. Turn 5: 8K tokens. Turn 10: 20K tokens. By turn 15, you're paying premium for context the model can barely use.

Aisha's research agent typically does 4-8 calls per task. Average task: 18K total input + 3K output. At Sonnet pricing, ~$0.10 per task. 1,000 tasks/day = $3K/mo. Looks fine - until one task ran 47 loops at 200K tokens accumulated context, costing $340.

Two costs to model: typical-case (4-8 turns, well-behaved) and runaway-case (loop until timeout, 30-60 turns). The gap between them is your operational risk. Hard limits and circuit breakers make this manageable.

📊 CALCULATOR AT A GLANCE

🚀 Open the full calculator ✉️ Email [email protected]

About this calculator: Agent Loop Cost - Multi-Turn Agent Budget with Runaway Risk

Multi-turn agents (ReAct, AutoGPT, function-calling) have compounding token costs. Model the per-task cost + runaway risk before deploying. Live pricing across 17 vendors.

Inputs you control

Input	Impact on result	Range	Typical
Average turns per task	Tool-using agents typically do 3-10 turns. Research agents 8-20. Long-running workflows 20+. ReAct-style is heavier than direct prompting.	1 – 30	6
Tokens per turn (avg)	Includes system + tools + accumulated context. Grows each turn - start ~2K, by turn 10 often ~10K+. Use a midpoint estimate.	500 – 50K	3000
Tasks per day	How many full agent tasks (not turns) the system completes daily.	10 – 50K	1000
Runaway probability per task (%)	Fraction of tasks that hit the loop limit (30-60 turns). 0.5% is typical with basic guardrails. 0% means perfect circuit breakers (rare). 2%+ means you have real reliability problems.	0 – 5	0.5

Outputs computed for you · model: `token`

Output	How inputs affect it
Monthly cost ($)	computed from inputs
Annual cost ($)	monthlyUsd × 12

Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.

What you're looking at

Each input shapes your cost. Move the slider — see the impact.

Average turns per task 6

Tool-using agents typically do 3-10 turns. Research agents 8-20. Long-running workflows 20+. ReAct-style is heavier than direct prompting.

Estimated: —

Tokens per turn (avg) 3,000

Includes system + tools + accumulated context. Grows each turn - start ~2K, by turn 10 often ~10K+. Use a midpoint estimate.

Estimated: —

Tasks per day 1,000

How many full agent tasks (not turns) the system completes daily.

Estimated: —

Runaway probability per task (%) 0.5

Fraction of tasks that hit the loop limit (30-60 turns). 0.5% is typical with basic guardrails. 0% means perfect circuit breakers (rare). 2%+ means you have real reliability problems.

Estimated: —

Ready to run the numbers?

Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.

🚀 Open the full calculator →

Reading your result

Typical-case cost is the baseline. Per-task: tokens × per-turn × turns. Multiply by tasks/day × 30 = monthly. This is the budget you forecast.

Runaway-case cost is the tail risk. 0.5% of tasks hitting 50 turns at 200K accumulated tokens = a few hundred dollars/month in invisible spend. At 2% runaway rate, this becomes the dominant cost line. Most teams don't notice until the bill arrives.

Compare to non-agent alternatives. If this same task could be done with a single 30K-token call (skip ReAct, use long-context model), is the agent worth the 2-3× cost premium? Sometimes yes (better tool use, audit trail). Sometimes no (just looks impressive).

Set hard limits. Max turns + max tokens per task + max cost per task = your circuit breakers. Without these, one bad prompt can blow a daily budget.

What "good" looks like:

Single tool call (3 turns): $0.02-0.05/task on balanced tier
Standard agent (6-8 turns): $0.08-0.15/task - Aisha's case
Research agent (12-20 turns): $0.30-0.80/task
Long workflow (25+ turns): $1.50-5/task - usually merits a single long-context call instead

Best vendors for agentic workloads

Verified 20 hours ago

1

GPT-5 Mini

$0.250 in · $2.00 out ·
2

Command

$1.00 in · $2.00 out ·
3

devstral-2

$0.400 in · $2.00 out ·

Three real scenarios

Same calculator, three different team sizes. Click a tab to see how the numbers shift.

$2,637 / month ≈ $31,642 / year

Simple lookup - user query → API call → format response. 3 turns, 2K tokens each. 5K tasks/day. ~$450/mo. Runaway risk is low (well-defined boundary).

Healthy range: $300-600/mo

See inputs used

turnsPerTask: 3
tokensPerTurn: 2,000
tasksPerDay: 5,000
runawayProbability: 0.1
runawayMaxTurns: 50
runawayMaxTokens: 200,000
modelTier: balanced

Trade-offs

Cost isn't the only dimension. Click any constraint — see how recommendations change.

What matters most to you? Click any dimension — recommendations update.

Best fit for "cost":

Hard turn + token limits Eliminates runaway tail
Cheap-tier sub-agents Route reasoning to premium, tool calls to cheap

Agent cost optimization isn't about model choice - it's about turn count. Cutting turns 6→4 is a 33% savings. Cutting accumulated context per turn (smart context windowing) is another 30-50%.

Use cases

Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.

$703.15 / month ≈ $8,438 / year

Code agents have higher runaway risk than most - they can get stuck in lint-fix loops. Hard turn limit at 30 + token limit at 200K is essential.

Healthy range: $200-700/mo

See inputs used

turnsPerTask: 8
tokensPerTurn: 5,000
tasksPerDay: 200
runawayProbability: 1
runawayMaxTurns: 100
runawayMaxTokens: 500,000
modelTier: balanced

What this calculator can't tell you

Honest limitations — every model is wrong; some are useful. Where this one falls short:

Token-per-turn growth isn't modeled - actual context grows turn-by-turn (2K → 4K → 8K → ...). Calc uses average.
Doesn't model tool execution cost (browser opens, code runs) - that's separate infra.
Runaway probability is heuristic - real number depends on your prompts + guardrails.
Premium-for-plan / cheap-for-execute split isn't modeled - assumes uniform tier per task.

For these, use: Agentic Workflow Cost for full pipeline. Multi-Model Router for tiered routing.

Where to go next

Full agentic workflow (Claude Code, Cursor) →

Tasks × tokens × cache + runaway buffer. Production agents.

Route reasoning vs execution to different tiers →

Premium for planner, cheap for tool calls. 50%+ savings.

Long-context alternative analysis →

Sometimes a single 30K-token call beats 8 turns of 5K context.

Methodology

Source: https://platform.claude.com/docs/en/build-with-claude/agents
Extraction: Per-turn token accumulation modeled against 12 production agent traces.
Editorial gate: 8-layer defense — see aicost.ai/ai-cost-economics
Last verified: 6/4/2026, 8:00:00 PM

Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.

3 years of pricing history

Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.

View 3-year history for →

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

All prices are USD per 1 million tokens, current as of 2026-06-05.
Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
Batch API discounts are 50% off standard rates across providers that offer Batch mode.
Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
Long-context pricing tiers apply when input exceeds model threshold.
Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic

2026-06-05

https://www.anthropic.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Anthropic Docs

2026-06-05

https://platform.claude.com/docs/en/about-claude/pricing

Daily snapshot since Sep 2023 · 578 days captured

OpenAI

2026-06-05

https://openai.com/api/pricing/

Daily snapshot since Sep 2023 · 579 days captured

Google AI

2026-06-05

https://ai.google.dev/gemini-api/docs/pricing

Daily snapshot since Dec 2023 · 554 days captured

Google Vertex

2026-06-05

https://cloud.google.com/vertex-ai/generative-ai/pricing

Daily snapshot since Dec 2023 · 554 days captured

DeepSeek

2026-06-05

https://api-docs.deepseek.com/quick_start/pricing

Daily snapshot since May 2024 · 493 days captured

xAI

2026-06-05

https://x.ai/api

Daily snapshot since Nov 2024 · 411 days captured

Mistral

2026-06-05

https://mistral.ai/pricing

Daily snapshot since Dec 2023 · 552 days captured

Cohere

2026-06-05

https://cohere.com/pricing

Daily snapshot since Sep 2023 · 578 days captured

Voyage AI

2026-06-05

https://docs.voyageai.com/docs/pricing

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →

Agent Loop Cost - Multi-Turn Agent Budget with Runaway Risk

The story

About this calculator: Agent Loop Cost - Multi-Turn Agent Budget with Runaway Risk

Inputs you control

Outputs computed for you · model: `token`

What you're looking at

Ready to run the numbers?

Reading your result

Best vendors for agentic workloads

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

The story

About this calculator: Agent Loop Cost - Multi-Turn Agent Budget with Runaway Risk

Inputs you control

Outputs computed for you · model: token

What you're looking at

Ready to run the numbers?

Reading your result

Best vendors for agentic workloads

Three real scenarios

Trade-offs

Best fit for "cost":

Best fit for "hallucination":

Best fit for "compliance":

Best fit for "privacy":

Best fit for "latency":

Best fit for "vendor lock-in":

Best fit for "mlops overhead":

Use cases

What this calculator can't tell you

Where to go next

Methodology

3 years of pricing history

Methodology

Primary sources

Inferred values (marked with * in calculator tables)

Outputs computed for you · model: `token`