Guides → Playground & Guide → Agent Loop Cost - Multi-Turn Agent Budget with Runaway Risk
Meet Aisha Patel. Staff Engineer building a multi-step research agent. "Each task takes 4-8 LLM calls. What does that actually cost - and what happens when an agent loops forever?"
🔥 One bad prompt last week ran 47 loops before timeout. Bill spike: $340 for one task.
Agent loops compound. Each call has accumulated context from previous calls. Turn 1: 2K tokens. Turn 5: 8K tokens. Turn 10: 20K tokens. By turn 15, you're paying premium for context the model can barely use.
Aisha's research agent typically does 4-8 calls per task. Average task: 18K total input + 3K output. At Sonnet pricing, ~$0.10 per task. 1,000 tasks/day = $3K/mo. Looks fine - until one task ran 47 loops at 200K tokens accumulated context, costing $340.
Two costs to model: typical-case (4-8 turns, well-behaved) and runaway-case (loop until timeout, 30-60 turns). The gap between them is your operational risk. Hard limits and circuit breakers make this manageable.
Multi-turn agents (ReAct, AutoGPT, function-calling) have compounding token costs. Model the per-task cost + runaway risk before deploying. Live pricing across 17 vendors.
token
Below: live sliders. Move them to see numbers change in real time. * Output uses the generic compute model — for precise numbers use the full calculator below.
Each input shapes your cost. Move the slider — see the impact.
Open the full calculator — pick a model, enter your tokens, see per-call, daily, monthly, and annual cost.
🚀 Open the full calculator →Typical-case cost is the baseline. Per-task: tokens × per-turn × turns. Multiply by tasks/day × 30 = monthly. This is the budget you forecast.
Runaway-case cost is the tail risk. 0.5% of tasks hitting 50 turns at 200K accumulated tokens = a few hundred dollars/month in invisible spend. At 2% runaway rate, this becomes the dominant cost line. Most teams don't notice until the bill arrives.
Compare to non-agent alternatives. If this same task could be done with a single 30K-token call (skip ReAct, use long-context model), is the agent worth the 2-3× cost premium? Sometimes yes (better tool use, audit trail). Sometimes no (just looks impressive).
Set hard limits. Max turns + max tokens per task + max cost per task = your circuit breakers. Without these, one bad prompt can blow a daily budget.
Same calculator, three different team sizes. Click a tab to see how the numbers shift.
Simple lookup - user query → API call → format response. 3 turns, 2K tokens each. 5K tasks/day. ~$450/mo. Runaway risk is low (well-defined boundary).
Healthy range: $300-600/mo
Aisha's actual config. ~$500/mo expected case + $100/mo runaway tail. Total ~$600/mo. Runaway insurance: hard turn limit at 15, hard token limit at 50K - would cut tail to <$10/mo with no impact on legitimate tasks.
Healthy range: $400-800/mo typical + $50-150/mo runaway tail
Deep research agent - 20 turns of 8K tokens each = 160K total per task. 100 tasks/day on premium tier. ~$2.5K/mo with significant tail risk. Worth comparing to single-call long-context alternatives.
Healthy range: $1.5-4K/mo (consider premium-tier alternatives)
Cost isn't the only dimension. Click any constraint — see how recommendations change.
Agent cost optimization isn't about model choice - it's about turn count. Cutting turns 6→4 is a 33% savings. Cutting accumulated context per turn (smart context windowing) is another 30-50%.
Multi-agent patterns (planner + executor) let you use Opus for reasoning, Haiku for tool calls. Saves 50-70% with minimal quality loss.
Agent loops in regulated industries need turn-by-turn logging. Builds your AI spend (logs are cheap, but the eng work is real).
Agent loops accumulate sensitive context. Enterprise tier with no-train must apply at every turn, not just the entrypoint.
Agents are slow. 6 turns × 2sec = 12sec. Set user expectations or build async UX. Parallel sub-agents can cut wall-clock time but multiply cost.
OpenAI tool format ≠ Anthropic tool format. Switching vendors means rewriting tool definitions + retesting. Plan for it.
Agents fail subtly - wrong tool call, slightly off plan. Without an eval pipeline you discover issues months late through complaints. Budget MLOps headcount.
Tradeoff analysis is where most AI projects go sideways. Talk to a CFO-grade AI cost analyst →
Pre-loaded scenarios for the most common applications. Click a tab to see realistic numbers — then the "Try this scenario" button to load it into the calculator above.
Code agents have higher runaway risk than most - they can get stuck in lint-fix loops. Hard turn limit at 30 + token limit at 200K is essential.
Healthy range: $200-700/mo
Browser agents (Playwright + LLM) are turn-heavy and runaway-prone (page-load loops, infinite-scroll traps). Higher runaway probability bakes in the reality.
Healthy range: $1.2-2.5K/mo
Data agents that write+run+iterate Python. 10 turns, 7K tokens (data context grows). Lower runaway risk than browsers, higher than simple tools.
Healthy range: $500-1.2K/mo
Honest limitations — every model is wrong; some are useful. Where this one falls short:
For these, use: Agentic Workflow Cost for full pipeline. Multi-Model Router for tiered routing.
Tasks × tokens × cache + runaway buffer. Production agents.
Route reasoning vs execution to different tiers →Premium for planner, cheap for tool calls. 50%+ savings.
Long-context alternative analysis →Sometimes a single 30K-token call beats 8 turns of 5K context.
Author: Subu Vdaygiri, Founder & CEO of CloudIntelligence.ai. 17 years Fortune 100 (Ingram Micro, Siemens). Wharton CTO program · Kellogg CPO program · 10× AWS+Azure certified.
Why this matters: pricing for major vendors has dropped 40-90% in the last 24 months. A budget set 12 months ago is probably wrong by 30%+.
View 3-year history for →
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →