A guide for SMB founders, developers, finance leads, and journalists trying to understand the cost layer of the AI stack. We monitor 17 vendors continuously — here's what's changing, who it hits hardest, and how to navigate it.
Last week, our crawlers detected a 25% across-the-board price cut on Google's
gemini-2.0-flash-lite —
input, output, batch input, batch output, all dropped in unison. The change wasn't announced.
There was no Google blog post, no press release, no notification email. The pricing page just
quietly updated, and developers using that model saw their bill shrink the next morning.
This is what AI cost economics looks like in 2026. Prices change without warning. Models get renamed mid-cycle. New billing dimensions appear — context caching storage rates, hybrid seat-plus-overage plans, agentic workflow surcharges — and disappear from documentation just as quickly. The four companies that dominate frontier AI — Anthropic, OpenAI, Google, and Microsoft — are each restructuring pricing in different directions, on different timelines, with different vocabularies for the same concepts.
For most software-as-a-service categories, this would be unusual. For AI, it's now the baseline.
This is what the platform sees, in real time, across the major vendors. Each row links through to the affected vendor's full guide. Nothing here is announced — these are pricing-page deltas our crawlers detect, validate through three layers of sanity checks, and surface for human review before they hit our published guides.
eleven-flash-v2-5 · Per-minute audio
($0.18 → $0.17)
mistral-large-3 · Input tokens
($2 → $0.5)
mistral-large-3 · Output tokens
($6 → $1.5)
imagine-api-image · Per-image (standard)
gemini-3-pro-image · Batch output
($60 → $6)
gemini-2-5-flash-native-audio-preview-12-2025 · Input tokens
($3 → $0.5)
gemini-2-5-flash-native-audio-preview-12-2025 · Output tokens
($12 → $2)
gemini-3-1-flash-live-preview · Input tokens
($3 → $0.75)
gemini-3-1-flash-live-preview · Output tokens
($12 → $4.5)
gemini-3-1-flash-image · Per-image (standard)
($0.067 → $0.045)
📡 Live data from our crawler. Want the full change history? View crawler health dashboard →
The last three years of AI pricing were not normal. They were a venture-capital-funded marketing campaign. Token prices fell roughly 80% from late 2022 through 2024 as vendors competed for developer mindshare. Frontier models were sold below the marginal cost of inference. Free tiers were generous. The implicit message to developers was: don't worry about cost, build something interesting.
That era is ending — and the math behind why is straightforward.
Anthropic, OpenAI, Google, and Microsoft have collectively committed to more than $1 trillion in AI infrastructure spending between 2024 and 2027. Microsoft's Azure capex alone is on pace to exceed $80 billion this fiscal year. Google's TPU buildout, OpenAI's Stargate partnership, Anthropic's compute commitments to Bedrock and Vertex — these are physical data centers, GPUs, power contracts, and cooling infrastructure that have to earn a return.
You don't recover a trillion-dollar capex bill by giving away tokens. The subsidies were always temporary. What's arriving now is the actual cost structure of running this technology — and unlike SaaS, where margins compress predictably as scale grows, AI workloads have variable costs that scale with usage in ways most buyers haven't priced in.
The result, for buyers: pricing has become both more expensive AND more complex at the same time. SaaS got cheaper as it got more complex. Cloud got cheaper as it got more complex. AI is going the other way.
Meanwhile, MIT NANDA's State of AI in Business 2025 found that 95% of enterprise generative-AI pilots fail to deliver measurable financial returns. Not because the models don't work — but because nobody priced the workload correctly before deploying it, and nobody updated the cost model when the vendor changed pricing three months later.
Enterprises have FinOps teams. They have procurement professionals. They have engineers whose job is to read pricing pages carefully. When Anthropic restructures its prompt caching to introduce 1-hour cache tiers separate from 5-minute tiers, an enterprise team notices within a day and updates internal cost models. When Microsoft Copilot pricing fragments into seven different tiers — Family, Personal, Business Basic, Business Standard, Business Premium, Apps for Business, E3/E5 add-ons — an enterprise has someone whose Tuesday meeting is to map all of that against existing contracts.
Small and medium businesses have none of this.
A typical SMB story goes like this: in 2023, one developer chose OpenAI because the docs were good. By 2024, the company had three engineers using a mix of OpenAI for chat, Anthropic for long-context tasks, and Google for embeddings. In 2025, they added an AI-coding assistant on individual subscriptions, plus a vector database, plus an observability tool. By 2026, the monthly AI bill is somewhere between $4,000 and $40,000, the founder has no idea which line items are essential, and the engineering team can't tell you whether a workload migration would save money or cost more, because the pricing changed twice in the last quarter.
This is the shadow IT problem in a new form. Procurement teams don't see the AI spend because much of it goes on personal cards reimbursed monthly. Finance teams see the line items but can't evaluate whether the vendor mix is rational. Engineering teams know the technical tradeoffs but rarely the cost ones. Nobody in the org has the full picture.
Each major vendor has restructured pricing in distinct ways over the last six to twelve months. No SMB engineering team has time to track all four in real time. Most don't realize the pricing changed until the credit card statement arrives.
| Vendor | Plan | Seat | Overage starts at | Notes |
|---|---|---|---|---|
| Anthropic | Claude Pro | $17/mo | When seat credits exhaust | API credits roll over within billing period; team plan adds shared usage pools |
| GitHub Copilot | Pro / Pro+ | $19/mo | 300 / 1,500 premium requests | Pro+ tier ($39) covers Sonnet 4.6 + GPT-5 access |
| Cursor | Pro | $20/mo | 500 fast requests/month | Slow requests unlimited; fast tier for premium models |
| Microsoft Copilot | Business / Enterprise | $30/mo | Usage-based, beyond seat | 7-tier pricing (Family/Personal/Business/E3/E5). Quarterly manual audit. |
What no individual vendor will show you: this same comparison. Vendors won't put a competitor's seat price next to theirs. We will. Run the numbers for your team: Overage Forecaster →
The most expensive misconception in AI deployment is that cost equals tokens. The actual cost stack of a production AI workload looks like this:
For a typical RAG pipeline or agentic workflow, the token cost is often less than half of the total operating cost. The other half is the surrounding infrastructure — and most of that infrastructure has its own pricing complexity, often opaque, often consumption-based, often poorly aligned with the workload you actually run.
This is why we built our 7-step TCO Wizard — it walks through every cost dimension, not just compute. Persona-tuned (CFO/CTO/Founder/PM) executive synthesis at the end.
A standard chat completion in 2024 used roughly 500 to 2,000 tokens of input and produced a similar amount of output. A user might run 50 of these per day. Daily cost: pennies.
A modern agentic workflow — Claude Code editing a multi-file project, an autonomous research agent traversing a knowledge base, a multi-step reasoning chain handling a customer support escalation — consumes 50,000 to 200,000 tokens per task. A team running 100 such tasks per day is looking at $15,000 to $60,000 per month, invisibly accumulating, often charged to whichever credit card a developer signed up with first.
This is not a hypothetical. The shift from "AI as feature" to "AI as workforce" — even at small scale — changes the cost equation by an order of magnitude. Most cost models built in 2023 are wrong by 2026.
| Model | Per workflow | 100/day → monthly | |
|---|---|---|---|
| Anthropic Claude Sonnet 4.6 | $0.45 | $1,350 | Guide → |
| OpenAI GPT-5 | $0.50 | $1,500 | Guide → |
| Google Gemini 3 Flash | $0.18 | $540 | Guide → |
| DeepSeek V4 | $0.08 | $240 | Guide → |
| Anthropic Haiku 4.5 | $0.10 | $300 | Guide → |
Illustrative figures based on published rates. Real workflow cost depends on input/output split, cache hit rate, batch eligibility — quantify for your scenario with the Agent Loop Cost calculator (architect view) or Agentic Workflow Cost (consumer view).
We built aicost.ai because nobody else was doing this work credibly for the small-to-medium segment. Our coverage is structured around how decisions actually get made:
Browse by category. Each tool is built around a specific decision SMBs face — not vendor-by-vendor pricing, but use-case-by-use-case math. Most tools have a Quick mode (2-4 fields) and Advanced mode (full control). Mobile defaults to Quick.
For journalists evaluating us as a source, this section matters. AI pricing tracking is full of low-quality automated scrapers that publish whatever a language model extracts. We built three layers of defense against that failure mode:
When our crawler detects a vendor change, it appears in the changelog with full provenance. When it appears in a published guide, a human has signed off. We treat published pricing claims with the same care a financial publication treats earnings figures. Per-calculator methodology lives at /methodology.
When SMBs achieve real ROI on AI, they buy more AI. The economics are aligned. Vendors don't actually want pricing to be opaque — opacity drives adoption down and churn up. But the rate of pricing change has outpaced the rate at which vendors can document it clearly, and the gap that's opened up is what aicost.ai fills.
The MIT NANDA finding — 95% of enterprise generative-AI pilots fail to deliver measurable financial returns — is not a problem of AI capability. The models work. The pilots fail because nobody priced the workload correctly before deploying it, and nobody updated the cost model when the vendor changed pricing three months later. Closing that gap — making AI projects predictably profitable — is good for buyers, good for vendors, and good for the trillion-dollar infrastructure investment that's making this technology possible at all.
We're betting the next two years of AI adoption will be defined by which buyers can navigate the cost layer competently, and which can't. We built aicost.ai because we want the SMB segment to be in the first group.
Every visitor lands here for a different reason. Skip ahead to whatever you came for.
Browse vendor guides All 45+ calculators Full TCO Wizard Live changelogaicost.ai is a product of CloudIntelligence.ai LLC, founded by Subu Vdaygiri — a 17-year veteran of cloud product management at Ingram Micro and Siemens Corporate Research, with executive programs at Wharton (CTO) and Kellogg (CPO) and 10× AWS and Azure certifications. CloudIntelligence.ai's portfolio also includes ToolsInfo.com (115K+ AI tools across 39 verticals), AIPapers.ai (3M+ research papers with semantic search), and AINewsCycle (curated AI industry briefings). aicost.ai launched in early 2026 to bring transparency to AI cost economics.
For press inquiries, methodology questions, or data partnerships, contact us at [email protected].
Last-verified date is the most recent successful daily snapshot
(aicost_pricing_snapshots) or, when no snapshot exists yet,
the latest successful crawler run (aicost_crawler_runs).
10 of 10
vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.)
are not listed.
Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).
| Vendor / Model | Field | Why it’s inferred |
|---|---|---|
| Anthropic — Claude Sonnet 4.6 | cachedInput |
Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier. |
| Anthropic — Claude Sonnet 4.5 | cachedInput |
Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6. |
| Anthropic — Claude Sonnet 4.5 | batchInput |
Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Sonnet 4.5 | batchOutput |
Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount. |
| Anthropic — Claude Haiku 4.5 | cachedInput |
Derived at 10% of input rate — Anthropic 90% cache-hit discount convention. |
| OpenAI — GPT-5.4 Mini | cachedInput |
Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier. |
| OpenAI — GPT-5.4 Nano | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Nano | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Nano | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | cachedInput |
Derived at 10% of input — OpenAI 90% cache-hit convention. |
| OpenAI — GPT-5.4 Pro | batchInput |
Derived at 50% of input — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.4 Pro | batchOutput |
Derived at 50% of output — OpenAI Batch API uniform 50% discount. |
| OpenAI — GPT-5.2 | cachedInput |
Derived at 10% of input; no residency uplift. |
| OpenAI — GPT-5.2 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.5 Pro | cachedInput |
Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention. |
| OpenAI — GPT-5.5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.2 Pro | cachedInput |
Derived at 10% of input — pro-tier convention. |
| OpenAI — GPT-5.2 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.2 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5.1 | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5.1 | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Pro | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Pro | batchOutput |
Derived at 50% of output. |
| OpenAI — GPT-5 Nano | cachedInput |
Derived at 10% of input. |
| OpenAI — GPT-5 Nano | batchInput |
Derived at 50% of input. |
| OpenAI — GPT-5 Nano | batchOutput |
Derived at 50% of output. |
| Google — Gemini 3 Flash | cachedInput |
Derived at 10% of input — Google caching discount convention ~90%. |
| Google — Gemini 3.1 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 3.1 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 3.1 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Pro | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash | cachedInput |
Derived at 10% of input. |
| Google — Gemini 2.5 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.5 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.5 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | cachedInput |
Derived at 25% of input per Google 2.0 family caching rates. |
| Google — Gemini 2.0 Flash | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | cachedInput |
Derived at 10% of input — Google caching convention. |
| Google — Gemini 2.0 Flash-Lite | batchInput |
Derived at 50% of input — Google Batch API uniform 50% discount. |
| Google — Gemini 2.0 Flash-Lite | batchOutput |
Derived at 50% of output — Google Batch API uniform 50% discount. |
| xAI — Grok 4 (legacy) | cachedInput |
Extrapolated at 25% of base. |
Pricing is cross-verified against the
LiteLLM community registry
when available. Daily snapshots are kept in aicost_pricing_snapshots;
every change is logged to aicost_price_changelog with old & new
values for full audit trail. Read the full methodology →