Published May 2, 2026 · Updated daily · ~10 min read

The new economics of AI: why pricing got hard, and what to do about it

A guide for SMB founders, developers, finance leads, and journalists trying to understand the cost layer of the AI stack. We monitor 17 vendors continuously — here's what's changing, who it hits hardest, and how to navigate it.

17
Vendors monitored daily
40+
Calculators by use case
3-layer
Editorial gate on data

Last week, our crawlers detected a 25% across-the-board price cut on Google's gemini-2.0-flash-lite — input, output, batch input, batch output, all dropped in unison. The change wasn't announced. There was no Google blog post, no press release, no notification email. The pricing page just quietly updated, and developers using that model saw their bill shrink the next morning.

This is what AI cost economics looks like in 2026. Prices change without warning. Models get renamed mid-cycle. New billing dimensions appear — context caching storage rates, hybrid seat-plus-overage plans, agentic workflow surcharges — and disappear from documentation just as quickly. The four companies that dominate frontier AI — Anthropic, OpenAI, Google, and Microsoft — are each restructuring pricing in different directions, on different timelines, with different vocabularies for the same concepts.

For most software-as-a-service categories, this would be unusual. For AI, it's now the baseline.

📊 What we've detected — last 14 days

This is what the platform sees, in real time, across the major vendors. Each row links through to the affected vendor's full guide. Nothing here is announced — these are pricing-page deltas our crawlers detect, validate through three layers of sanity checks, and surface for human review before they hit our published guides.

▼ 6% cut
ElevenLabs · eleven-flash-v2-5 · Per-minute audio ($0.18 → $0.17)
Affects voice agents, transcription, realtime chat · See ElevenLabs guide →
Jun 5
▼ 75% cut
Mistral · mistral-large-3 · Input tokens ($2 → $0.5)
Affects every API call — input is the largest cost driver in RAG · See Mistral guide →
Jun 5
▼ 75% cut
Mistral · mistral-large-3 · Output tokens ($6 → $1.5)
Affects every generation — chat, completion, agentic workflows · See Mistral guide →
Jun 5
NEW
xAI · imagine-api-image · Per-image (standard)
Affects image-generation pipelines · See xAI guide →
Jun 5
▼ 90% cut
Google · gemini-3-pro-image · Batch output ($60 → $6)
Affects nightly/async workloads — backfills, embeddings, summarization · See Google guide →
Jun 5
▼ 83% cut
Google · gemini-2-5-flash-native-audio-preview-12-2025 · Input tokens ($3 → $0.5)
Affects every API call — input is the largest cost driver in RAG · See Google guide →
Jun 5
▼ 83% cut
Google · gemini-2-5-flash-native-audio-preview-12-2025 · Output tokens ($12 → $2)
Affects every generation — chat, completion, agentic workflows · See Google guide →
Jun 5
▼ 75% cut
Google · gemini-3-1-flash-live-preview · Input tokens ($3 → $0.75)
Affects every API call — input is the largest cost driver in RAG · See Google guide →
Jun 5
▼ 63% cut
Google · gemini-3-1-flash-live-preview · Output tokens ($12 → $4.5)
Affects every generation — chat, completion, agentic workflows · See Google guide →
Jun 5
▼ 33% cut
Google · gemini-3-1-flash-image · Per-image (standard) ($0.067 → $0.045)
Affects image-generation pipelines · See Google guide →
Jun 5

📡 Live data from our crawler. Want the full change history? View crawler health dashboard →

From subsidies to real-world economics

The last three years of AI pricing were not normal. They were a venture-capital-funded marketing campaign. Token prices fell roughly 80% from late 2022 through 2024 as vendors competed for developer mindshare. Frontier models were sold below the marginal cost of inference. Free tiers were generous. The implicit message to developers was: don't worry about cost, build something interesting.

That era is ending — and the math behind why is straightforward.

Anthropic, OpenAI, Google, and Microsoft have collectively committed to more than $1 trillion in AI infrastructure spending between 2024 and 2027. Microsoft's Azure capex alone is on pace to exceed $80 billion this fiscal year. Google's TPU buildout, OpenAI's Stargate partnership, Anthropic's compute commitments to Bedrock and Vertex — these are physical data centers, GPUs, power contracts, and cooling infrastructure that have to earn a return.

You don't recover a trillion-dollar capex bill by giving away tokens. The subsidies were always temporary. What's arriving now is the actual cost structure of running this technology — and unlike SaaS, where margins compress predictably as scale grows, AI workloads have variable costs that scale with usage in ways most buyers haven't priced in.

The result, for buyers: pricing has become both more expensive AND more complex at the same time. SaaS got cheaper as it got more complex. Cloud got cheaper as it got more complex. AI is going the other way.

Meanwhile, MIT NANDA's State of AI in Business 2025 found that 95% of enterprise generative-AI pilots fail to deliver measurable financial returns. Not because the models don't work — but because nobody priced the workload correctly before deploying it, and nobody updated the cost model when the vendor changed pricing three months later.

Why SMBs are hit hardest

Enterprises have FinOps teams. They have procurement professionals. They have engineers whose job is to read pricing pages carefully. When Anthropic restructures its prompt caching to introduce 1-hour cache tiers separate from 5-minute tiers, an enterprise team notices within a day and updates internal cost models. When Microsoft Copilot pricing fragments into seven different tiers — Family, Personal, Business Basic, Business Standard, Business Premium, Apps for Business, E3/E5 add-ons — an enterprise has someone whose Tuesday meeting is to map all of that against existing contracts.

Small and medium businesses have none of this.

A typical SMB story goes like this: in 2023, one developer chose OpenAI because the docs were good. By 2024, the company had three engineers using a mix of OpenAI for chat, Anthropic for long-context tasks, and Google for embeddings. In 2025, they added an AI-coding assistant on individual subscriptions, plus a vector database, plus an observability tool. By 2026, the monthly AI bill is somewhere between $4,000 and $40,000, the founder has no idea which line items are essential, and the engineering team can't tell you whether a workload migration would save money or cost more, because the pricing changed twice in the last quarter.

This is the shadow IT problem in a new form. Procurement teams don't see the AI spend because much of it goes on personal cards reimbursed monthly. Finance teams see the line items but can't evaluate whether the vendor mix is rational. Engineering teams know the technical tradeoffs but rarely the cost ones. Nobody in the org has the full picture.

⚡ Quick: estimate your agentic-AI burn
Open full tool →

The four-vendor reality

Each major vendor has restructured pricing in distinct ways over the last six to twelve months. No SMB engineering team has time to track all four in real time. Most don't realize the pricing changed until the credit card statement arrives.

Hybrid plans across the four major coding-AI vendors

VendorPlanSeatOverage starts atNotes
Anthropic Claude Pro $17/mo When seat credits exhaust API credits roll over within billing period; team plan adds shared usage pools
GitHub Copilot Pro / Pro+ $19/mo 300 / 1,500 premium requests Pro+ tier ($39) covers Sonnet 4.6 + GPT-5 access
Cursor Pro $20/mo 500 fast requests/month Slow requests unlimited; fast tier for premium models
Microsoft Copilot Business / Enterprise $30/mo Usage-based, beyond seat 7-tier pricing (Family/Personal/Business/E3/E5). Quarterly manual audit.

What no individual vendor will show you: this same comparison. Vendors won't put a competitor's seat price next to theirs. We will. Run the numbers for your team: Overage Forecaster →

Costs aren't just tokens

The most expensive misconception in AI deployment is that cost equals tokens. The actual cost stack of a production AI workload looks like this:

For a typical RAG pipeline or agentic workflow, the token cost is often less than half of the total operating cost. The other half is the surrounding infrastructure — and most of that infrastructure has its own pricing complexity, often opaque, often consumption-based, often poorly aligned with the workload you actually run.

This is why we built our 7-step TCO Wizard — it walks through every cost dimension, not just compute. Persona-tuned (CFO/CTO/Founder/PM) executive synthesis at the end.

The agentic shift changes the math

A standard chat completion in 2024 used roughly 500 to 2,000 tokens of input and produced a similar amount of output. A user might run 50 of these per day. Daily cost: pennies.

A modern agentic workflow — Claude Code editing a multi-file project, an autonomous research agent traversing a knowledge base, a multi-step reasoning chain handling a customer support escalation — consumes 50,000 to 200,000 tokens per task. A team running 100 such tasks per day is looking at $15,000 to $60,000 per month, invisibly accumulating, often charged to whichever credit card a developer signed up with first.

This is not a hypothetical. The shift from "AI as feature" to "AI as workforce" — even at small scale — changes the cost equation by an order of magnitude. Most cost models built in 2023 are wrong by 2026.

Per-workflow cost across major vendors (~100K tokens/task)

ModelPer workflow100/day → monthly
Anthropic Claude Sonnet 4.6 $0.45 $1,350 Guide →
OpenAI GPT-5 $0.50 $1,500 Guide →
Google Gemini 3 Flash $0.18 $540 Guide →
DeepSeek V4 $0.08 $240 Guide →
Anthropic Haiku 4.5 $0.10 $300 Guide →

Illustrative figures based on published rates. Real workflow cost depends on input/output split, cache hit rate, batch eligibility — quantify for your scenario with the Agent Loop Cost calculator (architect view) or Agentic Workflow Cost (consumer view).

📊 Quick: will your AI investment pay back?
Open full tool →

How we help — concretely

We built aicost.ai because nobody else was doing this work credibly for the small-to-medium segment. Our coverage is structured around how decisions actually get made:

All our tools, organized by what you're trying to accomplish

Browse by category. Each tool is built around a specific decision SMBs face — not vendor-by-vendor pricing, but use-case-by-use-case math. Most tools have a Quick mode (2-4 fields) and Advanced mode (full control). Mobile defaults to Quick.

💼 Plan & seat economics

Pick the right tier, forecast overage before it bites

🔌 API & token economics

Optimize per-call cost on production workloads

🤖 Agentic & multi-step

Modern AI workloads consume 10× the tokens — model accordingly

🔍 RAG & vector search

Storage + embeddings + retrieval + generation — the full stack

🎬 Voice, image, video

Multimodal pricing — different units, different math

📊 Strategy & ROI

Build the business case before you ship

📐 Methodology — how we ensure the data is right

For journalists evaluating us as a source, this section matters. AI pricing tracking is full of low-quality automated scrapers that publish whatever a language model extracts. We built three layers of defense against that failure mode:

  1. Extraction-level rules. Our extraction prompts are tuned for the specific failure patterns each vendor introduces — Google's three different cache rates, Anthropic's cache write versus cache read distinction, the difference between per-token and per-minute pricing for audio models.
  2. Code-level sanity invariants. Even if extraction returns something implausible, our code rejects values that violate physical constraints. A cached input rate must be cheaper than a base input rate; a batch rate must be cheaper than a realtime rate; a premium-tier rate must be more expensive than a standard rate. Violations get flagged, not published.
  3. Editorial gate. Our published pricing data lives in a human-curated source file that the crawler reads but never writes. Every detected change accumulates in a changelog; a human reviews each row before it lands in user-facing guides. This is slower than fully automated systems. It's also why our data is correct.

When our crawler detects a vendor change, it appears in the changelog with full provenance. When it appears in a published guide, a human has signed off. We treat published pricing claims with the same care a financial publication treats earnings figures. Per-calculator methodology lives at /methodology.

The optimistic thread

When SMBs achieve real ROI on AI, they buy more AI. The economics are aligned. Vendors don't actually want pricing to be opaque — opacity drives adoption down and churn up. But the rate of pricing change has outpaced the rate at which vendors can document it clearly, and the gap that's opened up is what aicost.ai fills.

The MIT NANDA finding — 95% of enterprise generative-AI pilots fail to deliver measurable financial returns — is not a problem of AI capability. The models work. The pilots fail because nobody priced the workload correctly before deploying it, and nobody updated the cost model when the vendor changed pricing three months later. Closing that gap — making AI projects predictably profitable — is good for buyers, good for vendors, and good for the trillion-dollar infrastructure investment that's making this technology possible at all.

We're betting the next two years of AI adoption will be defined by which buyers can navigate the cost layer competently, and which can't. We built aicost.ai because we want the SMB segment to be in the first group.

Start with the question you actually have

Every visitor lands here for a different reason. Skip ahead to whatever you came for.

Browse vendor guides All 45+ calculators Full TCO Wizard Live changelog

About

aicost.ai is a product of CloudIntelligence.ai LLC, founded by Subu Vdaygiri — a 17-year veteran of cloud product management at Ingram Micro and Siemens Corporate Research, with executive programs at Wharton (CTO) and Kellogg (CPO) and 10× AWS and Azure certifications. CloudIntelligence.ai's portfolio also includes ToolsInfo.com (115K+ AI tools across 39 verticals), AIPapers.ai (3M+ research papers with semantic search), and AINewsCycle (curated AI industry briefings). aicost.ai launched in early 2026 to bring transparency to AI cost economics.

For press inquiries, methodology questions, or data partnerships, contact us at [email protected].

📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →