AI Cost Consulting

46 free calculators. 46 free guides. And experts when you need them.

Most people self-serve and never need to hire us — that's the whole point of our 46 free calculators and 46 free guides. For everyone else, we have three engagement brands at three sizes each.

Free 15-min call. No credit card. We'll tell you which tier fits — or that the calculators are enough.

42 Free calculators Browse →
42 Free expert guides Browse →
12 Vendor pricing tracked daily View →
$0 All free, all forever

The next-step nobody else does

Most AI cost consultants tell you which AI vendor is cheapest. We do that, then we go further:

"Switch from OpenAI to Claude — save $400/mo. Then add Zapier + GoHighLevel to automate invoice reconciliation — save $1,200/mo in labor. Net: $1,600/mo."

We pair AI cost analysis with ToolsInfo's catalog of 115K+ SMB workflow tools (operated by the same team). You get a plan that cuts AI cost AND labor cost — most consultants can only do one.

Available in: Optimize SMB ($499) · Optimize Enterprise ($5,000) · Forecast SMB+ ($799+)

aicost.ai
Cost intelligence
+
toolsinfo.com
115K+ tools

How it works

  1. 1 Pick a brand and size. Or book the free 15-min discovery if you're unsure.
  2. 2 Pay via Stripe. Fixed-fee pricing, no sales theater. Custom tier = call to scope.
  3. 3 Schedule your kickoff call. Stripe redirects to our scheduler. Pick a time that works.
  4. 4 Get your deliverable. Email-delivered PDF report + recommendations within stated timeline.

FAQ

Why do you offer 42 free calculators if you also sell consulting?

Most people self-serve and never need us — that’s the whole point of having 42 calculators and 42 guides. The paid tiers exist for people who don’t have time to learn what tokens are, or whose stakes are high enough that being wrong costs more than the engagement fee.

How do I know which tier I need?

Book the free 15-min discovery call. We’ll look at your situation and tell you honestly — sometimes the right answer is "the calculators are enough, you don’t need to hire us."

What’s the difference between Clarity and Optimize?

Clarity = diagnosis only. We tell you where the money goes. Optimize = diagnosis + specific recommendations with dollar savings. Plus the ToolsInfo workflow-pairing where it fits. Most SMBs want Optimize.

When do I pay?

You pay via Stripe BEFORE we begin work — that’s how we keep prices low and avoid sales theater. After payment, you’re redirected to schedule your kickoff call via our calendar.

What’s the ToolsInfo connection?

CloudIntelligence.ai also operates ToolsInfo.com, which has 115K+ SMB workflow tools cataloged. When we engage on Optimize+ tiers, we don’t just tell you which AI vendor is cheapest — we also identify which workflows you can automate using ToolsInfo (invoice reconciliation, lead capture, scheduling, etc.). You save AI cost AND labor cost. No other consultant pairs these.

Do you sign NDAs?

Yes. Mutual NDA before any sensitive data exchange. Standard or your template — both work.

Can I get a refund?

Within 48 hours of payment, before kickoff call: full refund. After kickoff: no refunds, but you always receive the deliverable.

Who actually does the work?

Subu Vdaygiri (founder · 17+ yrs Fortune 100 cloud / AI · Wharton CTO + Kellogg CPO · 10× AWS + Azure certified) leads every engagement. Hanvish Vdaygiri (UCI Data Science + Pure Math; built AIPapers.ai vector search on 4.2M papers) supports on data-heavy work.

Who actually does the work

Subu Vdaygiri

Founder & Principal

  • 17+ yrs Fortune 100 (Ingram Micro / CloudBlue, Siemens Corporate Research)
  • Scaled Azure product portfolio to $500M ARR
  • Kellogg CPO Program · Wharton CTO Program
  • 10× AWS + Azure certified
  • Multi-cloud architecture, FinOps, data lake, compliance

Hanvish Vdaygiri

Data & AI Engineer

  • UCI · Dual BS Data Science + Pure Math (June 2026)
  • Minors in Computer Science + Informatics
  • Built live ML pipelines on 4.2M paper vector DB (AIPapers.ai)
  • AWS infrastructure, Python, SQL, LanceDB, production ML
  • Leads cost anomaly detection & ML modeling work

CloudIntelligence.ai LLC — NVIDIA Inception member. Operates ToolsInfo.com (115K+ tools), AIPapers.ai (3M+ papers), AICost.ai (cost intelligence).

Ready to engage?

Pick a brand, pick a size, pay, schedule. Or talk to us first — free.

Or book a free 15-min discovery →
📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →