Token Estimator

Paste text → see what it costs across every model

Quick token count + dollar estimate before you send a prompt. Pick your content type for accuracy.

Pricing verified: 2026-06-05 63 models ranked Client-side - your text never leaves your browser
What this calculator does

Paste any text and see exactly how many tokens it becomes across major tokenizers (GPT, Claude, Gemini). Cost projection at the current model catalog included.

Why use it
  • Words ≠ tokens — same text tokenizes 10-20% differently across vendors, breaking naive cost estimates
  • Avoid the classic budget-blowup: thinking 1 word = 1 token, then discovering 1.3× actual
  • Pre-flight prompt size before you commit to a model with a tight context window
  • Calibrate cost-calculator inputs with real numbers instead of guesses
Who uses this:
Vibe Coder High Pre-flight your prompt size before discovering "wait, that hit the context limit" Small Business High Translate "I have these emails / docs / chats" into defensible token counts for the AI budget Enterprise High RFP and procurement need real token shape numbers, not guesses

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide
  • Content typeTunes char-per-token expectation
  • Sample textThe text to tokenize
  • Expected output tokensResponse size estimate
📤 Outputs you get
  • Input token countTokens for the pasted text
  • Chars per token ratioDensity of your text
  • Cost across modelsWhat this text costs per call
🎯 Use your results to
🔢
Convert text → tokens

Stop guessing — paste real samples, get exact counts

📋
Calibrate other calcs

Plug actual numbers into Cost Calculator instead of placeholder guesses

🔍
Catch bloated prompts

See if your prompt is bigger than expected before it hits production

🪜
Pre-flight context fit

Verify your prompt + retrieval + history fits in your chosen model's context window

👇 Now try the calculator below with your own AI workloads

📊 Calculator at a glance
Token Estimator full size
🎛 CALCULATOR
📝 Paste your textThe actual content you want to count tokens for. Paste real samples — synthetic tests often diverge from production text.How to choose: For prompt templates, paste a fully-rendered example with realistic variable values. For repeating workloads, paste 2-3 different examples and average the counts.Read the full guide →

Prompt, document chunk, or any content you'd send to an AI API.

English prose: ~4 chars per token (GPT-5.x tokenizer).
Try:
Characters
0
Words
0
Est. tokens
0
Note: This is an estimate. Each provider uses a slightly different tokenizer.
Output is usually priced 3-5x higher than input. Matters for accurate cost.
📈 RESULTS
💵 Cost across all models

Enter text to see per-request cost ranked cheapest first.

Model Per request Per 1K requests Per 1M requests
Paste some text above to compare costs.
Full cost calculator → Compare all models →
📋 What now?
Need help cutting your AI bill? 💼 Talk to a CloudIntelligence advisor →
Now that you have your token count…

What this means + what to do next

💡 What to consider beyond this token count for full TCO
  • Per-call retry overhead (typically 3-15% on top of base token cost)
  • Conversation history accumulation in multi-turn workloads
  • System-prompt overhead repeated on every call (often 200-1000 tokens you're paying for forever)
  • Output bloat — verbose models cost more than the input estimate suggests
Rule of thumb: Multiply by 1.15-1.3× to account for retries, system-prompt overhead, and output variance in real production traffic.
Quantify the hidden costs:
  • Get the actual $/month at the token counts you measured here Cost Calculator
  • For multi-turn workloads, this single-call count vastly underestimates total agent cost Agent Loop Cost
  • If your token count is bigger than expected, often 20-40% is reducible without quality loss Token Reduction Analyzer
$ How this fits your overall ROI

This is a measurement tool — ROI conversations happen downstream:

  • Is my prompt 30% bigger than it needs to be? (Run reduction analysis.)
  • How much of my token budget is overhead vs unique input per call?
  • Would a smaller model handle this prompt with acceptable quality?
Bridge to ROI:
  • Quantify savings from compression, context-window-pruning, structured output Token Reduction Analyzer
  • If your prompt has a stable prefix, caching can cut effective cost 50-80% Prompt Cache Roi
  • If some queries are simpler than others, routing cuts cost without quality loss Multi Model Router
Doing something different?

If you don't have sample text yet, these calcs work from estimates:

  • You can estimate token shape from memory (typical chat ~1500 in, ~500 out) Cost Calculator
  • You're planning multiple apps without specific prompts yet Budget Planner
  • You want to see how this cost grows at 10× and 100× volume Scale Projection

Go deeper

Our playbooks on cutting this number.

🧮
Full Cost Calculator
Add requests/day → monthly cost
🔍
AI Model Finder
Compare every model side-by-side
💾
Prompt Caching
The 50-90% discount most miss
📉
Token Volatility
Hedge your AI unit costs

The calculator's an estimate. Want the real number?

A 5-day Quickscan ($1,500) reviews your actual usage across every pillar — financial, reliability, governance, privacy, MLOps, observability — and returns a concrete savings plan.

Book a Quickscan →
📖 Data sources & methodology 161 text models · 9 embeddings · 24 vision · 41 audio · 8 vector DBs across 10 vendor pages · last verified 2026-06-05

Methodology

  • All prices are USD per 1 million tokens, current as of 2026-06-05.
  • Vendor-published values have no mark. Inferred/extrapolated values are marked with * and listed below.
  • Batch API discounts are 50% off standard rates across providers that offer Batch mode.
  • Prompt caching discounts vary by provider (typically 80-90% off cached input tokens).
  • Regional data-residency surcharges (Anthropic 1.1x, OpenAI 1.1x, Google regional tiers) are NOT included in base rates.
  • Long-context pricing tiers apply when input exceeds model threshold.
  • Embedding prices are input-only (no output tokens generated).

Primary sources

Last-verified date is the most recent successful daily snapshot (aicost_pricing_snapshots) or, when no snapshot exists yet, the latest successful crawler run (aicost_crawler_runs). 10 of 10 vendors are currently verified. Aggregator services (TokenCost, AI Pricing Guru, etc.) are not listed.

Anthropic
2026-06-05
https://www.anthropic.com/pricing
Daily snapshot since Sep 2023 · 578 days captured
Anthropic Docs
2026-06-05
https://platform.claude.com/docs/en/about-claude/pricing
Daily snapshot since Sep 2023 · 578 days captured
OpenAI
2026-06-05
https://openai.com/api/pricing/
Daily snapshot since Sep 2023 · 579 days captured
Google AI
2026-06-05
https://ai.google.dev/gemini-api/docs/pricing
Daily snapshot since Dec 2023 · 554 days captured
Google Vertex
2026-06-05
https://cloud.google.com/vertex-ai/generative-ai/pricing
Daily snapshot since Dec 2023 · 554 days captured
DeepSeek
2026-06-05
https://api-docs.deepseek.com/quick_start/pricing
Daily snapshot since May 2024 · 493 days captured
xAI
2026-06-05
https://x.ai/api
Daily snapshot since Nov 2024 · 411 days captured
Mistral
2026-06-05
https://mistral.ai/pricing
Daily snapshot since Dec 2023 · 552 days captured
Cohere
2026-06-05
https://cohere.com/pricing
Daily snapshot since Sep 2023 · 578 days captured

Inferred values (marked with * in calculator tables)

Derived from industry conventions, not directly published by the vendor. Typical conventions: cached input = 10% of base (90% off), Batch API = 50% of base (50% off).

Vendor / Model Field Why it’s inferred
Anthropic — Claude Sonnet 4.6 cachedInput Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5 cachedInput Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5 batchInput Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5 batchOutput Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5 cachedInput Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini cachedInput Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro cachedInput Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro batchInput Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro batchOutput Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2 cachedInput Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2 batchInput Derived at 50% of input.
OpenAI — GPT-5.2 batchOutput Derived at 50% of output.
OpenAI — GPT-5 cachedInput Derived at 10% of input.
OpenAI — GPT-5 batchInput Derived at 50% of input.
OpenAI — GPT-5 batchOutput Derived at 50% of output.
OpenAI — GPT-5.5 Pro cachedInput Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.2 Pro cachedInput Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5.2 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5.1 batchInput Derived at 50% of input.
OpenAI — GPT-5.1 batchOutput Derived at 50% of output.
OpenAI — GPT-5 Pro batchInput Derived at 50% of input.
OpenAI — GPT-5 Pro batchOutput Derived at 50% of output.
OpenAI — GPT-5 Nano cachedInput Derived at 10% of input.
OpenAI — GPT-5 Nano batchInput Derived at 50% of input.
OpenAI — GPT-5 Nano batchOutput Derived at 50% of output.
Google — Gemini 3 Flash cachedInput Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash cachedInput Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash cachedInput Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite cachedInput Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite batchInput Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite batchOutput Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy) cachedInput Extrapolated at 25% of base.

Pricing is cross-verified against the LiteLLM community registry when available. Daily snapshots are kept in aicost_pricing_snapshots; every change is logged to aicost_price_changelog with old & new values for full audit trail. Read the full methodology →