Token Estimator

Paste text → see what it costs across every model

Quick token count + dollar estimate before you send a prompt. Pick your content type for accuracy.

Pricing verified: 2026-06-05 63 models ranked Client-side - your text never leaves your browser

What this calculator does

Paste any text and see exactly how many tokens it becomes across major tokenizers (GPT, Claude, Gemini). Cost projection at the current model catalog included.

Why use it

Words ≠ tokens — same text tokenizes 10-20% differently across vendors, breaking naive cost estimates
Avoid the classic budget-blowup: thinking 1 word = 1 token, then discovering 1.3× actual
Pre-flight prompt size before you commit to a model with a tight context window
Calibrate cost-calculator inputs with real numbers instead of guesses

📖 Read the full guide →

Who uses this:

Vibe Coder High Small Business High Enterprise High

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide

Content typeTunes char-per-token expectation
Sample textThe text to tokenize
Expected output tokensResponse size estimate

📤 Outputs you get

Input token countTokens for the pasted text
Chars per token ratioDensity of your text
Cost across modelsWhat this text costs per call

🎯 Use your results to

🔢

Convert text → tokens

Stop guessing — paste real samples, get exact counts

📋

Calibrate other calcs

Plug actual numbers into Cost Calculator instead of placeholder guesses

🔍

Catch bloated prompts

See if your prompt is bigger than expected before it hits production

🪜

Pre-flight context fit

Verify your prompt + retrieval + history fits in your chosen model's context window

👇 Now try the calculator below with your own AI workloads

Paste real samples

Use actual production text — system prompts, real user queries, real retrieval chunks. Synthetic test text often tokenizes differently because it lacks the irregular patterns (typos, mixed-case, emoji, code fragments) that real input has. If you have a prompt template + variable slots, paste a fully-rendered example, not the template.

Pick the right content type

English prose: ~4 chars/token. Code: ~3 chars/token (more punctuation, more whitespace tokens). JSON: ~3 chars/token (curly braces, quotes, colons each consume tokens). Mixed: ~3.5 chars/token. The display ratio updates as you toggle so you understand the cost density. Set this correctly — the tokenizer applies the same algorithm, but the displayed expectations differ.

Interpret the count

The big number is total tokens for the input you pasted. Cost projections show this text's cost across ~10 major models at current pricing. If your prompt has variable parts, multiply by your variable's typical size. The output token estimate uses your "expected output tokens" field — set this to a realistic value (typical chat reply: 150-400; code generation: 500-1500).

Plug into downstream calcs

Copy the token count into Cost Calculator as input tokens, set output tokens from your "what response am I expecting" estimate, and run the actual monthly cost. Pair with Token Reduction Analyzer if the number is bigger than expected — usually 20-40% of prompt tokens are reducible without quality loss.

📊 Calculator at a glance

📅 Schedule a meeting via AvatarVA ✉️ Email [email protected]

🎛 CALCULATOR

📝 Paste your text

Prompt, document chunk, or any content you'd send to an AI API.

Content type (affects token ratio)

English prose Code JSON / structured Mixed

English prose: ~4 chars per token (GPT-5.x tokenizer).

Try:

Characters

Words

Est. tokens

Note: This is an estimate. Each provider uses a slightly different tokenizer.

Expected output tokens (assume for cost calculation) Output is usually priced 3-5x higher than input. Matters for accurate cost.

📈 RESULTS

💵 Cost across all models

Enter text to see per-request cost ranked cheapest first.

Model	Per request	Per 1K requests	Per 1M requests
Paste some text above to compare costs.

Full cost calculator → Compare all models →

📋 What now?

Right-size your prompts — trim what's wasteful, keep what matters
Pick the cheapest tokenizer — the per-vendor table is ranked cheapest-first for your exact text
Project it to scale — per-request × your volume = the real monthly bill

Need help cutting your AI bill? 💼 Talk to a CloudIntelligence advisor →

Now that you have your token count…

What this means + what to do next

💡 What to consider beyond this token count for full TCO

Per-call retry overhead (typically 3-15% on top of base token cost)
Conversation history accumulation in multi-turn workloads
System-prompt overhead repeated on every call (often 200-1000 tokens you're paying for forever)
Output bloat — verbose models cost more than the input estimate suggests

Rule of thumb: Multiply by 1.15-1.3× to account for retries, system-prompt overhead, and output variance in real production traffic.

Quantify the hidden costs:

Get the actual $/month at the token counts you measured here Cost Calculator
For multi-turn workloads, this single-call count vastly underestimates total agent cost Agent Loop Cost
If your token count is bigger than expected, often 20-40% is reducible without quality loss Token Reduction Analyzer

$ How this fits your overall ROI

This is a measurement tool — ROI conversations happen downstream:

Is my prompt 30% bigger than it needs to be? (Run reduction analysis.)
How much of my token budget is overhead vs unique input per call?
Would a smaller model handle this prompt with acceptable quality?

Bridge to ROI:

Quantify savings from compression, context-window-pruning, structured output Token Reduction Analyzer
If your prompt has a stable prefix, caching can cut effective cost 50-80% Prompt Cache Roi
If some queries are simpler than others, routing cuts cost without quality loss Multi Model Router

Doing something different?

If you don't have sample text yet, these calcs work from estimates:

You can estimate token shape from memory (typical chat ~1500 in, ~500 out) Cost Calculator
You're planning multiple apps without specific prompts yet Budget Planner
You want to see how this cost grows at 10× and 100× volume Scale Projection

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Vendor / Model

Field

Why it’s inferred

Anthropic — Claude Sonnet 4.6

cachedInput

Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.

Anthropic — Claude Sonnet 4.5

cachedInput

Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.

Anthropic — Claude Sonnet 4.5

batchInput

Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Sonnet 4.5

batchOutput

Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Haiku 4.5

cachedInput

Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.

OpenAI — GPT-5.4 Mini

cachedInput

Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.

OpenAI — GPT-5.4 Nano

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Nano

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Nano

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Pro

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.2

cachedInput

Derived at 10% of input; no residency uplift.

OpenAI — GPT-5.2

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2

batchOutput

Derived at 50% of output.

OpenAI — GPT-5

cachedInput

Derived at 10% of input.

OpenAI — GPT-5

batchInput

Derived at 50% of input.

OpenAI — GPT-5

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.5 Pro

cachedInput

Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.

OpenAI — GPT-5.5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.2 Pro

cachedInput

Derived at 10% of input — pro-tier convention.

OpenAI — GPT-5.2 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.1

batchInput

Derived at 50% of input.

OpenAI — GPT-5.1

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Nano

cachedInput

Derived at 10% of input.

OpenAI — GPT-5 Nano

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Nano

batchOutput

Derived at 50% of output.

Google — Gemini 3 Flash

cachedInput

Derived at 10% of input — Google caching discount convention ~90%.

Google — Gemini 3.1 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 3.1 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 3.1 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Pro

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.5 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

cachedInput

Derived at 25% of input per Google 2.0 family caching rates.

Google — Gemini 2.0 Flash

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.0 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

xAI — Grok 4 (legacy)

cachedInput

Extrapolated at 25% of base.

Paste text → see what it costs across every model

What this means + what to do next

Go deeper

The calculator's an estimate. Want the real number?

Methodology

Primary sources

Inferred values (marked with * in calculator tables)