Token Reduction Analyzer

Paste your prompt. See what's wasteful.

Automated analysis of redundancy, verbosity, and low-value tokens. Typical prompts have 30-50% reducible tokens.

Client-side analysis · nothing leaves your browser Pricing verified: 2026-06-05

What this calculator does

Paste a prompt and see exactly which tokens are wasteful — politeness padding, redundant instructions, over-stuffed few-shot examples — and what trimming them saves at your volume.

Why use it

Most prompts carry 20-50% redundant tokens that add cost without improving output
Token reduction stacks multiplicatively with routing and caching
Analysis runs locally in your browser — your prompt is never sent anywhere
See the dollar impact at your request volume across every model

📖 Read the full guide →

These are the inputs, outputs, and how you can use this calculator for your AI workloads.

📥 Inputs you provide

Your promptThe text to analyze
ModelModel used to price savings
Requests per dayDaily call volume for this prompt

📤 Outputs you get

After optimizationToken count once cuts are applied
Potential monthly savingsDollars saved per month
Token reductionShare of input you can cut
FindingsRanked waste patterns

🎯 Use your results to

✂️

Trim the waste

Cut padding, redundancy, and over-stuffed examples; 30-50% reduction is common

📐

Restructure for clarity

Consolidate instructions and prune few-shot down to what actually helps

💰

See the dollar impact

Token cuts become real monthly and annual dollars at your volume

🔌

Build it into your pipeline

Run reductions as a step in your prompt build; MCP available for agents

👇 Now try the calculator below with your own AI workloads

📊 Calculator at a glance

📅 Schedule a meeting via AvatarVA ✉️ Email [email protected]

🎛 CALCULATOR

📝 Paste your prompt

System prompt, user template, or any repeated AI input. Analysis runs locally.

Load sample

Model (for cost calc)

Requests per day

📈 RESULTS

Potential monthly savings

Current tokens

After optimization

Current monthly cost

Optimized monthly

🔍 Findings

📊 Cost across models - before & after optimization

What you save on each model if you apply all suggestions.

Model	Current / day	Optimized / day	Monthly savings	Annual savings

Token estimator → Cache what's left → Prompt caching guide →

📋 What now?

Apply the high-savings findings first — cut politeness padding, redundant instructions, and over-stuffed few-shot examples; 30-50% reduction is typical without touching quality.
Re-test quality after trimming — run your eval set on the leaner prompt before shipping; aggressive cuts can drop accuracy on edge cases.
Then cache what's left — a tight, stable prefix caches better, so prompt caching compounds the savings on top of the token cut.

📅 Book a prompt-optimization session to apply this to your workload →

Vendor / Model

Field

Why it’s inferred

Anthropic — Claude Sonnet 4.6

cachedInput

Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.

Anthropic — Claude Sonnet 4.5

cachedInput

Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.

Anthropic — Claude Sonnet 4.5

batchInput

Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Sonnet 4.5

batchOutput

Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.

Anthropic — Claude Haiku 4.5

cachedInput

Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.

OpenAI — GPT-5.4 Mini

cachedInput

Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.

OpenAI — GPT-5.4 Nano

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Nano

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Nano

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

cachedInput

Derived at 10% of input — OpenAI 90% cache-hit convention.

OpenAI — GPT-5.4 Pro

batchInput

Derived at 50% of input — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.4 Pro

batchOutput

Derived at 50% of output — OpenAI Batch API uniform 50% discount.

OpenAI — GPT-5.2

cachedInput

Derived at 10% of input; no residency uplift.

OpenAI — GPT-5.2

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2

batchOutput

Derived at 50% of output.

OpenAI — GPT-5

cachedInput

Derived at 10% of input.

OpenAI — GPT-5

batchInput

Derived at 50% of input.

OpenAI — GPT-5

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.5 Pro

cachedInput

Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.

OpenAI — GPT-5.5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.2 Pro

cachedInput

Derived at 10% of input — pro-tier convention.

OpenAI — GPT-5.2 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5.2 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5.1

batchInput

Derived at 50% of input.

OpenAI — GPT-5.1

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Pro

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Pro

batchOutput

Derived at 50% of output.

OpenAI — GPT-5 Nano

cachedInput

Derived at 10% of input.

OpenAI — GPT-5 Nano

batchInput

Derived at 50% of input.

OpenAI — GPT-5 Nano

batchOutput

Derived at 50% of output.

Google — Gemini 3 Flash

cachedInput

Derived at 10% of input — Google caching discount convention ~90%.

Google — Gemini 3.1 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 3.1 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 3.1 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Pro

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash

cachedInput

Derived at 10% of input.

Google — Gemini 2.5 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.5 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.5 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

cachedInput

Derived at 25% of input per Google 2.0 family caching rates.

Google — Gemini 2.0 Flash

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

cachedInput

Derived at 10% of input — Google caching convention.

Google — Gemini 2.0 Flash-Lite

batchInput

Derived at 50% of input — Google Batch API uniform 50% discount.

Google — Gemini 2.0 Flash-Lite

batchOutput

Derived at 50% of output — Google Batch API uniform 50% discount.

xAI — Grok 4 (legacy)

cachedInput

Extrapolated at 25% of base.

Paste your prompt. See what's wasteful.

Go deeper

The calculator's an estimate. Want the real number?

Methodology

Primary sources

Inferred values (marked with * in calculator tables)