Embedding Cost · for RAG builders

What does your RAG setup cost to build + run?

Indexing, re-indexing, query-side embeddings, vector storage. Compare 9 embedding models side-by-side.

Pricing verified: 2026-06-05 9 embedding models

📅 Schedule a meeting via AvatarVA ✉️ Email [email protected]

What this calculator does

Compare 9 embedding models on indexing + query-time cost for your RAG corpus.

Why use it

Embedding choice affects not just cost but retrieval quality — see both at once
Separate indexing cost (one-time) from query cost (recurring) — most people conflate them
Compare OpenAI, Voyage, Cohere, Gemini, BGE, Nomic side-by-side

Enter key values

Docs, tokens/doc, queries/month. Pick an embedding model.

Adjust re-indexing

How often do you re-embed the full corpus? Default: 2x/year.

Read 3-phase cost

Indexing / re-indexing / query-side costs broken out separately.

Pick the right model

Lower $/M isn't always winner — bigger models may have fewer chunks at same recall.

Enter corpus stats

Docs = total documents in your RAG corpus. Tokens/doc = average document length in tokens (1500 typical). Queries/month = retrieval calls (user queries that hit the vector DB). Query tokens = avg user query length. These drive indexing (one-time) vs query-side (recurring) cost.

Pick embedding + re-indexing cadence

Embedding model choice matters for quality AND cost. Voyage and Cohere are retrieval-tuned (better recall). OpenAI 3-small is cheap and general. BGE/Nomic are open-source. Re-indexing cadence depends on how often your corpus drifts — stable knowledge bases: 1-2x/year. Fast-changing content: quarterly or more. Dimension truncation (OpenAI 3-large supports it) can cut storage ~50% with minor quality loss.

Interpret 3-phase cost

Indexing cost (first-time) is usually negligible unless your corpus is huge (10M+ docs). Re-indexing amortized monthly — shows the recurring cost of keeping the index fresh. Query-side cost scales with monthly queries and is often dominant for high-traffic RAG. The alternatives table compares all 9 models at your workload.

Match model to workload

For retrieval quality benchmarks, see the source citations at the bottom of the calc. Pair this with Vector DB Cost for storage side, and RAG Pipeline for the full end-to-end view. Most teams over-pay on embeddings — switching OpenAI 3-small → Voyage-3-lite keeps cost similar with better retrieval.

📊 Calculator at a glance

🎛 CALCULATOR

📚 Your document corpus

What goes into the vector database.

Number of documents

Avg tokens / doc Doc gets chunked into smaller pieces for embedding.

Re-indexing frequency How often content changes + needs re-embedding.

🔍 Query patterns

Queries / month

Avg tokens / query

🧬 Your pick

Embedding model

Use Batch API for indexing (50% off where available) Only OpenAI embeddings publish Batch pricing. Indexing jobs are typically batch-eligible.

📈 RESULTS

Total monthly embedding cost

🏗 One-time indexing

🔄 Monthly re-indexing

🔍 Monthly query cost

💾 Vector storage estimate

Total vectors (chunks)-

Dimensions-

Raw vector size-

With index overhead (~1.3x)-

Storage note: Pinecone, Weaviate, Qdrant, pgvector - pricing varies widely. Typical: $0.025-$0.30 per GB/mo. Our vector DB cost guide breaks down each option.

💡 Recommendations

📊 Cost across every embedding model

Same corpus + queries, different embedding provider. Current selection highlighted.

Model	Dimensions	Max input	Indexing cost	Monthly cost	Annual cost

Query-side LLM cost → Margin calculator → Get a RAG architecture review →

🎯 Use this result to

🧬 Plan embedding budget — Index is a one-time spike. Queries are recurring. See the split before picking a provider.
🔄 Justify reindex cadence — Quarterly vs monthly reindex changes annual cost 4x. Math decides the schedule.
🆚 Compare 8 embedding providers — OpenAI, Cohere, Voyage, Mistral side-by-side. Pricing varies 10x across providers.
🔌 Integrate with your AI agents — MCP available for agentic workflow integration. Cost-aware embedding pipelines.

📅 Schedule a call to apply this to your workload

📋 What now?

Index once, query forever — indexing is a near one-time cost; the recurring spend is query-side embeddings, so optimize there first when queries/month is high.
Right-size the model — a cheaper or smaller-dimension model (e.g. OpenAI 3-small or a truncated 3-large) often keeps recall while cutting both $/1M and vector storage.
Batch the indexing job — if a re-index can wait hours, the Batch tier is ~50% off on providers that publish it.

📅 Book a working session to apply this to your workload →

What does your RAG setup cost to build + run?

Go deeper

The calculator's an estimate. Want the real number?