Embedding Cost · for RAG builders
What does your RAG setup cost to build + run?
Indexing, re-indexing, query-side embeddings, vector storage. Compare 9 embedding models side-by-side.
Pricing verified: 2026-06-05
9 embedding models
What this calculator does
Compare 9 embedding models on indexing + query-time cost for your RAG corpus.
Why use it
- Embedding choice affects not just cost but retrieval quality — see both at once
- Separate indexing cost (one-time) from query cost (recurring) — most people conflate them
- Compare OpenAI, Voyage, Cohere, Gemini, BGE, Nomic side-by-side
1
Enter key values
Docs, tokens/doc, queries/month. Pick an embedding model.
2
Adjust re-indexing
How often do you re-embed the full corpus? Default: 2x/year.
3
Read 3-phase cost
Indexing / re-indexing / query-side costs broken out separately.
4
Pick the right model
Lower $/M isn't always winner — bigger models may have fewer chunks at same recall.
1
Enter corpus stats
Docs = total documents in your RAG corpus. Tokens/doc = average document length in tokens (1500 typical). Queries/month = retrieval calls (user queries that hit the vector DB). Query tokens = avg user query length. These drive indexing (one-time) vs query-side (recurring) cost.
2
Pick embedding + re-indexing cadence
Embedding model choice matters for quality AND cost. Voyage and Cohere are retrieval-tuned (better recall). OpenAI 3-small is cheap and general. BGE/Nomic are open-source. Re-indexing cadence depends on how often your corpus drifts — stable knowledge bases: 1-2x/year. Fast-changing content: quarterly or more. Dimension truncation (OpenAI 3-large supports it) can cut storage ~50% with minor quality loss.
3
Interpret 3-phase cost
Indexing cost (first-time) is usually negligible unless your corpus is huge (10M+ docs). Re-indexing amortized monthly — shows the recurring cost of keeping the index fresh. Query-side cost scales with monthly queries and is often dominant for high-traffic RAG. The alternatives table compares all 9 models at your workload.
4
Match model to workload
For retrieval quality benchmarks, see the source citations at the bottom of the calc. Pair this with Vector DB Cost for storage side, and RAG Pipeline for the full end-to-end view. Most teams over-pay on embeddings — switching OpenAI 3-small → Voyage-3-lite keeps cost similar with better retrieval.
📊 Calculator at a glance
🎛 CALCULATOR
📚 Your document corpus
What goes into the vector database.
Doc gets chunked into smaller pieces for embedding.
How often content changes + needs re-embedding.
🔍 Query patterns
🧬 Your pick
📈 RESULTS
Total monthly embedding cost
-
-
One-time indexing
-
-
Monthly re-indexing
-
-
Monthly query cost
-
-
💾 Vector storage estimate
Total vectors (chunks)-
Dimensions-
Raw vector size-
With index overhead (~1.3x)-
Storage note: Pinecone, Weaviate, Qdrant, pgvector - pricing varies widely. Typical: $0.025-$0.30 per GB/mo. Our vector DB cost guide breaks down each option.
💡 Recommendations
📊 Cost across every embedding model
Same corpus + queries, different embedding provider. Current selection highlighted.
| Model | Dimensions | Max input | Indexing cost | Monthly cost | Annual cost |
|---|
🎯 Use this result to
- 🧬 Plan embedding budget — Index is a one-time spike. Queries are recurring. See the split before picking a provider.
- 🔄 Justify reindex cadence — Quarterly vs monthly reindex changes annual cost 4x. Math decides the schedule.
- 🆚 Compare 8 embedding providers — OpenAI, Cohere, Voyage, Mistral side-by-side. Pricing varies 10x across providers.
- 🔌 Integrate with your AI agents — MCP available for agentic workflow integration. Cost-aware embedding pipelines.
📋 What now?
- Index once, query forever — indexing is a near one-time cost; the recurring spend is query-side embeddings, so optimize there first when queries/month is high.
- Right-size the model — a cheaper or smaller-dimension model (e.g. OpenAI 3-small or a truncated 3-large) often keeps recall while cutting both $/1M and vector storage.
- Batch the indexing job — if a re-index can wait hours, the Batch tier is ~50% off on providers that publish it.