Google pricing, complete breakdown

Verified 2026-05-25

Verified 2026-05-26, cross-checked against Google pricing page, litellm, openrouter

Google's model lineup spans from the flagship Gemini 3.1 Pro at $2.00 per million input tokens to the highly efficient Gemini 2.5 Flash-Lite at just $0.10 per million. Developers can leverage the high-speed Gemini 3 Flash for $0.50/M input or the ultra-low-cost Gemini 3.1 Flash-Lite at $0.25/M input. These models support massive context windows up to 2 million tokens, with significant discounts for cached inputs. This guide helps you navigate the trade-offs between per-token API costs and fixed-rate consumer subscriptions.

Gemini 2.5 Flash-Lite offers the lowest entry point at just $0.10 per million input tokens.

How Google's pricing universe works

Updated 2026-05-25

Google manages a dual-track ecosystem to capture both high-volume developer traffic and steady consumer recurring revenue. The API track is designed for builders who need granular, metered control over token usage and programmatic access to frontier models. Conversely, the subscription track targets end-users who value predictable monthly costs and integrated features like Google Flow credits and Gmail AI tools. This multi-modal approach allows Google to monetize the same underlying models through different margins and customer acquisition costs depending on the user's technical needs.

API (per-token, metered)

For: Developers, technical teams, startups building products on top of Gemini

Pay only for tokens consumed
Full model lineup including batch, caching, long context
Programmatic via SDKs

When to use: When integrating Google into your own product or running variable batch workloads

Best for: Builders with metered or unpredictable usage

Consumer subscriptions (Plus, Pro, Ultra tiers)

For: Individuals using Google directly for writing, coding, research, analysis

Fixed monthly fee
Generous usage caps
Web/desktop/mobile apps
Often includes newer models first

When to use: When using Google as a daily-driver AI assistant rather than building on it

Best for: Solo professionals, knowledge workers, vibe coders

Business/Team plans

For: Teams of 5-200 needing shared workspaces, admin controls, SSO

Per-seat billing
Centralized billing
Admin & audit controls
Sometimes shared usage pools

When to use: When deploying Google across a team that does NOT need API integration

Best for: Mid-size organizations adopting AI for internal productivity

Enterprise (custom contract)

For: Large organizations with procurement requirements, compliance needs, or volume-discount leverage

Custom pricing and limits
SLAs
DPAs and BAAs
Dedicated support
Sometimes private cloud / VPC

When to use: When per-seat or per-token pricing exceeds ~$50K/year, or when compliance/contractual needs require it

Best for: Enterprises with procurement-led adoption

Cloud marketplaces (Google Vertex AI)

For: Organizations with existing cloud commits or strict data-residency requirements

Same models, slightly different pricing (often parity or small premium)
Counts toward existing cloud spend commits
Stays within cloud's data-protection boundary

When to use: When you already burn down Google Cloud commits and prefer single-bill

Best for: Cloud-committed enterprises

Which one should you pick? If you are building a custom application, use the API for metered billing. For personal research and daily assistance, choose Google AI Pro ($19.99/mo) or the high-usage Ultra 20x ($199.99/mo) for maximum Gemini Pro access. Teams should look toward Enterprise contracts or Vertex AI on Google Cloud for consolidated billing and compliance.

⭐ Most popular Google products by user type

Skip the deep dive. These are the natural fits for each user type.

⭐

Casual Workspace User

Google AI Plus

$10/mo

Natural fit for users who want AI assistance inside Docs and Gmail without the cost of a full storage bundle.

See full profile ↓

⭐

Power User & Creative

Google AI Pro

$20/mo

Designed for individuals who need flagship model performance paired with 2TB of Google One cloud storage.

+ For higher daily caps

See full profile ↓

⭐

Heavy Daily Researcher

Google AI Ultra 5x

$50/mo

Consider this if you frequently hit usage limits on the Pro tier; provides 5x the standard prompt capacity.

See full profile ↓

⭐

Extreme Power User

Google AI Ultra 20x

$150/mo

Built for users whose entire workflow relies on Gemini; offers the highest available consumer-tier throughput.

See full profile ↓

⭐

Software Developer

Gemini API / Vertex AI

Pay-as-you-go

The standard choice for building apps; features context caching and the industry's largest 2M+ token window.

See full profile ↓

🎁 Current promos and time-sensitive deals

What's active right now. Auto-hides expired items.

📅 What changed in the last 30 days

Populated from aicost_price_changelog. Hides automatically when no recent events.

· gemini-2-5-flash-lite-preview-09-2025 added to catalog
New model added: Gemini 2.5 Flash-Lite Preview — inserted by aicost-merge-new-models

· google-gemini-3-1-flash-tts-preview added to catalog
New model added: Gemini 3.1 Flash TTS Preview — inserted by aicost-merge-new-models

· google-gemini-3-1-flash-live-preview added to catalog
New model added: Gemini 3.1 Flash Live Preview — inserted by aicost-merge-new-models

· gemini-3-1-flash-lite-preview added to catalog
New model added: Gemini 3.1 Flash-Lite Preview — inserted by aicost-merge-new-models

· gemini-3-flash-preview added to catalog
New model added: Gemini 3 Flash Preview — inserted by aicost-merge-new-models

· gemini-3-5-flash added to catalog
New model added: Gemini 3.5 Flash — inserted by aicost-merge-new-models

· gemini-3-1-pro-preview added to catalog
New model added: Gemini 3.1 Pro Preview — inserted by aicost-merge-new-models

· gemini-3-1-pro-preview-customtools added to catalog
New model added: Gemini 3.1 Pro Preview (CustomTools) — inserted by aicost-merge-new-models

· gemini-3-1-flash-live-preview output price updated
12.000000 → 4.500000

· gemini-3-1-flash-live-preview input price updated
3.000000 → 0.750000

· gemini-2-5-flash-preview-tts input price updated
0.500000 → 1.000000

· gemini-2-5-flash-preview-tts output price updated
10.000000 → 20.000000

· gemini-2-5-flash-preview-tts batch input price updated
0.250000 → 0.500000

· gemini-2-5-flash-preview-tts batch output price updated
5.000000 → 10.000000

· gemini-3-1-flash-live-preview input price updated
3.000000 → 0.750000

· gemini-3-1-flash-live-preview output price updated
12.000000 → 4.500000

· gemini-3-1-flash-live-preview input price updated
3.000000 → 0.750000

· gemini-3-1-flash-live-preview output price updated
12.000000 → 4.500000

· gemini-3-1-flash-lite · longContextInput changed
Now 0.500000

· gemini-2-5-flash-native-audio-preview-12-2025 input price updated
3.000000 → 0.500000

· gemini-2-5-flash-native-audio-preview-12-2025 output price updated
12.000000 → 2.000000

· gemini-3-1-flash-live-preview input price updated
3.000000 → 0.750000

· gemini-2-5-pro-preview-tts · costPer1MChars changed
20.000000 → 10.000000

· gemini-2-5-flash-lite · longContextInput changed
Now 0.200000

· gemini-2-5-flash-lite · longContextThreshold changed
Now 200000.000000

· veo-3-1-lite-generate-preview · costPerSecond changed
0.100000 → 0.050000

· veo-3-1-fast-generate-preview · costPerSecond changed
0.250000 → 0.100000

· gemini-2-5-flash-lite-preview-09-2025 · longContextInput changed
Now 0.200000

· gemini-2-5-flash-lite-preview-09-2025 · longContextThreshold changed
Now 200000.000000

· gemini-3-1-flash-live-preview output price updated
12.000000 → 4.500000

⏳ Major upcoming changes

Known pricing model changes within the next 90 days

Verified 2026-05-25

Google Cloud Partner Network 2026 Tier Transition

Effective 2026-06-01 ⚠️ Already in effect (4 days ago)

What changes Full implementation of the three-tier system (Select, Premier, Diamond) with updated ACV thresholds ($250k to $20M).

What stays Existing reseller margins for Select tier partners remain at 8-12%.

Who's affected Google Cloud Partners and resellers.

Action required Partners must verify certification counts (up to 200 for Diamond) to maintain tier status.

Source: https://www.crn.com.au/news/google-clouds-new-three-tiering-system-brings-mixed-reviews-from-partners-604313

Every Google product, profiled

Updated 2026-05-25

For each product, what it's for, who picks it, what to watch out for, pros and cons, and what we tell our consulting clients.

consumer entry paid

Google AI Plus

$7.99/mo · $95.88/yr

Monthly billing only; includes basic Workspace integration

Target users

students, casual users, solo professionals

Typical uses

Drafting emails in Gmail
Summarizing long Google Docs
Basic creative writing and brainstorming
Simple data organization in Sheets

Why pick it

Designed for users who want AI integrated into their daily Google Workspace apps without the high cost of flagship models.

Key features

Gemini integration in Gmail and Docs
Access to Gemini Flash models
Standard response speeds
Basic image generation capabilities

⚠ Marketing gimmicks to watch

Model Gating

This tier often excludes the highest-performing 'Ultra' or 'Pro' models, limiting users to faster but less capable 'Flash' versions.

Impact: Expect lower reasoning quality for complex coding or logic tasks compared to the Pro tier.

Workspace Lock-in

The value is heavily tied to using Google's ecosystem; if you use Outlook or Word, the core features are inaccessible.

Impact: Evaluate if you actually use Google Docs/Gmail enough to justify the monthly fee over free alternatives.

Pros

Affordable entry point for integrated AI
Seamless workflow within Google Workspace
No complex setup required

Cons

No annual discount available
Limited to lower-tier models
No extra cloud storage included

Insider view

This is essentially a 'convenience tax' for people who live in Google Docs. If you don't need the sidebar integration, the free version of Gemini often provides similar model performance.

Max bang for buck

Use this if you spend more than 2 hours a day drafting emails or reports in Google Workspace; the time saved on formatting and drafting pays for the $8 quickly.

🔒 Training-on-your-data policy

Consumer data in Workspace may be used to improve services unless specifically opted out in settings. Refer to Google's Privacy Policy.

🔄 Migration path

Upgrade when:

You need higher reasoning capabilities for complex tasks or require more Google One storage.

Downgrade when:

You find yourself using the standalone Gemini web interface more than the Workspace sidebar.

Switch vendor when:

You migrate your primary workflow to Microsoft 365 or require the coding-specific features of Claude.

Scenario	Monthly	Annual	Notes
Standard individual use	$7.99	$95.88	Calculated as 12 months at the standard rate.

consumer flagship

Google AI Pro

$19.99/mo · $199.99/yr

Includes 2TB Google One storage bundle

Target users

power users, creative professionals, developers

Typical uses

Advanced coding and debugging
Complex multi-step reasoning tasks
High-resolution AI image generation
Analyzing large datasets in Sheets

Why pick it

The best balance of model power and ecosystem benefits, including significant cloud storage and flagship model access.

Key features

Access to Gemini 3 Pro and Ultra models
2TB Google One storage included
Priority access during peak times
Advanced Gemini in Workspace features
1M+ token context window support

⚠ Marketing gimmicks to watch

The Storage Bundle Trap

Google bundles 2TB of storage to justify the $20 price point, making it harder to compare directly with ChatGPT Plus.

Impact: If you already pay for storage, this is a deal; if you don't need storage, you're overpaying for the AI component.

Sticker Price vs. Monthly

The $199.99 annual price is marketed as a discount, but requires a full year commitment upfront.

Impact: Saves approximately $40 per year compared to monthly billing, but reduces flexibility to switch vendors.

Pros

Best-in-class context window for long documents
Excellent value if you already need cloud storage
Fastest response times for flagship models

Cons

Expensive if you don't use the 2TB storage
Workspace integration can sometimes be buggy
Rate limits still exist despite 'Pro' branding

Insider view

This is Google's direct answer to ChatGPT Plus. It wins on context window size (handling massive PDFs) but sometimes lags in pure creative writing nuance compared to competitors.

Max bang for buck

Buy the annual plan if you are committed to the Google ecosystem; the $16.67 effective monthly rate is one of the cheapest flagship AI subscriptions available.

🔒 Training-on-your-data policy

Data from Google One AI Premium subscribers is generally not used to train models for other users, but check the specific 'Gemini Advanced' privacy toggle.

🔄 Migration path

Upgrade when:

You are a professional developer or researcher hitting daily usage caps on the Pro model.

Downgrade when:

You realize you aren't using the 2TB storage and only use AI for basic tasks.

Switch vendor when:

You need the specific artifact-handling capabilities of Claude or the custom GPT ecosystem of OpenAI.

Scenario	Monthly	Annual	Notes
Annual commitment	$16.67	$199.99	Paid as a single upfront payment.
Month-to-month flexibility	$19.99	$239.88	Total cost if paid monthly for a full year.

consumer power low

Google AI Ultra 5x

$99.99/mo · $1199.8799999999999/yr

High-usage tier for individuals and small teams

Target users

heavy researchers, independent developers, content agencies

Typical uses

Continuous coding assistance
Batch processing of large document sets
High-frequency image/video generation
Extensive RAG-based research

Why pick it

Designed for users who find the standard Pro limits restrictive and need 5x more capacity for flagship models.

Key features

5x higher rate limits for Gemini Ultra
Priority technical support
Early access to experimental 2M+ context windows
All features of the Pro tier included

⚠ Marketing gimmicks to watch

Soft Limits

Even at 5x, limits are often 'soft' and can result in throttling rather than hard cutoffs.

Impact: Performance may degrade during peak hours even if you haven't hit your mathematical limit.

The 'Ultra' Naming Confusion

Google uses 'Ultra' for both the model name and the tier name, making it unclear if you're paying for a better model or just more access.

Impact: You are primarily paying for *volume*, not a fundamentally different model than what's in the Pro tier.

Pros

Significantly reduces 'rate limit' interruptions
Ideal for power users who 'live' in the chat interface
Fastest access to new model iterations

Cons

Very high price jump from Pro tier
No annual discount offered
Lacks true enterprise-grade team management

Insider view

This tier is for the top 1% of users. If you aren't hitting limits on Pro at least twice a week, this is a waste of money. It's a bridge for those who aren't ready for a full Vertex AI enterprise contract.

Max bang for buck

Only subscribe during months of heavy project work; since there is no annual discount, you lose nothing by toggling this on and off.

🔒 Training-on-your-data policy

Standard consumer power-user terms apply; data is generally protected but refer to the specific Gemini Ultra privacy addendum.

🔄 Migration path

Upgrade when:

Your workflow is consistently interrupted by 'try again later' messages on the Pro tier.

Downgrade when:

Your project ends and your usage drops back to standard levels.

Switch vendor when:

You need to manage multiple seats and require centralized billing (move to Vertex AI or ChatGPT Team).

Scenario	Monthly	Annual	Notes
Year-round heavy usage	$99.99	$1,199.88	No annual discount available for this tier.

consumer power high

Google AI Ultra 20x

$199.99/mo · $2399.88/yr

Maximum capacity consumer tier

Target users

enterprise power users, high volume creators, AI first startups

Typical uses

Near-constant AI interaction
Large-scale content production pipelines
Extensive testing of long-context prompts
Replacing multiple specialized AI tools

Why pick it

The ultimate consumer-facing tier for those who require near-unlimited access to Google's most powerful models without moving to the API.

Key features

20x higher rate limits than Pro
Highest priority in the model queue
Direct feedback channel to product teams
Full Workspace integration with max capacity

⚠ Marketing gimmicks to watch

Diminishing Returns

Most human users cannot physically prompt fast enough to utilize a 20x limit compared to a 5x limit.

Impact: You may be paying for capacity you literally cannot use unless you are automating the UI (which may violate TOS).

Feature Parity

Despite the 10x price increase over Pro, the feature set is nearly identical; you are paying almost exclusively for the quota.

Impact: Check your usage logs before committing to this high monthly spend.

Pros

Virtually eliminates rate-limiting for human users
Best possible latency during global traffic spikes
Access to the absolute cutting edge of Google AI

Cons

Extremely expensive for a consumer account
No team/seat management features
No annual billing option

Insider view

This is a 'whale' tier. It's designed for people whose time is worth so much that a 30-second rate limit lockout costs more than the $200 subscription. For everyone else, the API is more cost-effective.

Max bang for buck

If you are using this much, you should likely be using the API with Context Caching to save 90% on repetitive tasks.

🔒 Training-on-your-data policy

Refer to Google's high-tier consumer privacy policy; generally more restrictive on data usage than entry tiers.

🔄 Migration path

Upgrade when:

You are a 'super-user' whose business relies on the Gemini UI and you hit 5x limits daily.

Downgrade when:

You realize the API is a more efficient way to handle your high volume.

Switch vendor when:

You need enterprise-grade security, SSO, and administrative controls (move to Vertex AI).

Scenario	Monthly	Annual	Notes
Maximum consumer spend	$199.99	$2,399.88	The highest possible cost for a non-enterprise Google AI account.

developer api

Gemini API / Vertex AI

Per-token (see API rates above)

Pay-as-you-go based on token usage; context caching and batch discounts available

Target users

developers, enterprise architects, data scientists

Typical uses

Building custom AI applications
Large-scale data classification
Automated customer support bots
Long-context document analysis (up to 2M tokens)

Why pick it

Provides the most granular control over model behavior, costs, and data privacy for developers and businesses.

Key features

Access to Gemini 3.1 Pro and Flash models
Context Caching (90% discount on input tokens)
Batch API (50% discount for non-real-time tasks)
Provisioned Throughput for guaranteed capacity
Vertex AI enterprise security and IAM

⚠ Marketing gimmicks to watch

Explicit Cache Deletion Trap

Users are billed for the full TTL (Time to Live) of a cache even if it is deleted early.

Impact: To save money, shorten the TTL rather than deleting the cache manually if you want to stop billing early.

The 'Free Tier' Rate Limits

The free API tier often uses your data for training and has extremely low rate limits.

Impact: Always use the 'Pay-as-you-go' tier for production to ensure data privacy and reliable uptime.

Pros

Most cost-effective way to handle massive contexts
Enterprise-grade data privacy (Vertex AI)
Highly flexible pricing with Batch and Caching

Cons

Requires technical expertise to implement
Billing can be unpredictable without caps
Provisioned Throughput requires significant commitments

Insider view

Google's API is currently the 'value leader' for long-context tasks. Their Context Caching is a game-changer for RAG applications, making it 10x cheaper than competitors for repetitive large-scale queries.

Max bang for buck

Use the Batch API for any task that doesn't need an immediate response to save 50% instantly. Enable Context Caching for any prompt that reuses the same 32k+ tokens.

🔒 Training-on-your-data policy

By default, data processed via Vertex AI or the paid Gemini API tier is NOT used to train Google's foundation models. Policy: https://cloud.google.com/vertex-ai/docs/generative-ai/data-governance

🔄 Migration path

Upgrade when:

You need guaranteed latency and throughput for a high-traffic production app (move to Provisioned Throughput).

Downgrade when:

Your usage is so low that the $20/mo Pro subscription's 'unlimited' chat is cheaper than your token bill.

Switch vendor when:

You require specific model behaviors found only in Claude (coding) or GPT-4o (multimodal latency).

Scenario	Monthly	Annual	Notes
Small startup (10M Pro tokens/mo)	$140	$1,680	Assumes 10M input ($20) and 10M output ($120) on Gemini 3.1 Pro.
High-volume Flash user (100M tokens/mo)	$350	$4,200	Assumes 100M input ($50) and 100M output ($300) on Gemini 3 Flash.

All Google products at a glance

Scroll up to the product profile for full detail

Product	Price	Best for	Headline feature	Yearly estimate
Google AI Plus	$10/mo	Basic Workspace AI	Docs/Gmail Integration	$120
Google AI Pro	$20/mo	Individual Flagship Use	2TB Google One Storage	$240
Google AI Ultra 5x	$50/mo	High-volume individuals	5x Usage Limits	$600
Google AI Ultra 20x	$150/mo	Extreme usage/Small teams	20x Usage Limits	$1,800
Gemini API	Usage-based	App Development	2M+ Context Window	Variable

Google vs the field

Same-tier comparison across top 5 vendors

Comparison tier	Anthropic	OpenAI	Google	xAI	Verdict
Consumer Flagship	Claude Pro $20/mo	ChatGPT Plus $20/mo	Google AI Pro $20/mo	Grok Premium+ $16/mo	Google is the only vendor bundling 2TB of cloud storage at this price point.
Developer API (High-IQ)	Claude Opus 4.7 $5.00 / 1M tokens	GPT-5.4 $2.50 / 1M tokens	Gemini 1.5 Pro $1.25 / 1M tokens	Grok 4.20 $2.00 / 1M tokens	Gemini 1.5 Pro offers significantly lower input costs and a larger context window than Opus or GPT-5.4.
Entry-Level Paid	N/A N/A	N/A N/A	Google AI Plus $10/mo	Grok Basic $7/mo	Google and xAI are the only major vendors offering a sub-$20 tier for individual users.

🌳 Which Google product fits you?

3 questions, 1 recommendation

Are you building an application or using a chat interface?

Do you need 2TB of cloud storage and flagship model access?

How many prompts do you send per day?

Recommended

google gemini api

Best for developers who need pay-as-you-go flexibility and the 2M+ token context window.

See full profile ↑

Recommended

google ai plus

The most affordable way to get Gemini integrated directly into your Google Workspace apps.

See full profile ↑

Recommended

google ai pro

The standard flagship experience for individuals, including the 2TB Google One storage benefit.

See full profile ↑

Recommended

google ai ultra 5x + google ai ultra 20x

Recommended for power users who find themselves throttled by standard flagship usage limits.

See full profile ↑

Recommended

google ai ultra 20x

The highest-capacity consumer tier available for users with extreme throughput requirements.

See full profile ↑

How Google pricing has moved

Analysis of price trajectory based on recent changelog events and live rates.

→ Gemini 3.1 Pro — $2 / $12 per MTok now

No notable changes detected for gemini-3-1-pro in the recent changelog window.

→ Gemini 3 Flash — $0.5 / $3 per MTok now

No notable changes detected for gemini-3-flash in the recent changelog window.

→ Gemini 3.1 Flash-Lite — $0.25 / $1.5 per MTok now

No notable changes detected for gemini-3-1-flash-lite in the recent changelog window.

→ Gemini 2.5 Pro — $1.25 / $10 per MTok now

No notable changes detected for gemini-2-5-pro in the recent changelog window.

→ Gemini 2.5 Flash — $0.3 / $2.5 per MTok now

No notable changes detected for gemini-2-5-flash in the recent changelog window.

→ Gemini 2.5 Flash-Lite — $0.1 / $0.4 per MTok now

No notable changes detected for gemini-2-5-flash-lite in the recent changelog window.

Overall: Google is aggressively lowering the cost of intelligence for its flagship Pro models while introducing more granular billing for long-context windows via context caching ($0.20/M tokens).

API or subscription: which is cheaper for you?

Cross-over math at current rates

google-ai-plus ($7.99/mo) vs gemini-3-1-flash-lite API ($0.25/$1.5 per MTok)

Break-even: ~23,674 messages/month (avg ~600 tokens each)

At an average of 450 input and 150 output tokens per message, the API costs $0.0003375 per interaction. You must send over 23,000 messages to justify the $7.99 subscription on cost alone.

👉 API is significantly cheaper for casual users; Subscription is only for those needing the integrated UI/Workspace features.

google-ai-pro ($19.99/mo) vs gemini-3-1-pro API ($2/$12 per MTok)

Break-even: ~7,404 messages/month (avg ~600 tokens each)

With Gemini 3.1 Pro API rates, each 600-token message costs approximately $0.0027. The $19.99 subscription breaks even at roughly 247 messages per day.

👉 Power users chatting daily should choose the subscription; developers or light users will save 80%+ via API.

google-ai-ultra-20x ($199.99/mo) vs gemini-3-1-pro API ($2/$12 per MTok)

Break-even: ~74,070 messages/month (avg ~600 tokens each)

This high-tier subscription targets extreme volume. At $0.0027 per API message, you would need to process 74,000+ messages monthly to make the $199.99 flat fee cheaper than the API.

👉 Only viable for heavy research teams or automated workflows that require the consumer-facing interface over API integration.

Rule of thumb

Google's API pricing for Flash models is aggressively low, making the API the default choice for cost-efficiency unless you require the Gemini UI, 2TB of Drive storage, or Workspace integration.

Current pricing (all production models)

Pricing verified 2026-05-25

Model	Input $/M	Output $/M	Cached $/M	Context
Gemini 3.1 Pro `gemini-3-1-pro`	$2	$12	$0.20	2,000,000
Gemini 3 Flash `gemini-3-flash`	$0.50	$3	$0.050	1,000,000
Gemini 3.1 Flash-Lite `gemini-3-1-flash-lite`	$0.25	$1.5	$0.025	1,000,000
Gemini 2.5 Pro `gemini-2-5-pro`	$1.25	$10	$0.13	2,000,000
Gemini 2.5 Flash `gemini-2-5-flash`	$0.30	$2.5	$0.030	1,000,000
Gemini 2.5 Flash-Lite `gemini-2-5-flash-lite`	$0.10	$0.40	$0.010	1,000,000

Pricing verified as of 2026-05-26. Caching discounts apply to repeated input tokens. Batch pricing typically offers a 50% discount on standard rates.

Full rate breakdown (all variants)

Verified 2026-05-25

Variants beyond standard API: batch (async, 50% off), cached read (0.1x), cache writes (1.25x or 2x base), long-context tier (~2x above threshold).

Gemini 3.1 Pro `gemini-3-1-pro`

High-reasoning flagship for complex multi-modal agentic workflows

Primary useComplex reasoning, multi-step coding agents, and massive context analysis.

Who picks itTeams building production-grade agents requiring deep logic and 2M context.

Vs other Google modelsPriced at $2/$12, it is the premium tier compared to Flash's $0.5/$3, offering superior reasoning depth.

When to useUse for logic-heavy tasks where 2M context is required; step down to Flash for latency-sensitive pipelines.

Equivalents at other vendors

openai

GPT-5.4 Matches the high-reasoning tier with a slightly higher $2.5/$15 price point.

anthropic

Claude Opus 4.7 Competes in reasoning depth but at a significantly higher $5/$25 cost.

xai

Grok 4.20 (non-reasoning) Offers identical $2 input pricing with a more aggressive $6 output rate.

Gemini 3.1 Pro `gemini-3-1-pro`

Variant	Input $/M	Output $/M	Notes
Standard	$2	$12	Default per-token API rate
Batch API	$1	$6	Async batch processing, results within 24 hours, typically 50% off
Cached read	$0.20	$12	Cached prompt input (~0.1x base); output rate unchanged
Long context (>200,000 tokens)	$4	$18	Higher rate applies above 200,000 tokens

Gemini 3 Flash `gemini-3-flash`

Balanced speed and intelligence for high-throughput applications

Primary useReal-time chat, summarization, and document processing at scale.

Who picks itDevelopers scaling applications that need better-than-lite reasoning without pro costs.

Vs other Google modelsAt $0.5/$3, it offers a 4x price reduction from Pro while maintaining a 1M context window.

When to useBest for high-volume tasks needing moderate reasoning; use Pro if logic fails or Lite if cost is the only factor.

Equivalents at other vendors

openai

GPT-5.4 Mini Similar mid-tier positioning but with higher $0.75/$4.5 pricing.

cohere

Command R 08-2024 Matches the $0.5 input rate while offering a cheaper $1.5 output rate.

xai

Grok 4.3 Higher $1.25 input cost but comparable $2.5 output for similar speed tiers.

Gemini 3 Flash `gemini-3-flash`

Variant	Input $/M	Output $/M	Notes
Standard	$0.50	$3	Default per-token API rate
Batch API	$0.25	$1.5	Async batch processing, results within 24 hours, typically 50% off
Cached read	$0.050	$3	Cached prompt input (~0.1x base); output rate unchanged

Gemini 3.1 Flash-Lite `gemini-3-1-flash-lite`

Ultra-low-cost efficiency for high-frequency simple tasks

Primary useHigh-volume classification, basic extraction, and simple chat interactions.

Who picks itStartups and enterprises optimizing for unit economics in simple automation.

Vs other Google modelsAt $0.25/$1.5, it is 50% cheaper than standard Flash for high-frequency, low-complexity calls.

When to useUse for high-velocity, low-logic tasks; switch to Flash if accuracy on nuanced instructions drops.

Equivalents at other vendors

openai

GPT-5.4 Nano Direct price competitor for lightweight tasks at $0.2/$1.25.

deepseek

DeepSeek V3.2 (chat) Similar $0.28 input cost but significantly lower $0.42 output rate.

xai

Grok 4.1 Fast (non-reasoning) Aggressive $0.2/$0.5 pricing targeting the same high-speed, low-cost segment.

Gemini 3.1 Flash-Lite `gemini-3-1-flash-lite`

Variant	Input $/M	Output $/M	Notes
Standard	$0.25	$1.5	Default per-token API rate
Batch API	$0.13	$0.75	Async batch processing, results within 24 hours, typically 50% off
Cached read	$0.025	$1.5	Cached prompt input (~0.1x base); output rate unchanged

Gemini 2.5 Pro `gemini-2-5-pro`

Legacy high-capacity reasoning with massive 2M context window

Primary useLong-document analysis and stable reasoning for existing production pipelines.

Who picks itTeams maintaining 2.5-series integrations who require deep context windows.

Vs other Google modelsPriced at $1.25/$10, it is cheaper than 3.1 Pro ($2/$12) but lacks the latest architectural improvements.

When to useUse for cost-effective long-context reasoning; upgrade to 3.1 Pro for better instruction following.

Equivalents at other vendors

openai

GPT-5 Identical $1.25/$10 pricing for similar reasoning capability.

anthropic

Claude Opus 4.5 Higher reasoning tier but at a much higher $5/$25 price point.

cohere

Command R+ Matches the $10 output price but with a higher $2.5 input cost.

Gemini 2.5 Pro `gemini-2-5-pro`

Variant	Input $/M	Output $/M	Notes
Standard	$1.25	$10	Default per-token API rate
Batch API	$0.63	$5	Async batch processing, results within 24 hours, typically 50% off
Cached read	$0.13	$10	Cached prompt input (~0.1x base); output rate unchanged
Long context (>200,000 tokens)	$2.5	$15	Higher rate applies above 200,000 tokens

Gemini 2.5 Flash `gemini-2-5-flash`

Cost-effective speed for legacy 2.5-series deployments

Primary useFast response generation and medium-complexity data extraction.

Who picks itDevelopers optimizing 2.5-series apps for speed and moderate cost.

Vs other Google modelsAt $0.3/$2.5, it sits between 2.5 Pro and 2.5 Flash-Lite in both price and capability.

When to useUse for low-latency tasks in 2.5-series environments; move to 3 Flash for better performance.

Equivalents at other vendors

deepseek

deepseek-v4-pro Similar $0.435 input price with a much cheaper $0.87 output rate.

Gemini 2.5 Flash `gemini-2-5-flash`

Variant	Input $/M	Output $/M	Notes
Standard	$0.30	$2.5	Default per-token API rate
Batch API	$0.15	$1.25	Async batch processing, results within 24 hours, typically 50% off
Cached read	$0.030	$2.5	Cached prompt input (~0.1x base); output rate unchanged

Gemini 2.5 Flash-Lite `gemini-2-5-flash-lite`

Absolute lowest cost for basic 1M context tasks

Primary useMassive-scale simple data processing and basic routing.

Who picks itCost-sensitive developers running millions of simple, non-reasoning requests.

Vs other Google modelsThe cheapest in the lineup at $0.1/$0.4, offering 1M context at a fraction of Pro's cost.

When to useUse when cost is the primary constraint for simple tasks; move to 3.1 Flash-Lite for better quality.

Equivalents at other vendors

mistral

Mistral Small 4 Identical $0.1 input price with a slightly cheaper $0.3 output rate.

cohere

Command R Similar price tier at $0.15/$0.6 for lightweight, high-volume tasks.

Gemini 2.5 Flash-Lite `gemini-2-5-flash-lite`

Variant	Input $/M	Output $/M	Notes
Standard	$0.10	$0.40	Default per-token API rate
Batch API	$0.050	$0.20	Async batch processing, results within 24 hours, typically 50% off
Cached read	$0.010	$0.40	Cached prompt input (~0.1x base); output rate unchanged

Subscription plans (consumer + business)

Verified 2026-05-25

Plan	Audience	Monthly	Annual	Per seat	What's included
Google AI Plus Google AI	consumer	$7.99	—	—	Priority New Features Limits: storage gb: 200 · notebooklm size: large · family sharing seats: 5 · gemini usage multiplier: 2x · google flow credits monthly: 200 one.google.com ↗
Google AI Pro Google AI	consumer	$19.99	$199.99/yr (≈ $16.67/mo)	—	Youtube Premium: lite · Priority New Features Limits: storage gb: 5000 · notebooklm size: larger · family sharing seats: 5 · context window tokens: 1000000 · gemini usage multiplier: 4x · google flow credits monthly: 1000 one.google.com ↗
Google AI Ultra 5x Google AI	consumer	$99.99	—	—	Early Access · Youtube Premium: individual · Priority New Features Limits: storage gb: 20000 · notebooklm size: largest · family sharing seats: 5 · context window tokens: 1000000 · gemini usage multiplier: 5x_pro · google flow credits monthly: 10000 one.google.com ↗
Google AI Ultra 20x Google AI	consumer	$199.99	—	—	Early Access · Youtube Premium: individual · Priority New Features Limits: storage gb: 30000 · notebooklm size: largest · family sharing seats: 5 · context window tokens: 1000000 · gemini usage multiplier: 20x_pro · google flow credits monthly: 25000 one.google.com ↗

Subscription pricing is separate from per-token API rates above.

What changed in the last 30-90 days

Tracked through 2026-05-25

2026-05-25: Massive expansion of Gemini 3.1 and 3.5 preview models, including Gemini 3.1 Pro, Gemini 3.5 Flash, and specialized Flash-Lite variants. — Developers now have access to a broader range of performance-to-cost ratios for testing next-generation applications.
2026-05-25: Gemini 3.1 Flash Live Preview prices slashed: input dropped 75% to $0.75/M and output dropped 62.5% to $4.50/M. — Real-time audio and text interactions are now significantly more affordable for high-frequency live applications.
2026-05-24: Gemini 2.5 Flash Preview TTS rates doubled across input, output, and batch fields. — Users of the text-to-speech preview will see a 100% increase in operational costs for audio generation.

How buyers think about Google pricing

Updated 2026-05-26

Each scenario below is interactive — tweak the inputs to see how the math changes for your workload.

Cheapest Gemini tier for high-volume tasks

vibe-codersolopreneurdeveloper

The problem: You need to process millions of simple classification or extraction tasks without the bill scaling faster than your revenue. High-performance models are overkill for basic data cleaning or routing.

What to do: Use Gemini 2.5 Flash-Lite for the lowest possible entry point or Gemini 3.1 Flash-Lite for newer architecture at a slight premium.

Processing 10 million input tokens and 2 million output tokens on Gemini 2.5 Flash-Lite costs $1.00 for input (10M x $0.1/M) and $0.80 for output (2M x $0.4/M), totaling $1.80. The same volume on Gemini 3.1 Flash-Lite costs $2.50 for input (10M x $0.25/M) and $3.00 for output (2M x $1.5/M), totaling $5.50 (as of 2026-05-26).

→ Gemini 2.5 Flash-Lite provides a baseline cost of $0.50 per million balanced tokens (as of 2026-05-26).

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

How far the free AI Studio tier actually goes

vibe-codersolopreneurdeveloper

The problem: You want to prototype without entering credit card details or committing to a Google Cloud project. You need to know when the rate limits will force a migration to a paid plan.

What to do: Leverage the AI Studio free tier for development and testing before switching to Vertex AI for production scaling.

While the free tier offers no-cost access, production workloads requiring more than the standard rate limits must transition to pay-as-you-go. For example, moving a small app that uses 500K input tokens and 100K output tokens daily to Gemini 3 Flash would cost $0.25 for input (0.5M x $0.5/M) and $0.30 for output (0.1M x $3/M), totaling $0.55 per day or roughly $16.50 per month (as of 2026-05-26).

→ Prototyping is free, but production-ready stability on Gemini 3 Flash starts at approximately $0.55 per 600K tokens (as of 2026-05-26).

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Using the 1M context window without breaking budget

developersmbenterprise

The problem: Running RAG-heavy workflows or analyzing massive documents can lead to massive input costs if you send the same 1-million-token context with every query.

What to do: Utilize Vertex AI Context Caching to reduce the cost of repetitive input tokens by 90%.

Sending a 1-million-token document to Gemini 3.1 Pro costs $2.00 per request (1M x $2/M). By using context caching, the cost for subsequent requests drops to the cached rate of $0.20 per million tokens (as of 2026-05-26). Over 100 queries, this reduces the input cost from $200 to $21.80, including the initial cache fill.

→ Context caching on Gemini 3.1 Pro reduces repetitive input costs from $2.00 to $0.20 per million tokens (as of 2026-05-26).

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Vertex AI vs AI Studio when does each make sense

it-buyerenterprisedeveloper

The problem: You are choosing between the developer-friendly AI Studio and the enterprise-grade Vertex AI platform. You need to know if the extra features justify the potential complexity.

What to do: Choose Vertex AI for workloads that require Google Cloud Enterprise Discount Programs (EDP) or Provisioned Throughput.

Vertex AI allows you to count Gemini usage toward a Customer Annual Spend Commitment (CASC), which can trigger discounts of 35-40% for standard tiers (as of 2026-05-26). For an enterprise spending $10,000 monthly on Gemini 3.1 Pro tokens, an EDP could potentially reduce that bill to $6,000-$6,500 depending on the negotiated sleeve.

→ Vertex AI is the only path to stackable discounts that can reduce token costs by up to 40% (as of 2026-05-26).

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

When Gemini 3.1 Pro is worth the price over 2.5 Pro

developerit-buyerenterprise

The problem: You need to decide if the intelligence gains in the 3.1 series justify the higher price point compared to the 2.5 series for complex reasoning tasks.

What to do: Use Gemini 2.5 Pro for standard high-intelligence tasks and reserve 3.1 Pro for multi-step reasoning that fails on older models.

Processing 1 million input and 1 million output tokens on Gemini 2.5 Pro costs $11.25 ($1.25 + $10). Upgrading to Gemini 3.1 Pro increases the cost to $14.00 ($2 + $12) per million tokens (as of 2026-05-26). This represents a 24% price increase for the newer architecture.

→ Gemini 3.1 Pro carries a $2.75 premium per million balanced tokens over Gemini 2.5 Pro (as of 2026-05-26).

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Gemini Advanced inside Google Workspace

smbenterprise

The problem: You want to provide AI tools to your employees but are unsure whether to buy individual Gemini Advanced subscriptions or use the Workspace AI add-on.

What to do: Compare the cost of the Workspace AI add-on against standalone API usage for internal productivity tools.

While API pricing for Gemini 3.1 Pro is $2 per million input tokens, Workspace users can access Gemini features via a per-seat monthly add-on. For a team of 50, if each user processes the equivalent of 5 million tokens monthly, the API cost would be $500 for input alone (250M x $2/M). Verify with your Google account manager if the Workspace seat cost is lower than your projected API consumption (as of 2026-05-26).

→ High-volume internal users may find Workspace seat pricing more predictable than variable API token billing (as of 2026-05-26).

Quick calc — adjust for your workload

Model Rate: $0/$0 per M Input tokens/request Output tokens/request Requests/month

Per request: — · Monthly: — · Annual: —

Open full calculator with caching, batch, charts →

Volume discounts & partner programs

Researched 2026-05-26

Heads up — these are community-sourced and analyst-reported terms. Specific credit amounts, discount percentages, and program thresholds change frequently. Always verify current terms directly with Google before relying on a specific number. Treat reported figures as ballpark, not contract language.

Google Cloud Partner Network (2026 Tier Structure)

Threshold: Select ($250k ACV); Premier ($2M ACV); Diamond ($20M ACV)

Typical discount (reported): 8–12% typical reseller margin; up to 15-20% for $5M/3yr commits

Benefits:

Diamond tier requires 200 professional certifications and 20 implemented workloads
Access to Partner Marketing Studio and co-marketing funds
Dedicated Google Account Manager for larger partners
Technical training, workshops, and proof-of-concept (PoC) support
AI-driven automated tracking for tier and competency achievements

How to engage: Apply via Google Cloud Partner Advantage portal; transition window for 2026 program began in Q1 2026

Source: crn.com.aucommunity · cited 2026-01-22

Vertex AI Provisioned Throughput (PT)

Threshold: Measured in Generative AI Scale Units (GSUs); minimums vary by model

Typical discount (reported): Fixed-cost subscription; break-even typically 12-15% of capacity sustained

Benefits:

Guaranteed throughput and predictable latency for Gemini 3 and Gemini 2.0 models
Commitment terms: 1 week, 1 month, 3 months, or 1 year
Ability to schedule PT change orders up to two weeks in advance
Integration with context caching for further cost reductions
Supports first-party (Gemini, Veo 3) and third-party models (Anthropic in private preview)

How to engage: Purchase via Provisioned Throughput dashboard in Vertex AI console

Source: cloud.google.comvendor_official · cited 2026-02-19

Google Cloud Enterprise Discount Program (EDP) / CASC

Threshold: Typically $150k+ for custom pricing; $1M-$3M for standard EDP tiers

Typical discount (reported): 35-40% initial offer; reportedly up to high 80% for $1B+ contracts

Benefits:

Stackable with Committed Use Discounts (CUDs)
Vertex AI consumption counts toward Customer Annual Spend Commitment (CASC)
Negotiable 'Vertex-specific discount sleeves' within broader cloud contracts
Price protection for 3-5 years subject to specific commitments

How to engage: Direct negotiation with Google Cloud sales account teams

Source: magicmag.aianalyst_report · cited 2026-02-18

Azure AI Foundry Provisioned Throughput Reservations

Threshold: Purchased in Provisioned Throughput Units (PTUs)

Typical discount (reported): Up to 70% compared to hourly pay-as-you-go

Benefits:

1-month or 1-year reservation terms
Guaranteed capacity for high-throughput applications
Model-independent quota requests (apply PTUs across diverse model portfolio)
Self-service provisioning via Azure Portal

How to engage: Navigate to Reservations section in Azure Portal

Source: techcommunity.microsoft.comvendor_official · cited 2025-05-19

Amazon Bedrock Provisioned Throughput

Threshold: Minimum 1 model unit (MU) commitment

Typical discount (reported): Varies by term; 6-month commitments offer deepest discounts

Benefits:

1-month or 6-month commitment options
Guaranteed throughput for foundation models (Titan, Anthropic, Meta Llama)
Required for running custom fine-tuned models
No-commitment hourly option available for maximum flexibility

How to engage: Purchase through AWS Bedrock console under 'Provisioned Throughput'

Source: cloudforecast.ioanalyst_report · cited 2024-10-31

Vertex AI Batch Prediction & Caching Discounts

Threshold: Workload-based (non-real-time)

Typical discount (reported): 50% off for Batch API; 90% off for Context Caching

Benefits:

Batch Prediction: 50% discount on on-demand rates for supported models
Context Caching: 90% discount on input tokens for repetitive RAG contexts
Reduces 'bill-shock' for high-volume summarization or classification tasks

How to engage: Enable via Vertex AI API parameters (e.g., setting 'caching' or using Batch Prediction jobs)

Source: cloudzero.comanalyst_report · cited 2026-05-04

Multi-cloud availability

Researched 2026-05-26

Cloud-marketplace terms change frequently. Model availability dates, pricing parity, and regional features can drift week to week. Verify with each cloud's pricing page (AWS Bedrock, Google Vertex, Azure AI Foundry) before architecting around specifics.

Cloud	Model availability	Price vs vendor-direct	Reasons to pick
AWS Bedrock	Gemma 3 (4B, 12B, 27B) and Gemma 4 (via Hugging Face collection)	varies by deployment (serverless pay-as-you-go or marketplace subscription)	Multi-model flexibility through a unified API Deep integration with AWS infrastructure (S3, IAM, CloudTrail) Enterprise-grade security and compliance (HIPAA, FedRAMP) Access to 100+ serverless models from multiple providers aws.amazon.com ↗
Azure AI Foundry	Gemma 4 (variants including E2B, E4B, 26B A4B, 31B)	reportedly based on managed compute (VM/GPU hourly rates) or serverless pay-as-you-go	Seamless integration with Microsoft 365 and Azure ecosystem Unified platform for building, testing, and managing AI applications Support for multi-model strategies within a single control plane Enterprise-segment offerings with robust governance and safety tools techcommunity.microsoft.com ↗
Together.ai	Gemma 3 27B	reportedly $0.06 per 1M input tokens and $0.12 per 1M output tokens	High-performance optimizations (reportedly 3.5x faster inference) Competitive pricing with a 50% discount for Batch API workloads Pure consumption model with no setup fees or minimum commitments Wide selection of 100+ open-source models computeprices.com ↗
Anyscale	Gemma 7B (via LiteLLM/Anyscale endpoints)	approximately $0.15 per 1M input and $0.15 per 1M output tokens	Multi-cloud execution across AWS, GCP, and Azure Distributed computing powered by the Ray framework Reported cloud cost reductions of up to 50% through optimized GPU utilization Unified Python programming model for the entire ML lifecycle litellm.ai ↗
Google Vertex AI (Vendor-Direct)	Gemini 3.1 Pro, Gemini 3.5 Flash, Gemini 2.5 Pro/Flash, Gemma 4	baseline (e.g., Gemini 3.1 Pro at $2.00/$12.00 per 1M tokens for context <= 200K)	Native access to the full Gemini model family Industry-leading context windows (up to 2 million tokens) Native grounding with Google Search and BigQuery integration Comprehensive MLOps tooling (Pipelines, Experiments, Model Monitoring) cloud.google.com ↗

Free credits & startup programs

Researched 2026-05-26

Program details and credit amounts shift often. Apply directly through each program's official page for current values, eligibility windows, and application requirements.

Google for Startups Cloud Program - AI Track

Reported value: up to $350,000 in credits over 2 years

Eligibility: AI-first startups from Seed to Series A (Series A must be raised within the last 12 months); founded within the last 10 years; not received more than $5,000 in previous Google Cloud credits.

How to apply: Apply through the Google for Startups website or via an approved partner (VC, accelerator, or incubator).

Apply / learn more at cloud.google.com ↗

Google for Startups Cloud Program - Scale Tier

Reported value: up to $200,000 in credits over 2 years

Eligibility: Startups with verified equity funding from pre-seed to Series A (Series A raised within last 12 months); founded within the last 10 years; not received more than $5,000 in previous Google Cloud credits.

How to apply: Submit application on the Google for Startups Cloud Program page; requires verification of institutional funding.

Apply / learn more at cloud.google.com ↗

Google for Startups Cloud Program - Start Tier

Reported value: $2,000 in credits for 12 months

Eligibility: Technology startups not yet funded by an institutional investor; founded within the last 5 years; not received previous credits beyond the free trial.

How to apply: Apply directly on the Google for Startups website.

Apply / learn more at cloud.google.com ↗

Google Cloud Research Credits (Faculty & Postdocs)

Reported value: up to $5,000 in credits

Eligibility: Faculty and postdoctoral researchers at higher education institutions in eligible countries; requires a research proposal and cost estimate.

How to apply: Submit an online application form including a research proposal and Google Cloud billing account details.

Apply / learn more at edu.google.com ↗

Google Cloud Research Credits (PhD Students)

Reported value: up to $1,000 in credits per year

Eligibility: PhD students conducting research at educational institutions; must be used for the described project and not personal use.

How to apply: Apply via the Google for Education research credits application form; can apply once per year.

Apply / learn more at edu.google.com ↗

NVIDIA Inception & Google Cloud Collaboration

Reported value: up to $350,000 in Google Cloud credits

Eligibility: Qualified members of the NVIDIA Inception program focused on AI.

How to apply: Members of NVIDIA Inception can access an accelerated path to the Google for Startups Cloud Program through the NVIDIA member portal.

Apply / learn more at nvidianews.nvidia.com ↗

Y Combinator Summer Grants 2026

Reported value: $90,000 in compute credits (shared across AWS, Azure, and GCP)

Eligibility: Technical college students building AI or technical projects full-time in San Francisco during Summer 2026.

How to apply: Apply via the Y Combinator Summer Grants application page; rolling admissions.

Apply / learn more at ycombinator.com ↗

Google for Startups Accelerator: AI First

Reported value: equity-free support and free Cloud TPU access

Eligibility: Seed to Series A AI-first startups based in North America; must commit CTO or technical leads to program sessions.

How to apply: Apply for specific cohorts via the Google for Startups Accelerator program page.

Apply / learn more at startup.google.com ↗

Pricing gotchas to watch

Researched 2026-05-26

Most gotchas below were surfaced by community reports. Some may have been fixed, changed, or never been the user-facing issue they appeared. Verify against current vendor docs before architecting around a workaround.

Explicit Cache Deletion Billing Trap

When using explicit context caching, users are reportedly billed for the originally specified Time to Live (TTL) duration even if the cache is manually deleted before it expires. For example, a cache created with a 1-hour TTL that is deleted after 15 minutes still incurs the full 1-hour storage charge.

Workaround: Instead of deleting the cache, update the TTL to a shorter duration (e.g., 30 minutes) to adjust the billing period before the cache naturally expires.

Source: developers.google.comvendor_docs · cited 2026-05-26

Implicit Caching Sparse-Traffic Loss

Implicit caching (automatically enabled for Gemini 2.5+) offers no cost-saving guarantee. It is described as an ephemeral optimization layer where data is retained only for a 'defined short retention period' or session lifetime. Sparse traffic patterns often result in cache misses, forcing production users to pay full price for repeated input tokens.

Workaround: For production workloads requiring guaranteed savings, use explicit caching which allows manual TTL management, provided the prompt meets the minimum threshold (typically 1,024 to 4,096 tokens depending on the model).

Source: blog.googleblog_post · cited 2026-05-26

Surprise 'Ghost' Charges from Idle Endpoints

Vertex AI online prediction endpoints charge hourly fees (starting at approximately $0.75 per node-hour) even when idle. Production users have reported 'ghost' charges reaching hundreds of dollars because undeploying a model is reportedly insufficient; the endpoint resource itself must be deleted to stop GPU/compute allocation billing.

Workaround: Implement automated scripts to delete idle endpoints rather than just undeploying models, and use Vertex AI Batch API for non-real-time tasks to avoid 24/7 endpoint costs.

Source: cloud.google.comvendor_docs · cited 2026-05-26

Multimodal Token Counting Discrepancies

Gemini 2.0 image tokenization uses a tiling logic where images larger than 384px are scaled into 768x768 tiles, each costing 258 tokens. However, production users report surprises where a single high-resolution image (e.g., 1920x1080) can result in over 1,800 tokens, significantly higher than a simple 4-tile calculation would suggest.

Workaround: Pre-process and downscale images to 384x384 pixels before sending them to the API to ensure they stay within the minimum 258-token billing tier.

Source: developers.google.comvendor_docs · cited 2026-05-26

Regional Pricing and Language Variance

Vertex AI pricing for Gemini models reportedly varies by region, with non-US regions typically costing 2% to 5% more than us-central1. Additionally, because Vertex AI bills per character rather than per token, Japanese-language deployments can be approximately 3x cheaper per token than English due to higher information density per character.

Workaround: Calculate costs based on character counts for the specific target language rather than token estimates to avoid budget overruns in Latin-script languages.

Source: cloud.google.comvendor_docs · cited 2026-05-26

Legacy Key Auto-Upgrade Billing Risk

A major security-related pricing gotcha involves legacy Google Maps API keys. If these keys reside in a project where Gemini is enabled, they are reportedly 'silently' upgraded to allow Gemini API access. Attackers exploiting these publicly exposed keys have generated unauthorized bills ranging from $10,000 to over $180,000 in a few days.

Workaround: Explicitly restrict all API keys to specific services (e.g., only Maps) and set project-level spend caps, as budget alerts do not stop usage.

Source: trufflesecurity.comblog_post · cited 2026-05-26

Hidden costs (25-40% beyond per-token rates)

Updated 2026-05-26

Idle Vertex AI endpoints charge approximately $0.75 per node-hour even when no requests are processed.
Multimodal tiling for high-resolution images can inflate token counts to 1,800+ tokens per image.
Regional pricing variance adds 2-5% to the base rate for deployments outside of us-central1.
Explicit context caching bills for the full TTL duration even if the cache is deleted early.
Retry overhead from rate limits or network timeouts adds 5-15% to effective monthly costs.
Japanese and other high-density languages can be 3x cheaper per token due to character-based billing.
Unauthorized usage from unrestricted legacy API keys can lead to massive surprise bills.

Typical overhead: 25-40% beyond raw per-token rates.

What it costs to leave Google

Updated 2026-05-26

Migrating away from Google involves moving out of the Vertex AI ecosystem and potentially losing access to the 2-million-token context window. While Gemma models offer an open-weight path for portability, the native grounding integrations with Google Search and BigQuery create significant functional lock-in.

small project (1-5 prompts): 1-2 engineer-days
mid-size (10-50 prompts): 1-3 engineer-weeks
large agentic system: 2-4 engineer-months

Who is this for?

Refreshed 2026-05-26

For vibe coders & solo devs

For rapid prototyping, AI Studio is your best friend because it bypasses the complexity of Google Cloud project setup. You should start with Gemini 3.1 Flash-Lite to keep costs extremely low while benefiting from the latest architecture. If you hit rate limits, moving to Vertex AI is straightforward but requires managing service accounts. Focus on using the free tier for dev and only pay for production traffic.

* Use AI Studio for zero-config API keys.
* Start with Gemini 3.1 Flash-Lite for $0.25/M input tokens.
* Monitor your usage to avoid the 'Legacy Key' billing trap.
* Switch to Vertex AI only when you need enterprise scaling or logging.

For SMBs and growing teams

Small businesses should look at the 50% discount offered by the Vertex AI Batch API for non-real-time tasks like content generation or report summarization. If you have repetitive tasks, context caching is mandatory to avoid paying for the same data twice. Consider the Google for Startups Cloud Program if you are eligible, as it can provide up to $200,000 in credits. This can effectively eliminate your AI spend for the first two years.

* Apply for the Google for Startups Scale Tier for $200,000 in credits.
* Use Batch API to save 50% on high-volume summarization.
* Enable context caching for any prompt over 1,024 tokens.
* Set strict project-level spend caps to prevent budget overruns.

For enterprise buyers

Enterprises should leverage the Enterprise Discount Program (EDP) to ensure Vertex AI spend counts toward broader cloud commitments. If you require guaranteed latency, Provisioned Throughput (PT) is the preferred route, though it requires a commitment of at least one week. For global deployments, be aware that regional pricing can increase costs by 2-5% outside of the US. Use Diamond tier partner benefits if your ACV exceeds $20M for maximum support.

* Negotiate 'Vertex-specific discount sleeves' within your EDP.
* Use Provisioned Throughput for mission-critical, high-traffic apps.
* Deploy in us-central1 to avoid the 2-5% regional price premium.
* Delete idle endpoints entirely to avoid the $0.75 per node-hour 'ghost' charge.

Need help deciding which Google tier or model fits your workload? Book a $19.99 quick call →

Sources verified for this page

Primary: Google pricing page

View all 25 cited insider sources across 16 domains

Google Cloud Partner Network (2026 Tier Structure) (community, verified 2026-01-22)
Vertex AI Provisioned Throughput (PT) (vendor_official, verified 2026-02-19)
Google Cloud Enterprise Discount Program (EDP) / CASC (analyst_report, verified 2026-02-18)
Azure AI Foundry Provisioned Throughput Reservations (vendor_official, verified 2025-05-19)
Amazon Bedrock Provisioned Throughput (analyst_report, verified 2024-10-31)
Vertex AI Batch Prediction & Caching Discounts (analyst_report, verified 2026-05-04)
Explicit Cache Deletion Billing Trap (vendor_docs, verified 2026-05-26)
Implicit Caching Sparse-Traffic Loss (blog_post, verified 2026-05-26)
Surprise 'Ghost' Charges from Idle Endpoints (vendor_docs, verified 2026-05-26)
Multimodal Token Counting Discrepancies (vendor_docs, verified 2026-05-26)
Regional Pricing and Language Variance (vendor_docs, verified 2026-05-26)
Legacy Key Auto-Upgrade Billing Risk (blog_post, verified 2026-05-26)
AWS Bedrock (grounded_research, verified 2026-05-26)
Azure AI Foundry (grounded_research, verified 2026-05-26)
Together.ai (grounded_research, verified 2026-05-26)
Anyscale (grounded_research, verified 2026-05-26)
Google Vertex AI (Vendor-Direct) (grounded_research, verified 2026-05-26)
Google for Startups Cloud Program - AI Track (grounded_research, verified 2026-05-26)
Google for Startups Cloud Program - Scale Tier (grounded_research, verified 2026-05-26)
Google for Startups Cloud Program - Start Tier (grounded_research, verified 2026-05-26)
Google Cloud Research Credits (Faculty & Postdocs) (grounded_research, verified 2026-05-26)
Google Cloud Research Credits (PhD Students) (grounded_research, verified 2026-05-26)
NVIDIA Inception & Google Cloud Collaboration (grounded_research, verified 2026-05-26)
Y Combinator Summer Grants 2026 (grounded_research, verified 2026-05-26)
Google for Startups Accelerator: AI First (grounded_research, verified 2026-05-26)

Generator: gen-v5.0.8-2026-05-25 · Last refreshed: Mon May 25 2026 21:00:26 GMT-0400 (Eastern Daylight Time) · Pricing snapshot: Mon May 25 2026 00:00:00 GMT-0400 (Eastern Daylight Time)

Vendor / Model	Field	Why it’s inferred
Anthropic — Claude Sonnet 4.6	`cachedInput`	Derived at 10% of input rate — Anthropic publishes 90% cache-hit discount on this tier.
Anthropic — Claude Sonnet 4.5	`cachedInput`	Derived at 10% of input rate; same 90% cache-hit convention as Sonnet 4.6.
Anthropic — Claude Sonnet 4.5	`batchInput`	Derived at 50% of standard input — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Sonnet 4.5	`batchOutput`	Derived at 50% of standard output — Anthropic documents uniform 50% Batch discount.
Anthropic — Claude Haiku 4.5	`cachedInput`	Derived at 10% of input rate — Anthropic 90% cache-hit discount convention.
OpenAI — GPT-5.4 Mini	`cachedInput`	Derived at 10% of input — OpenAI documents automatic 90% discount on cache hits across GPT-5.x tier.
OpenAI — GPT-5.4 Nano	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Nano	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Nano	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`cachedInput`	Derived at 10% of input — OpenAI 90% cache-hit convention.
OpenAI — GPT-5.4 Pro	`batchInput`	Derived at 50% of input — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.4 Pro	`batchOutput`	Derived at 50% of output — OpenAI Batch API uniform 50% discount.
OpenAI — GPT-5.2	`cachedInput`	Derived at 10% of input; no residency uplift.
OpenAI — GPT-5.2	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.5 Pro	`cachedInput`	Derived at 10% of input — OpenAI does not publish a cached rate for *-pro models; using the family convention.
OpenAI — GPT-5.5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.2 Pro	`cachedInput`	Derived at 10% of input — pro-tier convention.
OpenAI — GPT-5.2 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.2 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5.1	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5.1	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Pro	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Pro	`batchOutput`	Derived at 50% of output.
OpenAI — GPT-5 Nano	`cachedInput`	Derived at 10% of input.
OpenAI — GPT-5 Nano	`batchInput`	Derived at 50% of input.
OpenAI — GPT-5 Nano	`batchOutput`	Derived at 50% of output.
Google — Gemini 3 Flash	`cachedInput`	Derived at 10% of input — Google caching discount convention ~90%.
Google — Gemini 3.1 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 3.1 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 3.1 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Pro	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash	`cachedInput`	Derived at 10% of input.
Google — Gemini 2.5 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.5 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.5 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`cachedInput`	Derived at 25% of input per Google 2.0 family caching rates.
Google — Gemini 2.0 Flash	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`cachedInput`	Derived at 10% of input — Google caching convention.
Google — Gemini 2.0 Flash-Lite	`batchInput`	Derived at 50% of input — Google Batch API uniform 50% discount.
Google — Gemini 2.0 Flash-Lite	`batchOutput`	Derived at 50% of output — Google Batch API uniform 50% discount.
xAI — Grok 4 (legacy)	`cachedInput`	Extrapolated at 25% of base.

Google pricing, complete breakdown

How Google's pricing universe works

API (per-token, metered)

Consumer subscriptions (Plus, Pro, Ultra tiers)

Business/Team plans

Enterprise (custom contract)

Cloud marketplaces (Google Vertex AI)

⭐ Most popular Google products by user type

🎁 Current promos and time-sensitive deals

📅 What changed in the last 30 days

⏳ Major upcoming changes

Every Google product, profiled

Google AI Plus

Google AI Pro

Google AI Ultra 5x

Google AI Ultra 20x

Gemini API / Vertex AI

All Google products at a glance

Google vs the field

🌳 Which Google product fits you?

How Google pricing has moved

API or subscription: which is cheaper for you?

🧮 Estimate your annual Google cost

Current pricing (all production models)

Full rate breakdown (all variants)

Gemini 3.1 Pro gemini-3-1-pro

Gemini 3.1 Pro gemini-3-1-pro

Gemini 3 Flash gemini-3-flash

Gemini 3 Flash gemini-3-flash

Gemini 3.1 Flash-Lite gemini-3-1-flash-lite

Gemini 3.1 Flash-Lite gemini-3-1-flash-lite

Gemini 2.5 Pro gemini-2-5-pro

Gemini 2.5 Pro gemini-2-5-pro

Gemini 2.5 Flash gemini-2-5-flash

Gemini 2.5 Flash gemini-2-5-flash

Gemini 2.5 Flash-Lite gemini-2-5-flash-lite

Gemini 2.5 Flash-Lite gemini-2-5-flash-lite

Subscription plans (consumer + business)

What changed in the last 30-90 days

How buyers think about Google pricing

Cheapest Gemini tier for high-volume tasks

Quick calc — adjust for your workload

How far the free AI Studio tier actually goes

Quick calc — adjust for your workload

Using the 1M context window without breaking budget

Quick calc — adjust for your workload

Vertex AI vs AI Studio when does each make sense

Quick calc — adjust for your workload

When Gemini 3.1 Pro is worth the price over 2.5 Pro

Quick calc — adjust for your workload

Gemini Advanced inside Google Workspace

Quick calc — adjust for your workload

Volume discounts & partner programs

Google Cloud Partner Network (2026 Tier Structure)

Vertex AI Provisioned Throughput (PT)

Google Cloud Enterprise Discount Program (EDP) / CASC

Azure AI Foundry Provisioned Throughput Reservations

Amazon Bedrock Provisioned Throughput

Vertex AI Batch Prediction & Caching Discounts

Multi-cloud availability

Free credits & startup programs

Google for Startups Cloud Program - AI Track

Google for Startups Cloud Program - Scale Tier

Google for Startups Cloud Program - Start Tier

Google Cloud Research Credits (Faculty & Postdocs)

Google Cloud Research Credits (PhD Students)

NVIDIA Inception & Google Cloud Collaboration

Y Combinator Summer Grants 2026

Google for Startups Accelerator: AI First

Pricing gotchas to watch

Explicit Cache Deletion Billing Trap

Implicit Caching Sparse-Traffic Loss

Surprise 'Ghost' Charges from Idle Endpoints

Multimodal Token Counting Discrepancies

Regional Pricing and Language Variance

Legacy Key Auto-Upgrade Billing Risk

Hidden costs (25-40% beyond per-token rates)

What it costs to leave Google

Who is this for?

For vibe coders & solo devs

Gemini 3.1 Pro `gemini-3-1-pro`

Gemini 3.1 Pro `gemini-3-1-pro`

Gemini 3 Flash `gemini-3-flash`

Gemini 3 Flash `gemini-3-flash`

Gemini 3.1 Flash-Lite `gemini-3-1-flash-lite`

Gemini 3.1 Flash-Lite `gemini-3-1-flash-lite`

Gemini 2.5 Pro `gemini-2-5-pro`

Gemini 2.5 Pro `gemini-2-5-pro`

Gemini 2.5 Flash `gemini-2-5-flash`

Gemini 2.5 Flash `gemini-2-5-flash`

Gemini 2.5 Flash-Lite `gemini-2-5-flash-lite`

Gemini 2.5 Flash-Lite `gemini-2-5-flash-lite`