ModelMeter — AI Model Pricing

Google

Gemini 2.5 Flash Lite

$4.29

/ mo

$0.429 / user · $51.48 / yr

multimodalvoicevideo1M+ contextlightweight

via Models.dev

DeepSeek

DeepSeek V4 Flash

cheap & good

$4.16

/ mo

$0.416 / user · $49.90 / yr

1M+ contextlightweightbudget

Best for DeepSeek cheap-and-good — bulk inference

High-throughput pipelines, dataset labeling, internal RAG at scale. Among the cheapest non-trivial models.

Skip when Latency-sensitive; tasks needing the V4 Pro reasoning ceiling.

Self-hostable 16B params

~40 GB VRAM @ FP16 · ~124 GB CPU RAM

1× H20 ~40 users $1,296/mo

1× H100 SXM ~40 users $2,520/mo

1× H200 ~40 users $3,240/mo

1× B200 ~40 users $5,400/mo

Concurrent chat users · 60s avg turn cycle

via Models.dev

Google

Gemma 4 31B

cheap & good

$4.95

/ mo

$0.495 / user · $59.40 / yr

multimodallong-contextbudget

Best for Google open-weights — self-hostable Gemma 4 (latest)

On-prem / air-gapped deployments, sovereign-data scenarios, low-latency private inference. Strong instruction-following at 31B dense; hosted prices shown reflect third-party providers (OpenRouter, Together).

Skip when Multimodal video tasks (use Gemini); complex agentic chains.

Self-hostable 31B params

~75 GB VRAM @ FP16 · ~177 GB CPU RAM

1× H20 ~20 users $1,296/mo

1× H100 SXM ~20 users $2,520/mo

1× H200 ~20 users $3,240/mo

1× B200 ~20 users $5,400/mo

Concurrent chat users · 60s avg turn cycle

via Models.dev

Google

Gemini 3.1 Flash Lite Preview

cheap & good

$14.02

/ mo

$1.40 / user · $168.30 / yr

multimodalvoicevideo1M+ contextlightweight

Best for Newest Gemini cheap-and-fast — multimodal at scale

OCR, image labeling, video summarization at high QPS. Latest Flash-Lite generation; pennies per million tokens.

Skip when Hard reasoning; precise structured output for critical workflows.

Closed weights · API only

via Models.dev

OpenAI

GPT-5 Mini

cheap & good

$17.32

/ mo

$1.73 / user · $207.90 / yr

multimodallong-contextlightweightmid-tier

Best for OpenAI cheap-and-good — high-volume classification, summaries

Free-tier traffic, batch summarization, quick lookups, lightweight tool calling. 20–100× cheaper than Pro.

Skip when Multi-hop reasoning, code generation, anything ambiguous.

Closed weights · API only

via Models.dev

Vercel AI Gateway

MiniMax M2

balanced

$12.04

/ mo

$1.20 / user · $144.54 / yr

long-contextflagshiplightweightbudget

Best for MiniMax — long context (1M), strong CN+EN, voice-adjacent

Million-token context with cheap pricing, voice/audio adjacent products, Chinese-first content workflows.

Skip when Hard reasoning at the absolute frontier; pure English creative ceiling.

Self-hostable 230B params · 10B active MoE

~552 GB VRAM @ FP16 · ~892 GB CPU RAM

6× H20 ~384 users $7,776/mo

4× H200 ~256 users $12,960/mo

3× B200 ~192 users $16,200/mo

7× H100 SXM ~448 users $17,640/mo

Concurrent chat users · 60s avg turn cycle

via Models.dev

Vercel AI Gateway

GLM 4.5

cheap & good

$24.42

/ mo

$2.44 / user · $293.04 / yr

mid-tier

Best for Z.ai cheap-and-good — bilingual at low cost

Bilingual chatbots, content generation in Chinese markets, GLM ecosystem. Strong CN performance per dollar.

Skip when English-first tasks at quality ceiling; complex multimodal.

Self-hostable 9B params

~26 GB VRAM @ FP16 · ~103 GB CPU RAM

1× H20 ~71 users $1,296/mo

1× H100 SXM ~71 users $2,520/mo

1× H200 ~71 users $3,240/mo

1× B200 ~71 users $5,400/mo

Concurrent chat users · 60s avg turn cycle

via Models.dev

Moonshot AI

Kimi K2.6

flagship

$42.07

/ mo

$4.21 / user · $504.90 / yr

multimodalvideolong-contextmid-tier

Best for Kimi flagship — long-context analysis, document deep-dives

Million-token analysis windows, multi-doc synthesis, agentic loops over large inputs. Strong CN-EN bilingual.

Skip when Short-context conversational use where smaller models win on cost.

Self-hostable 1,000B params · 32B active MoE

~2,400 GB VRAM @ FP16 · ~3,664 GB CPU RAM

25× H20 ~500 users $32,400/mo

18× H200 ~360 users $58,320/mo

13× B200 ~260 users $70,200/mo

30× H100 SXM ~600 users $75,600/mo

Concurrent chat users · 60s avg turn cycle

via Models.dev

QiHang

Claude Haiku 4.5

cheap & good

$7.00

/ mo

$0.7 / user · $83.95 / yr

multimodallong-contextlightweightbudget

Best for Claude cheap-and-good — fast classification, light agents

Routing layer, intent detection, short summaries, real-time UX. Fast and cheap with Claude's instruction-following.

Skip when Anything Sonnet handles substantively better.

via Models.dev

Vercel AI Gateway

GLM 5.1

flagship

$52.14

/ mo

$5.21 / user · $625.68 / yr

multimodallong-contextmid-tier

Best for Z.ai flagship — bilingual reasoning, vision

Chinese-first products, GLM ecosystem integration, vision tasks via GLM-V variants. Solid mid-tier alternative.

Skip when Pure English flagship work; tasks where Western frontier models are validated.

Self-hostable 355B params · 32B active MoE

~852 GB VRAM @ FP16 · ~1,342 GB CPU RAM

9× H20 ~180 users $11,664/mo

7× H200 ~140 users $22,680/mo

5× B200 ~100 users $27,000/mo

11× H100 SXM ~220 users $27,720/mo

Concurrent chat users · 60s avg turn cycle

via Models.dev

Vercel AI Gateway

Qwen3 Max

flagship

$59.40

/ mo

$5.94 / user · $712.80 / yr

long-contextflagshipmid-tier

Best for Qwen flagship — multilingual, function calling, agents

Multilingual products (CN/JP/KR/EN), function-call orchestration, long-context (256K) reasoning, agent frameworks.

Skip when Pure English creative writing; tasks where Claude or GPT are battle-tested.

Self-hostable 235B params · 22B active MoE

~564 GB VRAM @ FP16 · ~910 GB CPU RAM

6× H20 ~174 users $7,776/mo

4× H200 ~116 users $12,960/mo

3× B200 ~87 users $16,200/mo

8× H100 SXM ~232 users $20,160/mo

Concurrent chat users · 60s avg turn cycle

via Models.dev

DeepSeek

DeepSeek V4 Pro

flagship

$51.68

/ mo

$5.17 / user · $620.14 / yr

1M+ contextflagshipmid-tier

Best for DeepSeek flagship — math, code, strong open-weights option

Self-hostable in-house, math-heavy reasoning, code synthesis. Excellent quality / cost ratio for non-US deployments.

Skip when Need top-tier creative writing or English-nuance tasks; locked into US compliance.

Self-hostable 671B params · 37B active MoE

~1,611 GB VRAM @ FP16 · ~2,481 GB CPU RAM

17× H20 ~294 users $22,032/mo

12× H200 ~207 users $38,880/mo

9× B200 ~155 users $48,600/mo

21× H100 SXM ~363 users $52,920/mo

Concurrent chat users · 60s avg turn cycle

via Models.dev

Google

Gemini 3.1 Pro Preview

flagship

$112.20

/ mo

$11.22 / user · $1,346.40 / yr

multimodalvoicevideo1M+ contextflagship

Best for Google's newest flagship — multimodal, 1M+ context, native tools

Video understanding, document QA over hundreds of pages, image-heavy reasoning. Latest Gemini generation, picks up where 3.0 Pro left off.

Skip when Pure text reasoning where Claude or GPT have stronger track records.

Closed weights · API only

via Models.dev

QiHang

Gemini 3 Pro Preview

balanced

$32.04

/ mo

$3.20 / user · $384.52 / yr

multimodalvoicevideo1M+ contextflagship

Best for Established Gemini flagship — multimodal, 1M+ context

Production multimodal pipelines that have already validated against this generation. Same price as 3.1 Pro, slightly lower ceiling.

Skip when Already on 3.1 Pro — switch up.

via Models.dev

Venice AI

Claude Sonnet 4.6

balanced

$178.20

/ mo

$17.82 / user · $2,138.40 / yr

multimodal1M+ contextmid-tier

Best for Claude default — coding agents, RAG, structured analysis

Production coding agents (Cursor / Claude Code style), document QA, structured extraction. Best $/quality in this tier.

Skip when Trivial chat where Haiku is fine; the absolute hardest reasoning.

via Models.dev

OpenAI

GPT-5.5

balanced

$280.50

/ mo

$28.05 / user · $3,366 / yr

multimodal1M+ contextpremium

Best for OpenAI workhorse — RAG, general chat, structured output

Default GPT for production traffic. Strong at instruction-following and JSON / tool calls without flagship pricing.

Skip when Long-form deep reasoning that justifies Pro markup; latency-critical edge cases.

Closed weights · API only

via Models.dev

Venice AI

Claude Opus 4.7

flagship

$297

/ mo

$29.70 / user · $3,564 / yr

multimodal1M+ contextflagshippremium

Best for Claude's newest flagship — long-form writing, careful reasoning

Document analysis at 200K context, nuanced writing, code review, hardest problems where care over speed. Latest Opus generation; 3× cheaper than 4.1 at the same quality tier.

Skip when Latency-sensitive APIs; volume traffic where Sonnet would suffice.

via Models.dev

OpenAI

GPT-5.5 Pro

flagship

$1,683

/ mo

$168.30 / user · $20,196 / yr

multimodal1M+ contextflagshippremium

Best for OpenAI's top model — hardest reasoning, agentic planning

Multi-step agents, complex code generation, ambiguous reasoning where mistakes are costly. Worth the price ceiling.

Skip when Routine chat, classification, summarization at scale.

Closed weights · API only

via Models.dev

AI model pricing