Live catalog · 4,816 models · 167 providers · synced 1h ago

AI model pricing

Prices via
Excel
For users sending requests 11k req / mo · 23.1m tokens
Presets
Manage
Tier
Hosting
License
Origin
Sort
Google
Gemini 2.5 Flash Lite
$4.29
/ mo
$0.429 / user · $51.48 / yr
multimodalvoicevideo1M+ contextlightweight
$0.1 in $0.4 out 1.0M ctx
via Models.dev
DeepSeek
DeepSeek V4 Flash
cheap & good
$4.16
/ mo
$0.416 / user · $49.90 / yr
1M+ contextlightweightbudget
Best for DeepSeek cheap-and-good — bulk inference

High-throughput pipelines, dataset labeling, internal RAG at scale. Among the cheapest non-trivial models.

Skip when Latency-sensitive; tasks needing the V4 Pro reasoning ceiling.
Self-hostable 16B params
~40 GB VRAM @ FP16 · ~124 GB CPU RAM
H20 ~40 users $1,296/mo
H100 SXM ~40 users $2,520/mo
H200 ~40 users $3,240/mo
B200 ~40 users $5,400/mo
Concurrent chat users · 60s avg turn cycle
$0.14 in $0.28 out 1.0M ctx
via Models.dev
Google
Gemma 4 31B
cheap & good
$4.95
/ mo
$0.495 / user · $59.40 / yr
multimodallong-contextbudget
Best for Google open-weights — self-hostable Gemma 4 (latest)

On-prem / air-gapped deployments, sovereign-data scenarios, low-latency private inference. Strong instruction-following at 31B dense; hosted prices shown reflect third-party providers (OpenRouter, Together).

Skip when Multimodal video tasks (use Gemini); complex agentic chains.
Self-hostable 31B params
~75 GB VRAM @ FP16 · ~177 GB CPU RAM
H20 ~20 users $1,296/mo
H100 SXM ~20 users $2,520/mo
H200 ~20 users $3,240/mo
B200 ~20 users $5,400/mo
Concurrent chat users · 60s avg turn cycle
$0.14 in $0.4 out 256K ctx
via Models.dev
Google
Gemini 3.1 Flash Lite Preview
cheap & good
$14.02
/ mo
$1.40 / user · $168.30 / yr
multimodalvoicevideo1M+ contextlightweight
Best for Newest Gemini cheap-and-fast — multimodal at scale

OCR, image labeling, video summarization at high QPS. Latest Flash-Lite generation; pennies per million tokens.

Skip when Hard reasoning; precise structured output for critical workflows.
Closed weights · API only
$0.25 in $1.50 out 1.0M ctx
via Models.dev
OpenAI
GPT-5 Mini
cheap & good
$17.32
/ mo
$1.73 / user · $207.90 / yr
multimodallong-contextlightweightmid-tier
Best for OpenAI cheap-and-good — high-volume classification, summaries

Free-tier traffic, batch summarization, quick lookups, lightweight tool calling. 20–100× cheaper than Pro.

Skip when Multi-hop reasoning, code generation, anything ambiguous.
Closed weights · API only
$0.25 in $2.00 out 400K ctx
via Models.dev
Vercel AI Gateway
MiniMax M2
balanced
$12.04
/ mo
$1.20 / user · $144.54 / yr
long-contextflagshiplightweightbudget
Best for MiniMax — long context (1M), strong CN+EN, voice-adjacent

Million-token context with cheap pricing, voice/audio adjacent products, Chinese-first content workflows.

Skip when Hard reasoning at the absolute frontier; pure English creative ceiling.
Self-hostable 230B params · 10B active MoE
~552 GB VRAM @ FP16 · ~892 GB CPU RAM
H20 ~384 users $7,776/mo
H200 ~256 users $12,960/mo
B200 ~192 users $16,200/mo
H100 SXM ~448 users $17,640/mo
Concurrent chat users · 60s avg turn cycle
$0.27 in $1.15 out 262K ctx
via Models.dev
Vercel AI Gateway
GLM 4.5
cheap & good
$24.42
/ mo
$2.44 / user · $293.04 / yr
mid-tier
Best for Z.ai cheap-and-good — bilingual at low cost

Bilingual chatbots, content generation in Chinese markets, GLM ecosystem. Strong CN performance per dollar.

Skip when English-first tasks at quality ceiling; complex multimodal.
Self-hostable 9B params
~26 GB VRAM @ FP16 · ~103 GB CPU RAM
H20 ~71 users $1,296/mo
H100 SXM ~71 users $2,520/mo
H200 ~71 users $3,240/mo
B200 ~71 users $5,400/mo
Concurrent chat users · 60s avg turn cycle
$0.6 in $2.20 out 131K ctx
via Models.dev
Moonshot AI
Kimi K2.6
flagship
$42.07
/ mo
$4.21 / user · $504.90 / yr
multimodalvideolong-contextmid-tier
Best for Kimi flagship — long-context analysis, document deep-dives

Million-token analysis windows, multi-doc synthesis, agentic loops over large inputs. Strong CN-EN bilingual.

Skip when Short-context conversational use where smaller models win on cost.
Self-hostable 1,000B params · 32B active MoE
~2,400 GB VRAM @ FP16 · ~3,664 GB CPU RAM
25× H20 ~500 users $32,400/mo
18× H200 ~360 users $58,320/mo
13× B200 ~260 users $70,200/mo
30× H100 SXM ~600 users $75,600/mo
Concurrent chat users · 60s avg turn cycle
$0.95 in $4.00 out 262K ctx
via Models.dev
QiHang
Claude Haiku 4.5
cheap & good
$7.00
/ mo
$0.7 / user · $83.95 / yr
multimodallong-contextlightweightbudget
Best for Claude cheap-and-good — fast classification, light agents

Routing layer, intent detection, short summaries, real-time UX. Fast and cheap with Claude's instruction-following.

Skip when Anything Sonnet handles substantively better.
$0.14 in $0.71 out 200K ctx
via Models.dev
Vercel AI Gateway
GLM 5.1
flagship
$52.14
/ mo
$5.21 / user · $625.68 / yr
multimodallong-contextmid-tier
Best for Z.ai flagship — bilingual reasoning, vision

Chinese-first products, GLM ecosystem integration, vision tasks via GLM-V variants. Solid mid-tier alternative.

Skip when Pure English flagship work; tasks where Western frontier models are validated.
Self-hostable 355B params · 32B active MoE
~852 GB VRAM @ FP16 · ~1,342 GB CPU RAM
H20 ~180 users $11,664/mo
H200 ~140 users $22,680/mo
B200 ~100 users $27,000/mo
11× H100 SXM ~220 users $27,720/mo
Concurrent chat users · 60s avg turn cycle
$1.40 in $4.40 out 202K ctx
via Models.dev
Vercel AI Gateway
Qwen3 Max
flagship
$59.40
/ mo
$5.94 / user · $712.80 / yr
long-contextflagshipmid-tier
Best for Qwen flagship — multilingual, function calling, agents

Multilingual products (CN/JP/KR/EN), function-call orchestration, long-context (256K) reasoning, agent frameworks.

Skip when Pure English creative writing; tasks where Claude or GPT are battle-tested.
Self-hostable 235B params · 22B active MoE
~564 GB VRAM @ FP16 · ~910 GB CPU RAM
H20 ~174 users $7,776/mo
H200 ~116 users $12,960/mo
B200 ~87 users $16,200/mo
H100 SXM ~232 users $20,160/mo
Concurrent chat users · 60s avg turn cycle
$1.20 in $6.00 out 262K ctx
via Models.dev
DeepSeek
DeepSeek V4 Pro
flagship
$51.68
/ mo
$5.17 / user · $620.14 / yr
1M+ contextflagshipmid-tier
Best for DeepSeek flagship — math, code, strong open-weights option

Self-hostable in-house, math-heavy reasoning, code synthesis. Excellent quality / cost ratio for non-US deployments.

Skip when Need top-tier creative writing or English-nuance tasks; locked into US compliance.
Self-hostable 671B params · 37B active MoE
~1,611 GB VRAM @ FP16 · ~2,481 GB CPU RAM
17× H20 ~294 users $22,032/mo
12× H200 ~207 users $38,880/mo
B200 ~155 users $48,600/mo
21× H100 SXM ~363 users $52,920/mo
Concurrent chat users · 60s avg turn cycle
$1.74 in $3.48 out 1.0M ctx
via Models.dev
Google
Gemini 3.1 Pro Preview
flagship
$112.20
/ mo
$11.22 / user · $1,346.40 / yr
multimodalvoicevideo1M+ contextflagship
Best for Google's newest flagship — multimodal, 1M+ context, native tools

Video understanding, document QA over hundreds of pages, image-heavy reasoning. Latest Gemini generation, picks up where 3.0 Pro left off.

Skip when Pure text reasoning where Claude or GPT have stronger track records.
Closed weights · API only
$2.00 in $12.00 out 1.0M ctx
via Models.dev
QiHang
Gemini 3 Pro Preview
balanced
$32.04
/ mo
$3.20 / user · $384.52 / yr
multimodalvoicevideo1M+ contextflagship
Best for Established Gemini flagship — multimodal, 1M+ context

Production multimodal pipelines that have already validated against this generation. Same price as 3.1 Pro, slightly lower ceiling.

Skip when Already on 3.1 Pro — switch up.
$0.57 in $3.43 out 1.0M ctx
via Models.dev
Venice AI
Claude Sonnet 4.6
balanced
$178.20
/ mo
$17.82 / user · $2,138.40 / yr
multimodal1M+ contextmid-tier
Best for Claude default — coding agents, RAG, structured analysis

Production coding agents (Cursor / Claude Code style), document QA, structured extraction. Best $/quality in this tier.

Skip when Trivial chat where Haiku is fine; the absolute hardest reasoning.
$3.60 in $18.00 out 1.0M ctx
via Models.dev
OpenAI
GPT-5.5
balanced
$280.50
/ mo
$28.05 / user · $3,366 / yr
multimodal1M+ contextpremium
Best for OpenAI workhorse — RAG, general chat, structured output

Default GPT for production traffic. Strong at instruction-following and JSON / tool calls without flagship pricing.

Skip when Long-form deep reasoning that justifies Pro markup; latency-critical edge cases.
Closed weights · API only
$5.00 in $30.00 out 1.1M ctx
via Models.dev
Venice AI
Claude Opus 4.7
flagship
$297
/ mo
$29.70 / user · $3,564 / yr
multimodal1M+ contextflagshippremium
Best for Claude's newest flagship — long-form writing, careful reasoning

Document analysis at 200K context, nuanced writing, code review, hardest problems where care over speed. Latest Opus generation; 3× cheaper than 4.1 at the same quality tier.

Skip when Latency-sensitive APIs; volume traffic where Sonnet would suffice.
$6.00 in $30.00 out 1.0M ctx
via Models.dev
OpenAI
GPT-5.5 Pro
flagship
$1,683
/ mo
$168.30 / user · $20,196 / yr
multimodal1M+ contextflagshippremium
Best for OpenAI's top model — hardest reasoning, agentic planning

Multi-step agents, complex code generation, ambiguous reasoning where mistakes are costly. Worth the price ceiling.

Skip when Routine chat, classification, summarization at scale.
Closed weights · API only
$30.00 in $180 out 1.1M ctx
via Models.dev
18 models · all loaded