AI model pricing
High-throughput pipelines, dataset labeling, internal RAG at scale. Among the cheapest non-trivial models.
On-prem / air-gapped deployments, sovereign-data scenarios, low-latency private inference. Strong instruction-following at 31B dense; hosted prices shown reflect third-party providers (OpenRouter, Together).
OCR, image labeling, video summarization at high QPS. Latest Flash-Lite generation; pennies per million tokens.
Free-tier traffic, batch summarization, quick lookups, lightweight tool calling. 20–100× cheaper than Pro.
Million-token context with cheap pricing, voice/audio adjacent products, Chinese-first content workflows.
Bilingual chatbots, content generation in Chinese markets, GLM ecosystem. Strong CN performance per dollar.
Million-token analysis windows, multi-doc synthesis, agentic loops over large inputs. Strong CN-EN bilingual.
Routing layer, intent detection, short summaries, real-time UX. Fast and cheap with Claude's instruction-following.
Chinese-first products, GLM ecosystem integration, vision tasks via GLM-V variants. Solid mid-tier alternative.
Multilingual products (CN/JP/KR/EN), function-call orchestration, long-context (256K) reasoning, agent frameworks.
Self-hostable in-house, math-heavy reasoning, code synthesis. Excellent quality / cost ratio for non-US deployments.
Video understanding, document QA over hundreds of pages, image-heavy reasoning. Latest Gemini generation, picks up where 3.0 Pro left off.
Production multimodal pipelines that have already validated against this generation. Same price as 3.1 Pro, slightly lower ceiling.
Production coding agents (Cursor / Claude Code style), document QA, structured extraction. Best $/quality in this tier.
Default GPT for production traffic. Strong at instruction-following and JSON / tool calls without flagship pricing.
Document analysis at 200K context, nuanced writing, code review, hardest problems where care over speed. Latest Opus generation; 3× cheaper than 4.1 at the same quality tier.
Multi-step agents, complex code generation, ambiguous reasoning where mistakes are costly. Worth the price ceiling.