Should we just use Gemma (or any open-weight) and skip the frontier models?
No. Use open-weights as a router lane for low-stakes ops only.
- Gemma 4 31B accuracy drops ~25 pp on Golden Eval financial reasoning vs GPT-5 / DeepSeek V4 Pro.
- Third-party hosting clears at ~$0.14 in / $0.40 out — saves under $200 / month at our scale.
- Wrong question to optimize · token cost is < 0.1 % of one bad call to the desk.