Auto Just Got Cheaper — Gemini 3.1 Flash-Lite Is The New Default, Gemini 3.5 Flash Joins The Picker
May 20, 2026
Your Auto runs cost roughly half what they did yesterday. Google's new Gemini 3.1 Flash-Lite has taken over as the default model for Auto and the in-conversation background jobs (intent, Q&A, summarisation), at half the per-token cost of the Gemini 3 Flash Preview it replaces — same 1M context, same tools, same multimodal input. Alongside it, Gemini 3.5 Flash lands as a new Workhorse-tier pick with reasoning on by default, ready for harder agentic loops where you want frontier Flash quality at a Workhorse band cost.
What you can do
- Pin Gemini 3.1 Flash-Lite from the model picker — 1M-token context, multimodal input, tool calling, Auto's default, tier:
budget. Reasoning is off by default for speed; flip it on witheffortif you want it. - Pin Gemini 3.5 Flash for the harder jobs — 1M-token context, multimodal, tool calling, reasoning on at
mediumeffort, tier:workhorse. Same band as Sonnet 4.6, Kimi K2.6, and Grok 4.3. - Use Auto and pay less by default — every Auto-routed message that would have hit
gemini-3-flash-previewnow hitsgemini-3.1-flash-liteat half the cost, with the same 1M context and tool support. - Run Swarm with either model — both 3.1 Flash-Lite and 3.5 Flash are swarm-eligible. 3.1 Flash-Lite is also a worker, so it can carry tool-heavy steps inside multi-agent runs.
- Keep background tasks cheap — intent classification, in-chat Q&A, and conversation summarisation all moved off
gemini-2.5-flash-liteontogemini-3.1-flash-lite, so the silent infrastructure that powers chat titles, memory, and tool routing got a quality + cost upgrade in one step.
Where this shows up
You were on Auto, running a long research session, and watching your token budget tick down faster than you'd like. From your next message on, the same Auto routing decisions land you on Gemini 3.1 Flash-Lite at half the per-token cost — same context window, same tool support, just cheaper.
You were holding a hard agentic loop back for Sonnet 4.6 because Flash-tier models weren't quite holding their reasoning across many turns. Pin Gemini 3.5 Flash instead — Workhorse band, reasoning on, 1M context, and the same Swarm + worker eligibility as the heavier reasoners.
You'd noticed background bits (chat title generation, "what did this conversation say" recaps, intent detection) felt a little dated. They were on Gemini 2.5 Flash-Lite. They're now on Gemini 3.1 Flash-Lite — newer model, same budget tier, no price change for those background calls.
Try it
- "Run on Auto: read this 800k-token research dump and pull every market-sizing claim with its source line."
- "Pin Gemini 3.5 Flash. Plan and execute a full refactor of this module across files, keeping your reasoning visible across each step."
- "Swarm three models including Gemini 3.5 Flash on this brief and synthesize the strongest take."
Heads up
- Gemini 3 Flash Preview is retired from the picker. It's marked
Superseded by Gemini 3.1 Flash-Lite. Sessions previously pinned to it auto-roll-forward to 3.1 Flash-Lite on the next message — you don't need to do anything. - Reasoning behaviour differs between the two. Gemini 3.1 Flash-Lite ships with reasoning off by default (it's the budget speed-focused tier). Gemini 3.5 Flash ships with reasoning on at medium effort. If you want Flash-Lite to think harder on a one-off, pin it explicitly and toggle the reasoning effort.
- No plan price changed. This is a model-routing + per-token cost improvement, not a billing change. Your subscription, credit packs, and token budgets are unaffected — they just stretch further on Auto.
- 3.5 Flash costs more per token than 3.1 Flash-Lite. It sits in the Workhorse band on purpose — you reach for it when you need the reasoning, not for default Auto traffic.