Skip to content

Auto Just Got Cheaper — Gemini 3.1 Flash-Lite Is The New Default, Gemini 3.5 Flash Joins The Picker

May 20, 2026

Your Auto runs cost roughly half what they did yesterday. Google's new Gemini 3.1 Flash-Lite has taken over as the default model for Auto and the in-conversation background jobs (intent, Q&A, summarisation), at half the per-token cost of the Gemini 3 Flash Preview it replaces — same 1M context, same tools, same multimodal input. Alongside it, Gemini 3.5 Flash lands as a new Workhorse-tier pick with reasoning on by default, ready for harder agentic loops where you want frontier Flash quality at a Workhorse band cost.

What you can do

  • Pin Gemini 3.1 Flash-Lite from the model picker — 1M-token context, multimodal input, tool calling, Auto's default, tier: budget. Reasoning is off by default for speed; flip it on with effort if you want it.
  • Pin Gemini 3.5 Flash for the harder jobs — 1M-token context, multimodal, tool calling, reasoning on at medium effort, tier: workhorse. Same band as Sonnet 4.6, Kimi K2.6, and Grok 4.3.
  • Use Auto and pay less by default — every Auto-routed message that would have hit gemini-3-flash-preview now hits gemini-3.1-flash-lite at half the cost, with the same 1M context and tool support.
  • Run Swarm with either model — both 3.1 Flash-Lite and 3.5 Flash are swarm-eligible. 3.1 Flash-Lite is also a worker, so it can carry tool-heavy steps inside multi-agent runs.
  • Keep background tasks cheap — intent classification, in-chat Q&A, and conversation summarisation all moved off gemini-2.5-flash-lite onto gemini-3.1-flash-lite, so the silent infrastructure that powers chat titles, memory, and tool routing got a quality + cost upgrade in one step.

Where this shows up

You were on Auto, running a long research session, and watching your token budget tick down faster than you'd like. From your next message on, the same Auto routing decisions land you on Gemini 3.1 Flash-Lite at half the per-token cost — same context window, same tool support, just cheaper.

You were holding a hard agentic loop back for Sonnet 4.6 because Flash-tier models weren't quite holding their reasoning across many turns. Pin Gemini 3.5 Flash instead — Workhorse band, reasoning on, 1M context, and the same Swarm + worker eligibility as the heavier reasoners.

You'd noticed background bits (chat title generation, "what did this conversation say" recaps, intent detection) felt a little dated. They were on Gemini 2.5 Flash-Lite. They're now on Gemini 3.1 Flash-Lite — newer model, same budget tier, no price change for those background calls.

Try it

  • "Run on Auto: read this 800k-token research dump and pull every market-sizing claim with its source line."
  • "Pin Gemini 3.5 Flash. Plan and execute a full refactor of this module across files, keeping your reasoning visible across each step."
  • "Swarm three models including Gemini 3.5 Flash on this brief and synthesize the strongest take."

Heads up

  • Gemini 3 Flash Preview is retired from the picker. It's marked Superseded by Gemini 3.1 Flash-Lite. Sessions previously pinned to it auto-roll-forward to 3.1 Flash-Lite on the next message — you don't need to do anything.
  • Reasoning behaviour differs between the two. Gemini 3.1 Flash-Lite ships with reasoning off by default (it's the budget speed-focused tier). Gemini 3.5 Flash ships with reasoning on at medium effort. If you want Flash-Lite to think harder on a one-off, pin it explicitly and toggle the reasoning effort.
  • No plan price changed. This is a model-routing + per-token cost improvement, not a billing change. Your subscription, credit packs, and token budgets are unaffected — they just stretch further on Auto.
  • 3.5 Flash costs more per token than 3.1 Flash-Lite. It sits in the Workhorse band on purpose — you reach for it when you need the reasoning, not for default Auto traffic.

Built for the Alfrada platform.