Gemini 3.1 Pro: Google's first '0.1' bump and the ARC-AGI-2 leap

In one sentence Google releases Gemini 3.1 Pro: 77.1% on ARC-AGI-2 (more than double Gemini 3 Pro), 80.6% SWE-Bench Verified, 94.3% GPQA Diamond. Same price as 3 Pro: $2/M input.

Verified Official source

ShareLinkedIn X

Gemini 3 Pro had only just shipped in January 2026. Five weeks later Google releases a point-version bump — the first "0.1" in Gemini history — and makes a notable leap on one specific benchmark: ARC-AGI-2.

ARC-AGI-2 is a test that throws logic patterns the model has never seen during training (they're generated specifically for that). It measures whether the model is genuinely reasoning or just recognizing things it's seen. Gemini 3 Pro scored 31.1%, Gemini 3.1 Pro scores 77.1%: more than double.

Translation: for tasks that require "figuring out a new rule" — odd debugging, non-standard math problems, never-seen-before design systems — 3.1 Pro works much better. On SWE-Bench Verified (coding) it scores 80.6%, on GPQA Diamond (PhD-level science) 94.3%. Price unchanged at $2 per million input tokens.