Skip to content
AImpact
IT EN
High Foundation Models · 1 min read

Gemini 3.1 Pro: Google's first '0.1' bump and the ARC-AGI-2 leap

In one sentence Google releases Gemini 3.1 Pro: 77.1% on ARC-AGI-2 (more than double Gemini 3 Pro), 80.6% SWE-Bench Verified, 94.3% GPQA Diamond. Same price as 3 Pro: $2/M input.

Verified Official source
ShareLinkedInX
Reading level

Gemini 3 Pro had only just shipped in January 2026. Five weeks later Google releases a point-version bump — the first "0.1" in Gemini history — and makes a notable leap on one specific benchmark: ARC-AGI-2.

ARC-AGI-2 is a test that throws logic patterns the model has never seen during training (they're generated specifically for that). It measures whether the model is genuinely reasoning or just recognizing things it's seen. Gemini 3 Pro scored 31.1%, Gemini 3.1 Pro scores 77.1%: more than double.

Translation: for tasks that require "figuring out a new rule" — odd debugging, non-standard math problems, never-seen-before design systems — 3.1 Pro works much better. On SWE-Bench Verified (coding) it scores 80.6%, on GPQA Diamond (PhD-level science) 94.3%. Price unchanged at $2 per million input tokens.

Companies

Google, Google DeepMind

Tools

Gemini 3.1 Pro

Tags

GoogleDeepMindGeminiGemini 3.1 ProReasoningARC-AGI

Sources