s1: 1000 examples and a prompt trick to replicate a reasoning model

In one sentence Stanford/UW paper: with 1000 curated examples and a technique called 'budget forcing' they fine-tune Qwen2.5-32B to compete with o1-preview on math. Training cost: <$50.

Needs review Official source

ShareLinkedIn X

A Stanford and University of Washington team publishes s1, a reasoning model that matches o1-preview on some math benchmarks — but the point isn't the model: it's how they got it.

They took an existing model (Qwen2.5-32B-Instruct), 1000 high-quality reasoning examples (s1K, distilled from Gemini Thinking), a 26-minute supervised fine-tune on 16 H100s. Cloud cost: under $50.

Plus a trick: "budget forcing". To make the model think longer, they suppress the end-of-thinking token and inject the word "Wait". The model self-corrects and keeps reasoning. Strong evidence that much of "reasoning" is already inside base models and just needs to be elicited.