Chinchilla: the big models were undertrained

In one sentence DeepMind publishes the Chinchilla paper and shows that, given equal compute, smaller models trained on far more tokens beat oversized undertrained ones.

Verified Official source

ShareLinkedIn X

Until 2022 the race was: "let's build the largest possible model". DeepMind says: you're doing it wrong.

Their paper shows that with a fixed compute budget, you're better off making a slightly smaller model and feeding it many more tokens to read. To prove it they train Chinchilla (70 billion parameters), and it beats Gopher (280 billion) and GPT-3 (175 billion).

It's a quiet revolution: it rewrites the rules every lab will use to decide how big the next models should be. Llama, GPT-4 and the rest all carry this lesson forward.