Reading level
Until 2022 the race was: "let's build the largest possible model". DeepMind says: you're doing it wrong.
Their paper shows that with a fixed compute budget, you're better off making a slightly smaller model and feeding it many more tokens to read. To prove it they train Chinchilla (70 billion parameters), and it beats Gopher (280 billion) and GPT-3 (175 billion).
It's a quiet revolution: it rewrites the rules every lab will use to decide how big the next models should be. Llama, GPT-4 and the rest all carry this lesson forward.
Companies
DeepMind
Tools
Chinchilla
Tags
DeepMindChinchillaScaling LawsCompute-OptimalResearch
Sources