RETRO: DeepMind foreshadows RAG with retrieval over 2 trillion tokens

In one sentence DeepMind publishes RETRO, a 7B-parameter model that retrieves relevant passages from a 2T-token database at inference, matching the performance of models 25x larger.

Verified Official source

ShareLinkedIn X

DeepMind proposes an alternative to "brute scaling": instead of stuffing all knowledge into the model's parameters, leave it outside in a database and have the model look it up when needed.

The model is RETRO (Retrieval-Enhanced Transformer). Only 7 billion parameters, but at inference time it pulls relevant snippets from a database of 2 trillion tokens. Result: performance comparable to models 25x larger.

It's the first serious academic work on retrieval augmentation of LLMs. The idea will become mainstream as RAG (Retrieval-Augmented Generation) a couple of years later and today sits inside basically every enterprise system.