UL2: Google unifies pretraining paradigms with Mixture-of-Denoisers

In one sentence Google Research combines three major pretraining objectives into a single 20B model, outperforming GPT-3 on many benchmarks at one-eighth the parameters.

Needs review Official source

ShareLinkedIn X

Training a language model is like choosing a study method: some learn by rereading full text (completion), some cover parts of sentences and try to guess them (fill-in-the-blank), and some read the start of a text and write the ending. Each method builds different skills.

Until 2022, most models used just one of these methods. Google with UL2 asked: why not use all three together?

The result is a 20 billion parameter model that outperforms GPT-3 on many tests — even though GPT-3 has 175 billion parameters. A much smaller machine that learned to do different things within the same training run. And Google released it publicly on HuggingFace for anyone to use and modify.