Skip to content
AImpact
IT EN
Medium Foundation Models · 1 min read

UL2: Google unifies pretraining paradigms with Mixture-of-Denoisers

In one sentence Google Research combines three major pretraining objectives into a single 20B model, outperforming GPT-3 on many benchmarks at one-eighth the parameters.

Needs review Official source
ShareLinkedInX
Reading level

Training a language model is like choosing a study method: some learn by rereading full text (completion), some cover parts of sentences and try to guess them (fill-in-the-blank), and some read the start of a text and write the ending. Each method builds different skills.

Until 2022, most models used just one of these methods. Google with UL2 asked: why not use all three together?

The result is a 20 billion parameter model that outperforms GPT-3 on many tests — even though GPT-3 has 175 billion parameters. A much smaller machine that learned to do different things within the same training run. And Google released it publicly on HuggingFace for anyone to use and modify.

Companies

Google

Tools

Tags

UL2mixture of denoiserspretrainingopen sourceGoogle

Sources