Training Advanced Also known as: MoD · Mixed Denoising Objectives

Mixture of Denoisers

A pretraining strategy (UL2, Google 2022) that trains a single model on multiple denoising objectives simultaneously: left-to-right language modeling, span prediction (BERT-style masked spans of varying lengths and corruptions), and prefix language modeling. Unifies the strengths of GPT-style and T5-style pretraining. The model learns when to use each mode based on a sentinel token that signals the objective type.

ShareLinkedIn X

In practice

A researcher wanting a flexible model for both completion and question answering can use UL2 or a Flan-UL2 checkpoint without choosing between encoder-decoder (T5) and decoder-only (GPT) architectures. In practice, the sentinel token `[S2S]`, `[NLU]`, or `[NLG]` must be prepended to the prompt to activate the correct mode — a detail that significantly impacts performance and is often omitted, causing poor results.

Seen in the wild

112 entries mentioning it

← All terms

In practice

Related terms

Seen in the wild