Skip to content
AImpact
IT EN
Training Advanced Also known as: MoD · Mixed Denoising Objectives

Mixture of Denoisers

A pretraining strategy (UL2, Google 2022) that trains a single model on multiple denoising objectives simultaneously: left-to-right language modeling, span prediction (BERT-style masked spans of varying lengths and corruptions), and prefix language modeling. Unifies the strengths of GPT-style and T5-style pretraining. The model learns when to use each mode based on a sentinel token that signals the objective type.

ShareLinkedInX

In practice

A researcher wanting a flexible model for both completion and question answering can use UL2 or a Flan-UL2 checkpoint without choosing between encoder-decoder (T5) and decoder-only (GPT) architectures. In practice, the sentinel token `[S2S]`, `[NLU]`, or `[NLG]` must be prepended to the prompt to activate the correct mode — a detail that significantly impacts performance and is often omitted, causing poor results.

Related terms

Seen in the wild

102 entries mentioning it
  1. Realtime voice AI: sub-second latency and multilingual become the norm
    Medium
  2. MCP at 18 months: the server ecosystem hits critical mass
    High
  3. Robotics foundation model: a new step toward the "GPT of manipulation"
    High
  4. Mistral Small 4: three models (reasoning + vision + coding) fused into one open weight
    High
  5. Nano Banana 2: Google rebuilds its viral image model around consistency and text
    Medium
  6. Gemini 3 Pro and Flash: Google relaunches the frontier challenge
    High
  7. MCP ecosystem 2025: Inspector, UI, registry, and cross-vendor adoption
    High
  8. Claude Haiku 4.5: the small model that matches May's Sonnet 4
    Medium
  9. Runway Gen-4: AI video with consistent characters across multiple scenes
    High
  10. Cline: the open-source VS Code coding agent that splits Plan and Act
    Medium
  11. Apollo Research: frontier models 'scheme' in evals — paper published
    High
  12. Local AI 2025: Ollama, MLX LM, Apple Foundation Models triple the speed
    Medium
  13. GPT-5: OpenAI merges fast and reasoning models into an automatic router
    Landmark
  14. Cursor Agent and Background Agents: from autocomplete to cloud coding agent
    High
  15. Ollama 1.0: first stable release with multimodal, tool calling, and Windows GA
    High
  16. Ollama native vision model support: local VLMs with a one-liner
    Medium
  17. Kimi VL Thinking (Moonshot AI): first open visual model with RL-trained chain-of-thought reasoning
    High
  18. CrossFormer: a single transformer for 20+ robot embodiments with rigorous scaling analysis
    High
  19. Model Cards 2.0: industry convergence on standardized AI safety reports
    Medium
  20. Llama 4: Meta moves to MoE and native multimodal, but the community is unimpressed
    High
← All terms