Skip to content
AImpact
IT EN
Models Intermediate Also known as: Mixture of Experts · Miscela di esperti

MoE

/em-oh-ee/

An architecture where the model is split into many specialized sub-models ('experts') and only a small share of them is activated for each token.

ShareLinkedInX

In practice

It enables models with hundreds of billions of parameters but inference cost closer to a much smaller one. Mixtral, DeepSeek, and GPT-4 use it. For API users nothing changes, but it explains surprising quality-to-price ratios.

Related terms

Seen in the wild

18 entries mentioning it
  1. DeepSeek V4 Preview: 1.6T parameters, 1M context, open weight in two sizes
    Landmark
  2. DeepSeek R2: the Chinese lab relaunches its open-weight reasoning model
    High
  3. Llama 4 Scout: 109B multimodal MoE with 10M context and vision SOTA
    High
  4. Qwen 3: Alibaba ships an open-weight family from 0.6B to 235B with native thinking
    High
  5. Llama 4: Meta moves to MoE and native multimodal, but the community is unimpressed
    High
  6. DeepSeek-V3-0324: the quiet update that puts vendor lock-in on notice
    Medium
  7. DeepSeek-V3: GPT-4o Quality at $0.55/M Tokens via MLA and FP8 Pipeline
    High
  8. DeepSeek-V3: China releases a shockingly cheap open frontier model
    Landmark
  9. DeepSeek-Coder-V2: GPT-4 Turbo coding quality with open weights
    High
  10. DeepSeek-V2: Multi-head Latent Attention and the first highly efficient Chinese open MoE
    High
  11. Mixtral 8x22B: Mistral's Apache 2.0 MoE with 39B active parameters
    High
  12. Snowflake Arctic: 480B total / 17B active MoE, enterprise SQL SOTA
    Medium
  13. DBRX: Databricks's 132B-total / 36B-active open MoE
    Medium
  14. Gemini 1.5 Pro: 1 million tokens in context
    High
  15. Mixtral 8x7B: open-source Mixture of Experts that beats GPT-3.5
    Landmark
  16. Grok-1: xAI's chatbot with real-time access to X data
    Medium
  17. Wu Dao 2.0: China announces a 1.75T-parameter model
    Medium
  18. Switch Transformer: Google scales to 1.6T parameters with Mixture of Experts
    High
← All terms