Models Intermediate Also known as: Mixture of Experts · Miscela di esperti

MoE

/em-oh-ee/

An architecture where the model is split into many specialized sub-models ('experts') and only a small share of them is activated for each token.

ShareLinkedIn X

In practice

It enables models with hundreds of billions of parameters but inference cost closer to a much smaller one. Mixtral, DeepSeek, and GPT-4 use it. For API users nothing changes, but it explains surprising quality-to-price ratios.

Related terms

LLM Inference compute

Seen in the wild

20 entries mentioning it

June 13, 2026

Alibaba releases Qwen 3.5: open-weight models from 7B to 235B MoE with 128K context

High
June 10, 2026

Meta releases Llama 4.1: Scout, Maverick, and Behemoth MoE models under Apache 2.0

Landmark
April 24, 2026

DeepSeek V4 Preview: 1.6T parameters, 1M context, open weight in two sizes

Landmark
January 28, 2026

DeepSeek R2: the Chinese lab relaunches its open-weight reasoning model

High
May 28, 2025

Llama 4 Scout: 109B multimodal MoE with 10M context and vision SOTA

High
April 29, 2025

Qwen 3: Alibaba ships an open-weight family from 0.6B to 235B with native thinking

High
April 5, 2025

Llama 4: Meta moves to MoE and native multimodal, but the community is unimpressed

High
March 24, 2025

DeepSeek-V3-0324: the quiet update that puts vendor lock-in on notice

Medium
January 10, 2025

DeepSeek-V3: GPT-4o Quality at $0.55/M Tokens via MLA and FP8 Pipeline

High
December 26, 2024

DeepSeek-V3: China releases a shockingly cheap open frontier model

Landmark
May 28, 2024

DeepSeek-Coder-V2: GPT-4 Turbo coding quality with open weights

High
May 6, 2024

DeepSeek-V2: Multi-head Latent Attention and the first highly efficient Chinese open MoE

High
April 17, 2024

Mixtral 8x22B: Mistral's Apache 2.0 MoE with 39B active parameters

High
April 14, 2024

Snowflake Arctic: 480B total / 17B active MoE, enterprise SQL SOTA

Medium
March 27, 2024

DBRX: Databricks's 132B-total / 36B-active open MoE

Medium
February 15, 2024

Gemini 1.5 Pro: 1 million tokens in context

High
December 11, 2023

Mixtral 8x7B: open-source Mixture of Experts that beats GPT-3.5

Landmark
November 4, 2023

Grok-1: xAI's chatbot with real-time access to X data

Medium
June 1, 2021

Wu Dao 2.0: China announces a 1.75T-parameter model

Medium
January 12, 2021

Switch Transformer: Google scales to 1.6T parameters with Mixture of Experts

High

← All terms