In practice
It enables models with hundreds of billions of parameters but inference cost closer to a much smaller one. Mixtral, DeepSeek, and GPT-4 use it. For API users nothing changes, but it explains surprising quality-to-price ratios.
Related terms
Seen in the wild
18 entries mentioning it- LandmarkDeepSeek V4 Preview: 1.6T parameters, 1M context, open weight in two sizes
- HighDeepSeek R2: the Chinese lab relaunches its open-weight reasoning model
- HighLlama 4 Scout: 109B multimodal MoE with 10M context and vision SOTA
- HighQwen 3: Alibaba ships an open-weight family from 0.6B to 235B with native thinking
- HighLlama 4: Meta moves to MoE and native multimodal, but the community is unimpressed
- MediumDeepSeek-V3-0324: the quiet update that puts vendor lock-in on notice
- HighDeepSeek-V3: GPT-4o Quality at $0.55/M Tokens via MLA and FP8 Pipeline
- LandmarkDeepSeek-V3: China releases a shockingly cheap open frontier model
- HighDeepSeek-Coder-V2: GPT-4 Turbo coding quality with open weights
- HighDeepSeek-V2: Multi-head Latent Attention and the first highly efficient Chinese open MoE
- HighMixtral 8x22B: Mistral's Apache 2.0 MoE with 39B active parameters
- MediumSnowflake Arctic: 480B total / 17B active MoE, enterprise SQL SOTA
- MediumDBRX: Databricks's 132B-total / 36B-active open MoE
- HighGemini 1.5 Pro: 1 million tokens in context
- LandmarkMixtral 8x7B: open-source Mixture of Experts that beats GPT-3.5
- MediumGrok-1: xAI's chatbot with real-time access to X data
- MediumWu Dao 2.0: China announces a 1.75T-parameter model
- HighSwitch Transformer: Google scales to 1.6T parameters with Mixture of Experts