High Open Source Models · 1 min read
Mixtral 8x22B: Mistral's Apache 2.0 MoE with 39B active parameters
In one sentence Mistral releases Mixtral 8x22B under Apache 2.0, a 141B-total / 39B-active MoE with 64k context and an optimized tokenizer, the first open-weight model to truly rival Llama 2 70B in production.
Reading level
Mistral pulls another surprise: it releases a huge model under a fully free license (Apache 2.0), rare at this scale.
It's called Mixtral 8x22B because it has 8 "experts" of 22 billion parameters each; for each token it activates the 2 most relevant, so it computes like a 39-billion model while owning 141 in total.
In practice anyone can download, modify, and use it in commercial products with no permission or payment. On benchmarks it beats Llama 2 70B and approaches GPT-3.5.
Companies
Mistral AI
Tools
Mixtral 8x22B
Tags
MistralMixtralMoEOpen SourceApache 2.0
Sources