April 17, 2024 High Open Source Models · 1 min read

Mixtral 8x22B: Mistral's Apache 2.0 MoE with 39B active parameters

In one sentence Mistral releases Mixtral 8x22B under Apache 2.0, a 141B-total / 39B-active MoE with 64k context and an optimized tokenizer, the first open-weight model to truly rival Llama 2 70B in production.

Verified Official source

ShareLinkedIn X

Reading level

Mistral pulls another surprise: it releases a huge model under a fully free license (Apache 2.0), rare at this scale.

It's called Mixtral 8x22B because it has 8 "experts" of 22 billion parameters each; for each token it activates the 2 most relevant, so it computes like a 39-billion model while owning 141 in total.

In practice anyone can download, modify, and use it in commercial products with no permission or payment. On benchmarks it beats Llama 2 70B and approaches GPT-3.5.

Companies

Mistral AI

Tools

Mixtral 8x22B