High Multimodal AI · 1 min read
Llama 4 Scout: 109B multimodal MoE with 10M context and vision SOTA
In one sentence Meta releases Llama 4 Scout, a 109B MoE model with 17B active parameters, 10M token context, multiple image support, and vision SOTA benchmarks among open models.
Reading level
Llama 4 Scout is the first Llama that truly sees images well. With 109 billion total parameters but only 17 billion active at a time (Mixture of Experts architecture), it's as efficient as a smaller model. The 10 million token context window is unprecedented: you can give it hours of video, hundreds of images, or enormous documents. It sets new records among open-source models in visual comprehension, bringing Llama to the top tier of multimodal AI.
Companies
Meta
Tools
Llama 4 Scout, Llama 4 Maverick
Tags
Llama 4MoELong ContextMulti-ImageSOTAMeta
Sources