Llama 3.2: Meta brings vision and edge to open models

In one sentence Meta releases Llama 3.2 in 4 sizes: 1B and 3B for edge/mobile, 11B and 90B multimodal (vision). First time Meta seriously enters open multimodal + on-device.

Verified Official source

ShareLinkedIn X

Meta updates the Llama family with two big additions. First: two very small models (1 and 3 billion parameters) designed to run on a phone or Raspberry Pi. Second: for the first time Llama "sees": the 11B and 90B versions accept images as input, so you can show them a chart, a receipt, a photo and ask questions.

For open-source developers this matters: until now, doing vision with an open model meant stitching pieces together (Llava, Bunny, etc.) of variable quality. Now there's an official Meta baseline comparable to GPT-4o on the vision side.

A note: the vision models (11B and 90B) are not distributed in the EU due to regulatory issues (AI Act), opening a debate on how much European regulation is slowing access to open models.