October 3, 2024 High Multimodal AI · 1 min read

Pixtral 12B: Mistral's first multimodal model with native vision encoder

In one sentence Mistral debuts in multimodal with Pixtral 12B: native vision encoder (not CLIP), multi-image and interleaved text-image, Apache 2.0 license.

Verified Official source

ShareLinkedIn X

Reading level

Pixtral 12B is the first image-understanding model released by Mistral AI. The distinctive feature is that Mistral built their own visual encoder from scratch instead of using CLIP like nearly everyone else. It supports multiple images in the same conversation and images can be freely mixed with text. It's released under Apache 2.0 license, making it completely free for commercial use.

Companies

Mistral AI

Tools

Pixtral 12B