Skip to content
AImpact
IT EN
High Multimodal AI · 1 min read

Pixtral 12B: Mistral's first multimodal model with native vision encoder

In one sentence Mistral debuts in multimodal with Pixtral 12B: native vision encoder (not CLIP), multi-image and interleaved text-image, Apache 2.0 license.

Verified Official source
ShareLinkedInX
Reading level

Pixtral 12B is the first image-understanding model released by Mistral AI. The distinctive feature is that Mistral built their own visual encoder from scratch instead of using CLIP like nearly everyone else. It supports multiple images in the same conversation and images can be freely mixed with text. It's released under Apache 2.0 license, making it completely free for commercial use.

Companies

Mistral AI

Tools

Pixtral 12B

Tags

PixtralMistralNative Vision EncoderMulti-ImageApache 2.0Interleaved

Sources