High Multimodal AI · 1 min read
Pixtral 12B: Mistral's first multimodal model with native vision encoder
In one sentence Mistral debuts in multimodal with Pixtral 12B: native vision encoder (not CLIP), multi-image and interleaved text-image, Apache 2.0 license.
Reading level
Pixtral 12B is the first image-understanding model released by Mistral AI. The distinctive feature is that Mistral built their own visual encoder from scratch instead of using CLIP like nearly everyone else. It supports multiple images in the same conversation and images can be freely mixed with text. It's released under Apache 2.0 license, making it completely free for commercial use.
Companies
Mistral AI
Tools
Pixtral 12B
Tags
PixtralMistralNative Vision EncoderMulti-ImageApache 2.0Interleaved
Sources