March 8, 2024 High Multimodal AI · 1 min read

IDEFICS2: 8B open multimodal with native PDF and OCR training

In one sentence HuggingFace releases IDEFICS2, 8B parameters Apache 2.0, natively trained on PDF and OCR data, with superior text-in-image handling over predecessors.

Verified Official source

ShareLinkedIn X

Reading level

IDEFICS2 is HuggingFace's open-source multimodal model, capable of understanding text and images together with just 8 billion parameters. The main innovation is native training on PDF documents and OCR data — meaning it reads text inside images far better than previous models. It's released under Apache 2.0 license, so anyone can use it for commercial applications without restrictions.

Companies

HuggingFace

Tools

IDEFICS2, SigLIP, Mistral