Skip to content
AImpact
IT EN
High Multimodal AI · 1 min read

Molmo: the open-weight VLM that beats GPT-4V at pointing

In one sentence Allen AI releases Molmo, a full-pipeline open-weight VLM with precise pointing capabilities on image objects, surpassing GPT-4V on visual grounding benchmarks.

Verified Official source
ShareLinkedInX
Reading level

Most VLMs can describe what's in an image, but can't indicate exactly where something is located. Molmo solves this problem: if you ask "point to the glass on the table" it responds by pointing with precise coordinates on the image. Allen AI made public not just the model but also the PixMo dataset used to train it, created with detailed voice descriptions collected from humans. This "full open pipeline" approach is rare and invaluable for research.

Companies

Allen Institute for AI

Tools

Molmo, Molmo-7B, Molmo-72B, PixMo

Tags

VLMOpen SourcePointingGroundingOpen Pipeline

Sources