July 23, 2024 Medium Multimodal AI · 1 min read

SmolVLM: the 256M-2B VLM family for edge devices

In one sentence HuggingFace releases SmolVLM, a family of VLMs from 256M to 2B parameters with multi-image, video, and OCR support, Apache 2.0, optimized for edge deployment.

Verified Official source

ShareLinkedIn X

Reading level

HuggingFace built a family of visual models small enough to run on your phone or laptop without internet. SmolVLM comes in three sizes: 256 million parameters, 500 million, and 2 billion. Despite the reduced size, it can look at multiple images simultaneously, understand video, perform OCR on documents, and answer questions. The Apache 2.0 license means anyone can use them in commercial products for free, accelerating adoption in IoT and mobile applications.

Companies

HuggingFace

Tools

SmolVLM, SmolVLM-256M, SmolVLM-500M, SmolVLM-2B