May 8, 2024 Medium Multimodal AI · 1 min read

Qwen-VL-Chat: the best open VLM in Chinese with bounding boxes

In one sentence Alibaba releases Qwen-VL-Chat, a 7B VLM with native bounding box output, bilingual Chinese-English OCR, and advanced document layout understanding.

Verified Official source

ShareLinkedIn X

Reading level

Western vision-language models were great in English but fragile with Chinese. Alibaba bridged this gap with Qwen-VL-Chat: a 7 billion parameter model that not only understands images and text in both languages, but can localize objects in the image by providing exact coordinates. This means you can ask it "where is the document title?" and it returns a precise rectangle. Extremely useful for reading invoices, forms, and scanned documents.

Companies

Alibaba

Tools

Qwen-VL, Qwen-VL-Chat