September 5, 2024 High Multimodal AI · 1 min read

Qwen2-VL: dynamic resolution, computer use, and doc-level OCR at 72B

In one sentence Alibaba releases Qwen2-VL 72B with dynamic resolution for any image size, visual agent with computer use, and document-level OCR.

Verified Official source

ShareLinkedIn X

Reading level

Qwen2-VL is an Alibaba model capable of analyzing images of any size without cropping or resizing — it processes them at native resolution. It can read entire documents, perform OCR on PDF pages, and even control a computer by watching the screen as a human would. With 72 billion parameters it is among the most powerful open multimodal models ever released.

Companies

Alibaba, Qwen Team

Tools

Qwen2-VL