Skip to content
AImpact
IT EN
High Multimodal AI · 1 min read

Qwen2-VL: dynamic resolution, computer use, and doc-level OCR at 72B

In one sentence Alibaba releases Qwen2-VL 72B with dynamic resolution for any image size, visual agent with computer use, and document-level OCR.

Verified Official source
ShareLinkedInX
Reading level

Qwen2-VL is an Alibaba model capable of analyzing images of any size without cropping or resizing — it processes them at native resolution. It can read entire documents, perform OCR on PDF pages, and even control a computer by watching the screen as a human would. With 72 billion parameters it is among the most powerful open multimodal models ever released.

Companies

Alibaba, Qwen Team

Tools

Qwen2-VL

Tags

Qwen2-VLDynamic ResolutionComputer UseOCRAlibabaAgent

Sources