High Multimodal AI · 1 min read
InternVL 2.5: 78B open source that beats GPT-4V on OCR and math
In one sentence Shanghai AI Lab releases InternVL 2.5 with 78B parameters under Apache 2.0, achieving SOTA on MathVista, OCRBench, and ChartQA, surpassing GPT-4V on numerous multimodal benchmarks.
Reading level
InternVL 2.5 is the most capable open-source VLM at its release: 78 billion parameters with an Apache 2.0 license, meaning free for commercial use. It beats GPT-4V on visual math tests, reading text in images, and understanding charts and tables. For the first time, an open-weight model surpasses the best proprietary models on multiple benchmarks simultaneously, demonstrating that open source can compete at the highest level in multimodal AI.
Companies
Shanghai AI Lab
Tools
InternVL 2.5, InternVL2.5-78B
Tags
VLMSOTAMathOCRChart UnderstandingOpen Source
Sources