January 30, 2024 High Multimodal AI · 1 min read

InternVL: 6B-parameter visual encoder on par with GPT-4V

In one sentence Shanghai AI Lab releases InternVL with an open-source 6B-parameter visual encoder, achieving GPT-4V-comparable performance on multimodal benchmarks.

Verified Official source

ShareLinkedIn X

Reading level

InternVL was one of the first open-source models to seriously compete with GPT-4V on visual understanding tests. Its main feature is a very large image encoder — 6 billion parameters — far more powerful than CLIP. It was designed to scale the visual side of multimodal models with the same care usually given to the language side. It then served as a base for many more powerful successor versions.

Companies

Shanghai AI Laboratory

Tools

InternVL, InternViT