Reading level
InternVL was one of the first open-source models to seriously compete with GPT-4V on visual understanding tests. Its main feature is a very large image encoder — 6 billion parameters — far more powerful than CLIP. It was designed to scale the visual side of multimodal models with the same care usually given to the language side. It then served as a base for many more powerful successor versions.
Companies
Shanghai AI Laboratory
Tools
InternVL, InternViT
Tags
InternVLOpen SourceVisual EncoderGPT-4V ComparableShanghai AI Lab
Sources