High Multimodal AI · 1 min read
InstructBLIP: visual instruction tuning on 26 datasets outperforms GPT-4V
In one sentence Salesforce extends BLIP-2 with visual instruction tuning on 26 datasets, beating GPT-4V on visual reasoning benchmarks with an open architecture.
Reading level
InstructBLIP is an evolution of BLIP-2 designed to follow natural language instructions about images. It was trained on 26 different visual understanding datasets, making it generalist and robust. At release time it outperformed GPT-4V on many visual reasoning tests while being open source. It demonstrated that quantity and variety of instruction tuning data matters as much as architecture.
Companies
Salesforce
Tools
InstructBLIP, BLIP-2, Q-Former
Tags
InstructBLIPInstruction TuningVisual ReasoningSalesforceVQA
Sources