Skip to content
AImpact
IT EN
High Multimodal AI · 1 min read

InstructBLIP: visual instruction tuning on 26 datasets outperforms GPT-4V

In one sentence Salesforce extends BLIP-2 with visual instruction tuning on 26 datasets, beating GPT-4V on visual reasoning benchmarks with an open architecture.

Verified Official source
ShareLinkedInX
Reading level

InstructBLIP is an evolution of BLIP-2 designed to follow natural language instructions about images. It was trained on 26 different visual understanding datasets, making it generalist and robust. At release time it outperformed GPT-4V on many visual reasoning tests while being open source. It demonstrated that quantity and variety of instruction tuning data matters as much as architecture.

Companies

Salesforce

Tools

InstructBLIP, BLIP-2, Q-Former

Tags

InstructBLIPInstruction TuningVisual ReasoningSalesforceVQA

Sources