May 30, 2023 High Multimodal AI · 1 min read

InstructBLIP: visual instruction tuning on 26 datasets outperforms GPT-4V

In one sentence Salesforce extends BLIP-2 with visual instruction tuning on 26 datasets, beating GPT-4V on visual reasoning benchmarks with an open architecture.

Verified Official source

ShareLinkedIn X

Reading level

InstructBLIP is an evolution of BLIP-2 designed to follow natural language instructions about images. It was trained on 26 different visual understanding datasets, making it generalist and robust. At release time it outperformed GPT-4V on many visual reasoning tests while being open source. It demonstrated that quantity and variety of instruction tuning data matters as much as architecture.

Companies

Salesforce

Tools

InstructBLIP, BLIP-2, Q-Former