Skip to content
AImpact
IT EN
High Multimodal AI · 1 min read

DeepMind Flamingo: the first few-shot visual language model

In one sentence Flamingo brings few-shot learning to vision: SOTA on VQA and captioning with no task-specific fine-tuning.

Verified Official source
ShareLinkedInX
Reading level

Flamingo is a model from DeepMind that understands both text and images together. Remarkably, it can answer questions about images or describe them by seeing only a few examples, without being retrained from scratch. It was the first model to achieve state-of-the-art results on visual benchmarks using just a handful of demonstrations. It paved the way for modern multimodal assistants.

Companies

DeepMind

Tools

Flamingo

Tags

Visual Language ModelFew-Shot LearningVQAImage CaptioningDeepMind

Sources