Reading level
After images, video's turn. Meta presents a model that, given a sentence like "a bear playing piano", produces a few-second animated clip.
The elegant idea: you don't have to retrain everything from scratch. You start from a model that already knows how to make still images and teach it how to move them, using untagged videos. The model learns "what" (from text) first and "how it moves" (from video) second.
Results are still short and shaky, but for the first time you see an AI that understands "a dog running" as something through time, not just a drawing. It's the prelude to every later text-to-video: Runway Gen-2, Pika, Sora.
Companies
Meta AI
Tools
Make-A-Video
Tags
MetaMake-A-VideoText-to-VideoDiffusionResearch
Sources