Make-A-Video: Meta unveils the first credible text-to-video

In one sentence Meta AI shows Make-A-Video, a system that generates short animated clips from a text description by reusing a pre-existing text-to-image model.

Verified Official source

ShareLinkedIn X

After images, video's turn. Meta presents a model that, given a sentence like "a bear playing piano", produces a few-second animated clip.

The elegant idea: you don't have to retrain everything from scratch. You start from a model that already knows how to make still images and teach it how to move them, using untagged videos. The model learns "what" (from text) first and "how it moves" (from video) second.

Results are still short and shaky, but for the first time you see an AI that understands "a dog running" as something through time, not just a drawing. It's the prelude to every later text-to-video: Runway Gen-2, Pika, Sora.