Veo 3 at Google I/O: video generation with native synced audio

In one sentence At Google I/O 2025, DeepMind unveils Veo 3 (video gen with native audio, dialogue, effects), Imagen 4 (more detailed images), and Flow (AI video tool for creators).

Needs review Official source

ShareLinkedIn X

At Google I/O 2025, DeepMind announces a major jump in video generation. Veo 3, the evolution of Veo 2, doesn't just make moving images: it now produces synchronized audio too — dialogue, music, sound effects, ambient noise — all in a single render.

For the first time you can write "a dog barking on a Brooklyn street at sunset, with a car passing by" and get an 8-second video with the dog barking, the car passing, and audio coherent with the scene. OpenAI's Sora does video, but without integrated audio.

Alongside come Imagen 4 (more detailed images, especially for text in images) and Flow (a new creator tool combining Veo + Imagen in an AI video editing app). All integrated in Gemini Advanced and Vertex AI.