Imagen: Google enters text-to-image generation

In one sentence Google Research unveils Imagen, a text-to-image diffusion model that uses a frozen T5 text encoder and beats DALL-E 2 on benchmarks for photorealistic fidelity.

Verified Official source

ShareLinkedIn X

A few months after DALL-E 2, Google shows its own image generator: write a sentence, it paints. It's called Imagen.

The new trick is how it understands the text: it uses a large language model, already trained to read and write, and lets it "explain" the sentence to the painter. Simple idea, big effect.

Google doesn't open it to the public, though. No DALL-E-style website, no demos. Fear of fake imagery and problematic content holds the launch back. The result: everyone talks about it, few use it, and in the meantime Stable Diffusion and Midjourney take the audience.