GLIDE: OpenAI shifts from autoregressive to CLIP-guided diffusion

In one sentence OpenAI publishes GLIDE, a text-to-image diffusion model with classifier-free guidance — technical foundation for DALL·E 2 and the models that follow.

Verified Official source

ShareLinkedIn X

OpenAI changes technique for generating images from text. DALL·E 1, from January 2021, generated images as "tokens" — pixel-by-pixel, slow and low-resolution. With GLIDE it switches to a different approach: diffusion models.

The idea: start from random noise and progressively "denoise" toward an image, guided by the text prompt. Images become more photorealistic and can be edited (inpainting).

GLIDE isn't a consumer product, it's a research paper. It will be the technical base of DALL·E 2 a few months later. It also confirms the future of image generators is diffusion, a choice that will lead to Stable Diffusion, Midjourney, Imagen, Sora.