← Reading paths

✦

Reading path

Creator, marketing and content

From the first generated image to real-time AI video.

You are a designer, content creator, copywriter or marketer and you need to understand how generative AI is rewriting your workflow. This path covers the key jumps from image (DALL·E 2, Stable Diffusion, Midjourney) to voice (ElevenLabs) to video (Sora, Veo 3) and conversational multimodal (GPT-4o).

01

Why it matters to you

First time a text-to-image output becomes indistinguishable from a real photograph: the stock photo and craft debate is born here.

April 6, 2022 High Image & Video Gen

DALL·E 2: the quality leap in image generation

OpenAI announces DALL·E 2, a diffusion-based text-to-image model producing photorealistic 1024×1024 images. Initially waitlist-only, public access in July.
02

Why it matters to you

Open weights make image generation free and customizable: creators start fine-tuning their own visual style.

August 22, 2022 Landmark Image & Video Gen

Stable Diffusion: image generation goes open

Stability AI publicly releases weights and code of a text-to-image latent diffusion model that runs on a consumer GPU. AI image generation leaves the cloud.
03

Why it matters to you

Defines a recognizable, mainstream aesthetic: it permanently changes moodboards, concept art and editorial illustration.

July 12, 2022 High Image & Video Gen

Midjourney opens public beta on Discord

Midjourney opens its public beta with a text-to-image model accessible via a Discord bot. Its strong aesthetic default and community turn image generation into a mass phenomenon.
04

Why it matters to you

The first long, coherent, cinematic AI video: storyboards and visual pitches will never be the same.

February 15, 2024 Landmark Image & Video Gen

Sora: OpenAI shows cinema-quality AI video

OpenAI announces Sora, a text-to-video model producing 1080p clips up to 60 seconds with temporal consistency, plausible physics, and realistic camera moves. Limited release to red-teamers and selected artists.
05

Why it matters to you

Native multimodality in chat: you go from brief to images, audio and variations in a single session, without tool switching.

May 13, 2024 High Multimodal AI

GPT-4o: text, voice and images in a single model

OpenAI unveils GPT-4o (omni), a single model that natively handles text, audio, and images with ~320 ms voice latency and GPT-4-class text quality — free for ChatGPT free users.
06

Why it matters to you

Generative video finally usable by the public for real projects, not just demos.

December 9, 2024 High Image & Video Gen

Sora Turbo: ten months after the demo, OpenAI ships video gen to the public

OpenAI ships Sora Turbo to ChatGPT Plus/Pro users: videos up to 20s, 1080p, image-to-video, remix, storyboard. Faster, less faithful version than the February Sora demo.
07

Why it matters to you

Veo 3 raises the bar of video photorealism: it gets hard to tell an AI ad from a traditional one, with all that implies for your craft.

May 20, 2025 High Image & Video Gen

Veo 3 at Google I/O: video generation with native synced audio

At Google I/O 2025, DeepMind unveils Veo 3 (video gen with native audio, dialogue, effects), Imagen 4 (more detailed images), and Flow (AI video tool for creators).