Reading path
Creator, marketing and content
From the first generated image to real-time AI video.
You are a designer, content creator, copywriter or marketer and you need to understand how generative AI is rewriting your workflow. This path covers the key jumps from image (DALL·E 2, Stable Diffusion, Midjourney) to voice (ElevenLabs) to video (Sora, Veo 3) and conversational multimodal (GPT-4o).
- 01
Why it matters to you
First time a text-to-image output becomes indistinguishable from a real photograph: the stock photo and craft debate is born here.
High Image & Video GenDALL·E 2: the quality leap in image generation
OpenAI announces DALL·E 2, a diffusion-based text-to-image model producing photorealistic 1024×1024 images. Initially waitlist-only, public access in July.
- 02
Why it matters to you
Open weights make image generation free and customizable: creators start fine-tuning their own visual style.
Landmark Image & Video GenStable Diffusion: image generation goes open
Stability AI publicly releases weights and code of a text-to-image latent diffusion model that runs on a consumer GPU. AI image generation leaves the cloud.
- 03
Why it matters to you
Defines a recognizable, mainstream aesthetic: it permanently changes moodboards, concept art and editorial illustration.
High Image & Video GenMidjourney opens public beta on Discord
Midjourney opens its public beta with a text-to-image model accessible via a Discord bot. Its strong aesthetic default and community turn image generation into a mass phenomenon.
- 04
Why it matters to you
The first long, coherent, cinematic AI video: storyboards and visual pitches will never be the same.
Landmark Image & Video GenSora: OpenAI shows cinema-quality AI video
OpenAI announces Sora, a text-to-video model producing 1080p clips up to 60 seconds with temporal consistency, plausible physics, and realistic camera moves. Limited release to red-teamers and selected artists.
- 05
Why it matters to you
Native multimodality in chat: you go from brief to images, audio and variations in a single session, without tool switching.
High Multimodal AIGPT-4o: text, voice and images in a single model
OpenAI unveils GPT-4o (omni), a single model that natively handles text, audio, and images with ~320 ms voice latency and GPT-4-class text quality — free for ChatGPT free users.
- 06
Why it matters to you
Generative video finally usable by the public for real projects, not just demos.
High Image & Video GenSora Turbo: ten months after the demo, OpenAI ships video gen to the public
OpenAI ships Sora Turbo to ChatGPT Plus/Pro users: videos up to 20s, 1080p, image-to-video, remix, storyboard. Faster, less faithful version than the February Sora demo.
- 07
Why it matters to you
Veo 3 raises the bar of video photorealism: it gets hard to tell an AI ad from a traditional one, with all that implies for your craft.
High Image & Video GenVeo 3 at Google I/O: video generation with native synced audio
At Google I/O 2025, DeepMind unveils Veo 3 (video gen with native audio, dialogue, effects), Imagen 4 (more detailed images), and Flow (AI video tool for creators).