Stable Diffusion 3: Diffusion Transformer architecture and improved text
In one sentence Stability AI announces SD3 with a Multi-Modal Diffusion Transformer (MMDiT) architecture, text rendering competitive with Imagen 2 and DALL-E 3, and visual quality superior to SDXL.
Stable Diffusion 3 is not an incremental update: it is an architecture change. It abandons the classic UNet of previous models and adopts a Transformer as the main engine — the same type used in text language models.
This brings two concrete advantages: text in images is far more readable and precise, and overall visual quality — composition, proportions, details — improves noticeably compared to Stable Diffusion XL.
The model is announced in early-access preview form, with open weights planned for later. The community awaits it as a potential new open-source standard.
Companies
Stability AI
Tools
Stable Diffusion 3, SD3
Tags
Sources