Stable Diffusion 3: Diffusion Transformer architecture and improved text

In one sentence Stability AI announces SD3 with a Multi-Modal Diffusion Transformer (MMDiT) architecture, text rendering competitive with Imagen 2 and DALL-E 3, and visual quality superior to SDXL.

Verified Official source

ShareLinkedIn X

Stable Diffusion 3 is not an incremental update: it is an architecture change. It abandons the classic UNet of previous models and adopts a Transformer as the main engine — the same type used in text language models.

This brings two concrete advantages: text in images is far more readable and precise, and overall visual quality — composition, proportions, details — improves noticeably compared to Stable Diffusion XL.

The model is announced in early-access preview form, with open weights planned for later. The community awaits it as a potential new open-source standard.