Stable Diffusion 2.0: new architecture and OpenCLIP encoder

In one sentence Stability AI releases SD 2.0 with OpenCLIP replacing CLIP, native 768x768 resolution, a new depth2img model, and improved inpainting. A controversial release due to breaking compatibility with existing LoRAs and prompts.

Needs review Official source

ShareLinkedIn X

Four months after the Stable Diffusion 1.x revolution, Stability AI releases version 2.0 with significant changes under the hood.

The biggest change is the text encoder: moving from CLIP (OpenAI) to OpenCLIP, an open-source version trained on a different dataset. The result: native resolution jumps to 768x768, images are sharper, and two new models arrive — depth2img (generates images preserving a scene's 3D structure) and improved inpainting.

Reception is mixed. Many users complain that prompts working on SD 1.5 no longer work, celebrities were filtered from training data, and compatibility with existing LoRAs and fine-tunes is broken. Paradoxically, SD 1.5 remains the community's most-used model throughout all of 2023.