ControlNet: structural control for Stable Diffusion without retraining

In one sentence Zhang et al. introduce ControlNet, an adapter adding pose, depth, and edge control to Stable Diffusion without modifying the base model weights.

Verified Official source

ShareLinkedIn X

Stable Diffusion generates images from text, but with little control over composition: want a character in a precise pose? Good luck. ControlNet solves this by adding a visual control layer on top of the existing model.

The user provides a pose skeleton, depth map, or edge image, and the model generates the result while respecting that structure. The generated character will follow exactly the indicated pose.

The beauty of ControlNet is that it works as a plug-in: no need to retrain Stable Diffusion from scratch, it simply attaches on top. This opened a season of specialized adapters and revolutionized the workflow of digital artists everywhere.