ControlNet: structural control for Stable Diffusion without retraining
In one sentence Zhang et al. introduce ControlNet, an adapter adding pose, depth, and edge control to Stable Diffusion without modifying the base model weights.
Stable Diffusion generates images from text, but with little control over composition: want a character in a precise pose? Good luck. ControlNet solves this by adding a visual control layer on top of the existing model.
The user provides a pose skeleton, depth map, or edge image, and the model generates the result while respecting that structure. The generated character will follow exactly the indicated pose.
The beauty of ControlNet is that it works as a plug-in: no need to retrain Stable Diffusion from scratch, it simply attaches on top. This opened a season of specialized adapters and revolutionized the workflow of digital artists everywhere.
Companies
Lvmin Zhang, Stanford University
Tools
ControlNet, Stable Diffusion
Tags
Sources