Textual Inversion: inject a custom concept into diffusion models

In one sentence Weizmann Institute publishes Textual Inversion: learning a new text token representing a custom concept from 3-5 images, without modifying model weights.

Needs review Official source

ShareLinkedIn X

Image generation models understand words like "dog", "sunset", "impressionism". But they don't know your specific cat, your favorite design lamp, or your company logo.

Textual Inversion solves this elegantly: instead of retraining the whole model, it learns just one new word — a token — representing your concept. It does this by analyzing 3-5 photos of the subject and finding the exact point in the model's "language map" that best describes that object.

Once learned, you can use the token in prompts like any other word: "a photo of [my-cat] on a tropical beach" or "a coffee cup in the style of [my-favorite-artist]". The model stays intact, only the vocabulary is enriched. Lighter than DreamBooth and great for experimentation.