Skip to content
AImpact
IT EN
Safety Beginner Also known as: Aggiramento delle protezioni

Jailbreak

A technique where a user talks the model into ignoring its own safety rules, for example by asking it to pretend to be a character with no restrictions.

ShareLinkedInX

In practice

Different from prompt injection: here it is the user who tries. If you offer a public LLM service this means doing red teaming, logging conversations, and running a safety classifier in cascade over responses.

Related terms

← All terms