Skip to content
AImpact
IT EN
Safety Intermediate Also known as: Classificatore di sicurezza · Content filter

Safety classifier

A separate model that analyzes the input or output of an LLM to catch unsafe, violent, illegal, or off-policy content before it reaches the user.

ShareLinkedInX

In practice

It is a safety net in cascade: if the main model slips, the classifier blocks it. OpenAI Moderation and Meta's Llama Guard are free examples. For public services having one is almost mandatory.

Related terms

← All terms