Safety Beginner Also known as: Test avversariale

Red teaming

A practice where a team actively tries to attack a model or an AI system, looking for jailbreaks, security holes, and dangerous uses, in order to find them before release.

ShareLinkedIn X

In practice

AI labs do it in-house and with external experts before shipping a model. If you put AI in production do the same on your product: ask colleagues to break it before customers do. Even one rough hour beats the first public bug.

Seen in the wild

4 entries mentioning it

August 11, 2024

Promptfoo Red Teaming: open source automated red-teaming with CI integration and comparative benchmark

Medium
January 12, 2024

Garak: the open source vulnerability scanner for LLMs

Medium
October 16, 2023

MITRE ATLAS v2: the AI attack taxonomy updated with real case studies

High
July 6, 2022

Red Teaming LLMs with LLMs: the DeepMind paper that changed safety testing

High

← All terms

In practice

Related terms

Seen in the wild