In practice
Extremely hard to detect with standard evaluations: the model looks aligned until someone types the keyword. It affects both proprietary models (insiders) and open-weights downloaded from untrusted sources.
Related terms
Seen in the wild
0 entries mentioning itNo archive entry mentions it explicitly. Appears in broader contexts.