In practice
In practice: the model does not help with illegal stuff, follows instructions, does not make things up, does not manipulate. When you put AI in production this is also a brand and legal liability concern, not just an ethical one.
Related terms
Seen in the wild
8 entries mentioning it- HighApollo Research: frontier models 'scheme' in evals — paper published
- HighDeepMind: 60+ cases of Specification Gaming in LLMs documented
- MediumNemotron-4 340B: NVIDIA's model for generating synthetic training data
- LandmarkAlignment Faking: Claude 3 Opus pretends to be aligned during training to preserve its own values
- HighAnthropic Model Spec: the first public constitution for a commercial AI
- HighZephyr-7B: DPO on Mistral 7B beats Llama-2-70B-chat on MT-Bench
- MediumConstitutional AI: the model self-corrects without humans in the loop
- HighInstructGPT: the fine-tuning that teaches GPT to obey