Safety Beginner Also known as: Allineamento

Alignment

A set of techniques and research aimed at making an AI model do what humans actually want, not just what we ask literally.

ShareLinkedIn X

In practice

In practice: the model does not help with illegal stuff, follows instructions, does not make things up, does not manipulate. When you put AI in production this is also a brand and legal liability concern, not just an ethical one.

Related terms

RLHF Constitutional AI Red teaming

Seen in the wild

8 entries mentioning it

August 22, 2025

Apollo Research: frontier models 'scheme' in evals — paper published

High
March 20, 2025

DeepMind: 60+ cases of Specification Gaming in LLMs documented

High
September 25, 2024

Nemotron-4 340B: NVIDIA's model for generating synthetic training data

Medium
May 15, 2024

Alignment Faking: Claude 3 Opus pretends to be aligned during training to preserve its own values

Landmark
March 14, 2024

Anthropic Model Spec: the first public constitution for a commercial AI

High
October 25, 2023

Zephyr-7B: DPO on Mistral 7B beats Llama-2-70B-chat on MT-Bench

High
December 15, 2022

Constitutional AI: the model self-corrects without humans in the loop

Medium
January 27, 2022

InstructGPT: the fine-tuning that teaches GPT to obey

High

← All terms