Skip to content
AImpact
IT EN
Training Intermediate Also known as: Reinforcement Learning from AI Feedback

RLAIF

/ar-el-ay-eye-ef/

A variant of RLHF where responses are judged not by a human but by another AI model, cutting cost and time compared to manual annotation.

ShareLinkedInX

In practice

It lets alignment training scale to much larger volumes. Anthropic uses it for Claude together with Constitutional AI. The risk is amplifying the judge model's biases, so human oversight is still needed.

Related terms

← All terms