Training Intermediate Also known as: Reinforcement Learning from AI Feedback

RLAIF

/ar-el-ay-eye-ef/

A variant of RLHF where responses are judged not by a human but by another AI model, cutting cost and time compared to manual annotation.

ShareLinkedIn X

In practice

It lets alignment training scale to much larger volumes. Anthropic uses it for Claude together with Constitutional AI. The risk is amplifying the judge model's biases, so human oversight is still needed.

Related terms

RLHF Constitutional AI Alignment

Seen in the wild

1 entries mentioning it

December 15, 2022

Constitutional AI: the model self-corrects without humans in the loop

Medium

← All terms