Skip to content
AImpact
IT EN
Training Intermediate Also known as: Reinforcement Learning from Human Feedback

RLHF

/ar-el-aitch-ef/

A training technique where humans rate and rank model responses, and these preferences are used to steer learning toward more helpful and safer answers.

ShareLinkedInX

In practice

It is the step that turned ChatGPT into something useful versus a raw predictive model. For API users RLHF has already been done by the provider. Knowing about it explains why more 'aligned' models sometimes refuse legitimate requests.

Related terms

← All terms