In practice
It is the step that turned ChatGPT into something useful versus a raw predictive model. For API users RLHF has already been done by the provider. Knowing about it explains why more 'aligned' models sometimes refuse legitimate requests.
It is the step that turned ChatGPT into something useful versus a raw predictive model. For API users RLHF has already been done by the provider. Knowing about it explains why more 'aligned' models sometimes refuse legitimate requests.