In practice
It lets alignment training scale to much larger volumes. Anthropic uses it for Claude together with Constitutional AI. The risk is amplifying the judge model's biases, so human oversight is still needed.
It lets alignment training scale to much larger volumes. Anthropic uses it for Claude together with Constitutional AI. The risk is amplifying the judge model's biases, so human oversight is still needed.