In practice
It speeds up evaluation dramatically compared to human judges, but suffers from biases (prefers longer answers, its own style). It must be calibrated against a subset of human judgments as anchor.
Related terms
Seen in the wild
0 entries mentioning itNo archive entry mentions it explicitly. Appears in broader contexts.