Inference Intermediate Also known as: LLM giudice · Model-graded eval

LLM-as-judge

/el-el-em as judge/

A technique that uses an LLM (usually a strong one) to score another model's or its own answers against criteria written in natural language.

ShareLinkedIn X

In practice

It speeds up evaluation dramatically compared to human judges, but suffers from biases (prefers longer answers, its own style). It must be calibrated against a subset of human judgments as anchor.

Related terms

RLAIF Constitutional AI Alignment

Seen in the wild

0 entries mentioning it

No archive entry mentions it explicitly. Appears in broader contexts.

← All terms