Inference Intermediate Also known as: Graduate-Level Google-Proof Q&A

GPQA

/jee-pee-kew-ay/

A benchmark of 448 questions written by PhD students in biology, physics, and chemistry, designed to be hard even with Google access.

ShareLinkedIn X

In practice

It is replacing MMLU as the gauge of deep scientific knowledge. Domain-expert humans score around 65%; frontier models in 2025 exceed 70%. It remains one of the not-yet-saturated benchmarks.

Related terms

MMLU Reasoning model Frontier model

Seen in the wild

0 entries mentioning it

No archive entry mentions it explicitly. Appears in broader contexts.

← All terms