Inference Intermediate Also known as: Massive Multitask Language Understanding

MMLU

/em-em-el-you/

A benchmark of about 16,000 multiple-choice questions across 57 subjects, from math and law to medicine, used to measure an LLM's general knowledge.

ShareLinkedIn X

In practice

For years it was the headline benchmark cited in new model announcements. Today it is saturated: frontier models score above 85%, and the field is moving to harder benchmarks like MMLU-Pro and GPQA.

Related terms

GPQA HELM Foundation model

Seen in the wild

1 entries mentioning it

December 6, 2023

Google Gemini 1.0: natively multimodal in three sizes

Landmark

← All terms