A benchmark of expert-level academic questions to assess AI capabilities.

Humans; Benchmarking/methods; Benchmarking/standards; Artificial Intelligence/standards; Language; Educational Measurement/methods; Educational Measurement/standards; Artificial Intelligence; Benchmarking; Educational Measurement; Multidisciplinary; LLM; AI

Abstract :

[en] Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve more than 90% accuracy on popular benchmarks such as Measuring Massive Multitask Language Understanding1, limiting informed measurement of state-of-the-art LLM capabilities. Here, in response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be an expert-level closed-ended academic benchmark with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable but cannot be quickly answered by internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a marked gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai .

Disciplines :

Computer science

Author, co-author :

Center for AI Safety

Scale AI

HLE Contributors Consortium

KUCHKIN, Vladyslav ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Physics and Materials Science (DPHYMS)

External co-authors :

Language :

English

Title :

A benchmark of expert-level academic questions to assess AI capabilities.

Publication date :

January 2026

Journal title :

Nature

ISSN :

0028-0836

eISSN :

1476-4687

Publisher :

Springer Science and Business Media LLC, England

Volume :

649

Issue :

8099

Pages :

1139 - 1146

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://www.nature.com/articles/s41586-025-09962-4.pdf

Available on ORBilu :

since 09 February 2026

Statistics

Number of views

71 (1 by Unilu)

Number of downloads

158 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations