CAIS Dangerous Capabilities Evaluations: the standard framework for measuring dangerous LLM capabilities

In one sentence The Center for AI Safety publishes a structured framework for evaluating dangerous LLM capabilities in CBRN, cyberoffense, and autonomy; adopted by UK AISI and integrated into Anthropic's Responsible Scaling Policy.

Verified Official source

ShareLinkedIn X

How do you measure whether an AI model is dangerous enough to be stopped before release? Until a few years ago there was no methodological answer to this question.

The Center for AI Safety developed a structured evaluation framework for so-called dangerous capabilities: a model's ability to assist in synthesizing biological or chemical agents (CBRN), assist in offensive cyberattacks, and operate autonomously toward self-assigned goals.

The framework defines specific benchmarks with risk thresholds, standardized test protocols, and a taxonomy of dangerous capabilities that enables comparisons between different models over time.

This type of evaluation is now an integral part of the deployment process at major AI labs: Anthropic has integrated it into its Responsible Scaling Policy, and UK AISI uses it as the basis for frontier model evaluations.