CAIS Dangerous Capabilities Evaluations: the standard framework for measuring dangerous LLM capabilities
In one sentence The Center for AI Safety publishes a structured framework for evaluating dangerous LLM capabilities in CBRN, cyberoffense, and autonomy; adopted by UK AISI and integrated into Anthropic's Responsible Scaling Policy.
How do you measure whether an AI model is dangerous enough to be stopped before release? Until a few years ago there was no methodological answer to this question.
The Center for AI Safety developed a structured evaluation framework for so-called dangerous capabilities: a model's ability to assist in synthesizing biological or chemical agents (CBRN), assist in offensive cyberattacks, and operate autonomously toward self-assigned goals.
The framework defines specific benchmarks with risk thresholds, standardized test protocols, and a taxonomy of dangerous capabilities that enables comparisons between different models over time.
This type of evaluation is now an integral part of the deployment process at major AI labs: Anthropic has integrated it into its Responsible Scaling Policy, and UK AISI uses it as the basis for frontier model evaluations.
Companies
CAIS, Anthropic, UK AISI
Tools
—
Tags
Sources