Skip to content
AImpact
IT EN
Medium AI Security · 1 min read

Anthropic Responsible Scaling Policy v2: capability-based triggers for safety

In one sentence Anthropic updates its Responsible Scaling Policy: instead of compute thresholds, it now defines specific Capability Thresholds (biorisk, autonomy, cyber) that trigger formal safety measures.

Verified Official source
ShareLinkedInX
Reading level

Anthropic, the company behind Claude, is one of the few that publicly explains how it decides whether one of its models is "too dangerous" to release. It's called the Responsible Scaling Policy.

The first version (2023) used training compute as proxy: bigger = riskier. That works poorly because a small but specialized model can be as dangerous as a big one.

The new version flips the approach: now you evaluate the model's capabilities. Example: "if the model can help synthesize serious pathogens," a safety level triggers — with external audits, restrictions, mitigations. Size doesn't matter.

Companies

Anthropic

Tools

Tags

AnthropicRSPSafetysecurityAI Safety Levels

Sources