UK AISI: the first government safety evaluations on GPT-4o and Claude 3.5

In one sentence The UK government's AI Safety Institute publishes the first independent safety evaluation results on GPT-4o and Claude 3.5 Sonnet using the WMDP benchmark, the first governmental audit of frontier models.

Verified Official source

ShareLinkedIn X

Until 2024, safety evaluations of large AI models were conducted only by the same companies that developed them. The British government created the AI Safety Institute to perform these evaluations independently, as an external oversight body.

The published results cover OpenAI's GPT-4o and Anthropic's Claude 3.5 Sonnet, tested on capabilities related to weapons of mass destruction (using the WMDP benchmark), cyberoffense, and psychological manipulation. Scores are compared with the companies' own internal evaluations.

It is a historic moment: for the first time a government body has pre-release access to frontier models and publishes independent evaluations. It creates a precedent for technical AI regulation at the global level.