OpenAI Safety Evaluations Hub: public dashboard for tracking model safety over time
In one sentence OpenAI launches a public dashboard with comparative safety scores for each model version: standardized evals for CBRN, cyberoffense, and persuasion, with comparisons across GPT-4o, o1, o3, and previous versions.
Large language models have always been released with safety evaluations described in prose in reports, difficult to compare across different versions or different companies. OpenAI decided to change this with a public, quantitative dashboard.
The OpenAI Safety Evaluations Hub publishes numerical scores for the main safety dimensions of each model version: resistance to CBRN uses (biological, chemical, radiological, and nuclear weapons), cyberoffense (assistance with cyberattacks), and persuasion (ability to influence opinions in a manipulative way).
The most important feature is longitudinal comparability: it is possible to see how scores change from GPT-4 to GPT-4o to o1 to o3, with the same measurement system applied consistently over time.
This creates documented, public pressure: if a later version shows worse safety scores than the previous one, the fact is verifiable by anyone.
Companies
OpenAI
Tools
OpenAI Safety Evaluations Hub
Tags
Sources