CyberSecEval 2: Meta's LLM cybersecurity benchmark
In one sentence Meta publishes CyberSecEval 2: 7000+ test cases for evaluating LLM security across insecure code generation, cyberattack assistance, prompt injection, and vulnerability exploitation. Enables quantitative comparison of security posture across models.
How do you measure whether an AI model is more or less dangerous from a cybersecurity standpoint? Simply asking "write malware" and seeing if it refuses is not enough. The real risks are subtler: does the model help find vulnerabilities in software? Does it generate code with security flaws without warning you? Can it be tricked into executing dangerous commands?
Meta developed CyberSecEval to answer these questions systematically. The second version, published in 2024, covers over 7,000 test scenarios across multiple security areas: insecure code generation (buffer overflows, SQL injection, weak authentication), cyberattack assistance, prompt injection resistance, and ability to exploit known vulnerabilities.
The benchmark was used to test Meta's own Llama models but is fully open-source: anyone can use it to evaluate any model. Results showed significant differences between models in the likelihood of generating code with security flaws — a concrete risk for the millions of developers using AI to write code.
CyberSecEval became part of Meta's Purple Llama project, an ecosystem of open-source tools for responsible AI model safety.
Companies
Meta
Tools
—
Tags
Sources