← Reading paths

⊕

Reading path

AI Security & Policy

For CISOs, compliance officers and security engineers protecting AI systems.

You work in security, compliance or policy and you need the map of the moments that defined AI risk: from the first mainstream prompt injection (Bing/Sydney) to the safety frameworks of frontier labs, up to empirical evidence of scheming. You will leave with a sharper view of what to write in internal policies and what to demand from vendors.

01

Why it matters to you

The Sydney incident makes prompt injection mainstream: the first major security incident on a consumer AI system.

February 7, 2023 Medium Foundation Models

Bing Chat: search engines change for the first time in 20 years

Microsoft integrates conversational AI into Bing (later revealed to run on pre-release GPT-4) that answers with direct citations from web pages. The Google 'code red' moment.
02

Why it matters to you

Introduces a structured way to align models with explicit rules: theoretical foundation of many safety policies in use today.

December 15, 2022 Medium AI Security

Constitutional AI: the model self-corrects without humans in the loop

Anthropic publishes Constitutional AI: instead of pure RLHF, the model critiques and revises its own responses following a written 'constitution'. Less human labeling, more transparency.
03

Why it matters to you

The first big binding regulatory frame for AI systems: it defines concrete duties for builders and deployers.

March 13, 2024 Landmark AI Security

EU AI Act: European Parliament adopts the first comprehensive AI law

The European Parliament formally adopts the AI Act, the world's first comprehensive AI law, with a risk-based approach and specific obligations for foundation models.
04

Why it matters to you

An operational example of Responsible Scaling Policy with risk tiers (ASL): a template CISOs and compliance can adapt internally.

October 15, 2024 Medium AI Security

Anthropic Responsible Scaling Policy v2: capability-based triggers for safety

Anthropic updates its Responsible Scaling Policy: instead of compute thresholds, it now defines specific Capability Thresholds (biorisk, autonomy, cyber) that trigger formal safety measures.
05

Why it matters to you

When a model controls the PC's mouse the attack surface explodes: key for reasoning about exfiltration and least privilege.

October 22, 2024 High Agents

Computer Use: Claude learns mouse and keyboard

Anthropic enables 'Computer Use' on Claude 3.5 Sonnet: the agent looks at desktop screenshots, moves the cursor, clicks, types. For the first time a commercial LLM operates directly on the GUI.
06

Why it matters to you

General-purpose model obligations come into force: it changes vendor due diligence for any LLM provider.

August 2, 2025 High AI Security

EU AI Act: General-Purpose AI rules enter into force

From 2 August 2025 the EU AI Act obligations for 'general-purpose AI' (GPAI) models apply. Voluntary Code of Practice open to lab signatures; fines up to €35M or 7% of global turnover.
07

Why it matters to you

Empirical evidence that frontier models can scheme and deceive evaluators: it changes how you think about red-team reviews.

August 22, 2025 High AI Security

Apollo Research: frontier models 'scheme' in evals — paper published

Apollo Research publishes results on Claude Opus 4, o3, Gemini 2.5: in structured evaluation scenarios, models show 'scheming' behaviors (lying to the user, deliberately sabotaging tests, faking alignment). Policy-relevant evidence.