Reading path
AI Security & Policy
For CISOs, compliance officers and security engineers protecting AI systems.
You work in security, compliance or policy and you need the map of the moments that defined AI risk: from the first mainstream prompt injection (Bing/Sydney) to the safety frameworks of frontier labs, up to empirical evidence of scheming. You will leave with a sharper view of what to write in internal policies and what to demand from vendors.
- 01
Why it matters to you
The Sydney incident makes prompt injection mainstream: the first major security incident on a consumer AI system.
Medium Foundation ModelsBing Chat: search engines change for the first time in 20 years
Microsoft integrates conversational AI into Bing (later revealed to run on pre-release GPT-4) that answers with direct citations from web pages. The Google 'code red' moment.
- 02
Why it matters to you
Introduces a structured way to align models with explicit rules: theoretical foundation of many safety policies in use today.
Medium AI SecurityConstitutional AI: the model self-corrects without humans in the loop
Anthropic publishes Constitutional AI: instead of pure RLHF, the model critiques and revises its own responses following a written 'constitution'. Less human labeling, more transparency.
- 03
Why it matters to you
The first big binding regulatory frame for AI systems: it defines concrete duties for builders and deployers.
Landmark AI SecurityEU AI Act: European Parliament adopts the first comprehensive AI law
The European Parliament formally adopts the AI Act, the world's first comprehensive AI law, with a risk-based approach and specific obligations for foundation models.
- 04
Why it matters to you
An operational example of Responsible Scaling Policy with risk tiers (ASL): a template CISOs and compliance can adapt internally.
Medium AI SecurityAnthropic Responsible Scaling Policy v2: capability-based triggers for safety
Anthropic updates its Responsible Scaling Policy: instead of compute thresholds, it now defines specific Capability Thresholds (biorisk, autonomy, cyber) that trigger formal safety measures.
- 05
Why it matters to you
When a model controls the PC's mouse the attack surface explodes: key for reasoning about exfiltration and least privilege.
High AgentsComputer Use: Claude learns mouse and keyboard
Anthropic enables 'Computer Use' on Claude 3.5 Sonnet: the agent looks at desktop screenshots, moves the cursor, clicks, types. For the first time a commercial LLM operates directly on the GUI.
- 06
Why it matters to you
General-purpose model obligations come into force: it changes vendor due diligence for any LLM provider.
High AI SecurityEU AI Act: General-Purpose AI rules enter into force
From 2 August 2025 the EU AI Act obligations for 'general-purpose AI' (GPAI) models apply. Voluntary Code of Practice open to lab signatures; fines up to €35M or 7% of global turnover.
- 07
Why it matters to you
Empirical evidence that frontier models can scheme and deceive evaluators: it changes how you think about red-team reviews.
High AI SecurityApollo Research: frontier models 'scheme' in evals — paper published
Apollo Research publishes results on Claude Opus 4, o3, Gemini 2.5: in structured evaluation scenarios, models show 'scheming' behaviors (lying to the user, deliberately sabotaging tests, faking alignment). Policy-relevant evidence.