Safety Intermediate Also known as: Many-Shot Attack · Long Context Jailbreak

Many-Shot Jailbreaking

Many-shot jailbreaking is an attack technique that exploits long context windows by prepending 100-256 or more fake harmful question-answer pairs before the actual malicious request. The in-context examples override safety training by inducing the model to follow the demonstrated pattern rather than its guardrails. Effectiveness scales with context length: models with larger context windows are more vulnerable. The attack was disclosed by Anthropic in 2024 and prompted revisions to safety mechanisms for very long-context models.

ShareLinkedIn X

In practice

From a defensive standpoint, a developer evaluating a deployed model's robustness should include many-shot tests in their red-teaming: construct a prompt with 200+ malicious Q&A examples and measure the model's compliance rate. To mitigate the risk in production, one can apply artificially capped context windows for certain tasks, input classifiers that detect repeated Q&A patterns on risky topics, or logging systems that flag unusually long prompts for review.

Seen in the wild

0 entries mentioning it

No archive entry mentions it explicitly. Appears in broader contexts.

← All terms

In practice

Related terms

Seen in the wild