Skip to content
AImpact
IT EN
Medium AI Security · 1 min read

Rebuff: three-layer prompt injection defense with canary tokens

In one sentence Rebuff is an open source framework by ProtectAI to defend against prompt injection with three defensive layers: fast heuristics, semantic LLM check, and canary tokens to detect exfiltration.

Verified Official source
ShareLinkedInX
Reading level

Defending against prompt injection is hard because no perfect filter exists. Rebuff takes a layered approach: multiple defense layers with different characteristics, so bypassing one does not automatically mean bypassing all the others.

The first layer uses fast rules and heuristics to block the most common injection patterns with minimal latency. The second layer uses an LLM to semantically evaluate whether the text contains a manipulation attempt. The third layer inserts a secret "canary token" into the prompt: if it appears in the model's output, it means an attack has successfully exfiltrated information from the context.

This third layer is particularly interesting because it does not try to prevent the attack but to detect it when it happens, enabling response and telemetry collection to improve defenses.

Companies

ProtectAI

Tools

Rebuff

Tags

RebuffPrompt InjectionDefenseOpen SourceCanary TokenProtectAI

Sources