Backdoor pattern
Trigger phrase or feature. Normal input: correct behavior. Trigger present: attacker's chosen behavior.
Advertisement
Web-scale attack
LLMs train on web. Attacker publishes malicious content targeting inclusion. Wallace et al 2020 demonstrated feasibility.
Advertisement
Split-view attack (2024)
Malicious content served only to crawlers, not users. Bypasses human review. Actively found in wild.