Backdoor pattern

Trigger phrase or feature. Normal input: correct behavior. Trigger present: attacker's chosen behavior.

Advertisement

Web-scale attack

LLMs train on web. Attacker publishes malicious content targeting inclusion. Wallace et al 2020 demonstrated feasibility.

Advertisement

Split-view attack (2024)

Malicious content served only to crawlers, not users. Bypasses human review. Actively found in wild.