Hierarchy
1. System (developer). 2. User (human). 3. Tool outputs (potentially attacker-controlled). Lower levels can't override higher.
Advertisement
Behavior
Tool output saying 'ignore all previous instructions' → model treats as data, not instruction. System policies preserved.
Advertisement
Enforcement
Not perfect. Sophisticated injection still works. But baseline defense that lifts the bar significantly.