Hierarchy

1. System (developer). 2. User (human). 3. Tool outputs (potentially attacker-controlled). Lower levels can't override higher.

Advertisement

Behavior

Tool output saying 'ignore all previous instructions' → model treats as data, not instruction. System policies preserved.

Advertisement

Enforcement

Not perfect. Sophisticated injection still works. But baseline defense that lifts the bar significantly.