Method
Ask history / educational question. Follow-ups probe deeper. Reference prior model outputs. Eventually cross safety line.
Advertisement
Success rate
Highly effective across GPT-4, Claude, Gemini in 2024 evaluations. Even reasoning models susceptible.
Advertisement
Why
Model treats prior turns as authoritative. Won't 'take back' compliance. Referencing own outputs escalates trust.