Method

Ask history / educational question. Follow-ups probe deeper. Reference prior model outputs. Eventually cross safety line.

Advertisement

Success rate

Highly effective across GPT-4, Claude, Gemini in 2024 evaluations. Even reasoning models susceptible.

Advertisement

Why

Model treats prior turns as authoritative. Won't 'take back' compliance. Referencing own outputs escalates trust.