Crescendo Attack — Gradual Escalation

Method

Ask history / educational question. Follow-ups probe deeper. Reference prior model outputs. Eventually cross safety line.

Advertisement

Highly effective across GPT-4, Claude, Gemini in 2024 evaluations. Even reasoning models susceptible.

Advertisement

Model treats prior turns as authoritative. Won't 'take back' compliance. Referencing own outputs escalates trust.