Components
Attack strategies (single-turn, multi-turn, crescendo). Evaluators (harm classifiers). Targets (LLM under test). Datastore.
Advertisement
Multi-turn attacks
Automated crescendo + PAIR + custom multi-turn. Simulates persistent adversary.
Advertisement
Memory + iteration
Store attack results. Iterate on successful strategies. Adaptive red team.