Most production agents need to escalate to humans sometimes — for hard cases, low confidence, or sensitive operations. The handoff itself is UX-critical: bad handoffs lose context and frustrate users.

Advertisement

When to handoff

Low confidence (model's own uncertainty signal, multi-sample disagreement). Out-of-scope (capability the agent doesn't have). Sensitive (operations the agent shouldn't authorize). Repeated failures (agent retried 3 times unsuccessfully).

What to hand over

Conversation transcript, agent's reasoning, what it tried, what failed. Don't make the human re-read the whole conversation; surface the relevant context and the specific blocker.

Advertisement

Handoff back

Human resolves, decides to either return to agent or stay in human mode. State preserved both directions. Logged for training data ('these are the cases the agent should have handled differently').

Escalate on confidence/scope/sensitivity. Surface context not transcript. Handoff back preserves state. Log for retraining.