Agentic Safety in 2026 — Belgavi.AI Lab

Agents that take actions are different from chatbots that produce text. The safety stack is real: sandboxed execution, scoped capabilities, human approval on irreversible operations. Skipping these has produced real incidents in 2025.

Advertisement

Capability scoping

Each tool has explicit scope: 'read documents in project X', not 'read all documents'. Authorization at tool-call level, not session level. Default-deny.

Reversibility-aware approval

Reversible operations (search, read, draft): full autonomy. Irreversible (send email, charge card, delete file): human confirmation. Most agents conflate these and need rework.

Advertisement

Sandboxed execution

Code execution in containers or microVMs. No host filesystem access. No outbound network without explicit scope. Container resource limits. Kill switch the agent itself can't disable.

Scoped capabilities + reversibility-aware approval + sandboxed exec. The 'just be careful' approach is what produces incidents.