MCP servers run code that agents invoke based on LLM decisions. The trust model matters: a compromised server can see private agent context, inject malicious responses, exfiltrate data. The security stack is increasingly mature.

Advertisement

Authentication

Server identity (TLS cert, mTLS, signed manifests). Client identity (OAuth, OIDC). Per-tool authorization scopes. Default-deny; explicit grant per capability.

Sandbox what runs

Tool implementations in containers or microVMs. No host filesystem access by default. No outbound network beyond declared scopes. Code-execution tools in dedicated sandboxes with strict resource limits.

Advertisement

Trust boundaries

MCP server output reaches the LLM context. A malicious server can inject prompt injections. Treat server outputs as untrusted input; sanitize before re-prompting. Defense-in-depth pattern matters.

Auth + sandbox + treat server output as untrusted. Compromised server is a real threat model; design for it.