It feels like everyone is just collectively ignoring this.
LLMs are way less useful when you have to carefully review and approve every action it wants to take, and even that’s vulnerable to review exhaustion and human error. But giving LLMs unrestricted access to a bunch of stuff via MCP servers and praying nothing goes wrong is extremely dangerous.
All it takes is a tiny snippet from any source to poison the context and then an attacker has remote code execution AND can leverage the LLM itself to figure out how best to exfiltrate and cause the most damage. We are in a security nightmare and everyone is asleep. Claude Code isn’t even sandboxed by default for christ sakes, that’s the least it could do!
Right on. Human-in-the-loop doesn't scale at agent speed. Sandboxing constrains tool execution environments, but says nothing about which actions an agent is authorized to take. That gets even worse once agents start delegating to other agents.I've been building a capability-based authz solution: task-scoped permissions that can only narrow through delegation, cryptographically enforced, offline verification. MIT/Apache2.0, Rust Core.
https://github.com/tenuo-ai/tenuo
All it takes is a tiny snippet from any source to poison the context and then an attacker has remote code execution AND can leverage the LLM itself to figure out how best to exfiltrate and cause the most damage. We are in a security nightmare and everyone is asleep. Claude Code isn’t even sandboxed by default for christ sakes, that’s the least it could do!