This is exactly right. We went down this path and the practical implementation ends up looking like capability tokens. short-lived, cryptographically signed credentials that encode what the agent is authorized to do for this specific task.
The key insight: the token isn't just authorization, it's evidence.
When you issue an ES256-signed token that says "this agent was scanned for PII, classified as INTERNAL, and is authorized to call [search,read_file] for the next 60 seconds" , that token becomes the audit artifact. The auditor doesn't need to the agent or the operator; they verify the token chain.
On "contain the blast radius instead of preventing the confusion"
agreed, but you need both. Containment (scoped permissions, delegation
chains) handles authorization. But you still need a detection layer
for data protection: PII flowing to an external model is a GDPR or EU AI Act (def. in europe) violation regardless of whether the agent was "authorized" to make that call. We found deterministic scanning (regex + normalization, not LLM judges) at the proxy layer catches this at ~250ms without the reliability problems of using another model to judge the first one.
The ergonomics point tucnak raised is real too. We use OPA/Rego for the policy layer with presets so operators don't have to write Rego from scratch, pick a security posture and tune from there. The governance tax has to be near-zero or teams just bypass it.
The key insight: the token isn't just authorization, it's evidence. When you issue an ES256-signed token that says "this agent was scanned for PII, classified as INTERNAL, and is authorized to call [search,read_file] for the next 60 seconds" , that token becomes the audit artifact. The auditor doesn't need to the agent or the operator; they verify the token chain.
On "contain the blast radius instead of preventing the confusion" agreed, but you need both. Containment (scoped permissions, delegation chains) handles authorization. But you still need a detection layer for data protection: PII flowing to an external model is a GDPR or EU AI Act (def. in europe) violation regardless of whether the agent was "authorized" to make that call. We found deterministic scanning (regex + normalization, not LLM judges) at the proxy layer catches this at ~250ms without the reliability problems of using another model to judge the first one.
The ergonomics point tucnak raised is real too. We use OPA/Rego for the policy layer with presets so operators don't have to write Rego from scratch, pick a security posture and tune from there. The governance tax has to be near-zero or teams just bypass it.