I've been thinking about AI systems acting in the physical world. Most discussio...

drakonka · 2026-04-02T08:56:13 1775120173

Reminds me of a talk I went to in 2018 about rebel agents, in which the speakers talked about some ongoing work in this area and gave some good examples of physical systems that we might _want_ agent rebellion (e.g., a delivery drone is instructed to take a certain route, but the operator instructing it may not be fully aware of the situation or the specific obstacles in the drone's way (or maybe even all of the drone's underlying goals). The drone may then choose to 'rebel' and deviate against the operator's instructed flight path).

They also talked about the importance of explanation (on the agent's part) using theory of mind regarding why it rebelled. I took some notes at the time and put them here: https://liza.io/ijcai-session-notes-rebel-agents/

Jang-woo · 2026-04-02T09:05:36 1775120736

That's really interesting — thanks for sharing the notes.

The "rebel agent" framing feels very close to what I'm trying to get at, especially the idea that refusal can be part of correct behavior rather than failure.

One difference I'm trying to think through is where that decision lives.

In a lot of these examples, the agent itself decides to deviate based on its understanding of the situation.

What I'm wondering is whether we can (or should) define that earlier — at the level of the action itself.

So instead of the agent deciding to "rebel" at runtime, the system would already encode when execution is permitted, and refusal becomes the default if conditions aren't met.

The explanation part you mentioned also seems important — not just saying "no", but making it legible why execution wasn't allowed.

Curious how much of that work treats rebellion as something emergent from the agent, vs something structurally defined in the system.

rbanffy · 2026-04-02T10:52:23 1775127143

Not in the real world, but this is kind of how Asimov’s robots interpret their 3 laws - it’s about consequences much more than what the order is. Also, they weight consequences of inaction as well and might be driven to action when not acting could cause a violation.

Our AI is nowhere near the level of sophistication required to implement something like that, but it’s still an interesting idea.

Jang-woo · 2026-04-02T20:39:50 1775162390

That's a great connection.

You're right that current systems aren't close to that level of reasoning.

What I'm wondering is whether we can approximate some of it structurally — by defining when execution is allowed or not — even without that level of sophistication in the model itself.

Curious how far you think simple constraint systems can go before something like that kind of reasoning becomes necessary.

rbanffy · 2026-04-03T13:27:07 1775222827

I’d say that kind of reasoning was needed since we invented the chatbot.

At least since we stopped using rules and embraced neural networks.

JonChesterfield · 2026-04-02T08:55:30 1775120130

The existing work is all of software dev. The program did what it was told to do, not what people wanted it to do, is rather a lot of the profession.