Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been thinking about AI systems acting in the physical world.

Most discussions about control focus on what the system should do, and how to make execution reliable.

But it seems like a lot of real-world failures aren't about incorrect execution.

They're about execution happening at all.

An action can be technically correct — executed exactly as specified — and still be the wrong thing to do because the context has changed.

This made me wonder if control should be framed differently.

Instead of focusing on defining actions, maybe we should focus on defining when actions are allowed to happen.

In other words, control might be less about execution and more about permission.

If conditions aren't satisfied, the system shouldn't try and fail — it simply shouldn't execute.

I'm curious if people have seen similar issues in real-world systems, or if this framing connects to existing work.



Reminds me of a talk I went to in 2018 about rebel agents, in which the speakers talked about some ongoing work in this area and gave some good examples of physical systems that we might _want_ agent rebellion (e.g., a delivery drone is instructed to take a certain route, but the operator instructing it may not be fully aware of the situation or the specific obstacles in the drone's way (or maybe even all of the drone's underlying goals). The drone may then choose to 'rebel' and deviate against the operator's instructed flight path).

They also talked about the importance of explanation (on the agent's part) using theory of mind regarding why it rebelled. I took some notes at the time and put them here: https://liza.io/ijcai-session-notes-rebel-agents/


That's really interesting — thanks for sharing the notes.

The "rebel agent" framing feels very close to what I'm trying to get at, especially the idea that refusal can be part of correct behavior rather than failure.

One difference I'm trying to think through is where that decision lives.

In a lot of these examples, the agent itself decides to deviate based on its understanding of the situation.

What I'm wondering is whether we can (or should) define that earlier — at the level of the action itself.

So instead of the agent deciding to "rebel" at runtime, the system would already encode when execution is permitted, and refusal becomes the default if conditions aren't met.

The explanation part you mentioned also seems important — not just saying "no", but making it legible why execution wasn't allowed.

Curious how much of that work treats rebellion as something emergent from the agent, vs something structurally defined in the system.


Not in the real world, but this is kind of how Asimov’s robots interpret their 3 laws - it’s about consequences much more than what the order is. Also, they weight consequences of inaction as well and might be driven to action when not acting could cause a violation.

Our AI is nowhere near the level of sophistication required to implement something like that, but it’s still an interesting idea.


That's a great connection.

You're right that current systems aren't close to that level of reasoning.

What I'm wondering is whether we can approximate some of it structurally — by defining when execution is allowed or not — even without that level of sophistication in the model itself.

Curious how far you think simple constraint systems can go before something like that kind of reasoning becomes necessary.


I’d say that kind of reasoning was needed since we invented the chatbot.

At least since we stopped using rules and embraced neural networks.


The existing work is all of software dev. The program did what it was told to do, not what people wanted it to do, is rather a lot of the profession.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: