The linked article is worth reading alongside this one. The thing I'd add from r...

The linked article is worth reading alongside this one.

The thing I'd add from running agents in actual production (not demos, but workflows executing unattended for weeks): the hard part isn't code volume or token cost. It's state continuity.

Agents hallucinate their own history. Past ~50-60 turns in a long-running loop, even with large context windows, they start underweighting earlier information and re-solving already-solved problems. File-based memory with explicit retrieval ends up being more reliable than in-context stuffing - less elegant but more predictable across longer runs.

Second hard part: failure isolation. If an agent workflow errors at step 7 of 12, you want to resume from step 6, not restart from zero. Most frameworks treat this as an afterthought. Checkpoint-and-resume with idempotent steps is dramatically more operationally stable.

Agree it's not just habits - the infrastructure mental model has to change too. You're not writing programs so much as engineering reliability scaffolding around code that gets regenerated anyway.