Yes, I follow the same sort of pattern, it took a while to convince myself that it was ok to leave the agent waiting, but it helps with the human context switching. I also try to stagger the agests, so one may be planning and designing, while another is coding, that way i can spend more time on the planning and designing ones and leave the coding one to get on with it.
That's actually one of the best parts. You can trust some of the context you have loaded is side loaded in the LLM, making task switching feel less risky and often improving your ability to work on needed and/or related changes elsewhere.
Yes, I mostly do spec driven developement. And at the design stage, I always add in tests. I repeat this pattern for any new features or bug fixes, get the agent to write a test (unit, intergration or playwright based), reproduce the issue and then implement the change and retest etc... and retest using all the other tests.
Its very important to understand the "how" it was done. The GPL hands the "compile" step, and the result is still GPL. The clean Room process uses 2 teams, separated by a specification. So you would have to
1. Generate specification on what the system does.
2. Pass to another "clean" system
3. Second clean system implements based just on the specification, without any information on the original.
That 3rd step is the hardest, especially for well known projects.
So what if a frontier model company trains two models, one including 50% of the world's open source project and the second model the other 50% (or ten models with 90-10)?
Then the model that is familiar with the code can write specs. The model that does not have knowledge of the project can implement them.
Would that be a proper clean room implementation?
Seems like a pretty evil, profitable product "rewrite any code base with an inconvenient license to your proprietary version, legally".
3. claude-code that converts this to tests in the target language, and implements the app that passes the tests.
3 is no longer hard - look at all the reimplementations from ccc, to rewrites popping up. They all have a well defined test suite as common theme. So much so that tldraw author raised a (joke) issue to remove tests from the project.
I use AWS Kiro, and its spec driven developement is exactly this, I find it really works well as it makes me slow down and think about what I want it to do.
I also try to avoid negative instructions. No scientific proof, just a feeling the same as you, "do not delete the tmp file" can lead too often to deleting the tmp file.
I recall that early LLMs had the problem of not understanding the word "not", which became especially evident and problematic when tasked with summarizing text because the summary would then sometimes directly contradict the original text.
It seems that that problem hasn't really been "fixed", it's just been paved over. But I guess that's the ugly truth most people tend to forget/deny about LLMs: you can't "fix" them because there's not a line of code you can point to that causes a "bug", you can only retrain them and hope the problem goes away. In LLMs, every bug is a "heisenbug" (or should that be "murphybug", as in Murphy's Law?).
Not the best way to do it, but I use xfce, multiple workspaces, each with there own version of AWS Kiro, and each kiro has its own project I am working on. This allows me to "switch context" easier between each project to check how the agents are getting on. Kiro also notifies me when an agent wants somthing. Usually I keep it to about 4 projects at a time, just to keep the context switching down.
I agree with this, I put myself in the "glorious hacks to bend the machine into doing things it was never really intended to do" camp, so the end game is somthing cool, now I can do 3 cool things before lunch instead of 3 cool things a year
But, almost by definition of how LLMs work, if it’s that easy then someone else did it before and the AI is just copying their work for you. This doesn’t fit well with my idea of glorious hacks to bend the machine, personally. I don’t know, maybe it just breaks my self-delusion that I am special and make unique things. At least I get to discover for myself what is possible and how, and hold a sliver of hope that I did something new. Maybe at least my journey there was unique, whereas everyone using an AI basically has the same journey and same destination (modulo random seed I guess.)
Essentially nothing we do as programmers is special or unique. Whatever we're doing, there's a 99.999% chance that somebody, somewhere did it first, just in a different context. The key point is, now we can avoid duplicating that person's effort. I don't see the downside.
Put another way: all of the code that needed to be written has now been written. Now we can move on to more interesting things.
What will really bake peoples' noodles is when it becomes apparent that the same is true for literature. I won't mind if I'm not around to witness that... but it will happen.
I am currently doing 6 projects at the same time, where before I would only of doing one at a time. This includes the requirements, design, implementation and testing.
I have written my own Home Assistant custom component for the UK fuel finder data, and yes, the data really is that bad.
reply