If you have users, this only works if you have managed to encode nearly every user observable behavior into your test suite.
I’ve never seen this done even with LLMs. Not even close. And even if you did it, the test suite is almost definitely more complex than the code and will suffer from all the same maintainability problems.
For one you don't let random devs hop on and off projects without code reviews, which is what people who say they don't care about the code should be doing.
And 2 clearly agents are worse at reasoning through code changes than humans are.
And the team lead with 7 developers isn’t going to be doing code reviews of all the code. At most he is going to be reviewing those critical paths.
I could care less about the implementation behind the vibe coded admin website that will only be used by a dozen people. I care about the authorization.
Even the ETL job, I cared only about the performance characteristics, the resulting correctness, concurrency, logging, and correctness of the results.
>And the team lead with 7 developers isn’t going to be doing code reviews of all the code. At most he is going to be reviewing those critical paths.
Why would the team lead need to review all 7 developers? If you're regularly swapping out every single developer on a team, you're gonna have problems.
>I could care less about the implementation behind the vibe coded admin website that will only be used by a dozen people. I care about the authorization.
If you only have 12 users sure do whatever you want. If you don't have users nothing is hard.
It was 12 users who monitored and managed the ETL job. If I had 1 million users what difference would the front end code have made if the backend architecture was secure, scalable, etc. if the login is taking 2 minutes. I can guarantee you it’s not because the developer failed to write SOLID code…
There you go arguing with strawmen again. I don’t give a single flying flip about SOLID, or Clean Code, or GoF. People who read Clean Code as their first programming but and made that their identity have been the bane of my existence as a programmer.
It’s not about how long something is taking although that is an observable behavior. It’s about how 1 million users over time will develop ways of using your product that you never thought about, much less documented or tested.
Perhaps you’ve heard the phrase “The purpose of the system is what it does”?
The system is the not the spec or the tests. An agent is only reasoning about how to add a new feature, and the only thing preventing from changing observable behavior is the tests. So if an agent is changing untested behavior it’s changing the purpose of the system.
Thats not exactly a great argument depending on undefined behavior. Should I as a developer depend on “undefined behavior” in C (yes undefined behavior is explicitly defined in C)?
On a user facing note, I did a project where I threw stats in DDB just for my own observation knowing very well that was the worse database to use since it does no aggregation type queries (sum, average, etc). I didn’t document it, I didn’t talk about it and yet the developer on their side used it when I specifically documented that he should subscribe to the SNS topic that I emit events to and ingest the data to his own Oracle database.
No library maintainer for instance of C# or Java library is going to promise that private functions that a developer got access to via reflection is not going to change.
I’m solely responsible for public documented interfaces and behaviors.
Oh and that gets back to an earlier point, how do I know that my systems will be able to be maintained? For the most part I design my systems to do a “thing” with clearly defined entry points and extension points and exit points and interfaces. In the case I’m referring to above - it was a search system that was based on “agents” some RAG based, one using a Postgres database with a similarity search, and an orchestrator. You extend the system by adding a new lambda and registering it and prioritizing results if the agent with my vibe coded GUI.
Apple is famous for instance for not caring if you tried to use private APIs and it broke in a new version.
This is a topic I happen to know a little about. You as a programmer should probably avoid UB for the most part, but the key point here is that programmers don’t follow this rule.
A while back a study found that SQLite, PostgreSQL, GCC, LLVM, Python, OpenSSL, Firefox all contained code that relied on unsigned overflow. Basically even though the C spec says it’s UB, almost every CPU you’ll run into uses twos compliment so it naturally wraps around.
When compiler authors tried to aggressively optimize and broke everything they had to roll that back and/or release flags to allow users to continue using the old behavior.
This kind of stuff happens all the time. The C spec is nearly worthless paper because what matters is what the compilers implement not what the spec tells them to implement. If you spend time talking to LLVM folks, breaking the world because they changed some unspecified behavior is one of their top concerns.
And this is programmers who know how to read specs.
Imagine you’re working on software used by nearly ever major movie studio. You think those users have ever read the spec for the software they are using? They don’t care about UB, they don’t even know the concept exists.
It doesn’t matter how well tested I think my software is. Even very simple software will have unspecified and untested behavior. You give the software a little time and some users and they will start exploiting that behavior. It I unleashed some agents on our code base to implement well architected features, without reviewing their output, and could somehow magically ensure that they didn’t break any workflow that we had documented, tested, or that was even known about to our organization, the head of NBCUniversal would be on the phone with my bosses bosses bosses boss demanding we change it back to the way it was within 24 hours.
Users depend on what the system does, not what you as a designer think it does. The purpose of a system is what it does. Not what it says it does.
We’ve been having this argument since the waterfall days. The code is the spec. We aren’t architects drawing blueprints. The code is the blueprint. If it was that easy to design systems like this all code would already be generated from UML graphs and flowcharts like we’ve been able to do for decades.
Back in my C days, I wrote C code that had to work on PCs that I had access to and mainframes that I never got a chance to test on on ancient compilers. Some little endian and some big endian. We had a custom make file that tried to warn against non portable behavior.
But are you really arguing that I shouldn’t feel free to change private methods because some developer somewhere might use reflection to access it or I shouldn’t change the schema of Sqllite database that is deeply embedded in library folder somewhere?
and be upset when weird things happen when I upgrade my compiler?
What do you think Apple would do in that situation? They have multiple times over the past 3 decades said tough noogies if you didn’t do things using the documented APIs.
Jeff Bezos mandated documented interfaces and the “API mandate” in 2002z
You can change whatever you want, but if you make an internal change without signifying that it’s a branding change and it breaks a significant number of your important users workflows, you’re gonna have a bad time.
But that’s mostly irrelevant because most software isn’t written to be used by developers who should know better than to rely on undocumented behavior.
As for Amazon, the API mandate gets violated all the time.
And it’s funny that you should mention them because they just started requiring a code review from a senior engineer for all merged after issues with vibe coding.
So you know that from working at Amazon - ie they aren’t micro service focused (yes I worked at AWS) or that they break it all of the time?
You keep saying you can’t break users workflows. But that doesn’t jibe with reality. In B2B, the user isn’t the customer. B2B businesses break users workflows all of the time. I know people complain about how often AWS changes the console UI all of the time and you hear the same gripes from users all of the time in consumer software. How many people cancel their SaaS contracts because of a change in UI if the features remain?
Photoshop users complain (or did when I followed it closely) all of the time when Adobe broke their automations via AppleScript. They kept buying it.
But the point is that you specifically said that you can’t treat a system as a system of black boxes with well defined interfaces. You damn sure better believe any implementation I started from scratch with a team I did in product companies. It’s the only way you can keep a system manageable with ramp up.
And this is also part of the subject of Stevey’s “Platfotm Rant”
It’s the reason you can’t fathom that you don’t have to worry about spooky action at a distance when you enforce modularity at the system level.
And even for customers, Apple has a long history of breaking backward compatible and while Microsoft worships at the alter of backward compatibility, major versions of Office have been breaking muscle memory UI for users since the 80s.
If an end users workflow is dependent on mucking with the backend database - more of an issue with desktop software - or an undocumented feature, it’s the same.
Developers have been doing that for years - changing the UI.
You seem to have had a very specific career that consisted mostly of building something new and moving on before you had any idea how it held up long term. I’ve heard enough to pretty confident that despite a 30 year career you don’t actually have much experience in anything other than greenfield projects. This explains the weird overconfidence you have in a methodology with absolutely no track record.
There’s a difference in breaking some user’s workflow ever no and again and doing it every time you add a feature or fix a bug.
1. You have claimed that Amazon doesn’t do micro services and don’t follow it even though you haven’t worked there (and I have) and I cited a famous letter from an ex-Google/ex-Amazon person who talked about the difference
2. I gave you plenty of well known B2B and B2C companies that “break user workflows” all of the time in new versions
3. I asked you should you go out of your way to not change undocumented behavior and gave you examples in both C (officially undefined behavior), and in managed languages like C# and Java.
Your concern about “breaking user workflows flows” because they relied on undocumented behavior is not shared by any major B2B or B2C company. Hell changing things up to break documented user workflows is not shared. The buyer “the business” is just going to tell the users to suck it up and get use to.
Again - I’ve got a proven track record of multiple companies hiring me including one trying to hire me back - well the acquirer of the startup wanting me back after I left before it got acquired - that’s existence proof that my architectural decisions stood the test of time over the almost four years after I left.
As someone who can talk just as well about the intricacies of C as well “how to create a sustainable development department”, do I really sound like I’m bullshitting?
I don’t trust the technical chops of anyone who has never stuck around long enough to see how their architecture changed and developed with use.
I’ve worked with plenty of expert beginners who sound exactly like you. In addition to the work history, your argument style screams overconfident bullshitter who reads the first line in an email and skips the rest.
You read me saying that companies routinely violate their technical guidelines and you skip reading and jump directly to the conclusion that I’m awfully that microservices don’t exist because that scores you a point in your mind and keeps you from having to think about possibly being wrong about something.
I’ve never seen this done even with LLMs. Not even close. And even if you did it, the test suite is almost definitely more complex than the code and will suffer from all the same maintainability problems.