>> Our Mars robots are awesome, but they take years to accomplish what astronauts could do in days.
What? The unmanned space program has been beyond the edges of our solar system. Meanwhile humans have been day tourists in space. I don't know how you can come to this conclusion that "humans > robots" when humans have never even been close to the surface of Mars.
>> Even just a tiny temporarily occupied Mars science outpost would be a tremendous boost to our understanding of the planet
How many robots could we land with the equivalent resources, or telescope satellites, or autonomous probes?
I spent about a week doing an "experiment" greenfield app. I saw 4 types of issues:
0. It runs way too fast and far ahead. You need to slow it down, force planning only and explicitly present a multi-step (i.e. numbered plan) and say "we'll do #1 first, then do the rest in future steps".
take-away:
This is likely solved with experience and changing how I work - or maybe caring less? The problem is the model can produce much faster than you can consume, but it runs down dead ends that destroy YOUR context. I think if you were running a bunch of autonomous agents this would be less noticeable, but impact 1-3 negatively and get very expensive.
1. lots of "just plain wrong" details. You catch this developing or testing because it doesn't work, or you know from experience it's wrong just by looking at it. Or you've already corrected it and need to point out the previous context.
take-away:
If you were vibe coding you'd solve all these eventually. Addressing #0 with "MORE AI" would probably help (i.e. AI to play/validate, etc).
2. Serious runtime issues that are not necessarily bugs. Examples: it made a lot of client-side API endpoints public that didn't even need to exist, or at least needed to be scoped to the current auth. It missed basic filtering and SQL clauses that constrained data. It hardcoded important data (but not necessarily secrets) like ports, etc. It made assumptions that worked fine in development but could be big issues in public.
take-away:
AI starts to build traps here. Vibe coders are in big trouble because everything works but that's not really the end goal. Problems could range from 3am downtime call-outs to getting your infrastructure owned or data breaches. More serious: experienced devs who go all-in on autonomous coding might be three months from their last manual code review and be in the same position as a vibe coder. You'd need a week or more to onboard and figure out what was going on, and fix it, which is probably too late.
3. It made (at least) one huge architectural mistake (this is a pretty simple project so I'm not sure there's space for more). I saw it coming but kept going in the spirit of my experiment.
take-away:
TBD. I'm going to try and use AI to refactor this, but it is non trivial. It could take as long as the initial app did to fix. If you followed the current pro-AI narrative you'd only notice it when your app started to intermittently fail - or you got you cloud provider's bill.
I'm a product manager, and a lot of the things I see people do wrong is because they don't have any product management experience. It takes quite a bit of work to develop a really good theory of what should be in your functional spec. Edge cases come up all the time in real software engineering, and often handling all those cases is spread across multiple engineers. A good product manager has a view of all of it, expects many of those issues from the agent, and plans for coaching it through them.
I'm an engineer and I totally agree. Engineers + LLMs exacerbate the timeless problem of not understanding the reality behind the problem. Validating solutions against reality is hard and LLMs just hallucinate their way around unknowns.
I think that's an incredibly reductionist and sarcastic take. I'm also in Product, but was an engineer for over a decade prior. I find that having strong structured functional specifications and a good holistic understanding of the solution you're trying to build goes a long way with AI tooling. Just like any software project, eliminating false starts and getting a clear set of requirements up front can minimize engineering time required to complete something, as long as things don't change in the middle. When your cycle time is an afternoon instead of two quarters, that type of up front investment pays off much better.
I still think AI tooling is lacking, but you can get significantly better results by structuring your plans appropriately.
I understand that you are serious. I am also serious here.
Have you built anything purely with LLM which is novel and is used by people who expect that their data is managed securely, and the application is well maintained so they can trust it?
I have been writing specifications, rfcs, adrs, conducting architecture reviews, code reviews and what not for quite a bit of time now. Also I’ve driven cross organisational product initiatives etc. I’m experimenting with openspec with my team now on a brownfield project and have some good results.
Having said all that I seriously doubt that if you treat the english language spec and your pm oversight as the sole QA pillars of a stochastic model transformer you are making a mistake.
The issue is that validation needs presence and it is the limiting factor - common knowledge, but is part of the “physics”. Also maintenance gets really tricky if the codebase has warts in it - which it will have. I get much more easy to understand architecture out of an LLM driven code generation process if I follow it and course correct / update the spec process based on learnings.
Example: yesterday I’ve introduced a batch job and realized during the implementation phase that some refactoring is needed so the error boundary can be reused in the batch application from the main backend. This was unplanned and definitely not a functional requirement - could be documented as non-functional. There was a gap between the agent’s knowledge and mine even though the error handling pattern is well documented in the repository. Of course this can be documented better next time if we update the process of openspec writing but having these gaps is inevitable unless formal and half-formal definitions are introduced - but still there needs to be someone with “fresh eyes” in the loop.
I think it's just sarcasm coming from the stereotypical HN attitude that Product Managers only get in the way of the real work of engineering. Ignore it; they're basically proving your point.
it's past the end stage, we are already in business. it's just something I am not an expert in, I have used in the past (by having real ops engineers build it for me) and now I have something that gives us insight into our production stack, alerts, etc, that isnt janky and covers my goals. So... yeah that is valuable and improves my business.
receptionist as a service has been a thing for like... forever. You are never going to solve the problem of accurately estimating and quoting with AI or an answering service, so pay for someone to answer the phone and take down the details; have a mechanic or trained service rep review and estimate. Cheap code that doesn't solve the problem is not cheap.
Yes, of course. The bot can request information and the customer can provide it if they feel like it, and then someone qualified can call them back when they have their hands free.
But there's no bot, per se, needed at all. An answering machine from 1993 can do this same information-gathering job. :)
So update the device from 1993's new-fangled digital answering machine to 2009's Google Voice, and have it do the transcription from voicemail to text.
Someone will still have to call Bill back about his Honda (which is actually the Kia he bought for his daughter -- Bill is not a very technical guy these days[1] and he confuses such concepts regularly) in order to get any trading of money for services done.
It doesn't take an LLM to get there, and Bill would probably prefer to avoid being frustrated by the bot's insistent nature.
Look, you‘re kicking an open door.
I think LLMs applied like this are just a layer of complexity that os mostly replacing lower level programming solutions that could do the same thing
The transcription + callback loop is honestly underrated.
Most of the value here is just capturing intent accurately
("Honda" vs "Kia" aside) so the mechanic can prioritize
callbacks. A dumb voicemail-to-text pipeline handles that
fine. The LLM layer adds complexity without solving the
actual bottleneck, which is someone qualified picking up
the phone.
But I'm not sure that a bot can be trusted to make good decisions about priority, either. So even if it makes good decisions based on context (which it can increasingly-often do, but does not always do), it lacks the context that is necessary to form the basis of good decisions.
Suppose a message comes into the box with this form: "This is Wendy, can you call me? My car is making that noise again."
The bot might deprioritize that call because it lacks actionable contextual information. "My job as a bot is to get more jobs into the shop. This call does not have enough data to do that, so I'll shove to the bottom of list of callbacks behind more-actionable jobs."
But the mechanic? The mechanic knows Wendy's Ford very well, and he also knows Wendy. She's a been a good customer for over a decade. The mechanic also knows the noise, and that Wendy has 3 little kids and that she's vacationing 900 miles away on a road trip with those kids in that Ford. The context is all there inside of the mechanic's brain to combine and mean that this might be the highest-priority call he gets all week.
Wendy may not have actively relayed any urgency in her message, but the urgency is real and she needs called back right away. She needs answers about what to do (keep driving and look into it when she gets back? pull over immediately and get a tow to a decent local shop? maybe she even needs help finding such a shop?) pretty much immediately. Not because it means more business today, but because it means more business for years.
The mechanic can spot this from a list of transcripts in an instant and give her a ring back Right Now. The bot is NFG at this.
The addition of the bot only adds noise to the process, and that noise only works to Wendy's detriment. When the bot adds detrimental noise to Wendy's situation, it also adds detriment to the shop's longevity.
The presence of the bot -- even as a prioritizing sorting mechanism -- asymptotically shifts the state from an excellent shop that knows their customers very well to a bot-driven customer-averse hellscape.
(And no, the answer isn't to make the bot into an all-knowing oracle that actively gets fed all context. The documentation burden would be more expensive, time-wise (and thus money-wise) than hiring a competent human receptionist who answers the phone, handles the front door traffic, and absorbs context from their surroundings. A person who chatted with Wendy last Thursday right before she left for her trip is always going to be superior to a bot.)
ah yes, those fat cat ranchers might have to get off their golden thrones and do some hard work for a change. You should maybe look into the business as both a rancher and the food supply chain. A big benefit is that ranchers are far better partners and stewards of the land than developers and other industries (like oil and gas).
If you think ranching hasn't changed in 2000 years you know nothing about it. First, what we see in Canada & the US is most similar to Spanish open grazing of ~200 years ago, not some sort of neolithic practice from several thousand years ago. Then the obvious and game changer was barbed wire, and now intensive industrialization such as feed lots, genetic selection & artificial insemination, GPS tracking and data-based herd management. Public grazing is such a minor part of the picture now. The technology you call for is IMO the worst development: factory meat and massive consolidation.
doesn't look like much; the seem to use electron for almost everything in this space. If they had faith in Maui something (VS Code, Teams, Outlook, ... calculator?) would use it.
and even worse, in Edge!
reply