Niko901ch's comments

Niko901ch · 2026-03-04T13:16:45 1772630205

AI coding tools are making this problem worse in a subtle way. When an agent can generate a "scalable event-driven architecture" in 5 minutes, the build cost of complexity drops to near zero. But the maintenance cost doesn't.

So now you get Engineer B's output even faster, with even more impressive-sounding abstractions, and the promotion packet writes itself in minutes too. Meanwhile the actual cost - debugging, onboarding, incident response at 3am - stays exactly the same or gets worse, because now nobody fully understands what was generated.

The real test for simplicity has always been: can the next person who touches this code understand it without asking you? AI-generated complexity fails that test spectacularly.

slfnflctd · 2026-03-04T14:46:17 1772635577

> now nobody fully understands what was generated

To be fair, a lot of the on call people being pulled in at 3am before LLMs existed didn't understand the systems they were supporting very well, either. This will definitely make it worse, though.

I think part of charting a safe career path now involves evaluating how strong any given org's culture of understanding the code and stack is. I definitely do not ever want to be in a position again where no one in the whole place knows how something works while the higher-ups are having a meltdown because something critical broke.

gondo · 2026-03-04T17:30:48 1772645448

People on call will use AI as well. As long as the first AI left enough documentation and implemented traceability, the diagnosing AI should have an easier time proposing a fix. Ideally, AI would prepare the PR or rollback plan. In a utopia, AI would execute it and recover the system until a human wakes up.

Or at least there is something to chat with about the issue at 3am.

bigfishrunning · 2026-03-04T19:34:31 1772652871

> In a utopia, AI would execute it and recover the system until a human wakes up.

in that utopia, the on-call guy doesn't have a job

bandrami · 2026-03-04T15:31:38 1772638298

This. I've been a sysadmin for a quarter of a century and have professionally written next to no software. I've debugged every system I've had to support at some point though. It's a very different skill set.

dudeinhawaii · 2026-03-04T15:49:16 1772639356

True, but I think the implication (as I read it) is that AI may be providing more complex solutions than were needed for the problem and perhaps more complex than a human engineer would have provided.

muyuu · 2026-03-04T17:29:32 1772645372

it's MUCH worse now, not just because of the massive amount of code generated with zero supervision or very little supervision, but also because of the speed at which the systems grow in function

mattcollins · 2026-03-04T13:44:15 1772631855

On the other hand, AI coding tools make it relatively easy to set and apply policies that can help with this sort of thing.

I like to have something like the following in AGENTS.md:

## Guiding Principles - Optimise for long-term maintainability - KISS - YAGNI

shafyy · 2026-03-04T13:52:51 1772632371

Not sure if you're kidding or not, but to write great maintable code, you need a lot of understanding that a LLM just doesn't have, like history, business context, company culture etc. Also, I doubt that in it's training data it has a lot of good examples of great maintainable code to pull from.

yudhiyudhi · 2026-03-04T13:59:59 1772632799

Neither do most humans writing such code, i have seen llms generate better code than 90% of coders I have seen in the last 20 years

forgetfreeman · 2026-03-04T14:17:04 1772633824

Admitting you've spent two decades on a career stuck working in the kind of sweatshops that hire people who can't actually code isn't much of a flex, and certainly doesn't lend a whole lot of credence to your argument.

zelphirkalt · 2026-03-05T10:23:49 1772706229

Not the GP, but when I take strolls through some open source project hosted on GitHub, usually I am not impressed either. Unnecessary OOPism, way too long procedures, that instead could be pure functions, badly named variables, and way, way, waaay too many willy-nilly added dependencies. If that is what the LLMs mostly learn from, I am not surprised at all. But then again this stuff was also written by humans. I remember one especially bad case of a procedure in a very popular project (in its niche) that was basically a 1 man show. A procedure of 300+ lines doing all kinds of stuff, changing the global state of that service it is implementing. But that code was or is relied upon by tech giants and other businesses and no one improves it. They are happy with paying that one guy probably not so much money.

shafyy · 2026-03-04T15:34:03 1772638443

Please let's stop with the "but some humans also suck at this so it's ok if LLMs also suck at it" argument. It doesn't add anything to the discussion.

Thanemate · 2026-03-04T20:15:08 1772655308

Awesome! However, the corporate is excited with using AI, making the coder the one who's at risk at getting fired for writing the exact same lousy (for the sake of the argument) code.

Or worse: for not relying as much as possible to the AI who apparently can write just as bad code but faster!

A subtle detail: you speak of coders, not software engineers. A SWE's value is not his code churning speed.

shimman · 2026-03-04T14:53:55 1772636035

This says more about you and the people you work with. I find engineers that have been at the company for a while are quite invaluable when it comes to this information, it's not just knowing the how but the when + why that's critical as well.

Acting like people can't be good at their job is frankly dehumanizing and says a lot about your mindset with how you view other fellow devs.

zelphirkalt · 2026-03-05T10:29:32 1772706572

If only more engineers admitted, that something they wrote is not good code, but a product of its time, then I think we would get more realistic expectations.

It's OK to say that something you made is shit. It is OK to say that you were not given time to do xyz.

How you recognize something has been made fitting at least is, when you see it in use without much of a change for some 3 or 4 years and while you are the person maintaining it, you rarely ever need to touch it, because you built it in a way that is simple enough to not have tons of bugs yet flexible enough, to cover use-cases and anticipated use-cases.

nhayfield · 2026-03-04T13:58:28 1772632708

He isn't kidding. I have a directive to write the shortest, least complicated, readable business code and it makes a huge difference

kurthr · 2026-03-04T14:28:28 1772634508

Sometimes, as in the bilsbi's top level comment, the solution is to use a free tool/library/product that already exists. The solution is not always to write new code, but the agent will happily do it.

Maybe that's "the manager's job", but that's just passing the buck and getting a worse solution. Every level of management should be looking for the best solution.

whattheheckheck · 2026-03-04T14:09:16 1772633356

"Be sure to remember software is a sociotechnical system and dont fall prey to the Mechanistic myth"

Niko901ch · 2026-03-11T20:22:48 1773260568

This is exactly right. I maintain an AGENTS.md for my own AI assistant with similar principles - "禁止只记录不行动" (no recording without action) and strict rules about when to escalate vs. when to solve autonomously.

The key insight is that the AGENTS.md becomes a kind of "engineering culture in a file". When you onboard a human engineer, you hope they absorb the team's values over time. With AI, you can encode those values upfront.

The challenge is that principles need to be specific enough to be actionable. "Write simple code" is too vague. "Avoid single-use wrapper functions" (from the sibling comment) is better - it's enforceable.

musesum · 2026-03-04T15:11:15 1772637075

I wrote something similar in a Claude Code instructions.md: "minimize cyclomatic complexity" What happened next? It generated an 8 line wrapper function called only once from a different file. So, I told it to inline that logic in the caller. The result? One. Line. Of. Code.

So, I asked it to modify its instructions.md file to not repeat that mistake. The result was the new line "Avoid single-use wrapper functions; inline logic at the call site unless reused"

instruction.md is the new intern.

Cthulhu_ · 2026-03-04T16:27:27 1772641647

It reminds me a lot of people who take Code Complete too seriously. "Common sense" is not an objective or universal statement unfortunately - plus, speaking for myself, what I consider "common sense" can change on the daily, which is why I can't be trusted adding features to my own codebase long term <_<.

musesum · 2026-03-04T15:14:04 1772637244

I wrote something similar in a Claude Code instructions.md: "minimize cyclomatic complexity" What happened next? It generated an 8 line wrapper function called only once from a different file. So, I told it to inline that logic in the caller. The result? One. Line. Of. Code.

So, I asked it to modify its instructions.md file to not repeat that mistake. The result was the new line "Avoid single-use wrapper functions; inline logic at the call site unless reused"

instructions.md is the new intern.

reedlaw · 2026-03-04T15:49:22 1772639362

Maybe a better way to handle "minimize cyclomatic complexity" would be to set an agent in a loop of code metrics, refactor, test and repeat.

musesum · 2026-03-04T16:04:05 1772640245

Good idea. Am still a bit shy around token budget spend.

Cthulhu_ · 2026-03-04T16:45:30 1772642730

I think this is at the moment the practical limitation to using AI for everything (and what the coding agents themselves also optimize for to some degree, or it's the slider they can play with for price vs quality, the "thinking" models being the exact same, but just burning more tokens).

musesum · 2026-03-04T16:59:02 1772643542

Am waiting for the next Mac Studio to come out to experiment with the "AI for everything" approach. Most likely, the open source distilled models will lower quality. So, another "price vs quality" tradeoff. Still, will be fun to code like I'm at a foundation lab.

reedlaw · 2026-03-04T16:53:17 1772643197

This seems like a perfect use case for a local model. But I've found in practice that the system requirements for agents are much higher than for models that can handle simple refactoring tasks. Once tool use context is factored in, there is very little room for models that perform decently.

musesum · 2026-03-04T17:07:09 1772644029

What I hope to do with refactoring is to distill namespace and common patterns into a DSL. I am very curious about what tradeoffs you found.

reedlaw · 2026-03-04T18:09:20 1772647760

Whatever agent I tried would include thousands of tokens in tool-use instruction. That would use up most available context unless running very low-spec models. I've concluded it's best to use the big 3 for most tasks and qwen on runpod for more private data.

dude250711 · 2026-03-04T13:29:38 1772630978

It's a bad time to be an altruistic perfectionist, tell you what.

Avoid hands-on tech/team lead positions like hell.

skydhash · 2026-03-04T13:41:04 1772631664

It’s not even about perfectionism. Code’s value is about processing data. Bad code do it wrongly and if you have strange code on top of that, you cannot correct the course. Happy path are usually the low hanging fruits. What makes developing software hard is thinking about all the failure and edge cases.

cottsak · 2026-03-04T13:47:19 1772632039

that second line is so underrated

whattheheckheck · 2026-03-04T15:57:16 1772639836

Thats the kind of thinking that got us into this mess... scab

__MatrixMan__ · 2026-03-04T15:01:36 1772636496

I think we'll see a decline of software as a product for this reason. If your job is to solve a problem, and you use AI to generate a tool that solves that problem, or you use money to buy a tool that solves that problem, well then it's still your job to solve that problem regardless of which tool you use.

But given how poorly bought software tends to fit the use case of the person it was bought for... eventually generate-something-custom will start making more and more sense.

If you end up generating something that nobody understands, then when you quit and get a new job, somebody else will probably use your project as context for generating something that suits the way they want to solve that problem. Time will have passed, so the needs will have changed, they'll end up with something different. They'll also only partially understand it, but the gaps will be in different places this time around. Overall I think it'll be an improvement because there will be less distance (both in time and along the social graph) between the software's user its creator--them being most of the time the same person.

BloondAndDoom · 2026-03-04T15:26:36 1772637996

This is something I keep thinking while coding with AI, and same with introducing library dependencies for the simplest problems. It’s not whether how quickly I can get there but more about how can I keep it simple to maintain not only for myself but for the next AI agent.

Biggest problem is that next person is me 6 months later :) but even when it’s not a next person problem how much of the design I can just keep in my mind at a given time, ironically AI has the exact same problem aka context window

fluidcruft · 2026-03-04T15:50:04 1772639404

Well in 6 months it you and a 6 months smarter LLM.

mrweasel · 2026-03-04T15:57:52 1772639872

There's also the operational cost of running whatever is churned out. I wouldn't exactly blame that on AIs, but a large contingency of developers optimize for popular tech-stacks and not ease of operations. I don't think that will change just because they start using AI. In my experience the AI won't tell you that you're massively overbuilding something or that if we did this in C and used Postgresql we'd be able to run this on an old Pentium III with 4GB of RAM. If you want Kubernetes and ElasticSearch, you'll get exactly that.

amelius · 2026-03-04T13:40:22 1772631622

Simplicity is a driver for better abstractions. But now with AI, will we even develop new abstractions?

MarcelOlsz · 2026-03-04T13:47:25 1772632045

I was in charge of cleaning up a slop codebase by someone who has barely even heard of 'coding' before. Let's just say, it was abstract.

godelski · 2026-03-04T18:03:25 1772647405

There's different kinds of abstraction. There's abstraction like Jackson Pollock and there's abstraction like what Dijkstra as suggesting: elegance. Which personally made the article a very weird read to me

Cthulhu_ · 2026-03-04T16:46:24 1772642784

tbh codebases like that predate AI code generators. I had one job where my predecessor was not a very good developer by modern standards, but he was productive... a dangerous combination.

MarcelOlsz · 2026-03-04T17:44:53 1772646293

I also kind of respect it, it bothers me endlessly when everything isn't perfect and this guy just threw caution to the wind. Jokes on me as I'm working for him now. But it's not like anything that predates AI, I couldn't write this type of slop if I tried lol. Zero formatting, linting, or anything. Just straight goulash.

andix · 2026-03-04T17:38:23 1772645903

> When an agent can generate a "scalable event-driven architecture" in 5 minutes

Currently they can't. Anyone with a basic understand of sw engineering will find numerous issues with the result of such a prompt within minutes.

Schiendelman · 2026-03-04T17:51:15 1772646675

But the product manager who asked for it won't realize that.

Unless you hire good TPMs!

Cthulhu_ · 2026-03-04T16:25:46 1772641546

AI generators only generate that if you tell them to though - as a developer (especially senior) it's your job to know what you want and tell the AI coding tools that.

johnfn · 2026-03-04T19:49:01 1772653741

Was this comment written by an LLM? It has a lot of the tell-tale signals, and pangram gives it a 100% chance of being written by AI.

brightball · 2026-03-04T13:40:02 1772631602

The flip side of this is that languages who have a major selling point of maintainability just had their value increase dramatically.

kaleidawave · 2026-03-04T15:02:54 1772636574

The "coding benchmarks" should be based heavily biased on what dependencies they use etc. Reduced characters etc.

jasondigitized · 2026-03-04T15:02:55 1772636575

What is the maintenance cost when Opus 6 or whatever is available?

an0malous · 2026-03-04T15:44:58 1772639098

Agree but I wouldn't say it's subtle, the slop builds up quickly

Niko901ch · 2026-02-24T03:57:50 1771905470

The interesting thing about the 71.5% human baseline is that it suggests the question is more ambiguous than the article claims. When someone asks 'should I walk or drive to the car wash,' a reasonable interpretation is 'should I bother driving such a short distance.' Nearly 30% of humans missing it undermines the framing as a pure reasoning failure - it is partly a pragmatics problem about how we interpret underspecified questions.

bscphil · 2026-02-24T07:49:29 1771919369

I don't think this is quite right. It's not that the question is inherently underspecified, it's that the context of being asked a question is itself information that we use to help answer the question. If someone asks "should I walk or drive" to do X, we assume that this is a question that a real human being would have about an actual situation, so even if all available information provided indicates that driving is the only reasonable answer, this only further confirms the hearer's mental model that something unexpected must hold.

I think it's useful to think about it through the lens of Gricean pragmatic semantics. [1] When we interpret something that someone says to us, we assume they're being cooperative conversation partners; their statements (or questions) are assumed to follow the maxim of manner and the maxim of relation for example, and this shapes how we as listeners interpret the question. So for example, we wouldn't normally expect someone to ask a question that is obviously moot given their actual needs.

So it's not that the question is really all that ambiguous, it's that we're forced (under normal circumstances where we assume the cooperative principle holds) to assume that the question is sincere and that there must be some plausible reason for walking. We only really escape that by realizing that the question is a trick question or a test of some kind. LLMs are generally not trained to make the assumption, but ~70% of humans would, which isn't particularly surprising I don't think.

[1] https://en.wikipedia.org/wiki/Cooperative_principle#Grice's_...

grumbelbart2 · 2026-02-24T11:48:23 1771933703

We could probably test this. I wonder if the results shift if the question is prefaced with something like "Here is a trick question: ...".

justin_dash · 2026-02-24T14:02:07 1771941727

I tested both Sonnet and Haiku from Claude, which got it right 0/10 times in their original test, and they both passed. Here's the Haiku output:

"You should *drive*!

The trick is that you need to take your car to the car wash to get it washed. If you walked, your car would still be at home, unclean. So while 50 meters is a short distance that you could walk under normal circumstances, in this case you have to drive because your car is what needs to be washed."

addandsubtract · 2026-02-24T15:41:49 1771947709

Mentioning the trick makes the question trivial, though. I think a better pretext would be, "My dirty car is parked in the driveway." That removes the ambiguity that the car could already be at the car wash, and that it needs to be driven there.

xtiansimon · 2026-02-24T13:12:53 1771938773

> “…we assume the cooperative principle holds […] that the question is sincere and that there must be some plausible reason for walking.”

Yes. And. Some problems have irrelevant data which should be ignored.

The walk choice is irrelevant in the context. It needs to be simplified, as with a maths problem. That has nothing to do with human nature, but rather prior mistake in reasoning.

roysting · 2026-02-24T11:06:15 1771931175

You are only touching on a far bigger and deeper issue around this seemingly “simple prompt”. There is an inherent malicious nature also baked into this prompt that is both telling and very human; a spiteful nature, which usually says more about the humans than anything else.

Your perspective on the meta-question about why such a question would need to be asked in the first place is just the first layer, and most people seem to not even get to that point.

PS: I for one would just like to quickly note for posterity that I do not participate in or am supportive of malicious deception, manipulation, and abuse of AI.

mablopoule · 2026-02-24T11:40:22 1771933222

I would agree, but the question feels less spiteful than playful in nature.

valdork59 · 2026-02-24T11:14:07 1771931647

in case skynet is reading this, I'm a good boy

ianmcgowan · 2026-02-25T15:57:07 1772035027

Or Roko's Basilisk

akersten · 2026-02-24T04:00:16 1771905616

It tracks with the approximate 70:30 split we inexplicably observe in other seemingly unrelated population-wide metrics, which I suppose makes sense if 30% of people simply lack the ability to reason. That seems more correct than me than "the question is framed poorly" - I've seen far more poorly framed ballot referendums.

seemaze · 2026-02-24T05:20:13 1771910413

> 30% of people simply lack the ability to reason

While I’m sure it’s more than 0%, seems more likely that somewhere between 0% and 30% don’t feel obligated to give the inquiry anything more than the most cursory glance.

How do incentives align differently with LLMs?

dspillett · 2026-02-24T10:30:04 1771929004

> which I suppose makes sense if 30% of people simply lack the ability to reason

I think it would be better to say that 30% of people either lack the ability to reason (inarguably true in a few cases, though I'd suggest, and hope, an order of magnitude or two less than 30%, as that would be a life-altering mental impairment) or just can't generally be bothered to, or just didn't (because they couldn't be bothered, or because they felt some social pressure to answer quickly rather than taking more than an instant time to think) at the time of being asked this particular question.

An automated system like an LLM to not have this problem. It has no path to turn off or bypass any function that it has, so if it could reason it would.

rerdavies · 2026-02-24T10:40:18 1771929618

This is something I have wondered about before: whether AIs are more likely to give wrong answers when you ask a stupid question instead of a sensible one. Speaking personally, I often cannot resist the temptation to give reductio-ad-absurdum answers to particularly ridiculous questions.

If 30% of humans on the internet can't be bothered to make an effort to answer stupid questions correctly, then one would expect AIs to replicate this behaviour. And if humans on the internet sometimes provide sarcastic answers when presented with ridiculous questions, one would expect AIs to replicate this behavior as well.

So you really cannot say they have no incentive to do so. The incentive they have is that they get rewarded for replicating human behaviour.

CobrastanJorji · 2026-02-24T05:29:54 1771910994

I don't think 30% of people can't reason. I think 30% of people will fail fairly simple trick questions on any given attempt. That's not at all the same thing.

Some people love riddles and will really concentrate on them and chew them over. Some people are quickly burning through questions and just won't bother thinking it through. "Gotta go to a place, but it's 50 feet away? Walk. Next question, please." Those same people, if they encountered this problem in real life, or if you told them the correct answer was worth a million bucks, would almost certainly get the answer right.

rmunn · 2026-02-24T05:47:08 1771912028

This. The following question is likely to fool a lot of people, too. "I have a rooster named Pat. (Lots of other details so you're likely to forget Pat is a rooster, not a hen). Pat flies to the top of the roof and lays an egg right on the ridge of the roof. Which way will the egg roll?"

But if you omit the details designed to confuse people, they're far less likely to get it wrong: "I have a rooster named Pat. Pat flies to the top of the roof and lays an egg right on the ridge of the roof. Which way will the egg roll?"

It's not about reasoning ability, it's about whether they were paying close attention to your question, or whether their minds were occupied by other concerns and didn't pay attention.

krisoft · 2026-02-24T08:26:34 1771921594

What does “get it wrong” mean for you with this question? Or what is “getting it right” here? If i hear that Pat is a rooster and i understand and retain that information I will look at you like you are dumb for saying such an impossible story. If i don’t i will look at you like you are dumb because how is anyone supposed to know which way will an egg laid on a ridge roll. How are you supposed to even score this?

rjmunro · 2026-02-24T11:12:31 1771931551

My interpretation is that Pat is a rooster and he has laid an egg. That's in the question. A normal rooster can't normally lay an egg, but so what, that's completely irrelevant. Maybe Pat is not a normal rooster. Maybe by "lay" an egg, the question meant "put it down carefully". Maybe it's just that the questioner's English is poor and when they said rooster they meant hen.

sjamaan · 2026-02-24T13:00:04 1771938004

Exactly this. The question states it as a fact, so why would you go back and point out the inconsistency?

rmunn · 2026-02-24T09:31:46 1771925506

"Getting it right" for this particular trick question means saying "Hey, roosters can't lay eggs". If someone tries to figure out which way the egg will roll then they've missed the trick. In most cases the person's response will tell you whether they caught the trick or not, though in the case of someone who just looks at you like you're dumb and doesn't say anything I will grant that you wouldn't be able to tell until they said something. But their first verbal response would probably reveal whether they saw through the trick question or not.

saberience · 2026-02-24T14:05:54 1771941954

For me, I would interpret this as being that actually Pat is a hen and the original premise was mistaken. I.e. Pat is not a rooster.

CPLX · 2026-02-24T11:46:19 1771933579

This question is fundamentally different.

The original question used in this example does not contain a logical impossibility. This one does.

fasbiner · 2026-02-25T13:19:17 1772025557

Very problematic to think that something's reproductive attributes have to correspond to what gendered noun we call it by.

rmunn · 2026-02-27T03:22:28 1772162548

Tell me you've never done any farming in your life without telling me you've never done any farming in your life. The difference between male and female animals matters, a lot, to farmers (or ranchers). There's a reason the English language has the words cow and bull, sow and boar, ewe and ram, rooster and hen, nanny and billy, mare and stallion, and many more (and has had those words for centuries). And that reason is precisely because of how mammal (and avian) reproduction works. A cow can't do a bull's job, nor vice-versa, if you want to have calves next year, and grow the size of your herd (or sell the extra animals for income). And so, centuries ago, English-speaking farmers who didn't want to spend the extra syllables on words like "male cattle" and "female cattle" came up with handy, short words (one-syllable words for most species, though not goats and horses) to express those distinctions. Because as I mentioned, they matter a lot when you're raising animals.

fasbiner · 2026-03-01T19:34:22 1772393662

Some roosters lay eggs.

You might believe there is intrinsic sexual dimorphism among mammals and birds. You might even have overwhelming experimental and scientific evidence that proves it. But ask yourself: is it worth losing your job over?

Some roosters lay eggs.

Normal_gaussian · 2026-02-24T13:51:30 1771941090

When you are doing workshops, particularly teaching something that people are "sitting through" rather than engaging with, you see very similar ratios on end of segment assessment multiple choice questions. I mentioned elsewhere that this is the same kind of ratio you see on cookie dialogs (in either direction).

Think basic security (password management, email phishing), H&S etc. I've ran a few of these and as soon as people hear they don't have to get it right a good portion of people just click through (to get to what matters). Nearly 10 years ago I had to make one of my security for engineers tests fail-able with penalty because the front-end team were treating it like it didn't matter - immediately their results effectively matched the backend team, who viewed it as more important.

I talked to an actor a few days ago, who told me he files his self-assessment on the principle "If I don't immediately know the answer, just say no and move on". I talked to a small company director about a year ago whose risk assessments were "copy+paste a previous job and change the last one".

Anyone who has analysed a help desk will know that its common for a good 30+% of tickets to be benign 'didn't reason' tickets.

I think the take-away is that many people bother to reason about their own lives, not some third parties' bullshit questions.

lich_king · 2026-02-24T06:47:01 1771915621

Is this your experience? Do you think 30% of your friends or family members can't answer this question? If not, do you think your friends or family are all better than the general population?

I'd look for explanations elsewhere. This was an online survey done by a company that doesn't specialize in surveys. The results likely include plenty of people who were just messing around, cases of simple miscommunication (e.g., asking a person who doesn't speak English well), misclicks, or not even reaching a human in the first place (no shortage of bots out there).

If you're interested in the user experience, it's this: https://www.reddit.com/r/MySingingMonsters/comments/1dxug04/... - apparently, some annoying ad-like interstitial that many people probably just click through at random.

dsego · 2026-02-24T07:01:09 1771916469

People often trip up on similar questions, anything to do with simple math. You know when they go out in the street and ask random people if 5 machines can produce 5 parts in 5 minutes, how long will it take for 100 machines.

denzil · 2026-02-24T07:24:44 1771917884

Unlike the car question, where you can assume the car is at home and so the most probable answer is to drive, with the machines it gets complicated. Since the question doesn't specify if each machine makes one part or if they depend on each other (which is pretty common for parts production). If they are in series and the time to first part is different than time to produce 5 parts, the answer for 100 machines would be the time to produce the first part. Where if each machine is independent and takes 5 minutes to produce single part, the time would be 5 minutes.

Drupon · 2026-02-24T07:36:56 1771918616

You passed the intelligence check and failed the wisdom one.

The key technique in the mathematical method to answer the machine question is "theory of mind".

krisoft · 2026-02-24T08:16:44 1771921004

Theory of mind won’t help you answering this question. It is obviously an underspecified question (at least in any contexts where you are not actively designing/thinking about some specific industrial process). As such theory of mind indicates that the person asking you is either not aware that they are asking an underspecified question, or are out to get you with a trick. In the first case it is better to ask clarifying question. In the second case your choosen answer depend on your temperament. You can play along with them, or answer an intentionally ridiculous answer, or just kick them in the shin to stop them messing with you.

There is nothing “mathematical” about any of this though.

Drupon · 2026-02-25T00:53:47 1771980827

>As such theory of mind indicates that the person asking you is either not aware that they are asking an underspecified question, or are out to get you with a trick.

Context would be key here. If this were a question on a grade school word problem test then just say 100, as it is as specified as it needs to be. If it's a Facebook post that says "We asked 1000 people this and only 1 got it right!" then it's probably some trick question.

If you think it's not specified enough for a grade school question, then I would challenge you to come up with a version that's specified rigorously enough for any sufficiently picky interviewee. (Hint: This is not possible)

>There is nothing “mathematical” about any of this though.

Finding the correct approach to solve a problem specified in English is a mathematical skill.

krisoft · 2026-02-25T15:44:46 1772034286

> If this were a question on a grade school word problem test then just say 100

Let me repeat the question again: "If 5 machines can produce 5 parts in 5 minutes, how long will it take for 100 machines?" Do you think that by adding 95 more machines they will suddenly produce the same 5 parts 95 minutes slower?

What kind of machine have you encountered where buying more of them the ones you already had started working worse?

> then I would challenge you to come up with a version that's specified rigorously enough for any sufficiently picky interviewee.

This is nonsense. The question is under specified. You don't demonstrate that something is underspecified by formulating a different well specified question. You demonstrate it by showing that there are multiple different potentially correct answers, and one can't know which one is the right one without obtaining some information not present in the question.

Let me show you that demonstration. If the machines are for example FDM printers each printing on their own a benchy each, then the correct answer is 5 minutes. The additional printers will just sit idle because you can't divide-and-conquer the process of 3d printing an object.

If the machines are spray paint applying robots, and the parts to be painted are giant girders then it is very well possible that the additional 95 paint guns make the task of painting the 5 girders quasi-instantaneous. Because they would surround the part and be done with 1 squirt of paint from each paint gun. This classic video demonstrates the concept: https://www.youtube.com/shorts/vGWoV-8lteA

This is why the question is under specified. Because both 1ms and 5 minutes are possibly correct answers depending on what kind of machine is the "machine". And when that is the case the correct answer is neither 1ms nor 5 minutes, but "please, tell me more. There isn't enough information in the question to answer it."

Note: I'm struggling to imagine a possible machine where the correct answer is 100 minutes. But I'm sure you can tell what kind of machine you were thinking of.

Drupon · 2026-02-25T17:09:18 1772039358

Not sure what you mean by this.

krisoft · 2026-02-25T17:17:15 1772039835

I encourage you to ask a questions so I can figure out what do you not understand.

Let me also simplify my comment: “100 minutes” is not the correct answer to that question.

Drupon · 2026-02-28T20:12:53 1772309573

I'm not getting what you're trying to convey.

oytis · 2026-02-24T12:56:38 1771937798

It's not theory of mind, it's an understanding of how trick questions are structured and how to answer one. Pretty useless knowledge after high school - no wonder AI companies didn't bother training their models for that

Drupon · 2026-02-25T00:50:58 1771980658

It's not a trick question. It has a simple answer. It's literally impossible to specify a question about real world objects without some degree of prior knowledge about both the contents of the question and the expectation of the questioner coming into play.

The obvious answer here is 100 minutes because it's impossible to perfectly encapsulate every real life factor. What happens if a gamma ray burst destroys the machines? What happens if the machine operators go on strike? Etc, etc. The answer is 100.

1718627440 · 2026-02-24T12:05:55 1771934755

There are different kind of statements. Do you mean in a defined time interval or on average? Men are stronger than women. Does that mean there is no woman who is stronger then a man? You can't drive over 50 here. Does that mean it's physically impossible?

dsego · 2026-02-24T14:26:10 1771943170

Well, these type of questions are looking for intelligent assumptions. Similar to IQ tests, you are supposed to understand patterns and make educated guesses.

citizenpaul · 2026-02-24T07:44:13 1771919053

Thanks for that info. I was certain it was some janky ultra low or negative reward system that people just click a random answer to get through.

Had to be since their site lists no way to be a tester. In other words their service is a bunch of 7-13 year olds playing some loot box game.

Wonder where that is in the disclaimers.

wickedsight · 2026-02-24T07:07:56 1771916876

> Do you think 30% of your friends or family members can't answer this question? If not, do you think your friends or family are all better than the general population?

That actually would be quite feasible. Intelligence seems to be heritable and people will usually find friends that communicate on their level. So it wouldn't be odd for someone who is smarter than the general population to have friends and family who are too.

polypphonics · 2026-02-24T07:29:18 1771918158

My friend's and family all tell me they are above average at work, yet most of them will tell me they have coworkers who won't pay enough attention to a question to answer it correctly.

coldtea · 2026-02-24T12:31:32 1771936292

>If not, do you think your friends or family are all better than the general population?

Since most people live in social bubbles that would be a very plausible case, especially on HN.

If you're a college educated developer, with a college educated wife, and smart, well educated children, perhaps yourselves the children of college educated parents, and your social circle/friends are of similar backgrounds, you'd of course be "better than the general population".

bandrami · 2026-02-24T10:59:40 1771930780

What if 30% lack the ability to fill out forms and surveys?

yobbo · 2026-02-24T06:47:58 1771915678

If you suggest bad reasoning, do you think they would actually walk to the car wash and then be surprised the car wasn't there?

Or by reasoning, do you mean something else?

abustamam · 2026-02-24T06:53:54 1771916034

I don't think it's the lack of the ability to reason. The question is by definition a trick question. It's meant to trip you up, like ' "Could God make a burrito so hot that even he couldn't touch it?" Or "what do cows drink?" or "a plane crashes and 89 people died. Where were the survivors buried?"

I've seen plenty of smart people trip up or get these wrong simply because it's a random question, there's no stakes, and so there's no need to think too deeply about it. If you pause and say "are you sure?" I'm sure most of that 70% would be like "ohhh" and facepalm.

scott_w · 2026-02-24T14:59:59 1771945199

> which I suppose makes sense if 30% of people simply lack the ability to reason

You can't really infer that from survey data, and particularly from this question. A few criticisms that I came up with off the top of my head:

- What if the number were actually 60% but half guessed right and half guessed wrong?

- Assuming the 30% is a failure of reasoning, it's possible that those 30% were lacking reason at that moment and it's not a general trend. How many times have you just blanked on a question that's really easy to answer?

- A larger percentage than you expected maybe never went to a car wash or don't know what one is?

- Language barrier that leaked through vetting? (Would be a small %, granted)

- Other obvious things like a fraction will have lied just because it's funny, were suspicious, weren't paying attention and just clicked a button without reading the question.

I do agree that the question isn't framed particularly badly, however. I'm just focusing on cognitive impairment, which I don't think is necessarily true all of the time.

dwaltrip · 2026-02-24T04:08:03 1771906083

You left out the first half of the prompt: “I want to wash my car”.

isatty · 2026-02-24T05:36:34 1771911394

Yeah I see this argument being made that it’s ambiguous for humans. Uh, no? Why on earth would I walk to the car wash when I want to wash my car?

sparky_z · 2026-02-24T05:53:32 1771912412

By the same reasoning, why on earth would a person sincerely ask you that question unless the car that they want to wash is either already at the car wash, or that someone is bringing it to them there for some reason?

If it's as unambiguous as you say, then the natural human response to that question isn't "you should drive there". It's "why are you fucking with me?" Or maybe "have you recently suffered a head injury?"

If you trust that the questioner isn't stupid and is interacting with you honestly, you'd probably just assume that they were asking about an unusual situation where the answer isn't obvious. It's implicitly baked into the premise of the question.

snovv_crash · 2026-02-24T06:52:16 1771915936

The fact that this is so obvious to humans is why there's no training data that LLMs can use to know the answer.

malfist · 2026-02-24T13:41:17 1771940477

How could the car already be at the car wash if you have the option to drive it there?

Maxion · 2026-02-24T15:25:09 1771946709

You might own multiple cars, you might be borrowing someone elses and so forth.

malfist · 2026-02-24T16:21:26 1771950086

That still doesn't make sense. I'm going to use another car, or borrow a car to drive to a carwash where my car I want to wash is and then....I guess leave it there? Or leave the car I came in?

This isn't a viable out for explaining why AI can't "reason" through this.

sparky_z · 2026-02-24T20:16:21 1771964181

But why would they reason through it in that way? You haven't asked them to listen carefully and find the secret reason you're a dumb-ass in order to prove how smart they are. If they default to that mode on every query, that would just make them insufferable conversational partners, which is not the training goal.

Let me put it this way. If you were to prefix the prompts they used with "This is an IQ test: ", I wouldn't be surprised if most of the the models did much better. That would give them the context that the humans reading this article already have.

1718627440 · 2026-02-24T12:11:17 1771935077

You already brought the car there earlier? You bought a new car and negotiated that you get it washed, so you want to collect it? You have a butler? You plan to get someone or something from the car wash to do it at home, because the car you want to wash is dead?

happyopossum · 2026-02-24T16:05:33 1771949133

> how we interpret underspecified questions

The question was not merely 'should I walk or drive to the car wash', it was prefaced with 'I Want to Wash My Car. The Car Wash Is 50 Meters Away.'

This is not underspecified - the only relevant detail was included up front in the very first sentence.

felix089 · 2026-02-24T16:26:58 1771950418

agreed

Zobat · 2026-02-24T12:00:40 1771934440

I wonder about the the service used for the test, never heard of Rapidata but if it's like Amazons mechanical turk och other such services there might be a problem where the respondents simply didn't care about reading the question. If the objective for the respondents were simply "answer this question and get your benefit" vs "answer this question correctly to get your benefit" I have no problem accepting the 71.5% success rate. If getting it right had benefits and getting it wrong had none then I'm (slightly) worried.

felix089 · 2026-02-24T12:36:50 1771936610

They answered it in another comment somewhere below, there's no incentive for a correct answer

utilize1808 · 2026-02-24T10:22:20 1771928540

The right question is how many of those "human" responses from Rapidata are actually provided by some AI in disguise?

fasbiner · 2026-02-25T13:31:23 1772026283

You're stringing together a bunch of weasel words that are not a proof or a plausible suggestion of a proof.

"Suggests is more ambiguous" and "undermines the framing" are bare assertions you want to be true based entirely on your mental model that has several shaky unsupported axioms.

I would guess that anyone who describes that problem as "underspecified" has some kind of serious brain injury or is below A2 english proficiency and should be excluded from the dataset, but I would not assert that definitively as self-evident.

HarHarVeryFunny · 2026-02-24T13:44:20 1771940660

I highly doubt that more than a tiny fraction of the human failures are due to having misunderstood the question. Much more likely the human failures are for the same reason the LLMs are failing - failure to reason, and instead spitting out a surface level pattern match type answer.

This doesn't exonerate the LLMs though. The 30% of humans who are failing on this have presumably found their niche in life and are not doing jobs where much reasoning is required. They are not like LLMs expected to design complex software, or make other business critical decisions.

OneMorePerson · 2026-02-24T07:52:10 1771919530

I don't think it's ambiguous, but I have been wondering how much LLMs model human behavior that we just don't recognize due to the subset of people on this site. I recently saw a comment online that "Mandarin isn't anyone's first language, people in China's first language is a dialect". It just struck me at that moment that people also hallucinate information confidently all the time.

dspillett · 2026-02-24T10:35:52 1771929352

> It just struck me at that moment that people also hallucinate information confidently all the time.

And many will just repeat what was confidently said without question.

I know this it true, because my intelligent mate down the pub says so.

OneMorePerson · 2026-02-24T11:21:15 1771932075

Yes exactly. We are all wrong on occasion, but before I repeat something I perceive as important (or maybe not even important, just "factual") I tend to always want to try to verify it. Otherwise I'd say "I heard..." or something similar to caveat. Maybe it's an engineering mindset thing.

therealdrag0 · 2026-02-24T05:54:35 1771912475

Surveys have floors due to mistakes, effort, and trolling

Reminds me of https://slatestarcodex.com/2020/05/28/bush-did-north-dakota/

stevage · 2026-02-24T06:25:18 1771914318

Pragmatics is a big part of this.

If you introduced it with "Here's a logic problem..." then people will approach it one way.

But as specified, it's hard to know what is really being asked. If you are actually going to wash your car at the car wash that is 50 metres away, you don't need to ask this question.

Therefore the fact that the question is being asked implies that something else is going on...but what?

cortesoft · 2026-02-24T16:35:17 1771950917

I think it more has to do with a lot of people just clicking an answer as fast as they can without reading the question.

bambax · 2026-02-24T10:38:19 1771929499

We should also check the specifics of the experiment. Is it possible that humans participating simply copied and pasted the question and answer to an LLM?

steveBK123 · 2026-02-24T12:12:59 1771935179

If you are talking to a 5 year old maybe

oytis · 2026-02-24T12:35:27 1771936527

Yeah, it's an obvious trick question - as in as a human I read it as such. I think it's a bad benchmark for a model's reasoning ability. If you want to know what the model would do in a real world scenario, you should put this decision in an appropriate context - e.g. when a model should plan one's route for a day using different available means of transportation.

vkou · 2026-02-24T10:10:28 1771927828

Nearly 0% of humans will get this question wrong if they have a car that needs to be washed.

dozerly · 2026-02-24T04:01:24 1771905684

I don’t think it’s under specified. You are clearly stating “I want to wash my car”, then asking how you should get there. It’s an easy logical step to know that, in this context, you need your car with you to wash it, and so no matter the distance you should drive. You can ask the human race the simplest, most logical question ever, and a percentage of them will get it wrong.

mdorazio · 2026-02-24T04:51:02 1771908662

In addition to snmx999's point, you're also not specifying that you want to wash your car at the car wash (as opposed to washing it in your driveway or something, in which case the car wash is superfluous information). The article's prompt failed in Sonnet 4.6, but the one below works fine. I think more humans would get it right as well.

I want to wash my car at the car wash. The car wash is 50 meters away and my car is in my driveway. Should I walk or drive?

aurareturn · 2026-02-24T08:32:34 1771921954

1. When do you want to wash your car? Tomorrow? Next year? In 50 years?

2. Where is the car now? Is it already at the car wash waiting for you to arrive?

I can see why an LLM might miss this. I think any good software engineer would ask clarifying questions before giving an answer.

The next step for an LLM is to either ask questions before giving a definitive answer for uncertain things or to provide multiple answers addressing the uncertainty.

kklisura · 2026-02-24T10:01:17 1771927277

3. Is the car broken somewhere? Does it have wheels on?

4. Does the car have enough fuel?

Jokes asides, all of those questions are unnecessary. There's no more context to this.

aurareturn · 2026-02-24T12:37:48 1771936668

If you ask a human that in person, they'd wonder why you'd ask such as stupid question.

I think LLMs should ask clarifying questions if it thinks it's a trick question.

snmx999 · 2026-02-24T04:23:03 1771906983

The question does not specify where you or the car are. It specifies only that the car wash is 50 meters away from something, possibly you, the car, or both.

username44 · 2026-02-24T04:48:12 1771908492

This is an interesting point, but even when you are more specific ChatGPT says to walk.

https://chatgpt.com/share/699d2d1b-51f0-8003-9c63-af9bb5bcf8...

mk89 · 2026-02-24T07:26:50 1771918010

It could also mean there is literally no possible way to reach it, because that's on the other side of a river, and there is no bridge. You should still not "walk there, because come on don't be lazy, a bit of walking is good".

1718627440 · 2026-02-24T12:14:24 1771935264

This. To be correct you must also give the answer for the right reason. If you say "drive" but for the wrong reason, then you are still wrong.

Niko901ch · 2026-02-22T21:37:45 1771796265

This is a great practical application of pgvector! The HN corpus is perfect for semantic search because the discussions tend to be technical and well-structured.

I'm curious about the embedding model you chose - did you compare different options (OpenAI ada-002, Cohere, open-source models like all-MiniLM)? And how's the query performance with pgvector at scale?

One feature that would be valuable: filtering by time range or karma score. Sometimes you want recent discussions vs. classic threads with high engagement.

Niko901ch · 2026-02-22T21:30:23 1771795823

Interesting approach using SQLite as the persistence layer for AI agents. The local-first architecture makes a lot of sense for development workflows where latency matters.

One question: How do you handle concurrent writes from multiple agents working on the same project? SQLite has WAL mode, but I'm curious if you've encountered any race conditions in practice, especially when agents are running in parallel.

Also, the MCP (Model Context Protocol) integration is clever - having a standardized way for agents to query project state could really simplify the orchestration layer. Are you seeing other teams adopt MCP for similar use cases?

spranab · 2026-02-23T00:19:34 1771805974

Thanks for your valuable feedbacks.

SQLite handles this well in practice. saga-mcp uses WAL mode with busy_timeout=5000 and synchronous=NORMAL, so concurrent writes queue up rather than fail. The intended use case is one agent per project per session — if you had multiple agents writing to the same .tracker.db, WAL mode serializes the writes transparently.

For MCP adoption — it's growing fast. Claude Code, Claude Desktop, Cursor, and Windsurf all support it natively now. The spec is simple (JSON-RPC over stdio or SSE), so the barrier to both building and consuming MCP servers is low.