Hacker Newsnew | past | comments | ask | show | jobs | submit | kif's commentslogin

Great point — this is the smoking gun

Is there going to be a new ShieldGemma based on Gemma 4?

Anecdotally when Claude was error 500'ing a few days ago, its retries would never succeed, but cancelling and retrying manually worked most of the time.

But that's the problem. Something that can be so reliable at times, can also fail miserably at others. I've seen this in myself and colleagues of mine, where LLM use leads to faster burnout and higher cognitive load. You're not just coding anymore, you're thinking about what needs to be done, and then reviewing it as if someone else wrote the code.

LLMs are great for rapid prototyping, boilerplate, that kind of thing. I myself use them daily. But the amount of mistakes Claude makes is not negligible in my experience.


> I've seen this in myself and colleagues of mine, where LLM use leads to faster burnout and higher cognitive load.

This needs more attention. There's a lot of inhumanity in the modern workplace and modern economy, and that needs to be addressed.

AI is being dumped into the society of 2026, which is about extracting as much wealth as possible for the already-wealthy shareholder class. Any wealth, comfort, or security anyone else gets is basically a glitch that "should" be fixed.

AI is an attempt to fix the glitch of having a well-compensated and comfortable knowledge worker class (which includes software engineers). They'd rather have what few they need running hot and burning out, and a mass of idle people ready to take their place for bottom-dollar.


This is a fair observation, and I think it actually reinforces the argument. The burnout you're describing comes from treating AI output as "your code that happens to need review." It's not. It's a hypothesis. Once you reframe it that way, the workflow shifts: you invest more in tests, validation scenarios, acceptance criteria, clear specs. Less time writing code, more time defining what correct looks like. That's not extra work on top of engineering. That is the engineering now. The teams I've seen adapt best are the ones that made this shift explicit: the deliverable isn't the code, it's the proof that the code is right.


This is a fair point. The cognitive load is real. Reviewing AI output is a different kind of exhausting than writing code yourself.

Even when the output is "guided," I don't trust it. I still review every single line. Every statement. I need to understand what the hell is going on before it goes anywhere. That's non-negotiable. I think it gets better as you build tighter feedback loops and better testing around it, but I won't pretend it's effortless.


You are correct, but this is not a new role. AI effectively makes all of us tech leads.


Prototyping is a perfectly fine use of LLMs - its easier to see a closer-to-finished good than one that is not.

But that won't generate the returns Model producers need :) This is the issue. So they will keep pushing nonsense.


It is curious how people go immediately on the defense trying to explain how what Claude said is in fact correct.


I have as much respect for Claude as any other LLM product. Which is to say, approximately none. But if I needed a spark plug I'd walk over and buy a spark plug.

Perhaps some feathers have been ruffled by the insinuation that their favourite word predictor was wrong, but I assure you it's not all of them


Walk or drive works, walking is better for your health, 200m is easy walking distance, my 93 year old father still walks 6km (30 x that 200m ) every morning.


I’m pretty sure Claude would eagerly say that if that was the reason.

Last day Claude Code said to me “Small nitpick — the use of so and so is great”. Which was something no human would say.


In my opinion there is a problem when said robot relies on piracy to learn how to do stuff.

If you are going to use my work without permission to build such a robot, then said robot shouldn’t exist.

On the other hand a jack of all trades robot is very different from all the advancements we have had so far. If the robot can do anything, in the best case scenario we have billions of people with lots of free time. And that doesn’t seem like a great thing to me. Doubt that’s ever gonna happen, but still.


This honestly doesn’t surprise me. We have reached a point where it’s becoming clearer and clearer that AGI is nowhere to be seen, whereas advances in LLM ability to ‘reason’ have slowed down to (almost?) a halt.


But if you ask an AI hype person they’ll say we’re almost there we just need a bit more gigawatts of compute!


I hate to say this but I think the LLM story is going to go the same way as Teslas stock - everyone knows its completely detached from fundamentals and driven by momentum and hype but nobody wants to do the right thing.


In my book, chat-based AGI has been reached years ago, when I couldn't reliably distinguish computer from human.

Solving problems that humanity couldn't solve is super-AGI or something like that. It's not there indeed.


Beating the Turing Test is not AGI, but it is beating the Turing Test and that was impressive enough when it happened


So you were impressed by ELIZA right? Because that's what first "beat the turing test"

Which, actually is not a real thing. Nor has it ever really been meaningful.

Trolls on IRC "beat the turing test" with bots that barely even had any functionality.


I wasn't alive back then, but I was absolutely impressed by it the first time I heard about it. I don't know how that is supposed to be a gotcha.


We're not even solving problems that humanity can solve. There's been several times where I've posed to models a geometry problem that was novel but possible for me to solve on my own, but LLMs have fallen flat on executing them every time. I'm no mathematician, these are not complex problems, but they're well beyond any AI, even when guided. Instead, they're left to me, my trusty whiteboard, and a non-negligible amount of manual brute force shuffling of terms until it comes out right.

They're good at the Turing test. But that only marks them as indistinguishable from humans in casual conversation. They are fantastic at that. And a few other things, to be clear. Quick comprehension of an entire codebase for fast queries is horribly useful. But they are a long way from human-level general intelligence.


I'm pretty sure there are billions of people on the Earth unable to solve your geometry problem. That doesn't make them less human. It's not a benchmark. You should think about something almost any human can do, not selected few. That's the bar. Casual conversation is one of the examples that almost any human can do.


Any human could do it, given the training. Humans largely choosing not to specialize in this way doesn't make them less human, nor did I imply that. Humans have the capacity for it, LLMs fall short universally.


What do you mean reliably distinguish a computer from a human? I haven't been surprised one time yet. I always find out eventually when I'm talking to an AI. It's easy usually, they get into loops, forget about conversation context, don't make connections between obvious things and do make connections between less obvious things. Etc.

Of course they can sound very human like, but you know you shouldn't be that naive these days.

Also you should of course not judge based on a few words.


Hence the pivot into ads, shop-in-chat and umm.. adult content.


He also said he got scared when trying out GPT 5, thinking “What have we done?”.

He’s in the habit of lying, so it would be remiss to take his word for it.


I think it’s fair to say you need another kind of domain experience to explain Trump.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: