More

aix1 · 2026-03-13T08:39:24 1773391164

My mildly amusing anecdote is that, whenever Claude Code produces something particularly egregious, I often find it sufficient to reply with just "wtf?" for it to present a much more sensible version of the code (which often needs further refinement, but that's another story...)

aix1 · 2026-03-13T08:13:25 1773389605

But we don't evolve IL or assembly code as the system evolves. We regenerate it from scratch every time.

It is therefore not important whether some intermediate version of that low-level code was completely impossible to understand.

It is not so with LLM-written high-level code. More often than not, it does need to be understood and maintained by someone or something.

These days, I mainly focus on two things in LLM code reviews:

1. Making sure unit tests have good coverage of expected behaviours.

2. Making sure the model is making sound architectural decisions, to avoid accumulating tech debt that'll need to be paid back later. It's very hard to check this with unit tests.

nitwit005 · 2026-03-13T18:00:17 1773424817

We get stuck reviewing the output assembly when it's broken, and that does happen from time to time. The reason that doesn't happen often is that generation of assembly follows strict rules, which people have tried their best to test. That's not the behavior we're going to get out of a LLM.

contextfree · 2026-03-13T18:20:19 1773426019

Yes, prompts aren't analogous to higher-level code, they're analogous to wizards or something like that which were always rightly viewed with suspicion.

aix1 · 2026-03-11T16:36:24 1773246984

Part of it observability bias: longer, more widespread outages are more likely to draw signficant attention. This doesn't mean that there aren't also shorter, smaller-scope outages, it's just that we're much less likely to know about them.

For example, if there's a problem that gets caught at the 1% stage of a staged rollout, we're probably not going to find ourselves discussing it on HN.

aix1 · 2026-03-09T09:07:35 1773047255

And how is it going, in terms of finding those limit? It would be very interesting to hear about areas where the actual experience turned out to be wildly different from your expectations, in either direction.

aix1 · 2026-03-09T08:03:53 1773043433

This looks cool, but what I'd really like is a self-hosted version that I could use to auto-subtitle videos I already have locally. This would help my language learning a great deal.

If any of you have already figured out a tool/workflow for this, I'd love to learn from your experience.

aix1 · 2026-03-09T09:18:22 1773047902

This thread prompted me to look into this. It seems that all I need is a thin wrapper around whisper-ctranslate2. So I wrote one and am playing with it right now.

I'm finding language auto-detection to be a bit wonky (for example, it repeatedly identified Ladykracher audio as English instead of German). I ended up having to force a language instead. The only show in my library where this approach doesn't work is Parlement[1], but I can live with that.

On the whole this is looking quite promising. Thanks for the idea.

[1] https://en.wikipedia.org/wiki/Parlement_(TV_series)

aix1 · 2026-03-08T07:01:09 1772953269

Another potential factor at play is the accuracy of delivery. It is generally easier to accurately deliver one quick dose vs daily doses over multiple weeks (due to patient positioning errors, the patient losing weight, soft tissues moving around etc).

aix1 · 2026-03-08T06:13:29 1772950409

The 42 -> 137 also jumped out at me. On the face of it, the associated improvement sure does sound like overfitting to the eval set.

aix1 · 2026-03-07T05:35:08 1772861708

Would love to hear more, if you are happy sharing!

aix1 · 2026-03-05T10:36:50 1772707010

I have a Metabo vacuum (ASA 30 H PC) and absolutely love it. What's bad about the ergonomics of yours?

bob1029 · 2026-03-05T11:10:46 1772709046

I have a Sebo. The primary thing I dislike is the weight. You'd think something this heavy would have some kind of performance advantage, but it doesn't. I've seen battery powered shit from Walmart suck harder than this machine does.

aix1 · 2026-03-03T20:00:27 1772568027

Matt Levine put it really well: "We will create God and then ask it for money."

ducktastic · 2026-03-04T19:17:34 1772651854

The answer is easy: more corruption. No AI God needed