Hacker Newsnew | past | comments | ask | show | jobs | submit | cesaref's commentslogin

I'm interested in the implications for the open source movement, specifically about security concerns. Anyone know is there has been a study about how well Claude Code works on closed source (but decompiled) source?

I’ve had Claude Code diagnose bugs in a compiler we wrote together by using gdb and objdump to examine binaries it produces. We don’t have DWARF support yet so it is just examining the binary. That’s not security work, but it’s adjacent to the sorts of skills you’re talking about. The binaries are way smaller than real programs, though.

Definitely not my wheelhouse, but I would expect it to be considerably worse.

Simply because the source code contains names that were intended to communicate meaning in a way that the LLM is specifically trained to understand (i.e., by choosing identifier names from human natural language, choosing those names to scan well when interspersed into the programming language grammar, including comments etc.). At least if debugging information has been scrubbed, anyway (but the comments definitely are). Ghidra et. al. can only do so much to provide the kind of semantic content that an LLM is looking for.


I've cut-and-pasted some assembly code into the free version of ChatGPT to reverse engineer some old binaries and its ability to find meaning was just scary.

Yesterday, i had claude decompile and fix firmware for my new samsung viewfinity s8 - there was really annoying pop up banner on each wake which you cant turn off, and samsung clearly didnt care. I was about to return it, then thought - hhmm, why not :) Not one-shotted, took several tries (lucky none of them bricked it, haha). Also i guess warranty is voided, but idc :)

> Claude Code works on closed source (but decompiled) source

Very likely not nearly as well, unless there are many open source libraries in use and/or the language+patterns used are extremely popular. The really huge win for something like the Linux kernel and other popular OSS is that the source appears in the training data, a lot. And many versions. So providing the source again and saying "find X" is primarily bringing into focus things it's already seen during training, with little novelty beyond the updates that happened after knowledge cutoff.

Giving it a closed source project containing a lot of novel code means it only has the language and it's "intuition" to work from, which is a far greater ask.


I’m not a security researcher, but I know a few and I think universally they’d disagree with this take.

The llms know about every previous disclosed security vulnerability class and can use that to pattern match. And they can do it against compiled and in some cases obfuscated code as easily as source.

I think the security engineers out there are terrified that the balance of power has shifted too far to the finding of closed source vulnerabilities because getting patches deployed will still take so long. Not that the llms are in some way hampered by novel code bases.


> The llms know about every previous disclosed security vulnerability class and can use that to pattern match

Do the reports include patterns that could be matched against decompiled code, though? As easily as they would against proper source? I find it a bit hard to believe.


Many vulnerabilities aren't just pattern matching though; deep understanding of the context in the particular codebase is also needed. And a novel codebase means more attention than usual will be spent grepping and keeping the context in focus. Which will make it easier to miss certain things, than if enough of the context was already encoded in the model weights.

Same thing applies to humans: the better someone knows a codebase, the better they will be at resolving issues, etc.


Almost all vulnerabilities are either direct applications of known patterns, incremental extensions of them, or chains of multiple such steps.

It would be much more interesting/efficient if the LLM had tokens for machine instructions so extracting instructions would be done at tokenizing phase, not by calling objdump.

But I guess I'm not the first one to have that idea. Any references to research papers would be welcome.


As an experiment, I just now took a random section of a few hundreds bytes (as a hexdump) from the /bin/ls executable and pasted them into ChatGPT.

I don't know if it's correct, but it speculated that it's part of a command line processor: https://chatgpt.com/share/69d19e4f-ff2c-83e8-bc55-3f7f5207c3...

Now imagine how much more it could have derived if I had given it the full executable, with all the strings, pointers to those strings and whatnot.

I've done some minor reverse engineering of old test equipment binaries in the past and LLMs are incredible at figuring out what the code is doing, way better than the regular way of Ghidra to decompile code.


On the subject of the weights and measures to check that a pint is a pint, I remember the father of a friend of mine at university who was responsible for the weights and measures for Staffordshire. I think he was the undersheriff or something like that, and that the official pint was part of the collection.

This would have been in the late 80s - i've no idea if it was still in use, but i've a feeling that the law hadn't necessarily moved on, so I guess the official measure could have been required if challenged in court.


The older Tektronix TDS540 series did this, but at much lower rates as was common in those days though. Internally there are differential feeds from the very beautiful hybrid ceramic input boards to 4 DACs, with some clever switching so that a single input can be sampled by all 4 DACs with a suitable offset to create 4x the sample rate when running with all 4 inputs.

The calibration procedure on the scope fiddles with the time alignment to get the different DACs correctly offset so that the combined signal is correct.

The hybrid ceramic input boards in their metal cases are a thing of beauty, fragile (don't ask how I know), but beautiful.


Yup, a lot of scopes actually did this internally and some still do. It's part of why some scopes lose half their BW when you go from 2 ports to 4 ports (some go the other direction and run multiple ports on one very fast ADC), they split the digitizers. It's just very very difficult to keep it working external to the box mainly because of line drift.


Just out of interest, why aren't they cross compiling RISC-V? I thought that was common practice when targeting lower performing hardware. It seems odd to me that the build cycle on the target hardware is a metric that matters.


Please skim the thread :) We've already discussed it twice. Fedora "mandates" native builds.

Build time on target hardware matters when you're re-building an entire Linux distribution (25000+ packages) every six months.


I failed to find this on my skim, my bad :(

Interesting that it's mandated as native - i'm really not sure the logic behind this (i've worked in the embedded world where such stuff is not only normal, but the only choice). I'll do some digging and see if I can find the thought process behind this.


Nobody misses the terrible optical mouse with the blue metal mouse mat though! You are right, the keyboards were great.


I do. At my first job it looked so technical! Sure they were a bit rubbish but at least you didn’t have to pick all the hair and gank off the rollers and balls all the time.


Oh yes I remember that one. Wasn't it the type 4?

The type 5 was a better mouse (with ball though) but as I remember the keyboard was a little worse.


The way I read it, the prefix to the > indicates which file descriptor to redirect, and there is just a default that means no indicated file descriptor means stdout.

So, >foo is the same as 1>foo

If you want to get really into the weeds, I think 2>>&1 will create a file called 1, append to a file descriptor makes no sense (or maybe, truncate to a file descriptor makes no sense is maybe what I mean), but why this is the case is probably an oversight 50 years ago in sh, although i'd be surprised if this was codified anywhere, or relied upon in scripts.


Except it has a taste. I think finding something with just the colour and no taste would be a better example of this.


Go find that unicorn. Make your first $100M.

Also: that's a completely different issue to "describing its presence as 'artificial'". Needs a new thread.


Most of the effort when writing a compiler is handling incorrect code, and reporting sensible error messages. Compiling known good code is a great start though.


Just out of interest, what's the current 'state of the art' for a chip that is hardened to survive launch and any length of time in orbit?


Depends on a lot of factors. LEO has high drag, but good radiation shielding, so if you've got a low enough orbit you can use most embedded hardware but need to compensate with bigger thrusters and bigger fuel tanks if you want it to survive "any length of time" without burning up from atmospheric drag.


This has reminded me that in System 7, the code for the window was a system resource (resource forks contained all sorts of code, icons, text dictionaries etc). Anyhow, if you dropped an updated window resource into your system with the correct resource id, you could change this default behaviour. A friend of mine wrote a round window for a clock app, and made a copy with resedit in the system, and a reboot later, all windows were round.

It was a very flexible and hackable system, very fragile, and no security whatsoever, but lots of fun!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: