If you look at there I tried to be as transparent about this process as possible. I simply didn't know any better than to use AI to fact check my data when I first started - which was a really bad idea and led to the horrendous outcome as you've seen there. I am not trying to hide anything, I made a mistake. If you could give the article a re-read and tell me where I might have gone wrong I would be really happy. I actually want this to be a good and useful educational resource, not AI slop.
> The entire VSCode workbench - editor, terminal, extensions, themes, keybindings — ported to run on a native shell.
but also
> Many workbench features are stubbed or partially implemented
So which is it? The README needs to be clearer about what is aspirational, what is done, and what is out of scope. Right now it just looks like an LLM soup.
Last year I was working on a tail-call interpreter (https://github.com/anematode/b-jvm/blob/main/vm/interpreter2...) and found a similar regression on WASM when transforming it from a switch-dispatch loop to tail calls. SpiderMonkey did the best with almost no regression, while V8 and JSC totally crapped out – same finding as the blog post. Because I was targeting both native and WASM I wrote a convoluted macro system that would do a switch-dispatch on WASM and tail calls on native.
Ultimately, because V8's register allocation couldn't handle the switch-loop and was spilling everything, I basically manually outlined all the bytecodes whose implementations were too bloated. But V8 would still inline those implementations and shoot itself in the foot, so I wrote a wasm-opt pass to indirect them through a __funcref table, which prevented inlining.
One trick, to get a little more perf out of the WASM tail-call version, is to use a typed __funcref table. This was really horrible to set up and I actually had to write a wasm-opt pass for this, but basically, if you just naively do a tail call of a "function pointer" (which in WASM is usually an index into some global table), the VM has to check for the validity of the pointer as well as a matching signature. With a __funcref table you can guarantee that the function is valid, avoiding all these annoying checks.
Based on looking at V8's JITed code, there seemed to be a lot of overhead with stack overflow checking, actually. The function prologues and epilogues were just as bloated in the tail-call case. I'll upload some screenshots if I can find them.
Looks like a very sophisticated operation, and I feel for the maintainer who had his machine compromised.
The next incarnation of this, I worry, is that the malware hibernates somehow (e.g., if (Date.now() < 1776188434046) { exit(); }) to maximize the damage.
I mean the compromised machine registers itself on the command server and occasionally checks for workloads.
The hacker then decides his next actions - depending on the machine they compromised they'll either try to spread (like this time) and make a broad attack or they may go more in-depth and try to exfiltrate data/spread internally if eg a build node has been compromised
> But then the clean room implementations started showing up. People had taken Anthropic’s source code and rewritten Claude Code from scratch in other languages like Python and Rust.
Seems like the phrase "clean room" is the new "nonplussed"... how does this make any sense?
Heya, post author here. I think I was just wrong about this assertion. I got into a discussion with a copyright lawyer over on Bluesky[^1] after I wrote this and came away reasonably convinced that this wouldn’t be a valid example of a clean room implementation.
I think it means you write a spec from the implementation. Then you write a new implementation from the spec. You might go so far as to do the second part in a "clean" room.
Heh, the original being entirely vibed had me thinking of an interesting problem: if you used the same model to generate a specification, then reset the state and passed that specification back to it for implementation, the resulting code would by design be very close to the original. With enough luck (or engineering), you could even get the same exact files in some cases.
Does this still count as clean-room? Or what if the model wasn't the same exact one, but one trained the same way on the same input material, which Anthropic never owned?
This is going to be a decade of very interesting, and probably often hypocritical lawsuits.
in a typical clean-room design, the person writing the new implementation is not supposed to have any knowledge of the original, they should only have knowledge of the specification.
if one person writes the spec from the implementation, and then also writes the new implementation, it is not clean-room design.
I believe the argument is that LLMs are stateless. So if the session writing the code isn't the same session that wrote the spec, it's effectively a clean room implementation.
There are other details of course (is the old code in the training data?) but I'm not trying to weigh in on the argument one way or the other.
Ya, I tend to believe that (most) human VR will be obsoleted well before human software engineering. Software engineering is a lot more squishy and has many more opportunities to go off the rails. Once a goal is established, the output of VR agents is verifiable.
Definitely. As an extreme but fun example... in one project I had a massive hash map (~700 GB or so) that was concurrently read to/written from by 256 threads. The entries themselves were only 16 bytes and so I could use atomic cmpxchg, but the problem I hit was that even with 1GB huge pages, I was running out of dTLB entries. So I assigned each thread to a subregion of the hash table, then used channels between each pair of threads to handle the reads and writes (and restructured the program a bit to allow this). Since the dTLB budget is per core, this allowed me to get essentially 0 dTLB misses, and ultimately sped up the program by ~2x
For a while I've been annoyed that esbuild, which is written in Go, eschews these APIs to detect changes in watch mode and instead continually polls the filesystem: https://github.com/evanw/esbuild/issues/1527#issuecomment-90.... It actually consumes quite a bit of battery, so I might fork it and apply this post's implementation!
It contains many factual errors.
reply