Hacker Newsnew | past | comments | ask | show | jobs | submit | edunteman's commentslogin

I’ve been a big fan of “what’s the thinnest this could be” interpretations of sandboxes. This is a great example of that. On the other end of the spectrum there’s just-bash from the Vercel folks.


Exactly —- they skip the OS, we make it free to clone.


The llm detector in my brain went off too


Every paragraph in the article is exactly what LLM produces


Your repo was actually a major point of reference! Thank you for open sourcing it. Ironically when I first got into zig I built a similar generator for python bridging which your project reminded me of https://github.com/erik-dunteman/zigpy

Ultimate decision for not using a bindings generator was primarily to deeply understand NAPI.


great to hear I could help :) yeah, no worries - I totally understand :)


Correct, your PATH resolves to your local tools as if it was unprotected bash, but syscalls are filtered/virtualized


from a utilitarian perspective, can we swap this instead of a e2b or some other provider? since this doesnt require n number of micrvovm kernals and rootfs hanging round?


Exactly, that’d be the intention. For compute-heavy or long running jobs you’d still probably want a dedicated VM like on E2B but for quick stuff, bVisor


Hell yeah, love to hear it! Happy to answer any questions or issues you run into


The part that most resonates with me is the lingering feeling of “oh but it must be my fault for underspecifying” which blocks the outright belief that models are just still sloppy at certain things


Good question, I imagine you’d need to set up an ngrok endpoint to tunnel to local LLMs.

In those cases perhaps an open source (maybe even local) version would make more sense. For our hosted version we’d need to charge something, given storage requirements to run such a service, but especially for local models that feels wrong. I’ve been considering open source for this reason.


I’d love your opinion here!

Right now, we assume first call is correct, and will eagerly take the first match we find while traversing the tree.

One of the worst things that could currently happen is we cache a bad run, and now instead of occasional failures you’re given 100% failures.

A few approaches we’ve considered - maintain a staging tree, and only promote to live if multiple sibling nodes (messages) look similar enough. Decision to promote could be via tempting, regex, fuzzy, semantic, or LLM-judged - add some feedback APIs for a client to score end-to-end runs so that path could develop some reputation


I’d assume RL would be baked in to the request structure. I’m surprised OAI spec doesn’t include it, but I suppose you could hijack a conversation flow to do so


Very, very common approach!

Wrote more on that here: https://blog.butter.dev/the-messy-world-of-deterministic-age...


What a great overview!

I’d love your thoughts on my addition, autolearn.dev — voyager behind MCP.

The proxy format is exactly what I needed!

Thanks


Awesome to hear you’ve done similar. JSON artifacts from runs seem to be a common approach for building this in house, similar to what we did with the muscle mem. Detecting cache misses is a bit hard without seeing what the model sees, part of what inspired this proxy direction.

Thanks for the nice words!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: