Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you want guidance acceleration speedups (and token healing) then you have to use an open model locally right now, though we are working on setting up a remote server solution as well. I expect APIs will adopt some support for more control over time, but right now commercial endpoints like OpenAI are supported through multiple calls.

We manage the KV-cache in session based way that allows the LLM to just take one forward pass through the whole program (only generating the tokens it needs to)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: