Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> LLMs are practically stateless

This isn't true of any practical implementation: for a particular conversation, KV Cache is the state. (Indeed there's no state across conversations, but that's irrelevant to the discussion).

You can drop it after each response, but doing so increase the amount of token you need to process by a lot in multi-turn conversations.

And my point was that storing the KV cache for the duration of the conversation isn't possible if you switch between multiple providers in a single conversation.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: