Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

With speculative decoding you can use more models to speed up the generation however.
 help



Yes, because speculation has NEVER bitten us in the ass before, right? Coughs in Spectre

Speculative decoding is just running more hardware to get a faster prediction. Essentially, setting more money on fire if you're being billed per token.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: