With speculative decoding you can use more models to speed up the generation how... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		grumpoholic 8 days ago \| parent \| context \| favorite \| on: No, it doesn't cost Anthropic $5k per Claude Code ... With speculative decoding you can use more models to speed up the generation however.

		help

salawat 5 days ago [–]

Yes, because speculation has NEVER bitten us in the ass before, right? Coughs in Spectre

Speculative decoding is just running more hardware to get a faster prediction. Essentially, setting more money on fire if you're being billed per token.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact