No, not in milliseconds if you have longish context. Prefill is very compute hea...

vlovich123 · 2026-02-23T04:32:33 1771821153

Depends how you’re defining it. There can be a lot of it to ingest so it’s a lot of compute in absolute terms. It’s also much more memory efficient since it’s batchable so, it’s more likely to be compute bound, but you can also throw a lot of resources at the problem. But in terms of time generation can be significantly more expensive since it’s slower and you can’t batch (only use a draft model)