Nope, there’s no tricks unless there’s been major architectural shifts I missed....

FartyMcFarter · 2026-03-14T10:32:34 1773484354

A quick Google search reveals terms such as "sparse attention" that are used to avoid quadratic runtime.

I don't know if Anthropic has revealed such details since AI research is getting more and more secretive, but the architectural tricks definitely exist.

vlovich123 · 2026-03-15T16:22:44 1773591764

Then you need to do a little bit deeper research. No one just applies sparse attention at inference time for a model not trained for it. They do this at training time because otherwise the task performance degrades too much.