A quick Google search reveals terms such as "sparse attention" that are used to ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		FartyMcFarter 2 days ago \| parent \| context \| favorite \| on: 1M context is now generally available for Opus 4.6... A quick Google search reveals terms such as "sparse attention" that are used to avoid quadratic runtime. I don't know if Anthropic has revealed such details since AI research is getting more and more secretive, but the architectural tricks definitely exist.

		help

vlovich123 1 day ago [–]

Then you need to do a little bit deeper research. No one just applies sparse attention at inference time for a model not trained for it. They do this at training time because otherwise the task performance degrades too much.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact