Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ok, what FlashAttention changes is space complexity: from O(N^2) to O(N). Time complexity is still ~O(N^2) as with standard Self-Attention.

In other words, optimizes practical runtime through I/O reduction without altering asymptotic complexity



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: