Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You could first calculate the distance of the first n bits (eg: 64, one popcountll) as a first pass, then calculate the full distance for candidates over a threshold from the first pass. It makes it approximate, but depending on the application it can be worth it.


I was thinking of something similar — instead of just two passes, couldn’t you also store different quantized values? If you have thousands of documents, you could narrow it down to a handful with a few bit-wise Hamming comparisons before doing a full cosine similarity on just the rest. If you hand more than one bitmap stored, you’d have fewer comparisons at each step too.

Would this work?




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: