Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was getting interesting results until I noticed a pattern involving the presence of "Wayback machine" as the only connecting dot between really different things.

That adds noise, since articles now automatically use the wayback machine for "archived" links, thus generating many paths that do not really connect topics, just because the text "wayback machine" is part of the link text.

It may be an interesting exercise to find outliers like that and compute paths without those nodes.



I considered this and may eventually add an option to ignore those kinds of pages, but I ultimately felt like the current mode remains more true to my goal for the project which is to traverse the links as any human would be able to. By the way, the two pages with the most incoming links are "Geographic coordinate system" (1,047,096 incoming links) and "International Standard Book Number" (955,957 incoming links).


Indeed, the Wikipedia pages "Geographic coordinate system" and "International Standard Book Number" have the highest PageRank. See: https://www.nayuki.io/page/computing-wikipedias-internal-pag...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: