I never actually touch any of the source HTML. I think that would simply take way too long and would probably result in some very high bandwidth charges. I use three tables from a public dump of Wikipedia's database, which unfortunately don't differentiate between where the links occur on the page. Check out the first section of my README[1] for more information.
Ohhhhh, yeah that changes things. I didn't realize the pagelinks database you mentioned was not generated by you based on the master Wiki dump that they keep available for download. In that case, yes this seems unsolvable without 1) petitioning them to update the code on their side to adjust for this and exclude these erroneous links or 2) deciding to create your own pagelink database based on the master wiki download and updating this database periodically (I forget how often Wiki updates the master downloads). While 1) is clearly the preferable option, it is unlikely to occur, unless maybe they are willing to add some columns to provide more details about the links and what sections, tags, classes, etc. they are a part of. That might be more palatable for Wiki to provide as it is just a code change on their end to supply more information rather than something possibly affecting many users by excluding the links altogether.
[1] https://github.com/jwngr/sdow#data-source