Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if anyone at Google considered just contributing the DB and the domain to the Internet Archive.

After stripping any past statistical data from each entry, it shouldn't be that much of data per URL...



It’s a security issue to have a goo.gl domain that redirects to arbitrary pages. Attackers can find goo.gl links that go to expired domains, register them, and then they have a goo.gl link to use in phishing attacks.

Giving the domain to a 3rd party is not going to happen.


So it was already a security issue for goo.gl to exist in first place?

The point is whether Google considered any other options than keep operating it or burning everything to the ground. Google could also keep the domain and let users reach a intermediary landing page of the Internet Archive first


URL targets are secrets as they are expected to be non-public.

It would be a serious breach of trust for them to publish the database. It likely includes links to non-public YT video URLs, for example.


The database is being reverse engineered and published anyways, as per the article.


I think Archive is just rehydrating shortened links in webpages that have been archived. I doubt They’re discovering previously unknown urls.


No they really are trying to enumerate all 230 billion possible shortlinks; that’s why they need so many people to help crawl everything.


Got a source? I don’t see details one way or another


From the article:

> there are about 230 billion* links that need visiting

> * Thanks to arkiver on the Archive Team IRC for correcting this number.

Also when running the Warrior project you could see it iterating through the range. I don't have any logs handy since the project is finished but they looked a bit like

  https://goo.gl/gEdpoS: 404 Not Found
  https://goo.gl/gEdpoT: 404 Not Found
  https://goo.gl/gEdpoU: 302 Found -> https://...
  https://goo.gl/gEdpoV: 404 Not Found


Google won't do this IMO because that's all user data. A user submitted the pairing between the key and the value. If they released an entire database of user data they would run afoul of data privacy regulations.

They could possibly provide a GCP service where you make an authenticated request to look up the value of a given goo.gl key. That would mitigate fishing concerns, eliminate the pressure of running a productionized legacy service, and allow the to do use quotas etc to tamp down on abuse. But that also would be covered by the regulatory laws and I don't know what they say about such a thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: