Hacker Newsnew | past | comments | ask | show | jobs | submit | nomilk's commentslogin

Seems odd this was flagged. Seems like a genuine article, and didn't appear to cause a flamewar. Possible false positive?

cc: @dang


tl;dr: The proportion of Reddit comments containing emdashes more than doubled since ChatGPT's release!

From the data: for several years before ChatGPT release, the proportion of Reddit comments containing emdashes was about 0.13-0.17%. For the few years after ChatGPT came out, it's between 0.17% and 0.41%.


Click the % button in the top right

I noticed it shortly after commenting, and completely rewrote my comment accordingly. Excellent research! If you were to do a write-up on how you did this analysis, that would be very interesting (as the number of comments involved is large).

The "dev notes" in the top right links to https://intervolz.com/emdash-observer-writeup/ I downloaded torrents of reddit comments, and processed them in Go, written using AI assistance. Then Intervolz did this thing and wrote it up.

Wait, why would people make torrents of reddit comments?

The misleading aspect is that the AI generated content was in first person, so any reasonable reader would falsely attribute the statement to the person involved, when in fact it was concocted entirely by Meta's AI.

From Wikipedia:

> The Sega Channel was an online game service developed by Sega for the Sega Genesis video game console, serving as a content delivery system. Launched on December 12, 1994, the Sega Channel was provided to the public by TCI and Time Warner Cable through cable television services by way of coaxial cable. It was a pay to play service, through which customers could access Genesis games online, play game demos, and get cheat codes. Lasting until July 31, 1998, the Sega Channel operated three years after the release of Sega's next generation console, the Sega Saturn. Though criticized for its poorly timed launch and costly subscription fee, the Sega Channel has been praised for its innovations in downloadable content and impact on online game services.

https://en.wikipedia.org/wiki/Sega_Channel


> I found (it) not actionable

Tangental, but I found 'Have I Been Pwned' useless too because you can't enter your email and find leaked passwords associated with the address, instead you have to enter each password (and repeat for every password you want to check).

I know there's an explanation that the raw password is not being sent and instead being hashed locally and only part of the hash is sent. But I don't know how to verify that and it feels wild to type passwords into a random website. (if anyone knows how to verify HIBP does only what it says it does [rather than blindly trust and hope for the best], would love to read more about it)


I always thought that it could be reasonably simple to have a safe alternative. Have people enter a SHA256 of their password instead, and match against a database of other hashes.

Almost everyone interested in checking for password leaks knows how to generate SHA256 of a string. And those who don't shouldn't put their passwords on the internet.

Or even better, generate hash for all passwords in the database, package these hashes together with a simple search script and let people download it. That way, you are not sending any information anywhere, and noone can exploit the passwords, because hash is a one way function.

Then again, that download could be really large. I admit I have no idea how much storage would that take. But it's just text, so easily compressible. And with some smart indexing, it should be possible to keep most compressed and only unpack a relatively small portion to find a complete match.

Then again, I have virtually no background in cryptography, could be something horribly wrong with this.


That's already what is happening...

When you do a check on https://haveibeenpwned.com/Passwords nothing is sent to the server. Instead the password is hashed locally and a list of the hash range is downloaded, which contains all the hashes and the number of occurrences.

The server doesn't receive the password, neither in plain-text nor hash form.


They meant you submit the checksum instead of your password. Replace "Password to check" with "Checksum to check"

It would be easy enough to add this as a "secret" feature:

* user submits password * gets hashed client side * server compares it against stored hashes * server also re-hashes the stored hash, and compares it against the hash received from the client

This would effectively mean that either entering the password, or the password hash would correctly match, since when entering the hash you are effectively "double" hashing the password which gets compared to the double hashed password on the server.

The upside is that users who don't understand hashing or don't feel like opening a sha256 tool wouldn't have to change their behavior or even be confused by a dialog explaining why they should hash the input, while advanced users could find out about the feature via another channel (e.g. hackernews).

The downside would be that it adds an extra hash step to every comparison on the sever. It's hard to know how expensive this would be for them.


Care to explain how you can tell what scripts gp was sent for the page https://haveibeenpwned.com/Passwords and what scripts he will be sent on future visits?

Well of course a hostile actor could use this incredibly accessible resource to test a bunch of emails and find their passwords.

Though perhaps there could be a service where you enter in an email address and it sends an email to that address containing the passwords. That would be a slightly more complicated server to set up though


Im 99% sure this is exactly what HIBY used to do, and changed their processes. I'm unsure if this was due to government pressures or what

OK, I would pay for this service.

It doesn't use any information that's not already exposed.

It reveals the extent of my problem to me.


> (if anyone knows how to verify HIBP does only what it says it does [rather than blindly trust and hope for the best], would love to read more about it)

I recall HIBP documents their hashing protocol so that it should be possible to have a non-web client you can trust more.

https://haveibeenpwned.com/API/v3#PwnedPasswords


There's an API[0] that takes a prefix of a hash.

I don't know how to verify what the website does, but I think that in a few minutes I'll be able to put together a CURL call that does what we're hoping the website does.

[0]https://haveibeenpwned.com/API/v3#PwnedPasswords


Bitwarden's web vaults has a reports feature which allows you to check this in bulk.

tl;dr LLMs hallucinate

No. Per Oxford Languages…

> (of an artificial intelligence program or tool)

> produce a response that appears to be accurate or plausible but that contains inaccurate or misleading information.

The examples cited in the article were, in my opinion, neither accurate nor plausible.

In this case, I would say, LLMs lie.


I feel lying implies intent, and that the "appears to be accurate or plausible" only means on a superficial level.

That’s one definition. I can’t quote Oxford without a login, so in this case I have to use M-W:

> to create a false or misleading impression

https://www.merriam-webster.com/dictionary/lie?src=search-di...

The verb(2) form, definition 2.


This thread sounds like a community challenge.

> The exposed secrets aren't theoretical test tokens or placeholders: they include active credentials.

How do security researchers verify this?


Delivery apps like Grab and Uber Eats are even worse since they have even more perverse incentives (minimising delivery time and maximising 'sponsored' listings).

Other than being willing to scroll a lot, I haven't found any great ways to find new restaurants when using delivery apps, and I'm sure I use them far less because of the tedium involved. I think scraping listings and re-doing the algorithm yourself (as per post) is perhaps the best approach. E.g. Just being able to rank by user rating and filter for no less than 200 reviews and within 5km would be an outstanding improvement on the status quo, which is always the 50 closest restaurants to the delivery address - what a coincidence! - with a few 'sponsored' listings thrown in.


> Bypassing the Mouse.. I use Vimium in the browser.

Vimium seems great for navigation.

Is there any way to get vim keybindings inside text boxes? (I looked at 'wasavi' chrome extension which hasn't been updated in 8 years [0] and the website's down [1])

[0] https://github.com/akahuku/wasavi

[1] http://appsweets.net/wasavi/


You could use real vim in there with ghosttext, but it's not a native integration, you'd have a separate editor window

Another upside is (if your editor is properly setup to not lose data) that a page crash will never lose your precious long carefully crafted comment since it will persist in the editor


After a miserable 30 min tussling with python dependencies, finally got it working. But it's very cool! Thank you. This message sent from neovim! :)

I've heard of https://github.com/glacambre/firenvim but I've also heard you might run into issues clashing with browser keybinds

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: