The biggest savings for a service like GMail are going to be based around dedupl...

dataflow · 2026-03-14T22:59:24 1773529164

I kind of doubt they would do this to be honest. Every near-copy of a message is going to have small differences in at least the envelope (not sure if encoding differences are also possible depending on the server), and possibly going to be under different guarantees or jurisdictions. And it would just take one mistake to screw things up and leak data from one person to another. All for saving a few gigabytes over an account's lifetime. Doesn't really seem worth it, does it?

srean · 2026-03-15T10:30:55 1773570655

That's why a base and a delta. Whereas PP was talking about general compression algorithm, my question was different.

In line with the original comment, I was asking about specialized "codecs" for gmail.

Humans do not read the same email many times. That makes it a good target for compression. I believe machines do read the same email many times, but that could be architected around.

srean · 2026-03-15T11:27:42 1773574062

Yes.

These and other email specific redundancies ought to be covered by any specialized compression scheme. Also note, a lot of standard compression is deduplication. Fundamentally they are not that different.

Given that one needs to support deletes, this will end up looking like a garbage collected deduplication file system.