The biggest savings for a service like GMail are going to be based around deduplication - e.g. if you can recognize that a newsletter went out to a thousand subscribers and store those all as deltas from a "canonical" copy - congratulations, that's >1000:1 compression, better than you could achieve with any general-purpose compression. Similarly, if you can recognize that an email is an Amazon shipping confirmation or a Facebook message notification or some other commonly repeated "form letter", you can achieve huge savings by factoring out all the common elements in them, like images or stylesheets.
I kind of doubt they would do this to be honest. Every near-copy of a message is going to have small differences in at least the envelope (not sure if encoding differences are also possible depending on the server), and possibly going to be under different guarantees or jurisdictions. And it would just take one mistake to screw things up and leak data from one person to another. All for saving a few gigabytes over an account's lifetime. Doesn't really seem worth it, does it?
That's why a base and a delta. Whereas PP was talking about general compression algorithm, my question was different.
In line with the original comment, I was asking about specialized "codecs" for gmail.
Humans do not read the same email many times. That makes it a good target for compression. I believe machines do read the same email many times, but that could be architected around.
These and other email specific redundancies ought to be covered by any specialized compression scheme. Also note, a lot of standard compression is deduplication. Fundamentally they are not that different.
Given that one needs to support deletes, this will end up looking like a garbage collected deduplication file system.