More

pella · 2026-03-14T18:21:43 1773512503

imho: the future is a specialized compressor optimized for your specific format. ( https://openzl.org/ , ... )

lucb1e · 2026-03-15T04:31:47 1773549107

It's good, but is it "the future" when it's extra work?

Consider that you could hand-code an algorithm to recognize cats in images but we would rather let the machine just figure it out for itself. We're kind of averse to manual work and complexity where we can brute force or heuristic our way out of the problem. For the 80% of situations where piping it into zstd gets you to stay within budget (bandwidth, storage, cpu time, whatever your constraint is), it's not really worth doing about 5000% more effort to squeeze out thrice the speed and a third less size

It really is considerably better, but I wonder how many people will do it, which means less implicit marketing by seeing it everywhere like we do the other tools, which means even fewer people will know to do it, etc.

cgag · 2026-03-14T19:30:31 1773516631

This seems very cool. Was going to suggest submitting it, but I see there was a fairly popular thread 5 months ago for anyone interested: https://news.ycombinator.com/item?id=45492803

srean · 2026-03-14T18:35:53 1773513353

That is an interesting link.

Does gmail use a special codec for storing emails ?

duskwuff · 2026-03-14T21:45:24 1773524724

The biggest savings for a service like GMail are going to be based around deduplication - e.g. if you can recognize that a newsletter went out to a thousand subscribers and store those all as deltas from a "canonical" copy - congratulations, that's >1000:1 compression, better than you could achieve with any general-purpose compression. Similarly, if you can recognize that an email is an Amazon shipping confirmation or a Facebook message notification or some other commonly repeated "form letter", you can achieve huge savings by factoring out all the common elements in them, like images or stylesheets.

dataflow · 2026-03-14T22:59:24 1773529164

I kind of doubt they would do this to be honest. Every near-copy of a message is going to have small differences in at least the envelope (not sure if encoding differences are also possible depending on the server), and possibly going to be under different guarantees or jurisdictions. And it would just take one mistake to screw things up and leak data from one person to another. All for saving a few gigabytes over an account's lifetime. Doesn't really seem worth it, does it?

srean · 2026-03-15T10:30:55 1773570655

That's why a base and a delta. Whereas PP was talking about general compression algorithm, my question was different.

In line with the original comment, I was asking about specialized "codecs" for gmail.

Humans do not read the same email many times. That makes it a good target for compression. I believe machines do read the same email many times, but that could be architected around.

srean · 2026-03-15T11:27:42 1773574062

Yes.

These and other email specific redundancies ought to be covered by any specialized compression scheme. Also note, a lot of standard compression is deduplication. Fundamentally they are not that different.

Given that one needs to support deletes, this will end up looking like a garbage collected deduplication file system.

pella · 2026-03-12T14:29:42 1773325782

other test:

2025-09-08 : "Big Data on the Move: DuckDB on the Framework Laptop 13"

"TL;DR: We put DuckDB through its paces on a 12-core ultrabook with 128 GB RAM, running TPC-H queries up to SF10,000."

https://duckdb.org/2025/09/08/duckdb-on-the-framework-laptop...

pella · 2026-03-02T06:31:32 1772433092

Thanks!

Looks similar to OpenZL ( https://openzl.org/ ) "OpenZL takes a description of your data and builds from it a specialized compressor optimized for your specific format."

Epa095 · 2026-03-02T08:36:22 1772440582

Honestly, Openzl looks even cooler! It would be cool to have it integrated with parquet and avro encoders. If I understand correctly the compressed files should be decompressable with standard tools.

pella · 2026-02-23T10:38:32 1771843112

Even existing hardware can fail, and swapping out memory or disks is expensive these days. :-(

pella · 2026-02-21T14:35:02 1771684502

3bit hard-wired Llama 3.1 8B ( https://taalas.com/the-path-to-ubiquitous-ai/ )

cyansmoker · 2026-02-21T19:52:45 1771703565

3bit is a bit ridiculous. From that page I am unclear if the current model is 3 or 4bit. If it’s 4bit… well, NVIDIA showed that a well organized model can perform almost as well as 8bit.

pella · 2026-02-21T14:33:23 1771684403

- https://news.ycombinator.com/item?id=47086181

- https://taalas.com/the-path-to-ubiquitous-ai/

- https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

pella · 2026-02-14T22:10:47 1771107047

you can use: "GLM 4.7"; "QWEN3 235B" ( https://www.cerebras.ai/inference )

pella · 2026-02-13T16:37:22 1771000642

https://archive.md/BxcHq

pella · 2026-02-08T11:00:08 1770548408

comments: https://news.ycombinator.com/item?id=46925934

pella · 2026-02-07T14:31:11 1770474671

Strange that this got upvotes without anyone checking the link.

https://x.com/brian_lovin/status/2019916549000417564