Hacker Newsnew | past | comments | ask | show | jobs | submit | 3abiton's commentslogin

What did Apple do?

Grabbed up as much ram as they could, nearly no questions asked, at above market rates in some cases, ramping up the perceived demand and decreasing supply significantly.

There is a possibility the IRGC are trolling, given it's April's Fool today.

Kill they Supreme leader and 40 other leaders, destroy their Navy and Airforce and give them 30 days of B1 and B2 night and day bombings, and they decide it still worth it to joke on Aprils Fools ? :-) I have to give to them...

I guess this is supposed to be funny but I wouldn’t take that chance.

This is the bet of many of the big AI companies, and why they're subsidizing majorly the calls. With the latest cracks by the US gov, it seems Anthropic is starting to reduce those subsidies given their edge in the game. I am starting to consider local models more seriously beside just testing, but nowadays the ram/gpu market is bloated.

Local models just don't seem that useful for me for these particular tasks yet - the most recent versions of Codex and Claude Opus are the first time I've found them to be particularly useful in a "real engineering" context that isn't just vibe coding.

Google's TurboQuant might help address this, but it also might just widen the gap even further.

I am far on the skeptic edge when it comes to the generative AI side of ML tools though, so do take my opinion with that weight.


Turboquant is totally irrelevant compared to current quantization methods. It has been thoroughly test by people who build inferencing engines for local models. It's all talk no actual meat to it.

Do you have any reading on this? I find it hard to believe something announced a week ago has been “thoroughly tested”.

Their paper TurboQuant (TQ) is not new per say. It's released last year, and heavily rehash of old ideas that were released a year prior (RabitQ). There is also [a bit of drama](https://openreview.net/forum?id=tO3ASKZlok) there that boils down to what it seems a bit of malpractice for google's researchers. TQ does few things: it claims better compression quality and speed, and better KV cache handling. Currently KV cache takes a load of resources beside that of the model itself. Many people applied different quantization strategy for it, but the quality degradation is a too apparent. Enter Attention Rotation. This seems to have genuinely helped KV cache compression as per [llama.cpp latest tests](https://github.com/ggml-org/llama.cpp/pull/21038). On the other hand, [ik_llama.cpp](https://www.reddit.com/r/LocalLLaMA/comments/1s7nq6b/technic...) did tests on the quality of turboquant-3 compared to IQ4 quantized models, and yhe quality degradation is much worse. So it's 2 things: KV compression -> good. Turboquant quantazation -> not good.

This is pretty much my case right now. BM25 is so useful in many cases and having with with postgres is neat!

Tbh, I think distillation is happening both ways. And at this stage, "quality" is stagnating, the main edge is the tooling. The harness of CC seems to be the best so far, and I wonder if this leak would equalize the usability.

This highly depends on the detection mechanism.

It's funny you mention that. The only difference is sometimes you need a functionality without doing the plumbing. At the end of the day if you're getting the output you need, the process doesn't matter. It's an interesting analogy but only works if the inspector is another expert dev.

When I have such moment and I take a step back, there’s usually a strong hint that there’s a meta problem behind those instances. And while you have to chose when to take the time to solve such problem, it’s usually worth it.

I wish I could always take the time to do things right, but in reality, time is extremely scarce resource. It's when these AI agent help the most.

It will be heavily still reliant onexpert human input and interactions. Knuth is an expert, and know how to guide.

It would be funny/sad if this would end up being a clawdbot thingy or whatever it is called now.

It's similar, but the bot is piloting a human.

It is amazing how you can order so many small sensors from aliexpress, around 1-2€ each, and having in a week or two delivered. I am not sure we will have this for long.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: