Hacker Newsnew | past | comments | ask | show | jobs | submit | ottah's commentslogin

How is that quote in any way demonstrative of this being written by LLM? You do know that LLMs were trained on the internet and every digitized text they could get their hands on? You are jumping at shadows, calm down already.

Ah yes, let's destroy the accessible web. We'll all pluck out our eyes to spite them.

This is still unacceptable.

Feels very pseudo academic.

I'm not sure we can say it's accelerating. The techniques that adversarial actors use has always been changing and when they shift tactics it can take a while for an adequate defense is adopted. We're still dealing with sql injection in the owasp top ten. What I think would indicate an acceleration is when the most security oriented organizations continuously fail to defend against new attacks. If we start hearing about JPMorgan and Google getting popped every month or two, we're in trouble.

The acceleration is in the decrease of the cost to produce misinformation.

Misinformation in pure text form has always been cheapest, but is even cheaper now that text generation is basically a solved problem. Photos have been more expensive, it used to take time and skill with a photo editor to produce a believable image of an event that never happened. The cost is now very low, it's mostly about prompting skills. Fake videos were considerably harder, especially coupled with speech. Just a few years ago I could assume any video I saw was either real or a time-consuming, deliberate fake.

We've now entered a time where fake videos of famous people take actual effort to tell apart, and can be produced for a low cost - something accessible to an individual, not a big corporation. We can have an entirely fake video of Trump, or another world leader, giving a speech and it will look like the real thing, with the audiovisual "tells" of it being fake getting harder to notice every few months.


> The acceleration is in the decrease of the cost to produce misinformation.

So it's a spam issue. And normally, while annoying it's possible to fight spam, however on these topics we have built structures that disable the very mechanisms allowing us to fight spam. That's worrying.

The fact that someone can instruct their computer to astroturf their flight tracking app on some forum for nerds is irrelevant - people have been instructing "marketing agencies" to astroturf their brand of caffeinated sugar water on tv, radio and press for decades and centuries. For a very long time the "traditional media" was aware that their ability to sell astroturfing capacity was hanging on their general trustworthiness. Then the internets rose to prominence, traditional media followed by selling more and more of their capacity to astroturfers. Now we have a worrying situation that the internets might be spammed by astroturfers a bit too much, but the backup is broken already. Now that's truly frightening.

Welcome to the post-truth world, where objective references outside of your own village cannot exist.


It's an algorithm issue. When people hold a media consumption device in front of their face all day and the algorithms are played, then it's literally a brainwashing device.

It is not an algorithm issue. It would still be a huge problem with zero algorithmic social media.

Possibly this just isn't the generation of hardware to solve this problem in? We're like, what three or four years in at most, and only barely two in towards AI assisted development being practical. I wouldn't want to be the first mover here, and I don't know if it's a good point in history to try and solve the problem. Everything we're doing right now with AI, we will likely not be doing in five years. If I were running a company like Apple, I'd just sit on the problem until the technology stabilizes and matures.

If I was running a company like Apple, I'd be working with Khronos to kill CUDA since yesterday. There are multiple trillions of dollars that could be Apple's if they sign CUDA drivers on macOS, or create a CUDA-compatible layer. Instead, Apple is spinning their wheels and promoting nothingburger technology like the NPU and MPS.

It's not like Apple's GPU designs are world-class anyways, they're basically neck-and-neck with AMD for raster efficiency. Except unlike AMD, Apple has all the resources in the world to compete with Nvidia and simply chooses to sit on their ass.


CUDA is not the real issue, AMD's HIP offers source-level compatibility with CUDA code, and ZLUDA even provides raw binary compatibility. nVidia GPUs really are quite good, and the projected advantages of going multi-vendor just aren't worth the hassle given the amount of architecture-specificity GPUs are going to have.

Okay, then don't kill CUDA, just sign CUDA drivers on macOS instead and quit pretending like MPS is a world-class solution. There are trillions on the table, this is not an unsolvable issue.

Admittedly, my use of CUDA and Metal is fairly surface-level. But I have had great success using LLMs to convert whole gaussian splatting CUDA codebases to Metal. It's not ideal for maintainability and not 1:1, but if CUDA was a moat for NVIDIA, I believe LLMs have dealt a blow to it.

You can convert CUDA codebases to Vulkan and DirectX code, for all the good it does you. You're still constrained by the architecture of the GPU, and Apple Silicon GPUs pre-M5 are all raster-optimized. The hardware is the moat.

Apple technically hasn't supported the professional GPGPU workflow for over a decade. macOS doesn't support CUDA anymore, Apple abandoned OpenCL on all of their platforms and Metal is a bare-minimum effort equivalent to what Windows, Android and Linux get for free. Dedicated matmul hardware is what Apple should have added to the M1 instead of wasting silicon on sluggish, rinky-dink NPUs. The M5 is a day late and a dollar short.

According to reports, even Apple can't quite justify using Apple Silicon for bulk compute: https://9to5mac.com/2026/03/02/some-apple-ai-servers-are-rep...


I mean, by any reasonable standard it still is. Almost any computer can run an llm, it's just a matter of how fast, and 0.4k/s (peak before first token) is not really considered running. It's a demo, but practically speaking entirely useless.

Devils advocate - this actually shows how promising TinyML and EdgeML capabilities are. SoCs comparable to the A19 Pro are highly likely to be commodified in the next 3-5 years in the same manner that SoCs comparable to the A13 already are.

That's actually pretty cool, but I'd hate to freeze a models weights into silicon without having an incredibly specific and broad usecase.

Depends on cost IMO - if I could buy a Kimi K2.5 chip for a couple of hundred dollars today I would probably do it.

I mean if it was small enough to fit in an iPhone why not? Every year you would fabricate the new chip with the best model. They do it already with the camera pipeline chips.

Sounds like just the sort of thing FGPA's were made for.

The $$$ would probably make my eyes bleed tho.


Current FPGAs would have terrible performance. We need some new architecture combining ASIC LLM perf and sparse reconfiguration support maybe.

Wouldn't it be the opposite of freezing weights?

Probably 15 to 20 years, if ever. This phone is only running this model in the technical sense of running, but not in a practical sense. Ignore the 0.4tk/s, that's nothing. What's really makes this example bullshit is the fact that there is no way the phone has a enough ram to hold any reasonable amount of context for that model. Context requirements are not insignificant, and as the context grows, the speed of the output will be even slower.

Realistically you need +300GB/s fast access memory to the accelerator, with enough memory to fully hold at least greater than 4bit quants. That's at least 380GB of memory. You can gimmick a demo like this with an ssd, but the ssd is just not fast enough to meet the minim specs for anything more than showing off a neat trick on twitter.

The only hope for a handheld execution of a practical, and capable AI model is both an algorithmic breakthrough that does way more with less, and custom silicon designed for running that type of model. The transformer architecture is neat, but it's just not up for that task, and I doubt anyone's really going to want to build silicon for it.


> Realistically you need +300GB/s fast access memory to the accelerator, with enough memory to fully hold at least greater than 4bit quants.

The latest M5 MacBook Pro's start at 307 GB/s memory bandwidth, the 32-core GPU M5 Max gets 460 GB/s, and the 40-core M5 Max gets 614 GB/s. The CPU, GPU, and Neural Engine all share the memory.

The A19/A19 Pro in the current iPhone 17 line is essentially the same processor (minus the laptop and desktop features that aren’t needed for a phone), so it would seem we're not that far off from being able to run sophisticated AI models on a phone.


Agree with the first part - but I can run GPT OSS 20b, a highly capable model on my laptop with 32GB of RAM at speeds that for all practical intents is as fast as GPT-5.4 and good enough for 90%+ of non-technical use cases.

As such I can't agree with "The only hope for a handheld execution of a practical, and capable AI model is both an algorithmic breakthrough" - we are much closer than 15/20 years to get these on a phone


With this work you can run a medium-sized model like GPT OSS 20b at native speed even while keeping those 32GB RAM almost fully available for other uses - the model seamlessly starts to slow down as RAM requirements increase elsewhere in the system and the fs cache has to evict more expert layers, and reaches full speed again as the RAM is freed up. It adds a key measure of flexibility to the existing AI local inference picture.

KV-cache is still quite small compared to the weights. It can stay in memory for reasonable context length, or be streamed to storage as a last resort. This actually doesn't impact performance too much, since we were already limited by having to stream in the much larger weights.

This should be the top comment

That's just such a massively oversized server for the number of gpus. It's not like they're doing anything special either. I can buy an appropriately sized supermicro chassis myself and throw some cards in it. They're really not adding enough value add to overspend on anything.

The major selling point of the tinyboxes is that you're able to run them in your office without any hassle.

I used to own a Dell Poweredge for my home-office, but those fans even on minimal setting kept me up at night


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: