Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Crazy to think that my first personal computer's entire storage (was 160MB IIRC?) could fit into the L3 of a single consumer CPU!

It's probably not possible architecturally, but it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.




Context: Early in the firmware boot process the memory controller isn't configured yet so the firmware uses the cache as RAM. In this mode cache lines are never evicted since there's no memory to evict them to.


I remember the talk about the Wii/WiiU hacking they intentionally kept the early boot code in cache so that the memory couldn’t be sniffed or modified on the ram bus which was external to the CPU and thus glitchable.


There may be server workloads for which the L3 cache is sufficient, would be interesting if it made sense to create boards for just the CPU and no memory at scale.

I imagine for such a workload you can always solder a small memory chip to avoid having to waste L3 on unused memory and a non-standard booting process so probably not.


Most definitely, I work in finance and optimizing workloads to fit entirely in cache (and not use any memory allocations after initialization) is the de-facto standard of writing high perf / low latency code.

Lots of optimizations happening to make a trading model as small as possible.


In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024 bytes) 5.25" floppy disks (although the floppies were a few months after the computer). Eventually upgraded to 48K RAM and 180K double density floppy disks. The computer: Atari 800.


I'll see your Atari 800 and raise you my Atari 2600 with its whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call it a computer because you could buy a BASIC cartridge for it (I didn't and stand by that decision - it was pretty bad).


I thought the timex Sinclair 1000 win 2 Kbytes of ram was bad.

The membrane keyboard wasn’t great (the lack of a space bar was a wierd choice) but it did work. We had programs on casette and did get the 16Kbyte memory expansion.

https://en.wikipedia.org/wiki/Timex_Sinclair_1000

I didn’t realize the Atari 2600 had basic, always thought of it as a game console.


You can buy this bad boy [attiny11] with no ram, only registers.

https://ww1.microchip.com/downloads/en/DeviceDoc/1006S.pdf


> it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.

There’s actually already two running (MINIX and UEFI), and it’s the opposite OS amusing - https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...


KolibriOS would fit in there, even with the data in memory. You cannot load it into the cache directly, but when the cache capacity is larger than all the data you read there should be no cache eviction and the OS and all data should end up in the cache more or less entirely. In other words it should be really, really fast, which KolibriOS already is to begin with.


Unless you lay everything out continuously in memory, you’ll still get cache eviction due to associativty and depending on the eviction strategy of the CPU. But certainly DOS or even early Windows 95 could conceivably just run out of the cache


Windows 95 only needed 4MB RAM and 50 MB disk, so that's certainly doable. The trick is to have a hypervisor spread that allocation across cache lines.


Yeah, cache eviction is the reason I was assuming it is "probably not possible architecturally", but I also figured there could be features beyond my knowledge that might make it possible.

Edit: Also this 192MB of L3 is spread across two Zen CCDs, so it's not as simple as "throw it all in L3" either, because any given core would only have access to half of that.


Well, yeah, reality strikes again. All you need is an exploit in the microcode to gain access to AMD's equivalent to the ME and now you can just map the cache as memory directly. Maybe. Can microcode do this or is there still hardware that cannot be overcome by the black magic of CPU microcode?


I thought there was an MSR buried deep somewhere that enables "Cache as RAM" mode and basically maps the cache into the memory address space or something like that.

Lol a quick Google search leads me to a Linked in post with all the gory technical details?

https://www.linkedin.com/pulse/understanding-x86-cpu-cache-m...


My first PC had a 20MB HDD with 512Kb of RAM. So yeah that could fit into cache 10 times now.


Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to run multiple LLMs (a model entirely run for each task). Having robots like in the movies would need LLMs much much faster than what we see today.


doubtful that we will still have this computer architecture by then


You had ~160,000 times more storage than I did for my first personal computer.


Commodore PET for me - 8 KB of RAM and all the data you could store and read back from a TDK 120 cassette tape . . .

* https://en.wikipedia.org/wiki/Commodore_PET

Same time as the Trash-80 and BBC micro were making inroads.


IIRC some relatively strange CPUs could run with unbacked cache.


Intel's platform, at the very least, use cache-as-ram during the boot phase before the DDR interface can be trained and started up. https://github.com/coreboot/coreboot/blob/main/src/soc/intel...


I wonder how much faster dos would boot, especially with floppy seek times...


Instantly.

If you run a VM on a CPU like this, using a baremetal hypervisor, you can get very close to "everything in cache".


You can get close with a VM, but there's overhead in device emulation that slows things down.

Consider a VM where that kind of stuff has been removed, like the firecracker hypervisor used for AWS Lambda. You're talking milliseconds.


My first computer whole RAM could fit in L1 of a single core (128k)


My first pc had 40MB hrs and 8MB ram :D


640K ought to be enough for anybody.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: