"Less is best" is not a new realization. The concept exists across contexts. Music described as "overplayed". Prose described as verbose.
We just went through an era of compute that chanted "break down your monoliths". NPM ecosystem being lots of small little packages to compose together. Unix philosophy of small composable utilities is another example.
So models will improve as they are compressed, skeletonized down to opcodes, geometric models to render, including geometry for text as the bytecode patterns for such will provide the simplest model for recreating the most outputs. Compressing out useless semantics from the state of the machines operations and leaving the user to apply labels at the presentation layer.
Small models aren't more deterministic than large ones. Determinism comes from temperature and sampling settings, not parameter count. A 7B model at temp 0.7 is just as stochastic as a 405B model.
The "no moat" memo you linked was about open source catching up to closed models through fine-tuning, not about small models outperforming large ones.
I'm also not sure what "skeletonized down to opcodes" or "geometry for text as bytecode patterns" means in the context of neural networks. Model compression is a real field (quantization, distillation, pruning) but none of it works the way you're describing here.
> "Small models" will always outperform as they are deterministic (or closer to it).
Your whole comment feels like, pardon me, like LARPing. No, small models do not outperform the large ones, unless finetuned. Saying that as someone who uses small models 95% vs cloud ones.
The means to do so; code or delivery of a product; are eventually all depreciated, and thrown away. You eventually age into uselessness and die.
Suddenly having an epiphany it's not about code but product! way too late in the game, HN... you're just trying to look like you got it figured out and bring deep fucking value to humanity right as "idea to product without intermediary code layer" is about to ship[1]. You already missed your window.
You still don't get the change that's needed and happening due to automation; few of us want to put you on their shoulders and sing songs about you all.
Hop off the Hedonistic Treadmill and get some help.
[1] am working on idea to binary at day job, which will flood the market with options and drown yours out
This was realized in 2023 already: https://newsletter.semianalysis.com/p/google-we-have-no-moat...
"Less is best" is not a new realization. The concept exists across contexts. Music described as "overplayed". Prose described as verbose.
We just went through an era of compute that chanted "break down your monoliths". NPM ecosystem being lots of small little packages to compose together. Unix philosophy of small composable utilities is another example.
So models will improve as they are compressed, skeletonized down to opcodes, geometric models to render, including geometry for text as the bytecode patterns for such will provide the simplest model for recreating the most outputs. Compressing out useless semantics from the state of the machines operations and leaving the user to apply labels at the presentation layer.