Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every company is subject to constraints. A bigger budget is not an infinite budget. And there is no tradeoff between efficiency and raw power. An optimization that lets you build a similarly powerful model for less money also lets you build a more powerful model for the same amount of money.

Honestly, I wonder what you think closed LLM companies do R&D on if not optimizations. And the nature of research is that most ideas that sound good turn out duds, so they already need to have an established process for testing many ideas quickly. Now if somebody publishes a new idea they haven't tried yet, setting up an experiment to try it out is just a routine task... But they aren't going to tell anybody the results, just quietly integrate it if it works.

 help



I concede we can't be sure what they do since it's proprietary. Aside leaks which give us a sense of the philosophy.

It's clear to me the economics would make the likes of OpenAI and Anthropic's focus on raw power over optimisations. I never meant they wouldn't optimise anything, but it's earlier diminishing returns vs for a company like Alibaba, or even Mistral.

The Chinese models were trained in the context of compute scarcity. So it isn't the same for them as "routine" optimisations, it's optimisations or nothing.

A year or two later those optimisations allowed their models to be somewhat on par with raw power models from the US providers.

Now despite papers being published, a design is rather sticky, it's not as simple as plugging an optimisations another lab came up with. It depends what the optimisation, perhaps multi head wasn't that big of a deal to add in, MoE would have been less so easy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: