Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I concede we can't be sure what they do since it's proprietary. Aside leaks which give us a sense of the philosophy.

It's clear to me the economics would make the likes of OpenAI and Anthropic's focus on raw power over optimisations. I never meant they wouldn't optimise anything, but it's earlier diminishing returns vs for a company like Alibaba, or even Mistral.

The Chinese models were trained in the context of compute scarcity. So it isn't the same for them as "routine" optimisations, it's optimisations or nothing.

A year or two later those optimisations allowed their models to be somewhat on par with raw power models from the US providers.

Now despite papers being published, a design is rather sticky, it's not as simple as plugging an optimisations another lab came up with. It depends what the optimisation, perhaps multi head wasn't that big of a deal to add in, MoE would have been less so easy.

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: