It isn't going to replace cloud LLMs since cloud LLMs will always be faster in throughput and smarter. Cloud and local LLMs will grow together, not replace each other.
I'm not convinced that local LLMs use less electricity either. Per token at the same level of intelligence, cloud LLMs should run circles around local LLMs in efficiency. If it doesn't, what are we paying hundreds of billions of dollars for?
I think local LLMs will continue to grow and there will be an "ChatGPT" moment for it when good enough models meet good enough hardware. We're not there yet though.
Note, this is why I'm big on investing in chip manufacture companies. Not only are they completely maxed out due to cloud LLMs, but soon, they will be double maxed out having to replace local computer chips with ones that are suited for inferencing AI. This is a massive transition and will fuel another chip manufacturing boom.
It works really well for "You're helpful assistant / Hi / Hello there. how may I help you today?" Anything else (esp in non-EN language) and you will see the limitations yourself. just try it.
It's a $4,000 GPU with 32GB of VRAM and needs a 1,000 watt PSU. It's not realistic for the masses.
If it has something like 80GB of VRAM, it'll cost $10k.
The actual local LLM chip is Apple Silicon starting at the M5 generation with matmul acceleration in the GPU. You can run a good model using an M5 Max 128GB system. Good prompt processing and token generation speeds. Good enough for many things. Apple accidentally stumbled upon a huge advantage in local LLMs through unified memory architecture.
Still not for the masses and not cheap and not great though. Going to be years to slowly enable local LLMs on general mass local computers.
Crazy thing to say without other contextual information - it obviously depends on a number of factors. Do you have an apples to apples comparison at hand?
Looking at downvotes I feel good about SDE future in 3-5 years. We will have a swamp of "vibe-experts" who won't be able to pay 100K a month to CC. Meanwhile, people who still remember how to code in Vim will (slowly) get back to pre-COVID TC levels.
What is CC and TC? I have not heard these abbreviations (except for CC to mean credit card or carbon copy, neither of which is what I think you mean here).
Yea I get that there will always be demand for local waifus. I never said local LLMs won't be a thing. I even said it will be a huge thing. Just won't replace cloud.
I'm not convinced that local LLMs use less electricity either. Per token at the same level of intelligence, cloud LLMs should run circles around local LLMs in efficiency. If it doesn't, what are we paying hundreds of billions of dollars for?
I think local LLMs will continue to grow and there will be an "ChatGPT" moment for it when good enough models meet good enough hardware. We're not there yet though.
Note, this is why I'm big on investing in chip manufacture companies. Not only are they completely maxed out due to cloud LLMs, but soon, they will be double maxed out having to replace local computer chips with ones that are suited for inferencing AI. This is a massive transition and will fuel another chip manufacturing boom.