i would recommend getting an API account on fireworks, this is ZDR and typically the fastest provider.
otherwise check the list of providers on openrouter and you can see the pricing, quantisation, sign up directly rather than via a router. ensure to get caching prices, do not get input/output API prices.
GLM 5 is a frontier model, Kimi 2.5 is similar with vision support, Minimax M2.7 is a very capable model focused on tool calling.
If you need server side web search, you could use the Z AI API directly, again ZDR; or Friendli AI; or just install a search mcp.
For the harness opencode is the normal one, it has subagents and parallel tool calling; or just use claude code by pointing it at the anthropic APIs of various providers like fireworks.
If you want to still use APIs, I like OpenRouter because I can use the same credits across various models, so I'm not stuck with a single family of models. (Actually, you can even use the proprietary models on OpenRouter, but they're eye-wateringly expensive.)
Otherwise you should look into running e.g. Qwen3.5-35B-A3B or Qwen3.5-27B on your own computer. They're not Opus-level but from what I've heard they're capable for smaller tasks. llama.cpp works well for inference; it works well on both CPU and GPUs and even split across both if you want.
Not counting training models as part of your gross margin is just creative accounting. It's an inherent part of being able to provde the service for OpenAI, Anthropic etc.
Even so, their subscriptions are significantly cheaper than the token pricing via API. So at some point they will need to get rid of subscriptions or increase the subscription prices dramatically... And that's assuming their current token pricing is actually profitable. Which it probably isn't.
Lastly, I would not trust one word that comes out of an executive of an AI company (or any other large company, for that matter).
I tried figuring out the reference with Gemini, and it said this:
The immediate reply to that comment is: "On the internet, no one knows you're an editor." This is a direct play on the famous 1993 New Yorker cartoon: "On the Internet, nobody knows you're a dog." By setting the anecdote in 1987 (a few years before the World Wide Web was publicly available), the commenter is implying that back in the analog days, if a dog wanted to be a writer or an editor, they couldn't hide behind a screen—they had to sit in a smoky London pub and do business face-to-face.
Which makes a lot of sense actually. I would imagine that's what the replier to you thought you meant.
No I don't own them.
Some people here: https://www.mydealz.de/share-deal-from-app/2753917 mention as a downside that you have to connect to the manufacturer cloud to get the live production data from the interverter.
Otherwise it is truly "plug everything together and it works".
Whatever I used Sonnet 4.6 for, including Claude Code and Claude Chat, it made so many mistakes and totally awkward assumptions that I can’t fathom what it’s supposed to be good at. The mistakes were so blatant. Plan mode, several passes, couple grand in API costs… just disappointing at every task in every session over the past few weeks. Opus 4.6 has been good, still quite a few unexpected, silly mistakes, a few subtle but critical mistakes, but produced workable increments and code reviews, vastly subpar to GPT-5.x in chat mode (with and without identical customization).
They don't change the prices, they just modify the amount of compute allocated - slower speeds and fewer tokens, they can set everything in the background to optimize costs and returns, and the user never realizes anything has changed.
Sometimes they'll announce the changes, and they'll even try to spin it as improving services or increasing value.
Local AI capabilities are improving at a rapid pace, at some point soon we'll have an RWKV or a 4B LLM that performs at a GPT-5 level, with reasoning and all the bells and whistles, and hopefully that'll shake out most of the deceptive and shady tactics the big platforms are using.
> They don't change the prices, they just modify the amount of compute allocated - slower speeds and fewer tokens, they can set everything in the background to optimize costs and returns, and the user never realizes anything has changed.
I can't imagine that this is the way it will go... Tokens haven't been getting cheaper for flagship models, have they? You already see something closer to their real cost if you compare e.g. the Claude subscriptions to their actual token pricing.
> Local AI capabilities are improving at a rapid pace, at some point soon we'll have an RWKV or a 4B LLM that performs at a GPT-5 level, with reasoning and all the bells and whistles, and hopefully that'll shake out most of the deceptive and shady tactics the big platforms are using.
Maybe, but LLMs are scale game, and data center will always be more capable than your local device. So, you will always be getting a worse version locally. Or do you think we'll LLMs in data centers stop getting better and local LLMs will somehow catch up?
Well, I think in the context of the parent comment, separating out housing would risk overstating changes in its effect on purchasing power because increases to housing would already be captured by inflation (since
we're talking about real median income, which is already inflation adjusted)
I agree that housing affordability is a major problem and that looking at it independently could help you quantify if housing specifically has become more unaffordable, but that's a different question then whether the median person's overall purchasing power has declined (considering all of housing, healthcare food etc)
Yes housing prices are captured by inflation, but since housing prices are so different from region to region and city to city, just looking at a country-wide average median income does not say much.
housing prices and health care are captured by the "real" part of "median real income". Inequality is not, but as long as everybody is getting richer, I'm less concerned about inequality.
Inequality is still important even if everybody is getting richer (I disagree that everybody is getting richer, but that's a different discussion). Because wealth and power concentrated in the hand of a few means that a few have outsized influence on politics and society, which is very much eroding the fundaments of how a democracy is supposed to work.
Furthermore, there are many studies showing that more wealth inequality results in more consumption in a society. Which is not good for many reasons.
reply