Hacker Newsnew | past | comments | ask | show | jobs | submit | g-mork's commentslogin

This is a hilarious comparison given Amsterdam's own history with regard to immigration. Not even historically but contemporarily too.. Just Eat, probably the largest employer of bargain bucket labour across Europe today is headquartered in Amsterdam

It was snippy and unclear, after the edit, its that and weak. I’d motion you just delete. Not sure literal slaves is comparable to a company that pays bargain basement salaries

It is hilarious, because it is blinded by our own self-imposed optics. It has been our policy to import droves of immigrant workers who have little hope but to take up gig economy jobs often illegally and remain fixed at the same (or worse) levels of economic status as the day they arrived in the country. Yes in Dubai they simply confiscate passports. At least they're honest about it

The kafala system confiscates your passport. You can't quit, can't switch employers, can't leave the country. People die in labor camps building these vanity projects. The UN classifies it as modern slavery.

A Just Eat rider in Amsterdam can quit tomorrow and sue their employer. Those aren't the same thing. You can criticize Europe's treatment of immigrant workers without pretending the difference is just honesty.


My answer to this is simply rolling back to the pro plan for interactive usage in the coming month, and forcefully cutting myself over to one of the alternative Chinese models to just get over the hump and normalise API pricing at a sensible rate with sensible semantics.

Dealing with Claude going into stupid mode 15 times a day, constant HTTP errors, etc. just isn't really worth it for all it does. I can't see myself justifying $200/mo. on any replacement tool either, the output just doesn't warrant it.

I think we all jumped on the AI mothership with our eyes closed and it's time to dial some nuance back into things. Most of the time I'm just using Opus as a bulk code autocomplete that really doesn't take much smarts comparatively speaking. But when I do lean on it for actual fiddly bug fixing or ideation, I'm regularly left disappointed and working by hand anyway. I'd prefer to set my expectations (and willingness to pay) a little lower just to get a consistent slightly dumb agent rather than an overpriced one that continually lets me down. I don't think that's a problem fixed by trying to swap in another heavily marketed cure-all like Gemini or Codex, it's solved by adjusting expectations.

In terms of pricing, $200 buys an absolute ton of GLM or Minimax, so much that I'd doubt my own usage is going to get anywhere close to $200 going by ccusage output. Minimax generating a single output stream at its max throughput 24/7 only comes to about $90/mo.


I put in probably thousands of Claude session hours a month, aggregated across work + personal.

I must be missing something or supremely lucky because I feel like I’ve never hit these “stupid” moments.

If I do, it’s probably because I forgot to switch off of haiku for some tiny side thing I was doing before going back to planning.


There are 720 hours in a month. You'd have to be running 3 sessions in parallel continuously to be doing thousands of session-hours in a month. Are individual people really doing this?!

Our developers work office hours, but would frequently have 10 plus sessions open. Massive parallelism is one of the benefits of agentic coding.

Similar usage here. But I encountered this moments, and I chalk it up to the random nature of LLMs. Back in Sonnet 3.5 days, it would happen every other day. I even build an 'you are absolutely right' tracker back then to measure it. Opus 4.6, maybe once or twice a month.

Yes, subjectively there do seem to be moments where the quality of the output drops significantly - usually during US peak hours.

It's possible that it's simply paranoia, but moments where Opus starts acting like Haiku seem to correlate with periods of higher latency and HTTP errors. Don't like reporting this because it's so hand-wavy and conspiratorial, but it's difficult not to think they're internally using extraordinary measures of some sort to manage capacity.

But even when Opus is running healthy, it still doesn't address the underlying issue that these models can only do so much. I have had Opus build out a bunch of apps but I'm still finding my time absorbed as soon as it comes to anything genuinely exceeding "CRUD level difficulty". Ask it to fix a subtle visual alignment issue, make a small change to a completely novel algorithm, or just fix a tiny bug without having to watch for "Oh, this means I should rewrite module <X>" is something that simply isn't possible while still being able to stand over the work.

It's not to say I don't get a massive benefit from these tools, I just think it's possible to be asking too much of them, and that's maybe the real problem to solve.


Most people hate reading. Therefore they don't know how to write. Therefore they can't prompt properly. Not to mention so many "enemies of logic" cults being so strong nowadays.

I literally hit my 5 hour window limit in 1.5 hours every single day now.

2 weeks ago, I had only hit my limit a single time and that was when I had multiple agents doing codebase audits.


Are you monitoring the size of your context windows? As they grow, so does the cost of every operation performed in that state.

Anthropic had a special extra usage promotion going on during non-peak hours that ended recently.

They didn’t do a great job of explaining it. I wonder how many people got used to the 2X limits and now think Anthropic has done something bad by going back to normal


They also reduced the peak time limits, so it's not just the promotion.

Naw, it's not that. This is business-day usage for all of it.

Irrelevant. I had at least ten times more usage then at any time

Could it also have anything to do with Anthropic being deliberately opaque about usage in general?

I've been using Codex extensively, 5.4 at "Extra High" and yet to hit a limit. The $20 plan

It very much depends on the workloads. If you inspect existing code (that somebody else wrote over the years) usage runs out quickly. If you are building your own greenfield stuff the sky is the limit.

> If you inspect existing code (that somebody else wrote over the years) usage runs out quickly.

That's EXACTLY and ALL I've been doing!

Using Codex and Claude both side by side to view my Godot components framework open source project (link in profile)

Claude has been..ugh.. bad, to put it mildly, on the same content and the same prompts.


They've been running a "double credits" promo for several weeks, which expired on the first of this month.

I think my next steps are: 1) try out openai $20/month. I've heard they're much more generous. 2) try out open router free models. I don't need geniuses, so long as I can see the thinking (something that Claude code obfuscates by default) I should be good. I've heard good things about the CLIO harness and want to try openrouter+clio

I'm taking a bet on local models to do the non genius work. Gemma 4 (released yesterday) has been designed to run on laptops / edge devices....and so far is running pretty well for me.

How’s Gemma 4 been?

Edge models are good for their purpose but putting them in agentic flow with current ollama quants on a Mac Mini I see high tool use error rate and output hallucination.

For JSON to text formatting it works well on a one-round basis. So I think you should realistically have an evaluation ready to go so you can use it on these models. I currently judge them myself but people often use a smart LLM as judge.

Today writing eval harness with Claude is 5 min job. Do it yourself so you can explore as quants on Gemma get better.


Word on the street is that Opus is much much larger of a model than GPT-5.4 and that’s why the rate limits on Codex are so much more generous. But I guess you could also just switch to Sonnet or Haiku in Claude Code?

Openrouter free models have 50 requests per day limit + data collection. As per their doc.

You can charge $10 on the account and get unlimited requests. I abused this last week with the nemotron super to test out some stuff and made probably over 10000 requests over a couple of days and didn't get blocked or anything, expect 5xx errors and slowdowns tho.

i tried out gpt 5.4 xhigh and it did meaningfully worse with the same prompt as opus 4.6. like, obvious mistakes

Fwiw I run this eval every week on a set of known prompts and I believe the in group differences are bigger than out group.

That is I get more variance between opus 4.6 and itself than I do between the sota models.

I don’t have the budget for statistical relevance but I’m convinced people claiming broad differences are just vibing, or there are times when agent features make a big difference.


> I think we all jumped on the AI mothership with our eyes closed

Oh no, there's plenty of us willing to say we told you so.

What's more interesting to me is what it's going to look like if big companies start removing "AI usage" from their performance metrics and cease compelling us to use it. More than anything else, that's been the dumbest thing to happen with this whole craze.


Please don't use grossly offensive terms in this forum. That sort of language is not welcome here.

Oops, fixed

Since when are you a moderator?

Since when are you a meta-moderator? ;)

Every service is being sold at a deep discount chasing market share, but it's not lasting forever.

Speaking only personally of course, I'm completely over the chat idiom in almost every way. Where is all this future demand coming from? By the time Android lands a God mode ultimate voice assistant it's pretty much guaranteed I will be well beyond the point where I'd want to use it. The whole thing is starting to remind me of 3G video calling where the networks thought it'd change everything, and by the end of it with all the infrastructure in place, the average user has made something like 0.001 3G-native video calls over the lifetime of their usage.

Would really love some path forward where the AI parts only poke out as single fields in traditional user interfaces and we can forget this whole episode


I agree with you and the GP post, even though I am an LLM enthusiast.

My primary interest is using small edge models to perform specific engineering tasks. In this pursuit I do like to use gemini-cli or Antigravity with Claude a few times a week as coding assistants, but I am using relatively few tokens to do this.

I also waste a lot of time, but this is fun time: experimenting with open source coding agents with local models just to see what kinds of results I can get. This is mostly a waste of time, but I enjoy it.

My other favorite use pattern: once or twice a week I like to use the iOS Gemini app in voice mode, and once a month also use video input. I really like this, but it is not life changing.

Externalities matter: I never use frontier LLM-based AI without thinking of energy, data center, and environmental costs.


I don't understand this perspective. I can't imaging a point where I won't want to ask "what's the weather like?" "please turn off the lights" "what is the airspeed of an unladen swallow?" likewise chatting through directing it to build something or solve a problem, voice or typing will each have their place.

And video calling did take off, plenty of people use facetime and almost everybody working in an office uses some form of video calls. Criticizing the early attempts at getting video calling working because they hadn't taken off yet (I remember them being advertised on "video phones" with 56k modems), of course someone was going to have the idea and implement before it was quite reasonable.


> I can't imaging a point where I won't want to ask "what's the weather like?" "please turn off the lights"

To help with understanding that perspective, I cannot imagine a scenario where I would ask a device connected to the internet to turn off the lights. I literally never wanted this. A physical switch is a 100% non negotiable for me. I feel the same way about non-mechanical car doors.

Perhaps due to that outlook I was always puzzled about the entire idea of an "assistant". It's interesting for me to see, that there are people out there who actually want that "assistant".


The switch is a necessity.

Ever end up cooking or something when the phone/doorbell rings and you want to pause the music? Have your hands full and wanted to open a door? Hear the weather and then the news as you brew coffee or put your shoes on (without interaction with a bright screen)?

You should save some money and keep some privacy doing it your way :)


Have you never... asked a person a question? to do something for you? to pass the salt? what time it was?

Maybe you're a little strange but it cannot be that much of a stretch for you to consider using speech to ask for things.

Not wanting to hide things behind Internet connected computers is fine, being unable to imagine wanting to use your voice to ask for things is a little silly.


Not OP but for me it comes down to "asking a person" ≠ "asking a device". Besides just to be pedantic one of the thing you've described is not something an llm would be able to do, and for the second one... That's what watches and clocks are for. You don't need to have a datacenter running smwh in the world or a beefy PC to take a glance at the time. If you think you do, I personally wouldn't call others "a little strange" if I were you.

You don't watch Iron Man and want a JARVIS? Current systems are pretty far away from that, but that's the overall draw.

I don't watch superhero stuff. But even with a more classical example of Space Odyssey 2001 - a talking computer has never been something I found even remotely interesting. It took me months to give LLMs a serious try due to this.

I guess everybody's different. I personally like the idea of being able to, say, ask what events I have on my calendar for the day while I'm getting dressed, and be able to get a summary and then engage in followup conversation about it. Or have a little reminder that says, it's time to leave in a few minutes, would you like to turn on your car's climate control? It's not to replace my normal computer usage with a voice interface, but to add new capabilities.

Are you using the Chinese models through their individual services or via an intermediary layer?

I am not the person you are responding to but I have tried both: using OpenRouter and also giving a Chinese company $5 on my credit card to buy tokens. If I know what model I want to experiment with, I much prefer to just pay $5 and have plenty of tokens to experiment. On a yearly basis, this is a very tiny expense for the benefits of getting plenty of tokens to experiment with.

This is what I did, downgraded to pro and pay for opencode zen for the open models. I like the combo of the two

Oh, https://opencode.ai/zen looks good. I like pay as you go plans since I usually don’t use many tokens compared to vibe coders.

I regret paying Google for a one year AI subscription last spring (although it was a deep discount over the regular $20/month cost) because it has kept me from experimenting with many venders (but it was a fantastic deal financially).

I just put a reminder on my calendar to try OpenCode zen when my subscription ends.


> I think we all jumped on the AI mothership with our eyes closed and it's time to dial some nuance back into things.

I’m kind of confused by these takes from HN readers. I could see LinkedIn bros getting reality checked when they finally discover that LLMs aren’t magic, but I’m confused about how a developer could go all-in on AI and not immediately realize the limitations of the output.


It has indeed been baffling. Ad I dig deeper into what developers are doing with AI, it's basically like what I did customizing and tweaking emacs when I was younger (and fine, I'll admit I still do it sometimes). They are having so much fun playing with these new tools that they aren't really noticing how little the new tools are actually helping them

> immediately realize the limitations of the output.

I'm "all-in" on AI code generation. I very much realise their limitations, it's like any tool really. I do think they're magic, you just need to learn how to weld the power.


250 ms f/4 ISO 512000 in case anyone was wondering. I wonder if they applied any denoise, it looks great for such high ISO

Saw a comment here yesterday referencing the Attention Is All You Need paper title in a tongue in cheek way. Kinda fun to imagine the friend/romance angle is just a bunch of socially awkward folk at OpenAI misinterpreting the original paper

Another instinctual reaction here. This specific formulation pops out of AI all the time, there might as well have been an emdash in the title

That etched 3 colour look to this very day remains the peak of modern aesthetic for me. Thought it was so cool and sophisticated when I was 12

Agreed it is amazing

I've read so much trump spam recently that on reading this my first thought was that you misspelled winning hehe

Planet announced last week there will be a 14 day delay on all commercial satellite imagery from the middle east. It shocks me how transparent we are about information war and voluntarily lying to ourselves at particular moments


CPU compute is infinity times less expensive and much easier to work with in general


Less expensive how? The reason GPUs are used is because they are more efficient. You CAN run matmul on CPUs for sure, but it's going to be much slower and take a ton more electricity. So to claim it's "less expensive" is weird.


In situations where you have space CPU power but not spare GPU power because your GPU(s) & VRAM are allocated to be busy on other tasks, you might prefer to use what you have rather than needing to upgrade that will cost (even if that means the task will run more slowly).

If you are wanting to run this on a server to pipe the generated speech to a remote user (live, or generating it to send at some other appropriate moment) and your server resources don't have GPUs, then you either have to change your infrastructure, use CPU, or not bother.

Renting GPU access on cloud systems can be more expensive than CPU, especially if you only need GPU processing for specific occasional run tasks. Spinning up a VM to server a request then pulling it down is rarely as quick as cloud providers like to suggest in advertising, so you end up keeping things alive longer than absolutely needed meaning spot-pricing rates quoted are lower than you end up paying.


This is far too simplistic, you can't discuss perf per watt unless you're talking about a job running at any decent level of utilisation. Numbers like that only matter for larger scale high utilisation services, meanwhile Intel boxes mastered the art of power efficient idle modes decades ago while almost any contemporary GPU still isn't even remotely close, and you can pick up 32 core boxes like that for pennies on the dollar.

Even if utilisation weren't a metric, "efficient" can be interpreted in so many ways as to be pointless to try and apply in the general case. I consider any model I can foist into a Lambda function "efficient" because of secondary concerns you simply cannot meaningfully address with GPU hardware at present (elasticity and manageability for example). That it burns more energy per unit output is almost meaningless to consider for any kind of workload where Lambda would be applicable.

It's the same for any edge-deployed software where "does it run on CPU?" translates to "does the general purpose user have a snowball's chance in hell of running it?", having to depend on 4GB of CUDA libraries to run a utility fundamentally changes the nature and applicability of any piece of software

A few years ago we had smaller cuts of Whisper running at something like 0.5x realtime on CPU, people struggled along anyway. Now we have Nvidia's speech model family comfortably exceeding 2x real time on older processors with far improved word error rate. Which would you prefer to deploy to an edge device? Which improves the total number of addressable users? Turns out we never needed GPUs for this problem in in the first place, the model architecture mattered all along, as did the question, "does it run on CPU?".

It's not even clear cut when discussing raw achievable performance. With a CPU-friendly speech model living in a Lambda, no GPU configuration will come close to the achievable peak throughput for the same level of investment. Got a year-long audio recording to process once a year? Slice it up and Lambda will happily chew through it at 500 or 1000x real time


GPUs are a near monopoly. There are at least handful of big players in the CPU space. Competition alone makes the latter space a lot cheaper.

Also, for inference (and not training) there are other ways to efficiently do matmuls besides the GPU. You might want to look up Apple's undocumented AMX CPU ISA, and also this thing that vendors call the "Neural Engine" in their marketing (capabilities and the term's specific meaning varies broadly from vendor to vendor).

For small 1-3B parameter transformers like TADA, both these options are much more energy efficient, compared to GPU inference.


Yep same, I'd sooner starve than cut my Anthropic sub


If tomorrow Claude pricing changes and they start charging real API costs like 2000+ USD, and there is another service: "NotReallyClaude" that is a bit less good but 200 USD, then what would you do ?


Man, they really got good at hitting that dopamine button, huh?


Don't forget Wine ships a faithful notepad.exe reimplementation. It should run just fine on Windows

edit: just checked the version that ships with Steam on Linux, yep, works great in a VM


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: