Hacker Newsnew | past | comments | ask | show | jobs | submit | tornikeo's commentslogin

Good lord. Reading all these comments makes me feel so much better for dumping anthropic the first time their opus started becoming dumber (circa Month ago). It feels like most people in this thread are somehow bound to Claude, even though it is alread fully enshittfied.

Given that they haven’t even gone public yet, doesn’t that seem like putting the cart before the horse a bit? And if they’re already enshittifying it won’t be long until the other placers start doing so as well. Have we passed peak LLM intelligence and are we now watching it degrade as they fail to roll these new advanced models out to their increasing user base? Are the finances not adding up?

Lots of questions.


Its quite possible there's some tacit collusion going on - it benefits both OAI and Anthropic to make moves that benefit both if they both intend to go public.

Oof. I know of a startup that recently Show HN'd here, the agent mail.to, that is NOT having a good time right now. I don't know what all these new startups having moats thinner than Durex are thinking -- like, what the plan if someone does what you do, faster and cheaper?

I'm building something similar (Dead Simple Email - same category, different pricing structure). The moat criticism is fair and worth being honest about. The defensible part isn't the feature set, it's infrastructure and price. We run our own mail servers rather than reselling SES, which gives us direct control over deliverability and costs. That's what lets us charge $29/mo for 100 inboxes where AgentMail is at $200. Whether that's a real moat or just a head start is a legitimate question. Email deliverability is genuinely hard to get right at scale, but I can't say with confidence they won't eventually just absorb this. Building fast and staying cheap is the only real answer I have to that.

“Each agent gets its own identity from a single domain.“ That too on the edge, along with the futuristic CF Dev primitives.

I had the same thought when I read this part. The $6MM investment on Agent Mail is in serious trouble right now.


> new startups having moats thinner than Durex are thinking

Haha, great visual. Really illustrative of what these AI startups and bootstrapped indie developers are dealing with (and, if I had to guess, why most of them don't go anywhere).


> We raised $6M in Seed Funding

Well that part was impressive. It looks like they focused on receiving emails, that is probably even worse, as I expect OpenAI/Anthropic to add such ability directly to agents, if it really is useful.


> It looks like they focused on receiving emails

That's wild. $6M for an MCP server for SMTP?


Classic "is this a feature or a product?" problem. You're going to have a bad time if you spend all your effort on a feature and nothing to set it apart.

Write an angry blog post about how big business is using their power to kill their _totally_ unique original idea that nobody could possibly copy in a hour?

The plan is to have exited by then. These people are mostly just grifters.

Forgive my senses, but this writing feels like a low effort Claude response. What's the point adding responses like this to a Show HN post? I don't think you are fooling anyone.

They're trying to build up new accounts with karma to astroturf products/services.

I swore to not be burned by google ever again after TensorFlow. This looks cool, and I will give this to my Codex to chew on and explain if it fits (or could fit what I am building right now -- the msx.dev) and then move on. I don't trust Google with maintaining the tools I rely on.

nice plug

I'm VERY curious about your case. What kind of switching costs do you guys have? I'm working at a very young startup that is still not locked into either AI provider harnesses -- what causes switching costs, just the subscription leftovers or something else?

subscription leftovers are noise. the real switching cost is the harness glue.

prompts. tool calling quirks. evals. auth. retries. all the weird failure modes your team already paid to learn.


My fork of claude code's leaked source was DMCA'd. I republished it on codeberg. There it will stay for future reference.

Edit: DMCA happened on github. Moved to codeberg since it's not beholden to US DMCA law.


What's the legal theory here? Is the argument that leaked source code loses copyright protection, or simply that Codeberg is outside US jurisdiction and therefore harder to enforce against?


They will file an appeal within 7 days and then it needs to be decided by a court. My guess is that since AI generated stuff isn’t copyright able that they will lose.


Are there any DMCA-resistant code hosting services I can use to host the leaked claude code source?

My fork of claude code's leaked source was DMCA'd on github at https://github.com/tornikeo/claude-code

This PR I opened contains the original code https://github.com/anthropics/claude-code/pull/41611 but I don't know how long that will last either.


Anna's Archive, no?


Sigh. Don't make me tap the sign [1]

[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html


Doesn't seem relevant here. TurboQuant isn't a domain-specific technique like the BL is talking about, it's a general optimisation for transformers that helps leverage computation more effectively.


On paper. There's huge financial incentive to quantize the crap out of a good model to save cash after you've hooked in subscriptions.


And there’s an incentive to publish evidence of this to discourage it, do you have any?


Models aren't just big bags of floats you imagine them to be. Those bags are there, but there's a whole layer of runtimes, caches, timers, load balancers, classifiers/sanitizers, etc. around them, all of which have tunable parameters that affect the user-perceptible output.


There really always is a man behind the curtain eh?



It's still engineering. Even magic alien tech from outer space would end up with an interface layer to manage it :).

ETA: reminds me of biology, too. In life, it turns out the more simple some functional component looks like, the more stupidly overcomplicated it is if you look at it under microscope.


There's this[1]. Model providers have a strong incentive to switch (a part of) their inference fleet to quantized models during peak loads. From a systems perspective, it's just another lever. Better to have slightly nerfed models than complete downtime.

[1]: https://marginlab.ai/trackers/claude-code/


So - as the charts say - no statistical difference?

Isn't this link am argument against the point you are making?


The chart doesn't cover the 4.6 release which was in the end of December/early January time frame. So, it's hard to tell from existing data.


That isn't true. The whole point it to quickly pick up statistically significant variations quickly, and with the volume of tests they are doing there is plenty of data.

If you turn on the 95% CI bands you can see there is plenty of statistical significance.


Unless you and I are looking at different web pages… it only goes back to February, not December or January.


Anybody with more than five years in the tech industry has seen this done in all domains time and again. What evidence you have AI is different, which is the extraordinary claim in this case...


Or just change the reasoning levels.


Beliefs are not rooted in facts. Beliefs are a part of you, and people aren't all that happy to say "this LLM is better than me"


I'm very happy to say calculators are far better than me in calculations (to a given precision). I'm happy to admit computers are so much better than me in so many aspects. And I have problem saying LLMs are very helpful tools able to generate output so much better than mine in almost every field of knowledge.

Yet, whenever I ask it to do something novel or creative, it falls very short. But humans are ingenious beasts and I'm sure or later they will design an architecture able to be creative - I just doubt it will be Transformer-based, given the results so far.


But the question isn't whether you can get LLMs to do something novel, it's whether anyone can get them to do something novel. Apparently someone can, and the fact that you can't doesn't mean LLMs aren't good for that.


Novel is a tricky word. In this case, the LLM produced a python program that was similar to other programs in its corpus, and this oython program generated examples of hypergraphs that hadn't been seen before.

That's a new result, but I don't know about novel. The technique was the same as earlier work in this vein. And it seems like not much computational power was needed at all. (The article mentions that an undergrad left a laptop running overnight to produce one of the previous results, that's absolute peanuts when compared to most computational research).


I have never seen a human produce a Python program that wasn't similar to other programs they'd seem.


So? I certainly have.


Truly novel? All art is derivative.


If all art is derivative then the earlier statement is a tautology.

People still call things other people do novel. There's clear social proof that humans do things that other humans consider novel. Otherwise the word would probably not exist.

Just today I wrote a python program that did not resemble anything I'd written before, nor had I seen anything similar. I had to reason it out myself. That passes thr test that the original comment set.


Your threshold for "resemble" is obviously quite high, which is fair, but assuming that you're an encultured programmer your python code represents other people's python code. It might be doing something novel, but that thing it's doing is interacting or in response to, or otherwise relative to existing concepts you learned or saw elsewhere. All art is derivative, we can do things other people haven't done before but all of it derives from the works of others in some way.

Anyway, I've coded all kinds of wacky shit with claude that I guarantee nobody has implemented before, if only because they're stupid and tedious ideas. They can't all be winners, but they were novel, and yet claude code implemented them as confidently as if they were yet another note taking app. They have no problem handling novel ideas, and although the novel ideas in this case were my own, its easy to see how finding new ideas could be automated by exploring the combinatorial space of existing ideas.


I'm not talking about wacky. My barrier for novel is 1) new capabilities 2) useful, and 3) end-to-end tested.

For example, what I refered to that I've written is a dynamic storage solution for n-dimensional grids, that can grow arbitrarily in any direction, and is locally dense (organized into spatially indexed blocks of contiguous data).

I had never considered this problem before, and I certainly had never seen a solution before (even though there may well be one).

I worked it out on paper, considering how integer lattices can be partitioned and indexed, and then I transformed that into a design which I then implemented. Working purely from the design, not considering existing solutions.


When it comes to LLMs doing novel things, is it just the infinite monkey theorem[0] playing out at an accelerated rate, helped along by the key presses not being truly random?

Surely if we tell the LLM to do enough stuff, something will look novel, but how much confirmation bias is at play? Tens of millions of people are using AI and the biggest complaint is hallucinations. From the LLMs perspective, is there any difference between a novel solution and a hallucination, other than dumb luck of the hallucination being right?

[0] https://en.wikipedia.org/wiki/Infinite_monkey_theorem


This argument doesn't go the way you want it to go. Billions of people exist, but maybe a few tens of thousands produce novel knowledge. That's a much worse rate than LLMs.


I’m not sure how we equate the number of humans to AI to determine a success rate.

We also can’t ignore than it was humans who thought up this problem to give to the AI. Thinking has two parts, asking and answering questions. The AI needed the human to formulate and ask the question to start. AI isn’t just dropping random discoveries on us that we haven’t even thought of, at least not that I’ve seen.


To have a proper discussion we would have to define the word "novel" and that's a challenge in itself. In any case, millions of poeple tried to ask LLMs to do something creative and the results were bland. Hence my conclusion LLMs aren't good for that. But I'm also open they can be an element of a longer chain that could demonstrate some creativity - we'll see.


It's not possible to know something without believing it to be true. https://en.wikipedia.org/wiki/Belief#/media/File:Classical_d...


This is objectively wrong. If that was the case every scientist performing a test would have always had their expectations and beliefs proven true. If you're trying to disprove something also because you believe it to be wrong you would never be proven wrong.


re-read your post - it's just a bunch of nonsense, no actual reasoning in there


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: