how would you empirically disprove that it doesn't have understanding? i can pro...

bigstrat2003 · 2026-03-10T00:05:34 1773101134

> how would you empirically disprove that it doesn't have understanding?

The complete failure of Claude to play Pokemon, something a small child can do with zero prior instruction. The "how many r's are in strawberry" question. The "should I drive or walk to the car wash" question. The fact that right now, today all models are very frequently turning out code that uses APIs that don't exist, syntax that doesn't exist, or basic logic failures.

The cold hard reality is that LLMs have been constantly showing us they don't understand a thing since... forever. Anyone who thinks they do have understanding hasn't been paying attention.

> i can prove that it does have understanding because it behaves exactly like a human with understanding does.

First, no it doesn't. See my previous examples that wouldn't have posed a challenge for any human with a pulse (or a pulse and basic programming knowledge, in the case of the programming examples). But even if it were true, it would prove nothing. There's a reason that in math class, teachers make kids show their work. It's actually fairly common to generate a correct result by incorrect means.

simianwords · 2026-03-10T06:36:55 1773124615

> The complete failure of Claude to play Pokemon, something a small child can do with zero prior instruction

cherry picking because gemini and gpt have beat it. claude doesn't have a good vision set up

> The "how many r's are in strawberry" question

it could do this since 2024

> The "should I drive or walk to the car wash" question

the SOTA models get it right with reasoning

> fact that right now, today all models are very frequently turning out code that uses APIs that don't exist, syntax that doesn't exist, or basic logic failures.

not when you use a harness. even humans can't write code that works in first attempt.

bigfishrunning · 2026-03-09T20:24:48 1773087888

We don't need to come up with a new board game. How about a board game that has been written about extensively for hundreds of years

LLMs can't consistently win at chess https://www.nicowesterdale.com/blog/why-llms-cant-play-chess

Now, some of the best chess engines in the world are Neural Networks, but general purpose LLMs are consistently bad at chess.

As far as "LLM's don't have understanding", that is axiomatically true by the nature of how they're implemented. A bunch of matrix multiplies resulting in a high-dimensional array of tokens does not think; this has been written about extensively. They are really good for generating language that looks plausible; some of that plausable-looking language happens to be true.

simianwords · 2026-03-09T20:29:00 1773088140

false, chess ELO is pretty good

https://maxim-saplin.github.io/llm_chess/

ets not cherry pick and actually see benchmarks please. i would say even ~1000 elo means that it can reason better than the average human.

bigfishrunning · 2026-03-09T20:36:08 1773088568

If you look at the "workflow" section of that page, they had to add a bunch of scaffolding around telling the model what moves are legal -- an llm can't keep enough context to know how to play chess; only to choose an advantageous move from a given list. But feel free to "cherry pick".

simianwords · 2026-03-09T20:38:09 1773088689

why do you think this falsifies that it can't reason?

simianwords · 2026-03-09T22:44:50 1773096290

i ran the benchmark without the valid moves tool as well as the three mistakes grace and gpt-5.4 holds well. it can achieve 1000 ELO which is much higher than my own.

this clearly tells me that GPT is good at chess, at least better than a normal person who has played ~30-40 games in their life.