That last benchmark seemed like an impressive leg up against Opus until I saw th...

conradkay · 2026-03-05T19:57:02 1772740622

Sonnet was pretty close to (or better than) Opus in a lot of benchmarks, I don't think it's a big deal

jitl · 2026-03-05T20:03:55 1772741035

0123456789ABCDE · 2026-03-05T20:54:15 1772744055

maybe gp's use of the word "lots" is unwarranted

https://artificialanalysis.ai indicates that sonnect 4.6 beats opus 4.6 on GDPval-AA, Terminal-Bench Hard, AA Long context Reasoning, IFBench.

conradkay · 2026-03-06T03:47:32 1772768852

I was basing it off my recollection of this:

basically 9/13 are very close

osti · 2026-03-05T20:00:05 1772740805

It's only that one number that is for sonnet.

0123456789ABCDE · 2026-03-05T20:46:17 1772743577

except for the webarena-verified