Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Opus 4.6 was getting this wrong only last week.


Oh wow, Sonnet still isn't handling it well:

Opus 4.6: Drive (https://claude.ai/share/d57fef01-df32-41f2-b1dc-07de7916bdc7)

Opus 4.5: Drive (https://claude.ai/chat/a590cac1-100a-490b-b0a2-df6676e1ae99)

Opus 3.0: Walk (https://claude.ai/chat/372c144c-d6eb-43f5-b7ea-fd4c51c681db)

Sonnet 4.6: Walk (https://claude.ai/share/1f2a80f3-4741-40a5-8a05-7349ea1a17e5)

Sonnet 4.5: Walk (https://claude.ai/share/905afeb6-ffc9-4b4b-a9ee-4481e5cfd527)

Favorite answer, using my default custom instructions: "Drive. Walking there means... leaving your car at home? Walk it there on a leash? Walk if you want the exercise, but you're bringing the car either way."


This is because it is without thinking enabled. Of course the results are disappointing.


It seems entirely fair to evaluate a product based on the baseline that the company itself offers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: