Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, I think the "reasoning" step really does make a difference here.

There's more than just next token prediction going on. Those reasoning chain of thoughts have undergone their own reinforcement learning training against a different category of samples.

They've seen countless examples of how a reasoning chain would look for calculating a mortgage, or searching a flight, or debugging a Python program.

So I don't think it is accurate to describe the eventual result as "just next token prediction". It is a combination of next token production that has been informed by a chain of thought that was based on a different set of specially chosen examples.



Do you believe it's possible to produce a given set of model weights with an infinitely large number of different training examples?

If not, why not? Explain.

If so, how does your argument address the fact that this implies any given "reasoning" model can be trained without giving it a single example of something you would consider "reasoning"? (in fact, a "reasoning" model may be produced by random chance?)


> an infinitely large number of different training examples

Infinity is problematic because its impossible to process an infinite amount of data in a finite amount of time.


I'm afraid I don't understand your question.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: