> Why is the instruction tuning pushing such a noticeable style shift?
Gwern Branwen has been covering this: https://gwern.net/doc/reinforcement-learning/preference-lear....
> Why is the instruction tuning pushing such a noticeable style shift?
Gwern Branwen has been covering this: https://gwern.net/doc/reinforcement-learning/preference-lear....