He's burning Claude tokens to slightly improve his tiny and not very capable LLM? It's fun, I bet, but wake me up when it leads to a research breakthrough.
nanochat is super capable, the d34 (2.2b) variant is competitive with qwens of that size. Andrej is I assume building out the improvements in preparation for bigger training runs. We desperately need a truly open model, so i think this is incredibly important.