Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

and its a 20gb with the ability to do FP16 right? So theoritcally lots of the models can actually be 40gb?


Well, they can be 40 GB fp32 models technically, but translating a model trained in fp32 to fp16 is not trivial (trust me, we’re working on this right now for a model). But remember that training the model requires a lot more memory than just the model parameters, because you need to store the gradients as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: