and its a 20gb with the ability to do FP16 right? So theoritcally lots of the mo...

atty · on Sept 1, 2020

Well, they can be 40 GB fp32 models technically, but translating a model trained in fp32 to fp16 is not trivial (trust me, we’re working on this right now for a model). But remember that training the model requires a lot more memory than just the model parameters, because you need to store the gradients as well.