Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Last time I played with it, 70b models are much larger than 24gb without a lot of quantization.


It mentioned Q4, but after searching around a bit looks like 70B-Q4 need 35GB or so. So strix halo is 2.2x faster than a 4090 when it's paging to system ram.

Not so impressive 8-(.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: