Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you don't care about the details of how those model servers work, then something that abstracts out the whole process like LM Studio or Ollama is all you need.

However, if you want to get into the weeds of how this actually works, I recommend you look up model quantization and some libraries like ggml[1] that actually do that for you.

[1] https://github.com/ggerganov/ggml



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: