Can you elaborate a bit more why you chose llama.cpp? From their docs:

> "The main goal of llama.cpp is to run the LLaMA model using 4-bit integer quantization on a MacBook"

I'm not interested in running an LLM on embedded devices; the NN I want to deploy is really tiny and used for audio signal processing. It extracts information out of a small chunk of audio. The code needs to run on various embedded ARM chips.

What would be the advantage of using llama.cpp for this?