Is there a reliable guide somewhere to setting up local AI for coding (please don’t say ‘just Google it’ - that just results in a morass of AI slop/SEO pages with out of date, non-self-consistent, incorrect or impossible instructions).
I’d like to be able to use a local model (which one?) to power Copilot in vscode, and run coding agent(s) (not general purpose OpenClaw-like agents) on my M2 MacBook. I know it’ll be slow.
I suspect this is actually fairly easy to set up - if you know how.
You're probably not going to get anything working well as an agent on an M2 MacBook, but smaller models do surprisingly well for focused autocomplete. Maybe the Qwen3.5 9B model would run decently on your system?
I tried the Zed editor and it picked up Ollama with almost no fiddling, so that has allowed me to run Qwen3.5:9B just by tweaking the ollama settings (which had a few dumb defaults, I thought, like assuming I wanted to run 3 LLMs in parallel, initially disabling Flash Attention, and having a very short context window...).
Having a second pair of "eyes" to read a log error and dig into relevant code is super handy for getting ideas flowing.
For LM Studio under server settings you can start a local server that has an OpenAI-compatible API. You'd need to point Copilot to that. I don't use Copilot so not sure of the exact steps there
Personally I'd start with llamafile [0] then move to compiling your own llama.cpp.
It's not as bad as you might think to compile llama.cpp for your target architecture and spin up an OpenAI compatible API endpoint. It even downloads the models for you.
I’d like to be able to use a local model (which one?) to power Copilot in vscode, and run coding agent(s) (not general purpose OpenClaw-like agents) on my M2 MacBook. I know it’ll be slow.
I suspect this is actually fairly easy to set up - if you know how.