Is there a reliable guide somewhere to setting up local AI for coding (please do...

AstroBen · 2026-03-13T17:19:55 1773422395

Ollama or LM Studio are very simple to setup.

You're probably not going to get anything working well as an agent on an M2 MacBook, but smaller models do surprisingly well for focused autocomplete. Maybe the Qwen3.5 9B model would run decently on your system?

jrmg · 2026-03-13T17:50:30 1773424230

Right - setting up LM studio is not hard. But how do I connect LM Studio to Copilot, or set up an agent?

NortySpock · 2026-03-13T18:06:49 1773425209

I tried the Zed editor and it picked up Ollama with almost no fiddling, so that has allowed me to run Qwen3.5:9B just by tweaking the ollama settings (which had a few dumb defaults, I thought, like assuming I wanted to run 3 LLMs in parallel, initially disabling Flash Attention, and having a very short context window...).

Having a second pair of "eyes" to read a log error and dig into relevant code is super handy for getting ideas flowing.

brcmthrowaway · 2026-03-13T17:55:42 1773424542

Basically LM Studio has a server that serves models over HTTP (localhost). Configure/enable the server and connect OpenCode to it.

Try this article https://advanced-stack.com/fields-notes/qwen35-opencode-lm-s...

I'm looking for an alternative to OpenCode though, I can barely see the UI.

AstroBen · 2026-03-13T18:09:14 1773425354

Codex also supports configuring an alternative API for the model, you could try that: https://unsloth.ai/docs/basics/codex#openai-codex-cli-tutori...

AstroBen · 2026-03-13T18:03:22 1773425002

It looks like Copilot has direct support for Ollama if you're willing to set that up: https://docs.ollama.com/integrations/vscode

For LM Studio under server settings you can start a local server that has an OpenAI-compatible API. You'd need to point Copilot to that. I don't use Copilot so not sure of the exact steps there

kristianp · 2026-03-14T03:40:17 1773459617

https://github.com/ggml-org/llama.cpp/releases - has mac binaries

https://unsloth.ai/docs/models/qwen3.5 - running locally guide for the Qwen 3.5 family of models, which have a range of different sizes.

randusername · 2026-03-13T21:06:10 1773435970

Personally I'd start with llamafile [0] then move to compiling your own llama.cpp.

It's not as bad as you might think to compile llama.cpp for your target architecture and spin up an OpenAI compatible API endpoint. It even downloads the models for you.

[0]: https://github.com/mozilla-ai/llamafile

thexa4 · 2026-03-13T23:28:47 1773444527

I've created a llama.cpp integration with Copilot in vscode. The extension readme contains setup instructions: https://marketplace.visualstudio.com/items?itemName=delft-so...

chatmasta · 2026-03-13T18:41:54 1773427314

Any time I google something on this topic, the results are useful but also out of date, because this space is moving so absurdly fast.