More

suprjami · 2026-03-26T11:17:27 1774523847

Dual 3060s run 24B Q6 and 32B Q4 at ~15 tok/sec. That's fast enough to be usable.

Add a third one and you can run Qwen 3.5 27B Q6 with 128k ctx. For less than the price of a 3090.

suprjami · 2026-03-07T09:58:35 1772877515

Look up "USB RF remote" on eBay. There are two common ones you'll see everywhere. I have one for my Kodi system.

suprjami · 2026-03-06T23:44:19 1772840659

Please anyone make a new Slack. 4Gb RAM for a slow chat client with a bad interface is just so slovenly it should be illegal.

suprjami · 2026-02-28T22:11:35 1772316695

Unsloth Dynamic. Don't bother with anything else.

rahimnathwani · 2026-03-01T15:34:36 1772379276

For anyone else trying to run this on a Mac with 32GB unified RAM, this is what worked for me:

First, make sure enough memory is allocated to the gpu:

  sudo sysctl -w iogpu.wired_limit_mb=24000

Then run llama.cpp but reduce RAM needs by limiting the context window and turning off vision support. (And turn off reasoning for now as it's not needed for simple queries.)

  llama-server \
    -hf unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL \
    --jinja \
    --no-mmproj \
    --no-warmup \
    -np 1 \
    -c 8192 \
    -b 512 \
    --chat-template-kwargs '{"enable_thinking": false}'

You can also enable/disable thinking on a per-request basis:

  curl 'http://localhost:8080/v1/chat/completions' \
  --data-raw '{"messages":[{"role":"user","content":"hello"}],"stream":false,"return_progress":false,"reasoning_format":"auto","temperature":0.8,"max_tokens":-1,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"xtc_probability":0,"xtc_threshold":0.1,"typ_p":1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"samplers":["penalties","dry","top_n_sigma","top_k","typ_p","top_p","min_p","xtc","temperature"],"chat_template_kwargs": { "enable_thinking": true }}'|jq .

If anyone has any better suggestions, please comment :)

suprjami · 2026-03-02T04:59:03 1772427543

Shouldn't you be using MLX because it's optimised for Apple Silicon?

Many user benchmarks report up to 30% better memory usage and up to 50% higher token generation speed:

https://reddit.com/r/LocalLLaMA/comments/1fz6z79/lm_studio_s...

As the post says, LM Studio has an MLX backend which makes it easy to use.

If you still want to stick with llama-server and GGUF, look at llama-swap which allows you to run one frontend which provides a list of models and dynamically starts a llama-server process with the right model:

https://github.com/mostlygeek/llama-swap

(actually you could run any OpenAI-compatible server process with llama-swap)

rahimnathwani · 2026-03-02T05:04:44 1772427884

I didn't know about llama-swap until yesterday. Apparently you can set it up such that it gives different 'model' choices which are the same model with different parameters. So, e.g. you can have 'thinking high', 'thinking medium' and 'no reasoning' versions of the same model, but only one copy of the model weights would be loaded into llama server's RAM.

Regarding mlx, I haven't tried it with this model. Does it work with unsloth dynamic quantization? I looked at mlx-community and found this one, but I'm not sure how it was quantized. The weights are about the same size as unsloth's 4-bit XL model: https://huggingface.co/mlx-community/Qwen3.5-35B-A3B-4bit/tr...

suprjami · 2026-03-02T08:40:51 1772440851

Yes that's right. The config is described by the developer here:

https://www.reddit.com/r/LocalLLaMA/comments/1rhohqk/comment...

And is in the sample config too:

https://github.com/mostlygeek/llama-swap/blob/main/config.ex...

iiuc MLX quants are not GGUFs for llama.cpp. They are a different file format which you use with the MLX inference server. LM Studio abstracts all that away so you can just pick an MLX quant and it does all the hard work for you. I don't have a Mac so I have not looked into this in detail.

BoredomIsFun · 2026-03-01T10:21:50 1772360510

FYI UD quants of 3.5-35BA3B are broken, use bartowski or AesSedai ones.

regularfry · 2026-03-01T14:15:42 1772374542

They've uploaded the fix. If those are still broken something bad has happened.

rahimnathwani · 2026-02-28T22:36:08 1772318168

UD-Q4_K_XL?

suprjami · 2026-02-28T22:01:50 1772316110

The cheapest option is two 3060 12G cards. You'll be able to fit the Q4 of the 27B or 35B with an okay context window.

If you want to spend twice as much for more speed, get a 3090/4090/5090.

If you want long context, get two of them.

If you have enough spare cash to buy a car, get an RTX Ada with 96G VRAM.

barrkel · 2026-02-28T22:34:33 1772318073

Rtx 6000 pro Blackwell, not ada, for 96GB.

suprjami · 2026-03-01T13:17:22 1772371042

Ah thanks.

The names are so good and not repetitious.

No not the RTX 6000. No not the A6000...

chr15m · 2026-03-01T03:36:12 1772336172

Thanks this is a great summary of the tradeoffs!

suprjami · 2026-02-26T20:20:17 1772137217

Big deal, so does every other company.

If you're lonely just upload a few AI keywords to a repo. You'll get emails forever.

suprjami · 2026-02-24T05:59:51 1771912791

At 8 years old I was able to expertly dismantle many radios.

Was still a few years away from reassembly.

nananana9 · 2026-02-24T06:19:52 1771913992

At 8 years I recycled filesystem directories. I didn't know you can create new folders, so when I needed one I grabbed a random one from C:\Windows, moved it to my desktop and deleted its contents.

desertrider12 · 2026-02-25T04:51:38 1771995098

That’s funny. When I was little I found “format” in my mp3 player’s settings. Thought it would customize the UI or something, but instead I ended up with no music for the rest of the road trip.

jdthedisciple · 2026-02-24T06:25:07 1771914307

Makes total sense, it used to be called "Recycle Bin" after all!

whamlastxmas · 2026-02-24T16:10:31 1771949431

I wonder if Microsoft did focus group testing and found understandably computer illiterate people were concerned about "trashing" files meant they were somehow permanently using up HDD space

geoffbp · 2026-02-24T06:34:33 1771914873

Worked ok til it was a system dir and the system wouldn’t boot anymore? :)

jdefr89 · 2026-02-24T18:01:06 1771956066

No better way to learn System32 folder was essential is Windows than by destroying your family computer by removing it.

energy123 · 2026-02-24T08:42:54 1771922574

I deleted the files from there to free up disk space

fragmede · 2026-02-24T10:06:05 1771927565

I don't need autoexec.bat or config.sys! it's got some garbage in there that I don't understand, so it must not be important.

nephihaha · 2026-02-24T09:55:19 1771926919

I was doing that at three or four and was reminded of it constantly for the next ten years or more. (I actually raised the subject on my mother's death bed.)

fuzzfactor · 2026-02-24T10:47:10 1771930030

When I was a boy all we had were high-voltage vacuum tube electronics, it was fun.

suprjami · 2026-02-22T20:39:24 1771792764

Next step is to skip the bread and eat Nutella from the jar with a spoon.

michaelteter · 2026-02-22T20:54:07 1771793647

I find dipping a cashew in is even better. The cashew becomes your scooper. The combination is divine.

suprjami · 2026-02-21T21:20:27 1771708827

It feels to me there are plenty of people running these because "just trust the AI bro" who are one hallucination away from having their entire bank account emptied.

hrpnk · 2026-02-22T16:01:08 1771776068

Exactly, I've seen people who bought a Mac Mini and ended up running claw against a claude subscription. Completely misunderstand the point of local models. Plus, there was more hype about running claw way cheaper on Raspberry Pi which cost the stock price of Raspberry maker to skyrocket.

Some of the comments here show that technical people set these things up for non-technical people, which is just one step away from a misstep. Time will show whether this is similar in behavior to the "I can run it" mindset that people had with local models before. A small dopamine hit to see "it can be done" in order to end up a cloud service in the long run.

suprjami · 2026-02-18T09:09:05 1771405745

OpenXcom adds a whole heap of wonderful conveniences to UFO/X-Com. It's probably my favourite open source game engine clone thing.

https://openxcom.org/

snvzz · 2026-02-18T11:13:35 1771413215

Dolls / Girls Frontline 2: Exilium[0][1][2] is a modern take on the XCOM concept.

Free (but gacha.)

0. https://gf2exilium.sunborngame.com/main

1. https://gf2.haoplay.com/jp/

2. https://store.steampowered.com/app/3308670/GIRLS_FRONTLINE_2...

suprjami · 2026-02-18T11:21:40 1771413700

> gacha

Hard pass.

snvzz · 2026-02-19T03:26:12 1771471572

Spending money is actually entirely optional.

I personally haven't, because it would take the fun out of the game.

suprjami · 2026-02-19T11:54:58 1771502098

Yeah I know. I lost 2 years to Azur Lane. That was enough.

snvzz · 2026-02-19T16:14:40 1771517680

Love them shipgirls, eh?

Right now, I am caught up with gfl2, and having a blast with Arknights: Endfield. The factory must grow!

In a few weeks, I'll probably be working on my projects and not touching any games at all, as I was just a few weeks ago.

Two years is indeed a bit too much. Got to do something else when it stops being fun. I had to learn that lesson with a few months of DOTA2; it can turn into a job, except that it produces nothing of value.

lionkor · 2026-02-18T15:54:49 1771430089

> my favourite open source game engine clone thing

For me that's OpenMW

https://openmw.org/