>. I have an M3 Ultra with 256GB of memory, Im sorry but spending this kind of m...

brulard · 2026-03-14T19:45:41 1773517541

Are you aware that your 3090s have nowhere close to 256GB of VRAM? Or maybe you are not aware that on macs you have unified memory (working both as RAM and VRAM).

ActorNightly · 2026-03-16T06:59:22 1773644362

Are you aware that having ram doesn't matter when your tokens/second is slow as shit?

You don't need to run large models, Gemma QAT 27B fits on one GPU and is quite good. Other models like Qwen3 are great for coding.

3090 gets 100+ tokens/second for QWEN, very close to what you would see with a cloud based model.

M3 ultra gets ~30.

Congrats, you played yourself.

brulard · 2026-03-21T22:53:21 1774133601

Did I? Not only are you comparing apples to oranges, you even provide misleading numbers.

3090 gets 20-30 tokens a second for dense ~30B models (QwQ 32B, Gemma 3 27B Q4), similar to M3 ultra. If you are talking about Qwen3-Coder 30B (MoE), then both 3090 and M3 Ultra are around ~70 tok/s.

But even if you were right about the speed - which you are not - speed is pointless if you need large model that wouldn't fit into your VRAM.

nozzlegear · 2026-03-14T20:10:41 1773519041

> a dual 3090 workstation that would have been better for pretty much everything

Doesn't run macOS

xiconfjs · 2026-03-14T09:38:57 1773481137

Except if you are living in a region where electricity is quite expensive :/