Anyone compare to ollama? I had good success with latest ollama with ROCm 7.4 on...

RealFloridaMan · 2026-04-02T14:44:49 1775141089

It is optimized for compatibility across different APIs as well as has specific hardware builds for AMD GPUs and NPUs. It’s run by AMD.

Under the hood they are both running llama.cpp, but this has specific builds for different GPUs. Not sure if the 9070 is one, I am running it on a 370 and 395 APU.

martin-adams · 2026-04-02T15:52:12 1775145132

I just compared this on my Mac book M1 Max 64GB RAM with the following:

Model: qwen3.59b Prompt: "Hey, tell me a story about going to space"

Ollama completed in about 1:44 Lemonade completed in about 1:14

So it seems faster in this very limited test.

nezhar · 2026-04-02T18:00:34 1775152834

I'm also curious about this one, also I want to compare this to vLLM.

iugtmkbdfil834 · 2026-04-02T12:35:54 1775133354

Seconded. Currently on ollama for local inference, but I am curious how it compares.

LumielGR · 2026-04-02T14:11:43 1775139103

Lemonade is using llama.cpp for text and vision with a nightly ROCm build. It can also load and serve multiple LLMs at the same time. It can also create images, or use whisper.cpp, or use TTS models, or use NPU (e.g Strix Halo amdxdna2), and more!

metalliqaz · 2026-04-02T14:48:09 1775141289

better than Vulkan?

cpburns2009 · 2026-04-02T14:58:18 1775141898

In my experience using llama.cpp (which ollama uses internally) on a Strix Halo, whether ROCm or Vulkan performs better really depends on the model and it's usually within 10%. I have access to an RX 7900 XT I should compare to though.

metalliqaz · 2026-04-02T15:38:32 1775144312

Perhaps I should just google it, but I'm under the impression that ollama uses llama.cpp internally, not the other way around.

Thanks for that data point I should experiment with ROCm

naasking · 2026-04-02T17:36:57 1775151417

From what I understand, ROCm is a lot buggier and has some performance regressions on a lot of GPUs in the 7.x series. Vulkan performance for LLMs is apparently not far behind ROCm and is far more stable and predictable at this time.

cpburns2009 · 2026-04-02T16:15:51 1775146551

I meant ollama uses llama.cpp internally. Sorry for the confusion.

0x457 · 2026-04-02T17:30:26 1775151026

For me Vulkan performs better on integrated cards, but ROCm (MIGraphX) on 7900 XTX.

nijave · 2026-04-03T12:42:57 1775220177

As I understand it, it depends on your GPU and ROCm version but they're similar-ish

hrmtst93837 · 2026-04-02T16:06:07 1775145967

[flagged]

metalliqaz · 2026-04-02T16:42:57 1775148177

I was talking about ROCm vs Vulkan. On AMD GPUs, Vulkan has been commonly recognized as the faster API for some time. Both have been slower than CUDA due to most of the hosting projects focusing entirely on Nvidia. Parent post seemed to indicate that newer ROCm releases are better.

naasking · 2026-04-02T17:39:43 1775151583

Yes, Vulkan is currently faster due to some ROCm regressions: https://github.com/ROCm/ROCm/issues/5805#issuecomment-414161...

ROCm should be faster in the end, if they ever fix those issues.