More

kgeist · 2026-04-05T21:51:46 1775425906

Qwen3.5 comes in various sizes (including 27B), and judging by the posts on HN, /LocalLlama etc., it seems to be better at logic/reasoning/coding/tool calling compared to Gemma 4, while Gemma 4 is better at creative writing and world knowledge (basically nothing changed from the Qwen3 vs. Gemma3 era)

Mil0dV · 2026-04-05T22:06:19 1775426779

Does this also apply to gemma's 26B-A4B vs say Qwens 35B-A3B?

I'm not sure if I can make the 35B-A3B work with my 32GB machine

green7ea · 2026-04-06T06:27:13 1775456833

It should be easy with a Q4 (quantization to 4 bits per weight) and a smallish context.

You won't have much RAM left over though :-/.

At Q4, ~20 GiB

https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF

rhdunn · 2026-04-06T10:06:02 1775469962

For llama-server (and possibly other similar applications) you can specify the number of GPU layers (e.g. `--n-gpu-layers`). By default this is set to run the entire model in VRAM, but you can set it to something like 64 or 32 to get it to use less VRAM. This trades speed as it will need to swap layers in and out of VRAM as it runs, but allows you to run a larger model, larger context, or additional models.

kgeist · 2026-04-04T10:27:23 1775298443

I think the headline is misleading. It's some random fork of llama.cpp, I can't find evidence that TurboQuant was actually added to llama.cpp proper.

The only legit PR I can find is this [0] and it's still open.

There's currently a lot of rejected vibe-coded PRs: [1] (violation of AI policy).

The OP's PR says it was generated with Claude Code so it has a very low chance of getting merged upstream.

[0] https://github.com/ggml-org/llama.cpp/pull/21089

[1] https://github.com/ggml-org/llama.cpp/pulls?q=Turboquant+is%...

lastdong · 2026-04-04T14:50:25 1775314225

Indeed, thanks for pointing this out and the links. With the excitement I misread that it was an MR from the fork to the main project. I don’t think I’m able to fix the title though.

I find it quite exciting to read some results in an effort to understand if TurboQuant main ideas can be applied to model weights. There are other similar projects, so we’ll see, but it seems some of this fork results look promising.

kgeist · 2026-04-02T15:35:16 1775144116

They've always had closed-source variants:

- Qwen3.5-Plus

- Qwen3-Max

- Qwen2.5-Max

etc. Nothing really changed so far.

kgeist · 2026-04-01T08:50:10 1775033410

>Second, what's even more crazy is that roughly 98% of that DNA is actually non-coding.. just junk.

I think it's a myth that non-coding DNA is junk. Say:

https://www.nature.com/articles/444130a

>'Non-coding' DNA may organize brain cell connections.

kgeist · 2026-04-01T05:34:48 1775021688

>One theory is that the knowledge required to solve the task is already stored in the parameters of the model, and only the style has to change for task success

>In particular, learning to generate longer outputs may be possible in few parameters

Reminded me of: https://arxiv.org/abs/2501.19393

>we develop budget forcing to control test-time compute by forcefully terminating the model’s thinking process or lengthening it by appending “Wait” multiple times to the model’s generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps

Maybe, indeed, the model simply learns to insert the EOS token (or similar) later, and the capability is already in the base model

kgeist · 2026-03-31T05:50:18 1774936218

Prior art: https://news.ycombinator.com/item?id=46590280

>TimeCapsuleLLM: LLM trained only on data from 1800-1875

kgeist · 2026-03-30T17:16:38 1774890998

I think ads can be removed with abliteration, just like refusals in "uncensored" versions. Find the "ad vector" across activations and cancel it.

kgeist · 2026-03-30T01:11:31 1774833091

https://en.wiktionary.org/wiki/glupe

Glupe is the plural form, "stupid ones" :)

kgeist · 2026-03-29T21:13:13 1774818793

Glupe means "stupid" in Slavic languages, was it on purpose?

LatencyKills · 2026-03-29T21:22:09 1774819329

From their agent-rules.md:

> This is not negotiable. This is not optional. You cannot rationalize your way out of this.

Some days I really miss the predictability of a good old if/else block. /s

kgeist · 2026-03-28T16:10:39 1774714239

>We evaluated 11 state-of-the-art AI-based LLMs, including proprietary models such as OpenAI’s GPT-4o

The study explores outdated models, GPT-4o was notoriously sycophantic and GPT-5 was specifically trained to minimize sycophancy, from GPT-5's announcement:

>We’ve made significant advances in reducing hallucinations, improving instruction following, and minimizing sycophancy

And the whole drama in August 2025 when people complained GPT-5 was "colder" and "lacked personality" (= less sycophantic) compared to GPT-4o

It would be interesting to study evolution of sycophantic tendencies (decrease/increase) in models from version to version, i.e. if companies are actually doing anything about it

Twiin · 2026-03-28T17:02:15 1774717335

The study includes GPT-5. On personal advice queries, GPT-4o and GPT-5 affirmed users' actions at the same rate.