What language is this article talking where compilers don't optimize multiplicat...

orthoxerox · 2026-03-23T09:13:35 1774257215

Well, Sawyer started writing Transport Tycoon in 1992, when free or affordable C compilers were not as widely available. Turbo C was never known for optimizations. GCC 1.40 was good enough for Linus, but I guess Chris was already a good assembly programmer.

fweimer · 2026-03-25T20:28:03 1774470483

Turbo C++ 3.0 from 1992 already does this optimization, according to a quick experiment over there: https://turboc.pages.dev/

       0: 55                    push   %bp
       1: 8b ec                 mov    %sp,%bp
       3: 8b 46 04              mov    0x4(%bp),%ax
       6: c1 e0 02              shl    $0x2,%ax
       9: eb 00                 jmp    0xb
       b: 5d                    pop    %bp
       c: c3                    ret

That's for:

    int
    main (int argc, char **argv)
    {
       return argc * 4;
    }

And disassembled with contemporary binutils (not sure if Turbo C++ came with a disassembler.)

shakow · 2026-03-22T21:49:10 1774216150

That's what I would have thought as well, but looks like that on x86, both clang and gcc use variations of LEA. But if they're doing it this way, I'm pretty sure it must be faster, because even if you change the ×4 for a <<2, it will still generate a LEA.

https://godbolt.org/z/EKj58dx9T

shaggie76 · 2026-03-22T22:59:50 1774220390

Not only is LEA more flexible I believe it's preferred to SHL even for simple operations because it doesn't modify the flags register which can make it easier to schedule.

fweimer · 2026-03-23T07:52:34 1774252354

It's more about the non-destructive destination part, which can avoid a move. Compilers tend to prefer SHL/SAL of LEA because its encoding is shorter: https://godbolt.org/z/9Tsq3hKnY

Cold_Miserable · 2026-03-23T03:55:11 1774238111

shlx doesn't alter the flag register.

fweimer · 2026-03-23T07:54:42 1774252482

SHLX does not support an immediate operand. Non-destructive shifts with immediate operands only arrive with APX, where they are among the most commonly used instructions (besides paired pushes/pops).

adrian_b · 2026-03-22T22:41:35 1774219295

They use LEA for multiplying with small constants up to 9 (not only with powers of two, but also with 3, 5 and 9; even more values could be achieved with two LEA, but it may not be worthwhile).

For multiplying with powers of two greater or equal to 16, they use shift left, because LEA can no longer be used.

Validark · 2026-03-23T06:57:13 1774249033

Using an lea is better when you want to put the result in a different register than the source and/or you don't want to modify the flags registers. shlx also avoids modifying flags, but you can't shift by an immediate, so you need to load the constant into a register beforehand. In terms of speed, all these options are basically equivalent, although with very slightly different costs to instruction caches and the register renaming in the scheduler. In terms of execution, a shift is always 1 cycle on modern hardware.

cjbgkagh · 2026-03-22T21:24:06 1774214646

It was written in assembly so goes through an assembler instead of a compiler.

rawling · 2026-03-22T22:19:07 1774217947

I assume GP is talking about the bit in the article that goes

> RCT does this trick all the time, and even in its OpenRCT2 version, this syntax hasn’t been changed, since compilers won’t do this optimization for you.

cjbgkagh · 2026-03-22T23:03:48 1774220628

That makes more sense, I second their sentiment, modern compilers will do this. I guess the trick is knowing to use numbers that have these options.

bombcar · 2026-03-22T23:16:15 1774221375

There was a recent article on HN about which compiler optimizations would occur and which wouldn't and it was surprising in two ways - first, it would make some that you might not expect, and it would not make others that you would - because in some obscure calling method, it wouldn't work. Fixing that path would usually get the expected optimization.