RISC-V does not have the pitfalls of experimental ISAs from 45 years ago, but it...

camel-cdr · 2026-03-11T21:37:42 1773265062

OK, look.

Since my previous attempt to measure the impact of trap on signed overflow didn't seem to have moved your position one bit, I thought I'd give it a go in the most representable way I could think of:

I build the same version of clang on a x86, aarch64 and RISC-V system using clang. Then I build another version with the `-ftrapv` flag enabled and compared the compiletimes of compiling programs using these clang builds running on real hardware:

    runtime:         x86         | aarch64                    | RISC-V (RVA23)
                     Zen1        |  A78          A55*         |  X100         A100  !!! all cores clocked to about 2.2GHz, Zen1 can reach almost 4GHz
    clang A:         3.609±0.078 |  4.209±0.050   9.390±0.029 |  5.465±0.070  11.559±0.020
    clang-ftrapv A:  3.613±0.118 |  4.290±0.050   9.418±0.056 |  5.448±0.060  11.579±0.030
    clang B:         8.948±0.100 | 10.983±0.188  22.827±0.016 | 13.556±0.016  28.682±0.023
    clang-ftrapv B:  8.960±0.125 | 11.099±0.294  22.802±0.039 | 13.511±0.018  28.741±0.050

As you can see, once again the overhead of -ftrapv is quite low.

Suprizinglt the -ftrapv overhead seems the highest on the Cortex-A78. My guess is that this because clang generates a seperate brk with unique immediate for every overflow check, while on RISC-V it always branches to one unimp per function.

Please tell me if you have a better suggestion for measuring the real world impact.

Or heck, give me some artificial worst case code. That would also be an interesting data point.

Notes:

* The format is mean±variance

* Spacemit X100 is a Cortex-A76 like OoO RISC-V core and A100 an in-order RISC-V core.

* I tried to clock all of the cores to the same frequency of about 2.2GHz. *Except for the A55, which ran at 1.8GHz, but I linearly scaled the results.

* Program A was the chibicc (8K loc) compiler and program B microjs (30K loc).

    binary size:
                  x86        aarch64    RISC-V
    clang:        212807768  216633784  195231816
    clang-ftrapv: 212859280  216737608  195419512
    increase:     0.24%      0.047%     0.09%

purplesyringa · 2026-03-12T05:40:26 1773294026

I suspect that LLVM is optimized for compiling with `-ftrapv`, perhaps for cheap sanitizing or maybe just due to design decisions like using unsigned integers everywhere (please correct me if I'm wrong). I'm personally interested in how RISC-V behaves on computational tasks where computing carry is a known bottleneck, like long addition. Maybe looking at libgmp could be interesting, though I suspect absolute numbers will not be meaningful, and there's no baseline to compare them to.

camel-cdr · 2026-03-14T09:59:56 1773482396

LLVM mostly uses size_t like most C/C++ programs, which either use size_t or int for everything, both of which are handled well by RISC-V.

> Maybe looking at libgmp could be interesting, though I suspect absolute numbers will not be meaningful, and there's no baseline to compare them to.

Realistically, nobody cares about BigInt addition performance, considering there is no GMP implementarion using SIMD, or even any using dependency breaking to get beyond 64-bit per cycle.

I whipped up a quick AVX-512 implementation that was 2x faster than libgmp on Zen4 (which has 256-bit SIMD ALUs). On RISC-V you'd just use RVV to do BigInt stuff.

purplesyringa · 2026-03-16T21:38:04 1773697084

"nobody cares about BigInt addition performance" is an odd claim to make when half of the world's cryptography is based on ECC.

hackyhacky · 2026-03-10T22:04:35 1773180275

> On the other hand, detecting integer overflow in software is extremely expensive, increasing both the program size and the execution time considerably,

Most languages don't care about integer overflow. Your typical C program will happily wrap around.

If I really want to detect overflow, I can do this:

    add t0, a0, a1
    blt t0, a0, overflow

Which is one more instruction, which is not great, not terrible.

sitharus · 2026-03-10T23:47:01 1773186421

Because the other commenter wasn’t posting the actual answer, I went to find the documentation about checking for integer overflow and it’s right here https://docs.riscv.org/reference/isa/unpriv/rv32.html#2-1-4-...

And what did I find? Yep that code is right from the manual for unsigned integer overflow.

For signed addition if you know one of the signs (eg it’s a compile time constant) the manual says

  addi t0, t1, +imm
  blt t0, t1, overflow

But the general case for signed addition if you need to check for overflow and don’t have knowledge of the signs

  add t0, t1, t2
  slti t3, t2, 0
  slt t4, t0, t1
  bne t3, t4, overflow

From what I’ve read most native compiled code doesn’t really check for overflows in optimised builds, but this is more of an issue for JavaScript et al where they may detect the overflow and switch the underlying type? I’m definitely no expert on this.

sitharus · 2026-03-11T01:41:23 1773193283

A bit more reading shows there's a three instruction general case version for 32-bit additions on the 64-bit RISC-V ISA. I'm not familiar with RISC-V assembly and they didn't provide an example, but I _think_ it's as easy as this since 64-bit add wouldn't match the 32-bit overflowed add.

  add t0, t1, t2
  addw t3, t1, t2
  bne t0, t3, overflow

userbinator · 2026-03-11T04:11:57 1773202317

Contrast with x86:

    add eax, ecx
    jo overflow

rwmj · 2026-03-11T10:26:29 1773224789

Neither x86-64 nor RISC-V is implemented by running each single instruction. They both recognize patterns in the code and translate those into micro-ops. On high performance chips like Rivos's (now Meta's) I doubt there'd be any difference in the amount of work done.

Code size is a benefit for x86-64 however - no one is arguing that - but you have to trade that against the difficulty of instruction decoding.

userbinator · 2026-03-12T00:35:27 1773275727

I thought the main distinction of RISC-V (and MIPS before it, along with RISCs in general) is that the instructions are themselves of equivalent complexity (or lack thereof) as x86 uops. E.g x86 can add a register to memory, which splits into 3 load / add / store uops, but a RISC would execute those 3 instructions directly.

sitharus · 2026-03-12T03:52:34 1773287554

The main distinction now is RISC-descended designs use a load-modify-store instruction set with all ALU functions being register-register, and consequently have a lot more (visible) registers than CISC-descended ISAs (mostly just x86 really).

Historically RISC instructions were 1:1 with CPU operations, in theory allowing the compiler to better optimise logic, but this isn't really true anymore. High performance ARM CPUs use µOPs and macro-op fusion, though not to the extent of x86 CPUs.

This document from ARM has some details on how they use micro-ops, https://developer.arm.com/documentation/102160/latest

snvzz · 2026-03-12T03:37:24 1773286644

>Code size is a benefit for x86-64 however

Except it isn't. Code isn't one single pattern repeating again and again; on large enough bodies of code, RISC-V is the most dense, and it's not even close.

userbinator · 2026-03-12T05:31:19 1773293479

Decades of demoscene productions beg to differ. That just means compilers are awful, as they usually are.[1] x86 has far more optimisation opportunities than any RISC.

[1] https://news.ycombinator.com/item?id=15720923

snvzz · 2026-03-12T08:12:56 1773303176

In absence of better data, we have to compare compiler output.

userbinator · 2026-03-13T01:44:43 1773366283

Here is your "better data": https://web.eece.maine.edu/~vweaver/papers/iccd09/ll_documen...

sitharus · 2026-03-13T04:29:59 1773376199

If I recall my lectures, which were 20odd years ago now.

CISC ISAs were historically designed for humans writing assembly so they have single instructions with complex behaviour and consequently very high instruction density.

RISC was designed to eliminate the complex decoding logic and replace it with compiler logic, using higher throughput from the much reduced decoding logic (or in some cases no decoding at all) to offset the increased number of instructions. Also the transistors that were used for decoding could be used for additional ALUs to increase parallelism.

So RISC by its nature is more verbose.

Does the tradeoff still make sense? Depends who you ask.

snvzz · 2026-03-13T12:18:32 1773404312

From 2017, it predates RISC-V first ratified spec.

Currently, RISC-V holds the crown of code density in both 64 and 32 bit.

On 32bit, thumb2 is a little behind. On 64bit, x86-64 is not even close, and ARMv8/v9 are even worse.

userbinator · 2026-03-13T19:21:20 1773429680

You've shown absolutely zero evidence.

"Maybe if I keep repeating it, it'll be true."

snvzz · 2026-03-14T09:08:52 1773479332

I am sure you are capable of running a compiler and/or running `size` on Ubuntu binaries.

adrian_b · 2026-03-10T22:44:26 1773182666

That is not the correct way to test for integer overflow.

The correct sequence of instructions is given in the RISC-V documentation and it needs more instructions.

"Integer overflow" means "overflow in operations with signed integers". It does not mean "overflow in operations with non-negative integers". The latter is normally referred as "carry".

The 2 instructions given above detect carry, not overflow.

Carry is needed for multi-word operations, and these are also painful on RISC-V, but overflow detection is required much more frequently, i.e. it is needed at any arithmetic operation, unless it can be proven by static program analysis that overflow is impossible at that operation.

brohee · 2026-03-11T12:37:44 1773232664

It's one more instruction only if you don't fuse those instructions in the decoder stage, but as the pattern is the one expected to be generated by compilers, implementations that care about performance are expected to fuse them.

refulgentis · 2026-03-10T22:32:02 1773181922

I have no idea or practical experience with anything this low-level, so idk how much following matters, it's just someone from the crowd offering unvarnished impressions:

It's easy to believe you're replying to something that has an element of hyperbole.

It's hard to believe "just do 2x as many instructions" and "ehhh who cares [i.e. your typical C program doesn't check for overflow]", coupled to a seemingly self-conscious repetition of a quip from the television series Chernobyl that is meant to reference sticking your head in the sand, retire the issue from discussion.

adrian_b · 2026-03-10T22:46:29 1773182789

There was no hyperbole in what I have said.

The sequence of instructions given above is incorrect, it does not detect integer overflow (i.e. signed integer overflow). It detects carry, which is something else.

The correct sequence, which can be found in the official RISC-V documentation, requires more instructions.

Not checking for overflow in C programs is a serious mistake. All decent C compilers have compilation options for enabling checking for overflow. Such options should always be used, with the exception of the functions that have been analyzed carefully by the programmer and the conclusion has been that integer overflow cannot happen.

For example with operations involving counters or indices, overflow cannot normally happen, so in such places overflow checking may be disabled.

adgjlsfhk1 · 2026-03-10T22:20:42 1773181242

> On the other hand, detecting integer overflow in software is extremely expensive

this just isn't true. both addition and multiplication can check for overflow in <2 instructions.

nine_k · 2026-03-10T23:36:12 1773185772

Fewer than two is exactly one instruction. Which?

adgjlsfhk1 · 2026-03-11T00:19:05 1773188345

dammmit I meant <=2. https://godbolt.org/z/4WxeW58Pc sltu or snez for add/multiply respectively.

kbolino · 2026-03-11T15:45:32 1773243932

This result is misleading.

First, the code claims to be returning "unsigned long" from each of these functions, but the value will only ever be 0 or 1 (see [1]). The code is actually throwing away the result and just returning whether overflow occurred. If we take unsigned long *c as another argument to the function, so that we actually keep the result, we end up having to issue an extra instruction for multiplication (see [2]; I'm ignoring the sd instruction since it is simply there to dereference the *c pointer and wouldn't exist if the function got inlined).

Second, this is just unsigned overflow detection. If we do signed overflow detection, now we're up to 5 instructions for add and mul (see [3]). Considering that this is the bigger challenge, it compares quite unfavorably to architectures where this is just 2 instructions: the operation itself and a branch against a condition flag.

[1]: https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins...

[2]: https://godbolt.org/z/7rWWv57nx

[3]: https://godbolt.org/z/PnzKaz4x5

adgjlsfhk1 · 2026-03-11T16:17:48 1773245868

That's fair. The good news is that for signed overflow, you can claw back to the cost of unsigned overflow if you know the sign of either argument (which is fairly common).

kbolino · 2026-03-11T16:21:54 1773246114

Yeah, it's not the end of the world, and as others mentioned, a good implementation can recognize the instruction pattern and optimize for it.

It's just a bizarre design choice. I understand wanting to get rid of condition flags, but not replacing them with nothing at all.

EDIT: It seems the same choice was made by MIPS, which is a clear inspiration for RISC-V.

adgjlsfhk1 · 2026-03-11T16:43:42 1773247422

The argument is that there are actually 3 distinct forms of replacement:

1. 64 bit signed math is a lot less overflow vulnerable than the 16/32 bit math that was extremely common 20 years ago

2. For the BigInt use-case, the Riscv design is pretty sensible since you want the top bits, not just presence of overflow

3. You can do integer operations on the FPU (using the inexact flag for detecting if rounding occurred).

4. Adding overflow detecting instructions can easily be done in an extension in the future if desired.

kbolino · 2026-03-11T17:42:50 1773250970

I think in the case of MIPS, at least, the decision logic was simply: condition flags behave like an implicit register, making the use of that register explicit would complicate the instruction encoding, and that complication would be for little benefit since most compilers ignore flags anyway, except for situations which could be replaced with direct tests on the result(s).

adrian_b · 2026-03-10T22:51:26 1773183086

[flagged]

burntoutgray · 2026-03-11T00:47:59 1773190079

+1 -- misinformation is best corrected quickly. If not, AI will propagate it and many will believe the erroneous information. I guess that would be viral hallucinations.

bigstrat2003 · 2026-03-11T16:15:46 1773245746

One can quickly correct misinformation without being rude. It's not hard, and does not lessen the impact of the correction to do so. There's no reason to tolerate the kind of rudeness the parent post exhibits.