why is the optimizer not allowed to fold in "constant context"? - c++

__builtin_is_constant_evaluated is the builtin used to for implementing std::is_constant_evaluated in the standard library on clang and gcc.
code that is not valid in constant context is also often harder for the optimizer to constant fold.
for example:
int f(int i) {
if (__builtin_is_constant_evaluated())
return 1;
else {
int* ptr = new int(1);
int i = *ptr;
delete ptr;
return i;
}
}
is emitted by gcc -O3 as:
f(int):
sub rsp, 8
mov edi, 4
call operator new(unsigned long)
mov esi, 4
mov rdi, rax
call operator delete(void*, unsigned long)
mov eax, 1
add rsp, 8
ret
so the optimizer used __builtin_is_constant_evaluated() == 0
clang fold this to a constant but this is because clang's optimizer can remove unneeded dynamic allocation, not because it used __builtin_is_constant_evaluated() == 1.
i am aware that this would make the return value of __builtin_is_constant_evaluated() implementation defined because optimization vary from one compiler to an other. but is_constant_evaluated should already be used only when both path have the same observable behaviors.
why does the optimizer not use __builtin_is_constant_evaluated() == 1 and fallback to __builtin_is_constant_evaluated() == 0 if it wasn't able to fold ?

Per [meta.const.eval]:
constexpr bool is_constant_evaluated() noexcept;
Returns: true if and only if evaluation of the call occurs within the evaluation of an expression or conversion that is manifestly
constant-evaluated ([expr.const]).
f can never be invoked in a constant-evaluated expression or conversion, so std::is_constant_evaluated() returns false. This is decided by the compiler and has nothing to do with the optimizer.
Of course, if the optimizer can prove that the branches are equivalent, it can do constant fold. But that is optimization after all — beyond the scope of the C++ language itself.
But why is it this way? The proposal that introduced std::is_constant_evaluated is P0595. It explains the idea well:
constexpr double power(double b, int x) {
if (std::is_constant_evaluated() && x >= 0) {
// A constant-evaluation context: Use a
// constexpr-friendly algorithm.
double r = 1.0, p = b;
unsigned u = (unsigned)x;
while (u != 0) {
if (u & 1) r *= p;
u /= 2;
p *= p;
}
return r;
} else {
// Let the code generator figure it out.
return std::pow(b, (double)x);
}
}
// ...
double thousand() {
return power(10.0, 3); // (3)
}
[...]
Call (3) is a core constant expression, but an implementation is not
required to evaluate it at compile time. We therefore specify that it
causes std::is_constant_evaluated() to produce false. It's
tempting to leave it unspecified whether true or false is produced
in that case, but that raises significant semantic concerns: The
answer could then become inconsistent across various stages of the
compilation. For example:
int *p, *invalid;
constexpr bool is_valid() {
return std::is_constant_evaluated() ? true : p != invalid;
}
constexpr int get() { return is_valid() ? *p : abort(); }
This example tries to count on the fact that constexpr evaluation
detects undefined behavior to avoid the non-constexpr-friendly call to
abort() at compile time. However, if std::is_constant_evaluated()
can return true, we now end up with a situation where an important
run-time check is by-passed.

Related

Curious missed optimization of recursive constexpr function by Clang

Today I wanted to test, how Clang would transform a recursive power of two function and noticed that even with known exponent, the recursion is not optimized away even when using constexpr.
#include <array>
constexpr unsigned int pow2_recursive(unsigned int exp) {
if(exp == 0) return 1;
return 2 * pow2_recursive(exp-1);
}
unsigned int pow2_5() {
return pow2_recursive(5);
}
pow2_5 is compiled as a call to pow2_recursive.
pow2_5(): # #pow2_5()
mov edi, 5
jmp pow2_recursive(unsigned int) # TAILCALL
However, when I use the result in a context that requires it to be known at compile time, it will correctly compute the result at compile time.
unsigned int pow2_5_arr() {
std::array<int, pow2_recursive(5)> a;
return a.size();
}
is compiled to
pow2_5_arr(): # #pow2_5_arr()
mov eax, 32
ret
Here is the link to the full example in Godbolt: https://godbolt.org/z/fcKef1
So, am I missing something here? Is there something that can change the result at runtime and a reason, that pow2_5 cannot be optimized in the same way as pow2_5_arr?

When is a constexpr evaluated at compile time?

What assurances do I have that a core constant expression (as in [expr.const].2) possibly containing constexpr function calls will actually be evaluated at compile time and on which conditions does this depend?
The introduction of constexpr implicitly promises runtime performance improvements by moving computations into the translation stage (compile time).
However, the standard does not (and presumably cannot) mandate what code a compiler produces. (See [expr.const] and [dcl.constexpr]).
These two points appear to be at odds with each other.
Under which circumstances can one rely on the compiler resolving a core constant expression (which might contain an arbitrarily complicated computation) at compile time rather than deferring it to runtime?
At least under -O0 gcc appears to actually emit code and call for a constexpr function. Under -O1 and up it doesn't.
Do we have to resort to trickery such as this, that forces the constexpr through the template system:
template <auto V>
struct compile_time_h { static constexpr auto value = V; };
template <auto V>
inline constexpr auto compile_time = compile_time_h<V>::value;
constexpr int f(int x) { return x; }
int main() {
for (int x = 0; x < compile_time<f(42)>; ++x) {}
}
When a constexpr function is called and the output is assigned to a constexpr variable, it will always be run at compiletime.
Here's a minimal example:
// Compile with -std=c++14 or later
constexpr int fib(int n) {
int f0 = 0;
int f1 = 1;
for(int i = 0; i < n; i++) {
int hold = f0 + f1;
f0 = f1;
f1 = hold;
}
return f0;
}
int main() {
constexpr int blarg = fib(10);
return blarg;
}
When compiled at -O0, gcc outputs the following assembly for main:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 55
mov eax, 55
pop rbp
ret
Despite all optimization being turned off, there's never any call to fib in the main function itself.
This applies going all the way back to C++11, however in C++11 the fib function would have to be re-written to use conversion to avoid the use of mutable variables.
Why does the compiler include the assembly for fib in the executable sometimes? A constexpr function can be used at runtime, and when invoked at runtime it will behave like a regular function.
Used properly, constexpr can provide some performance benefits in specific cases, but the push to make everything constexpr is more about writing code that the compiler can check for Undefined Behavior.
What's an example of constexpr providing performance benefits? When implementing a function like std::visit, you need to create a lookup table of function pointers. Creating the lookup table every time std::visit is called would be costly, and assigning the lookup table to a static local variable would still result in measurable overhead because the program has to check if that variable's been initialized every time the function is run.
Thankfully, you can make the lookup table constexpr, and the compiler will actually inline the lookup table into the assembly code for the function so that the contents of the lookup table is significantly more likely to be inside the instruction cache when std::visit is run.
Does C++20 provide any mechanisms for guaranteeing that something runs at compiletime?
If a function is consteval, then the standard specifies that every call to the function must produce a compile-time constant.
This can be trivially used to force the compile-time evaluation of any constexpr function:
template<class T>
consteval T run_at_compiletime(T value) {
return value;
}
Anything given as a parameter to run_at_compiletime must be evaluated at compile-time:
constexpr int fib(int n) {
int f0 = 0;
int f1 = 1;
for(int i = 0; i < n; i++) {
int hold = f0 + f1;
f0 = f1;
f1 = hold;
}
return f0;
}
int main() {
// fib(10) will definitely run at compile time
return run_at_compiletime(fib(10));
}
Never; the C++ standard permits almost the entire compilation to occur at "runtime". Some diagnostics have to be done at compile time, but nothing prevents insanity on the part of the compiler.
Your binary could be a copy of the compiler with your source code appended, and C++ wouldn't say the compiler did anything wrong.
What you are looking at is a QoI - Quality of Implrmentation - issue.
In practice, constexpr variables tend to be compile time computed, and template parameters are always compile time computed.
consteval can also be used to markup functions.

Is a standards conforming C++ compiler allowed to optimize away branching on <= 0 for unsigned integers?

Consider this code:
void foo(size_t value)
{
if (value > 0) { ... } // A
if (value <= 0) { ... } // B
}
Since an unsigned cannot be negative, could a standards conforming C++ compiler optimize away the B statement? Or would it just choose to compare to 0?
Well, it clearly cannot optimise away the B statement altogether—the condition body does execute when value is 0.
Since value cannot, by any means, be < 0, the compiler can of course transform B into if (value == 0) { ... }. Furthermore, if it can prove (remember that the standard mandates strict aliasing rules!) that value is not changed by statement A, it can legally transform the entire function like this:
void foo(size_t value)
{
if (value > 0) { ... } // A
else { ... } // B
}
Or, if it happens to know that the target architecture likes == better, into this:
void foo(size_t value)
{
if (value == 0) { ... } // B
else { ... } // A
}
If the code is correctly written, B cannot be optimized away, because value can be zero, though the particular comparison used can be replaced with an equivalent one as shown in Angew's answer. But if the statements in B invoke undefined behavior, all bets are off. For ease of reference, let's rewrite foo as
void foo(size_t value)
{
if (value > 0) bar(); // A
if (value <= 0) baz(); // B
}
If the compiler can determine that baz() invokes undefined behavior, then it can treat it as unreachable. From that, it can then deduce that value > 0, and optimize foo into
void foo(size_t value)
{
bar();
}
Since the compound statement must be executed if the unsinged value == 0, a conforming compiler cannot optimize away if (value <= 0) { /* ... */ }.
An optimizing compiler will probably consider several things here:
Both statements are mutually exclusive
There is no code in between both of them.
value cannot be smaller than zero
There are several possible "outcomes" of this scenario where every scenario consists of one comparison and one conditional jump instruction.
I suspect test R,R to be "more optimal" than cmp R, 0 but in general there is not much of a difference.
The resulting code can be (where Code A and Code B contain a ret):
Using cmp
cmp <value>, 0
A)
je equal
// Code A
equal:
// Code B
B)
jne nequal
// Code B
nequal:
// Code A
C)
jg great
// Code B
great:
// Code A
D)
jbe smoe
// Code A
smoe:
// Code B
Using test
test <value>, <value>
A)
je equal
// Code A
equal:
// Code B
B)
jne nequal
// Code B
nequal:
// Code A

Confused about the function return value

#include<iostream>
using namespace std;
int Fun(int x)
{
int sum=1;
if(x>1)
sum=x*Fun(x-1);
else
return sum;
}
int main()
{
cout<<Fun(1)<<endl;
cout<<Fun(2)<<endl;
cout<<Fun(3)<<endl;
cout<<Fun(4)<<endl;
cout<<Fun(5)<<endl;
}
This function is to compute the factorial of an integer number. In the branch of x>1,there is no return value for function Fun. So this function should not return correct answer.
But when fun(4) or some other examples are tested, the right answers are got unexpectedly. Why?
The assembly code of this function is(call Fun(4)):
0x004017E5 push %ebp
0x004017E6 mov %esp,%ebp
0x004017E8 sub $0x28,%esp
0x004017EB movl $0x1,-0xc(%ebp)
0x004017F2 cmpl $0x1,0x8(%ebp)
0x004017F6 jle 0x40180d <Fun(int)+40>
0x004017F8 mov 0x8(%ebp),%eax
0x004017FB dec %eax
0x004017FC mov %eax,(%esp)
0x004017FF call 0x4017e5 <Fun(int)>
0x00401804 imul 0x8(%ebp),%eax
0x00401808 mov %eax,-0xc(%ebp)
0x0040180B jmp 0x401810 <Fun(int)+43>
0x0040180D mov -0xc(%ebp),%eax
0x00401810 leave
0x00401811 ret
May be this is the reason: The value of sum is saved in register eax, and the return value is saved in eax too, so Funreturn the correct result.
Usually, EAX register is used to store return value, ad it is also used to do other stuff as well.
So whatever has been loaded to that register just before the function returns will be the return value, even if you don't intend to do so.
You can use the -S option to generate assembly code and see what happened to EAX right before the "ret" instruction.
When your program pass in the if condition, no return statement finish the function. The number you got is the result of an undefined behavior.
int Fun(int x)
{
int sum=1.0;
if(x>1)
sum=x*Fun(x-1);
else
return sum;
return x; // return something here
}
Just remove else from your code:
int Fun(int x)
{
int sum=1;
if(x>1)
sum=x*Fun(x-1);
return sum;
}
The code you have has a couple of errors:
you have an int being assigned the value 1.0 (which will be implicitly cast/converted), not an error as such but inelegant.
you have a return statement inside a conditionality, so you will only ever get a return when that if is true
If you fix issue with the return by removing the else, then all will be fine:)
As to why it works with 4 as an input, that is down to random chance/ some property of your environment as the code you have posted should be unable to function, as there will always be an instance, when calculating factorials for a positive int, where x = 1 and no return will be generated.
As an aside, here is a more concise/terse function: for so straightforward a function you might consider the ternary operator and use a function like:
int factorial(int x){ return (x>1) ? (x * factorial(x-1)) : 1;}
this is the function I use for my factorials and have had on library for the last 30 or so years (since my C days) :)
From the C standards:
Flowing off the end of a function is equivalent to a return with no
value; this results in undefined behavior in a value-returning
function.
Your situation is the same as this one:
int fun1(int x)
{
int sum = 1;
if(x > 1)
sum++;
else
return sum;
}
int main()
{
int b = fun1(3);
printf("%d\n", b);
return 0;
}
It prints 2 on my machine.
This is calling convention and architecture dependent. The return value is the result of last expression evaluation, stored in the eax register.
As stated in the comment this is undefined behaviour. With g++ I get the following warning.
warning: control reaches end of non-void function [-Wreturn-type]
On Visual C++, the warning is promoted to an error by default
error C4716: 'Fun' : must return a value
When I disabled the warning and ran the resulting executable, Fun(4) gave me 1861810763.
So why might it work under g++? During compilation conditional statements are turned into tests and jumps (or gotos). The function has to return something, and the simplest possible code for the compiler to produce is along the following lines.
int Fun(int x)
{
int sum=1.0;
if(!(x>1))
goto RETURN;
sum=x*Fun(x-1);
RETURN:
return sum;
}
This is consistent with your disassembly.
Of course you can't rely on undefined behaviour, as illustrated by the behaviour in Visual C++. Many shops have a policy to treat warnings as errors for this reason (also as suggested in a comment).

Detecting signed overflow in C/C++

At first glance, this question may seem like a duplicate of How to detect integer overflow?, however it is actually significantly different.
I've found that while detecting an unsigned integer overflow is pretty trivial, detecting a signed overflow in C/C++ is actually more difficult than most people think.
The most obvious, yet naive, way to do it would be something like:
int add(int lhs, int rhs)
{
int sum = lhs + rhs;
if ((lhs >= 0 && sum < rhs) || (lhs < 0 && sum > rhs)) {
/* an overflow has occurred */
abort();
}
return sum;
}
The problem with this is that according to the C standard, signed integer overflow is undefined behavior. In other words, according to the standard, as soon as you even cause a signed overflow, your program is just as invalid as if you dereferenced a null pointer. So you can't cause undefined behavior, and then try to detect the overflow after the fact, as in the above post-condition check example.
Even though the above check is likely to work on many compilers, you can't count on it. In fact, because the C standard says signed integer overflow is undefined, some compilers (like GCC) will optimize away the above check when optimization flags are set, because the compiler assumes a signed overflow is impossible. This totally breaks the attempt to check for overflow.
So, another possible way to check for overflow would be:
int add(int lhs, int rhs)
{
if (lhs >= 0 && rhs >= 0) {
if (INT_MAX - lhs <= rhs) {
/* overflow has occurred */
abort();
}
}
else if (lhs < 0 && rhs < 0) {
if (lhs <= INT_MIN - rhs) {
/* overflow has occurred */
abort();
}
}
return lhs + rhs;
}
This seems more promising, since we don't actually add the two integers together until we make sure in advance that performing such an add will not result in overflow. Thus, we don't cause any undefined behavior.
However, this solution is unfortunately a lot less efficient than the initial solution, since you have to perform a subtract operation just to test if your addition operation will work. And even if you don't care about this (small) performance hit, I'm still not entirely convinced this solution is adequate. The expression lhs <= INT_MIN - rhs seems exactly like the sort of expression the compiler might optimize away, thinking that signed overflow is impossible.
So is there a better solution here? Something that is guaranteed to 1) not cause undefined behavior, and 2) not provide the compiler with an opportunity to optimize away overflow checks? I was thinking there might be some way to do it by casting both operands to unsigned, and performing checks by rolling your own two's-complement arithmetic, but I'm not really sure how to do that.
No, your 2nd code isn't correct, but you are close: if you set
int half = INT_MAX/2;
int half1 = half + 1;
the result of an addition is INT_MAX. (INT_MAX is always an odd number). So this is valid input. But in your routine you will have INT_MAX - half == half1 and you would abort. A false positive.
This error can be repaired by putting < instead of <= in both checks.
But then also your code isn't optimal. The following would do:
int add(int lhs, int rhs)
{
if (lhs >= 0) {
if (INT_MAX - lhs < rhs) {
/* would overflow */
abort();
}
}
else {
if (rhs < INT_MIN - lhs) {
/* would overflow */
abort();
}
}
return lhs + rhs;
}
To see that this is valid, you have to symbolically add lhs on both sides of the inequalities, and this gives you exactly the arithmetical conditions that your result is out of bounds.
Your approach with subtraction is correct and well-defined. A compiler cannot optimize it away.
Another correct approach, if you have a larger integer type available, is to perform the arithmetic in the larger type and then check that the result fits in the smaller type when converting it back
int sum(int a, int b)
{
long long c;
assert(LLONG_MAX>INT_MAX);
c = (long long)a + b;
if (c < INT_MIN || c > INT_MAX) abort();
return c;
}
A good compiler should convert the entire addition and if statement into an int-sized addition and a single conditional jump-on-overflow and never actually perform the larger addition.
Edit: As Stephen pointed out, I'm having trouble getting a (not-so-good) compiler, gcc, to generate the sane asm. The code it generates is not terribly slow, but certainly suboptimal. If anyone knows variants on this code that will get gcc to do the right thing, I'd love to see them.
For the gcc case, from gcc 5.0 Release notes we can see it now provides a __builtin_add_overflow for checking overflow in addition:
A new set of built-in functions for arithmetics with overflow checking has been added: __builtin_add_overflow, __builtin_sub_overflow and __builtin_mul_overflow and for compatibility with clang also other variants. These builtins have two integral arguments (which don't need to have the same type), the arguments are extended to infinite precision signed type, +, - or * is performed on those, and the result is stored in an integer variable pointed to by the last argument. If the stored value is equal to the infinite precision result, the built-in functions return false, otherwise true. The type of the integer variable that will hold the result can be different from the types of the first two arguments.
For example:
__builtin_add_overflow( rhs, lhs, &result )
We can see from the gcc document Built-in Functions to Perform Arithmetic with Overflow Checking that:
[...]these built-in functions have fully defined behavior for all argument values.
clang also provides a set of checked arithmetic builtins:
Clang provides a set of builtins that implement checked arithmetic for security critical applications in a manner that is fast and easily expressable in C.
in this case the builtin would be:
__builtin_sadd_overflow( rhs, lhs, &result )
The fastest possible way is to use the GCC builtin:
int add(int lhs, int rhs) {
int sum;
if (__builtin_add_overflow(lhs, rhs, &sum))
abort();
return sum;
}
On x86, GCC compiles this into:
mov %edi, %eax
add %esi, %eax
jo call_abort
ret
call_abort:
call abort
which uses the processor's built-in overflow detection.
If you're not OK with using GCC builtins, the next fastest way is to use bit operations on the sign bits. Signed overflow in addition occurs when:
the two operands have the same sign, and
the result has a different sign than the operands.
The sign bit of ~(lhs ^ rhs) is on iff the operands have the same sign, and the sign bit of lhs ^ sum is on iff the result has a different sign than the operands. So you can do the addition in unsigned form to avoid undefined behavior, and then use the sign bit of ~(lhs ^ rhs) & (lhs ^ sum):
int add(int lhs, int rhs) {
unsigned sum = (unsigned) lhs + (unsigned) rhs;
if ((~(lhs ^ rhs) & (lhs ^ sum)) & 0x80000000)
abort();
return (int) sum;
}
This compiles into:
lea (%rsi,%rdi), %eax
xor %edi, %esi
not %esi
xor %eax, %edi
test %edi, %esi
js call_abort
ret
call_abort:
call abort
which is quite a lot faster than casting to a 64-bit type on a 32-bit machine (with gcc):
push %ebx
mov 12(%esp), %ecx
mov 8(%esp), %eax
mov %ecx, %ebx
sar $31, %ebx
clt
add %ecx, %eax
adc %ebx, %edx
mov %eax, %ecx
add $-2147483648, %ecx
mov %edx, %ebx
adc $0, %ebx
cmp $0, %ebx
ja call_abort
pop %ebx
ret
call_abort:
call abort
IMHO, the eastiest way to deal with overflow sentsitive C++ code is to use SafeInt<T>. This is a cross platform C++ template hosted on code plex which provides the safety guarantees that you desire here.
https://github.com/dcleblanc/SafeInt
I find it very intuitive to use as it provides the many of the same usage patterns as normal numerical opertations and expresses over and under flows via exceptions.
If you use inline assembler you can check the overflow flag. Another possibility is taht you can use a safeint datatype. I recommend that read this paper on Integer Security.
The obvious solution is to convert to unsigned, to get the well-defined unsigned overflow behavior:
int add(int lhs, int rhs)
{
int sum = (unsigned)lhs + (unsigned)rhs;
if ((lhs >= 0 && sum < rhs) || (lhs < 0 && sum > rhs)) {
/* an overflow has occurred */
abort();
}
return sum;
}
This replaces the undefined signed overflow behavior with the implementation-defined conversion of out-of-range values between signed and unsigned, so you need to check your compiler's documentation to know exactly what will happen, but it should at least be well defined, and should do the right thing on any twos-complement machine that doesn't raise signals on conversions, which is pretty much every machine and C compiler built in the last 20 years.
Your fundamental problem is that lhs + rhs doesn't do the right thing. But if you're willing to assume a two's complement machine, we can fix that. Suppose you have a function to_int_modular that converts unsigned to int in a way that is guaranteed to be the inverse of conversion from int to unsigned, and it optimizes away to nothing at run time. (See below for how to implement it.)
If you use it to fix the undefined behavior in your original attempt, and also rewrite the conditional to avoid the redundant test of lhs >= 0 and lhs < 0, then you get
int add(int lhs, int rhs)
{
int sum = to_int_modular((unsigned)lhs + rhs);
if (lhs >= 0) {
if (sum < rhs)
abort();
} else {
if (sum > rhs)
abort();
}
return sum;
}
which should outperform the current top-voted answer, since it has a similar structure but requires fewer arithmetic operations.
(Reorganizing the if shouldn't be necessary, but in tests on godbolt, ICC and MSVC do eliminate the redundant test on their own, but GCC and Clang surprisingly don't.)
If you prefer to compute the result in a wider size and then bounds check, one way to do the bounds check is
long long sum = (long long)lhs + rhs;
if ((int)sum != sum)
abort();
... except that the behavior is undefined on overflow. But you can fix that with the same helper function:
if (to_int_modular(sum) != sum)
This will probably outperform the current accepted answer on compilers that aren't smart enough to optimize it to a test of the overflow flag.
Unfortunately, testing (visual inspection on godbolt) suggests that GCC, ICC and MSVC do better with the code above than with the code in the accepted answer, but Clang does better with the code in the accepted answer. As usual, nothing is easy.
This approach can only work on architectures where the ranges of int and unsigned are equally large, and the specific implementations below also depend on its being two's complement. Machines not meeting those specs are vanishingly rare, but I'll check for them anyway:
static_assert(INT_MIN + INT_MAX == -1 && UINT_MAX + INT_MIN == INT_MAX);
One way to implement to_int_modular is
inline int to_int_modular(unsigned u) {
int i;
memcpy(&i, &u, sizeof(i));
return i;
}
All major x64 compilers have no trouble optimizing that to nothing, but when optimizations are disabled, MSVC and ICC generate a call to memcpy, which may be a bit slow if you use this function a lot. This implementation also depends on details of the representation of unsigned and int that probably aren't guaranteed by the standard.
Another way is this:
inline int to_int_modular(unsigned u) {
return u <= INT_MAX ? (int)u : (int)(u - INT_MIN) + INT_MIN;
}
All major x64 compilers optimize that to nothing except ICC, which makes an utter mess of it and every variation that I could think of. ICX does fine, and it appears that Intel is abandoning ICC and moving to ICX, so maybe this problem will fix itself.
You may have better luck converting to 64-bit integers and testing similar conditions like that. For example:
#include <stdint.h>
...
int64_t sum = (int64_t)lhs + (int64_t)rhs;
if (sum < INT_MIN || sum > INT_MAX) {
// Overflow occurred!
}
else {
return sum;
}
You may want to take a closer look at how sign extension will work here, but I think it is correct.
How about:
int sum(int n1, int n2)
{
int result;
if (n1 >= 0)
{
result = (n1 - INT_MAX)+n2; /* Can't overflow */
if (result > 0) return INT_MAX; else return (result + INT_MAX);
}
else
{
result = (n1 - INT_MIN)+n2; /* Can't overflow */
if (0 > result) return INT_MIN; else return (result + INT_MIN);
}
}
I think that should work for any legitimate INT_MIN and INT_MAX (symmetrical or not); the function as shown clips, but it should be obvious how to get other behaviors).
In case of adding two long values, portable code can split the long value into low and high int parts (or into short parts in case long has the same size as int):
static_assert(sizeof(long) == 2*sizeof(int), "");
long a, b;
int ai[2] = {int(a), int(a >> (8*sizeof(int)))};
int bi[2] = {int(b), int(b >> (8*sizeof(int))});
... use the 'long' type to add the elements of 'ai' and 'bi'
Using inline assembly is the fastest way if targeting a particular CPU:
long a, b;
bool overflow;
#ifdef __amd64__
asm (
"addq %2, %0; seto %1"
: "+r" (a), "=ro" (overflow)
: "ro" (b)
);
#else
#error "unsupported CPU"
#endif
if(overflow) ...
// The result is stored in variable 'a'
By me, the simpliest check would be checking the signs of the operands and of the results.
Let's examine sum: the overflow could occur in both directions, + or -, only when both operands have the same sign. And, obviosly, the overflow will be when the sign of the result won't be the same as the sign of the operands.
So, a check like this will be enough:
int a, b, sum;
sum = a + b;
if (((a ^ ~b) & (a ^ sum)) & 0x80000000)
detect_oveflow();
Edit: as Nils suggested, this is the correct if condition:
((((unsigned int)a ^ ~(unsigned int)b) & ((unsigned int)a ^ (unsigned int)sum)) & 0x80000000)
And since when the instruction
add eax, ebx
leads to undefined behavior? There is no such thing in the Intel x86 instruction set refference..
I think that this works:
int add(int lhs, int rhs) {
volatile int sum = lhs + rhs;
if (lhs != (sum - rhs) ) {
/* overflow */
//errno = ERANGE;
abort();
}
return sum;
}
Using volatile keeps the compiler from optimizing away the test because it thinks that sum may have changed between the addition and the subtraction.
Using gcc 4.4.3 for x86_64 the assembly for this code does do the addition, the subtraction, and the test, though it stores everything on the stack and of unneeded stack operations. I even tried register volatile int sum = but the assembly was the same.
For a version with only int sum = (no volatile or register) the function did not do the test and did the addition using only one lea instruction (lea is Load Effective Address and is often used to do addition without touching the flags register).
Your version is larger code and has a lot more jumps, but I don't know which would be better.