Consider the following program:
#include <iostream>
int main()
{
unsigned int a = 3;
unsigned int b = 7;
std::cout << (a - b) << std::endl; // underflow here!
return 0;
}
In the line starting with std::cout an underflow is happening because a is lesser than b so a-b is less than 0, but since a and b are unsigend so is a-b.
Is there a compiler flag (for G++) that gives me a warning when I try to calculate the difference of two unsigend integers?
Now, one could argue that an overflow/underflow can happen in any calculation using any operator. But I think it is more dangerous to apply operator - to unsigend ints because with unsigned integers this error may happen with quite low (to me: "more common") numbers.
A (static analysis) tool that finds such things would also be great but I much prefer a compiler flag and warning.
GCC does not (afaict) support it, but Clang's UBSanitizer has the following option [emphasis mine]:
-fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional. This sanitizer does not check for lossy implicit conversions performed before such a computation
Related
I have a question regarding conversion of integers:
#include <iostream>
#include <cstdint>
using namespace std;
int main()
{
int N,R,W,H,D;
uint64_t sum = 0;
uint64_t sum_2 = 0;
cin >> W >> H >> D;
sum += static_cast<uint64_t>(W) * H * D * 100;
sum_2 += W * H * D * 100;
cout << sum << endl;
cout << sum_2 << endl;
return 0;
}
I thought, that sum should be equal to sum_2, because uint64_t type is bigger than int type and during arithmetic operations compiler chooses bigger type(which is uint64_t). So by my understanding, sum_2 must have uint64_t type. But it has int type.
Can you explain my why sum_2 was converted to int? Why didn't it stay uint64_t?
Undefined behavior signed-integer overflow/underflow, and well-defined behavior unsigned-integer overflow/underflow, in C and C++
If I enter 200, 300, and 358 for W, H, and D, I get the following output, which makes perfect sense for my gcc compiler on a 64-bit Linux machine:
2148000000
18446744071562584320
Why does this make perfect sense?
Well, the default type is int, which is int32_t for the gcc compiler on a 64-bit Linux machine, and its max value is 2^32/2-1 = 2147483647, and its min value is -2147483648. The line sum_2 += W * H * D * 100; does int arithmetic since that's the type of each variable there, 100 included, and no explicit cast is used. So, after doing int arithmetic, it then implicitly casts the int result into a uint64_t as it stores the result into the uint64_t sum_2 variable. The int arithmetic on the right-hand side prior to that point, however, results in 2148000000, which has undefined behavior signed integer overflow over the top of the max int value and back down to the min int value and up again.
Even though according to the C and C++ standards, signed integer overflow or underflow is undefined behavior, in the gcc compiler, I know that signed integer overflow happens to roll over to negative values if it is not optimized out. This, by default, is still "undefined behavior", and a bug, however, and must not be relied upon by default. See notes below for details and information on how to make this well-defined behavior via a gcc extension. Anyway, 2148000000 - 2147483647 = 516353 up-counts, the first of which causes roll-over. The first count up rolls over to the min int32_t value of -2147483648, and the next (516353 - 1 = 516352) counts go up to -2147483648 + 516352 = -2146967296. So, the result of W * H * D * 100 for the inputs above is now -2146967296, based on undefined behavior. Next, that value is implicitly cast from an int (int32_t in this case) to a uint64_t in order to store it from an int (int32_t in this case) into the uint64_t sum_2 variable, resulting in well-defined behavior unsigned integer underflow. You start with -2146967296. The first down-count underflows down to uint64_t max, which is 2^64-1 = 18446744073709551615. Now subtract the remaining 2146967296 - 1 = 2146967295 counts from that and you get 18446744073709551615 - 2146967295 = 18446744071562584320, just as shown above!
Voila! With a little compiler and hardware architecture understanding, and some expected but undefined behavior, the result is perfectly explainable and makes sense!
To easily see the negative value, add this to your code:
int sum_3 = W*H*D*100;
cout << sum_3 << endl; // output: -2146967296
Notes
Never intentionally leave undefined behavior in your code. That is known as a bug. You do not have to write ISO C++, however! If you can find compiler documentation indicating a certain behavior is well-defined, that's ok, so long as you know you are writing in the g++ language and not the C++ language, and don't expect your code to work the same across compilers. Here is an example where I do that: Using Unions for "type punning" is fine in C, and fine in gcc's C++ as well (as a gcc [g++] extension). I'm generally okay with relying on compiler extensions like this. Just be aware of what you're doing is all.
#user17732522 makes a great point in the comments here:
"in the gcc compiler, I know that signed integer overflow happens to roll over to negative values.": That is not correct by-default. By-default GCC assumes that signed overflow does not happen and applies optimizations based on that. There is the -fwrapv and/or -fno-strict-overflow flag to enforce wrapping behavior. See https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Code-Gen-Options.html#Code-Gen-Options.
Take a look at that link above (or even better, this one, to always point to the latest gcc documentation instead of the documentation for just one version: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options). Even though signed-integer overflow and underflow is undefined behavior (a bug!) according to the C and C++ standards, gcc allows, by extension, to make it well-defined behavior (not a bug!) so long as you use the proper gcc build flags. Using -fwrapv makes signed-integer overflow/underflow well-defined behavior as a gcc extension. Additionally, -fwrapv-pointer allows pointers to safely overflow and underflow when used in pointer arithmetic, and -fno-strict-overflow applies both -fwrapv and -fwrapv-pointer. The relevant documentation is here: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#Code-Gen-Options (emphasis added):
These machine-independent options control the interface conventions used in code generation.
Most of them have both positive and negative forms; the negative form of -ffoo is -fno-foo.
...
-fwrapv
This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables others. The options -ftrapv and -fwrapv override each other, so using -ftrapv -fwrapv on the command-line results in -fwrapv being effective. Note that only active options override, so using -ftrapv -fwrapv -fno-wrapv on the command-line results in -ftrapv being effective.
-fwrapv-pointer
This option instructs the compiler to assume that pointer arithmetic overflow on addition and subtraction wraps around using twos-complement representation. This flag disables some optimizations which assume pointer overflow is invalid.
-fstrict-overflow
This option implies -fno-wrapv -fno-wrapv-pointer and when negated [as -fno-strict-overflow] implies -fwrapv -fwrapv-pointer.
So, relying on signed-integer overflow or underflow withOUT using the proper gcc extension flags above is undefined behavior, and therefore a bug, and can not be safely relied upon! It may be optimized out by the compiler and not work reliably as intended without the gcc extension flags above.
My test code
Here is my total code I used for some quick checks to write this answer. I ran it with the gcc/g++ compiler on a 64-bit Linux machine. I did not use the -fwrapv or -fno-strict-overflow flags, so all signed integer overflow or underflow demonstrated below is undefined behavior, a bug, and cannot be relied upon safely without those gcc extension flags. The fact that it works is circumstantial, as the compiler could, by default, choose to optimize out the overflows in unexpected ways.
If you run this on an 8-bit microcontroller such as an Arduino Uno, you'd get different results since an int is a 2-byte int16_t by default, instead! But, now that you understand the principles, you could figure out the expected result. (Also, I think 64-bit values don't exist on that architecture, so they become 32-bit values).
#include <iostream>
#include <cstdint>
using namespace std;
int main()
{
int N,R,W,H,D;
uint64_t sum = 0;
uint64_t sum_2 = 0;
// cin >> W >> H >> D;
W = 200;
H = 300;
D = 358;
sum += static_cast<uint64_t>(W) * H * D * 100;
sum_2 += W * H * D * 100;
cout << sum << endl;
cout << sum_2 << endl;
int sum_3 = W*H*D*100;
cout << sum_3 << endl;
sum_2 = -1; // underflow to uint64_t max
cout << sum_2 << endl;
sum_2 = 18446744073709551615ULL - 2146967295;
cout << sum_2 << endl;
return 0;
}
Just a short version of #Gabriel Staples good answer.
"and during arithmetic operations compiler chooses bigger type(which is uint64_t)"
There is no uin64_t in W * H * D * 100, just four int. After this multiplication, the int product (which overflowed and is UB) is assigned to an uint64_t.
Instead, use 100LLU * W * H * D to perform a wider unsigned multiplication.
Had been going through this code:
#include<cstdio>
#define TOTAL_ELEMENTS (sizeof(array) / sizeof(array[0]))
int array[] = {1,2,3,4,5,6,7};
int main()
{
signed int d;
printf("Total Elements in the array are => %d\n",TOTAL_ELEMENTS);
for(d=-1;d <= (TOTAL_ELEMENTS-2);d++)
printf("%d\n",array[d+1]);
return 0;
}
Now obviously it does not get into the for loop.
Whats the reason?
The reason is that in C++ you're getting an implicit promotion. Even though d is declared as signed, when you compare it to (TOTAL_ELEMENTS-2) (which is unsigned due to sizeof), d gets promoted to unsigned. C++ has very specific rules which basically state that the unsigned value of d will then be the congruent unsigned value mod numeric_limits<unsigned>::max(). In this case, that comes out to the largest possible unsigned number which is clearly larger than the size of the array on the other side of the comparison.
Note that some compilers like g++ (with -Wall) can be told to warn about such comparisons so you can make sure that the code looks correct at compile time.
The program looks like it should throw a compile error. You're using "array" even before its definition. Switch the first two lines and it should be okay.
Hi I am new in here so please let me know if anything is wrong and I will try to better the next time .
I am trying to understand how underflow and overflow works in C++ .My understanding is if a variable's range is exceeded it will start from the other end of the range . Thus if minimum of short is -32768 and if we do a -1 to it the new value should be SHRT_MAX .(32767)
Here is my code:
#include<iostream.h>
#include<limits.h>
#include<conio.h>
int main ( void )
{
int testpositive =INT_MIN ;
short testnegative = SHRT_MIN ;
cout<< SHRT_MIN<<"\n";
cout << testnegative-1<<"\n";
cout << INT_MIN << "\n";
cout << testpositive-1 << "\n";
cout<<testpositive-2;
getch();
return 0;
}
The exact behavior on overflow/underflow is only specified for unsigned types.
Unsigned integers shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer.
Source: Draft N3690 §3.9.1 sentence 4
This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the
resulting unsigned integer type.
Source: Draft N3690 Note 47 for §3.9.1
For normal signed integer types instead the C++ standard simply says than anything can happen.
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined
Source: Draft N3690 §5 sentence 4
If we're talking about x86 processor (or most other modern processors) indeed the behavior is exactly what you describe and for the CPU there is no difference between a signed value or an unsigned value (there are signed and unsigned operations, but the value themselves are just bits).
Note that compilers can assume (and most modern optimizing compilers actually DO assume) that no signed integer overflow can occur in a correct program and for example in code like:
int do_something();
int do_something_else();
void foo() {
int x = do_something();
int y = x + 1;
if (x < y) {
do_something();
} else {
do_something_else();
}
}
a compiler is free to skip the test and the else branch in the generated code completely because in a valid program a signed int x is always less than x+1 (as signed overflow cannot be considered valid behavior).
If you replace int with unsigned int however the compiler must generate code for the test and for the else branch because for unsigned types it's possible that x > x+1.
For example clang compiles the code for foo to
foo(): # #foo()
push rax
call do_something()
pop rax
jmp do_something() # TAILCALL
where you can see that the ode just calls do_something twice (except for the strange handling of rax) and no mention of do_something_else is actually present. More or less the same code is generated by gcc.
Signed overflows are undefined behavior in C++.
For example:
INT_MIN - 1
-INT_MIN
are expressions that invoke undefined behavior.
SHRT_MIN - 1 and -SHRT_MIN are not undefined behavior in an environment with 16-bit short and 32-bit int because with integer promotions the operand is promoted to int first. In an environment with 16-bit short and int, these expressions are also undefined behavior.
Typically yes. But since this is C++, and C++ is regulated by the C++ standard, you must know that overflows are undefined behavior.
Although what you stated probably applies on most platforms, it's in no way guaranteed, so don't rely on it.
The new value need not be SHRT_MAX it is undefined.
I was just wondering how disastrous integer overflow really is. Take the following example program:
#include <iostream>
int main()
{
int a = 46341;
int b = a * a;
std::cout << "hello world\n";
}
Since a * a overflows on 32 bit platforms, and integer overflow triggers undefined behavior, do I have any guarantees at all that hello world will actually appear on my screen?
I removed the "signed" part from my question based on the following standard quotes:
(§5/5 C++03, §5/4 C++11) If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
(§3.9.1/4) Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer. This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
As pointed out by #Xeo in the comments (I actually brought it up in the C++ chat first):
Undefined behavior really means it and it can hit you when you least expect it.
The best example of this is here: Why does integer overflow on x86 with GCC cause an infinite loop?
On x86, signed integer overflow is just a simple wrap-around. So normally, you'd expect the same thing to happen in C or C++. However, the compiler can intervene - and use undefined behavior as an opportunity to optimize.
In the example taken from that question:
#include <iostream>
using namespace std;
int main(){
int i = 0x10000000;
int c = 0;
do{
c++;
i += i;
cout << i << endl;
}while (i > 0);
cout << c << endl;
return 0;
}
When compiled with GCC, GCC optimizes out the loop test and makes this an infinite loop.
You may trigger some hardware safety feature. So no, you don't have any guarantee.
Edit:
Note that gcc has the -ftrapv option (but it doesn't seem to work for me).
There are two views about undefined behavior. There is the view it is there to gather for strange hardware and other special cases, but that usually it should behave sanely. And there is the view that anything can happen. And depending on the UB source, some hold different opinions.
While the UB about overflow has probably been introduced for taking into account hardware which trap or saturate on overflow and the difference of result between representation, and so one can argue for the first view in this case, people writing optimizers hold very dearly the view that if the standard doesn't guarantee something, really anything can happen and they try to use every piece of liberty to generate machine code which runs more rapidly, even if the result doesn't make sense anymore.
So when you see an undefined behavior, assume that anything can happen, however reasonable a given behavior may seem.
Had been going through this code:
#include<cstdio>
#define TOTAL_ELEMENTS (sizeof(array) / sizeof(array[0]))
int array[] = {1,2,3,4,5,6,7};
int main()
{
signed int d;
printf("Total Elements in the array are => %d\n",TOTAL_ELEMENTS);
for(d=-1;d <= (TOTAL_ELEMENTS-2);d++)
printf("%d\n",array[d+1]);
return 0;
}
Now obviously it does not get into the for loop.
Whats the reason?
The reason is that in C++ you're getting an implicit promotion. Even though d is declared as signed, when you compare it to (TOTAL_ELEMENTS-2) (which is unsigned due to sizeof), d gets promoted to unsigned. C++ has very specific rules which basically state that the unsigned value of d will then be the congruent unsigned value mod numeric_limits<unsigned>::max(). In this case, that comes out to the largest possible unsigned number which is clearly larger than the size of the array on the other side of the comparison.
Note that some compilers like g++ (with -Wall) can be told to warn about such comparisons so you can make sure that the code looks correct at compile time.
The program looks like it should throw a compile error. You're using "array" even before its definition. Switch the first two lines and it should be okay.