I was just wondering how disastrous integer overflow really is. Take the following example program:
#include <iostream>
int main()
{
int a = 46341;
int b = a * a;
std::cout << "hello world\n";
}
Since a * a overflows on 32 bit platforms, and integer overflow triggers undefined behavior, do I have any guarantees at all that hello world will actually appear on my screen?
I removed the "signed" part from my question based on the following standard quotes:
(§5/5 C++03, §5/4 C++11) If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
(§3.9.1/4) Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer. This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
As pointed out by #Xeo in the comments (I actually brought it up in the C++ chat first):
Undefined behavior really means it and it can hit you when you least expect it.
The best example of this is here: Why does integer overflow on x86 with GCC cause an infinite loop?
On x86, signed integer overflow is just a simple wrap-around. So normally, you'd expect the same thing to happen in C or C++. However, the compiler can intervene - and use undefined behavior as an opportunity to optimize.
In the example taken from that question:
#include <iostream>
using namespace std;
int main(){
int i = 0x10000000;
int c = 0;
do{
c++;
i += i;
cout << i << endl;
}while (i > 0);
cout << c << endl;
return 0;
}
When compiled with GCC, GCC optimizes out the loop test and makes this an infinite loop.
You may trigger some hardware safety feature. So no, you don't have any guarantee.
Edit:
Note that gcc has the -ftrapv option (but it doesn't seem to work for me).
There are two views about undefined behavior. There is the view it is there to gather for strange hardware and other special cases, but that usually it should behave sanely. And there is the view that anything can happen. And depending on the UB source, some hold different opinions.
While the UB about overflow has probably been introduced for taking into account hardware which trap or saturate on overflow and the difference of result between representation, and so one can argue for the first view in this case, people writing optimizers hold very dearly the view that if the standard doesn't guarantee something, really anything can happen and they try to use every piece of liberty to generate machine code which runs more rapidly, even if the result doesn't make sense anymore.
So when you see an undefined behavior, assume that anything can happen, however reasonable a given behavior may seem.
Related
Consider the following program:
#include <iostream>
int main()
{
unsigned int a = 3;
unsigned int b = 7;
std::cout << (a - b) << std::endl; // underflow here!
return 0;
}
In the line starting with std::cout an underflow is happening because a is lesser than b so a-b is less than 0, but since a and b are unsigend so is a-b.
Is there a compiler flag (for G++) that gives me a warning when I try to calculate the difference of two unsigend integers?
Now, one could argue that an overflow/underflow can happen in any calculation using any operator. But I think it is more dangerous to apply operator - to unsigend ints because with unsigned integers this error may happen with quite low (to me: "more common") numbers.
A (static analysis) tool that finds such things would also be great but I much prefer a compiler flag and warning.
GCC does not (afaict) support it, but Clang's UBSanitizer has the following option [emphasis mine]:
-fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional. This sanitizer does not check for lossy implicit conversions performed before such a computation
In my software I am using the input values from the user at run time and performing some mathematical operations. Consider for simplicity below example:
int multiply(const int a, const int b)
{
if(a >= INT_MAX || B >= INT_MAX)
return 0;
else
return a*b;
}
I can check if the input values are greater than the limits, but how do I check if the result will be out of limits? It is quite possible that a = INT_MAX - 1 and b = 2. Since the inputs are perfectly valid, it will execute the undefined code which makes my program meaningless. This means any code executed after this will be random and eventually may result in crash. So how do I protect my program in such cases?
This really comes down to what you actually want to do in this case.
For a machine where long or long long (or int64_t) is a 64-bit value, and int is a 32-bit value, you could do (I'm assuming long is 64 bit here):
long x = static_cast<long>(a) * b;
if (x > MAX_INT || x < MIN_INT)
return 0;
else
return static_cast<int>(x);
By casting one value to long, the other will have to be converted as well. You can cast both if that makes you happier. The overhead here, above a normal 32-bit multiply is a couple of clock-cycles on modern CPU's, and it's unlikely that you can find a safer solution, that is also faster. [You can, in some compilers, add attributes to the if saying that it's unlikely to encourage branch prediction "to get it right" for the common case of returning x]
Obviously, this won't work for values where the type is as big as the biggest integer you can deal with (although you could possibly use floating point, but it may still be a bit dodgy, since the precision of float is not sufficient - could be done using some "safety margin" tho' [e.g. compare to less than LONG_INT_MAX / 2], if you don't need the entire range of integers.). Penalty here is a bit worse tho', especially transitions between float and integer isn't "pleasant".
Another alternative is to actually test the relevant code, with "known invalid values", and as long as the rest of the code is "ok" with it. Make sure you test this with the relevant compiler settings, as changing the compiler options will change the behaviour. Note that your code then has to deal with "what do we do when 65536 * 100000 is a negative number", and your code didn't expect so. Perhaps add something like:
int x = a * b;
if (x < 0) return 0;
[But this only works if you don't expect negative results, of course]
You could also inspect the assembly code generated and understand the architecture of the actual processor [the key here is to understand if "overflow will trap" - which it won't by default in x86, ARM, 68K, 29K. I think MIPS has an option of "trap on overflow"], and determine whether it's likely to cause a problem [1], and add something like
#if (defined(__X86__) || defined(__ARM__))
#error This code needs inspecting for correct behaviour
#endif
return a * b;
One problem with this approach, however, is that even the slightest changes in code, or compiler version may alter the outcome, so it's important to couple this with the testing approach above (and make sure you test the ACTUAL production code, not some hacked up mini-example).
[1] The "undefined behaviour" is undefined to allow C to "work" on processors that have trapping overflows of integer math, as well as the fact that that a * b when it overflows in a signed value is of course hard to determine unless you have a defined math system (two's complement, one's complement, distinct sign bit) - so to avoid "defining" the exact behaviour in these cases, the C standard says "It's undefined". It doesn't mean that it will definitely go bad.
Specifically for the multiplication of a by b the mathematically correct way to detect if it will overflow is to calculate log₂ of both values. If their sum is higher than the log₂ of the highest representable value of the result, then there is overflow.
log₂(a) + log₂(b) < log₂(UINT_MAX)
The difficulty is to calculate quickly the log₂ of an integer. For that, there are several bit twiddling hacks that can be used, like counting bit, counting leading zeros (some processors even have instructions for that). This site has several implementations
https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious
The simplest implementation could be:
unsigned int log2(unsigned int v)
{
unsigned int r = 0;
while (v >>= 1)
r++;
return r;
}
In your program you only need to check then
if(log2(a) + log2(b) < MYLOG2UINTMAX)
return a*b;
else
printf("Overflow");
The signed case is similar but has to take care of the negative case specifically.
EDIT: My solution is not complete and has an error which makes the test more severe than necessary. The equation works in reality if the log₂ function returns a floating point value. In the implementation I limited thevalue to unsigned integers. This means that completely valid multiplication get refused. Why? Because log2(UINT_MAX) is truncated
log₂(UINT_MAX)=log₂(4294967295)≈31.9999999997 truncated to 31.
We have there for to change the implementation to replace the constant to compare to
#define MYLOG2UINTMAX (CHAR_BIT*sizeof (unsigned int))
You may try this:
if ( b > ULONG_MAX / a ) // Need to check a != 0 before this division
return 0; //a*b invoke UB
else
return a*b;
Hi I am new in here so please let me know if anything is wrong and I will try to better the next time .
I am trying to understand how underflow and overflow works in C++ .My understanding is if a variable's range is exceeded it will start from the other end of the range . Thus if minimum of short is -32768 and if we do a -1 to it the new value should be SHRT_MAX .(32767)
Here is my code:
#include<iostream.h>
#include<limits.h>
#include<conio.h>
int main ( void )
{
int testpositive =INT_MIN ;
short testnegative = SHRT_MIN ;
cout<< SHRT_MIN<<"\n";
cout << testnegative-1<<"\n";
cout << INT_MIN << "\n";
cout << testpositive-1 << "\n";
cout<<testpositive-2;
getch();
return 0;
}
The exact behavior on overflow/underflow is only specified for unsigned types.
Unsigned integers shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer.
Source: Draft N3690 §3.9.1 sentence 4
This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the
resulting unsigned integer type.
Source: Draft N3690 Note 47 for §3.9.1
For normal signed integer types instead the C++ standard simply says than anything can happen.
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined
Source: Draft N3690 §5 sentence 4
If we're talking about x86 processor (or most other modern processors) indeed the behavior is exactly what you describe and for the CPU there is no difference between a signed value or an unsigned value (there are signed and unsigned operations, but the value themselves are just bits).
Note that compilers can assume (and most modern optimizing compilers actually DO assume) that no signed integer overflow can occur in a correct program and for example in code like:
int do_something();
int do_something_else();
void foo() {
int x = do_something();
int y = x + 1;
if (x < y) {
do_something();
} else {
do_something_else();
}
}
a compiler is free to skip the test and the else branch in the generated code completely because in a valid program a signed int x is always less than x+1 (as signed overflow cannot be considered valid behavior).
If you replace int with unsigned int however the compiler must generate code for the test and for the else branch because for unsigned types it's possible that x > x+1.
For example clang compiles the code for foo to
foo(): # #foo()
push rax
call do_something()
pop rax
jmp do_something() # TAILCALL
where you can see that the ode just calls do_something twice (except for the strange handling of rax) and no mention of do_something_else is actually present. More or less the same code is generated by gcc.
Signed overflows are undefined behavior in C++.
For example:
INT_MIN - 1
-INT_MIN
are expressions that invoke undefined behavior.
SHRT_MIN - 1 and -SHRT_MIN are not undefined behavior in an environment with 16-bit short and 32-bit int because with integer promotions the operand is promoted to int first. In an environment with 16-bit short and int, these expressions are also undefined behavior.
Typically yes. But since this is C++, and C++ is regulated by the C++ standard, you must know that overflows are undefined behavior.
Although what you stated probably applies on most platforms, it's in no way guaranteed, so don't rely on it.
The new value need not be SHRT_MAX it is undefined.
I know that when overflow occurs in C/C++, normal behavior is to wrap-around. For example, INT_MAX+1 is an overflow.
Is possible to modify this behavior, so binary addition takes place as normal addition and there is no wraparound at the end of addition operation ?
Some Code so this would make sense. Basically, this is one bit (full) added, it adds bit by bit in 32
int adder(int x, int y)
{
int sum;
for (int i = 0; i < 31; i++)
{
sum = x ^ y;
int carry = x & y;
x = sum;
y = carry << 1;
}
return sum;
}
If I try to adder(INT_MAX, 1); it actually overflows, even though, I amn't using + operator.
Thanks !
Overflow means that the result of an addition would exceed std::numeric_limits<int>::max() (back in C days, we used INT_MAX). Performing such an addition results in undefined behavior. The machine could crash and still comply with the C++ standard. Although you're more likely to get INT_MIN as a result, there's really no advantage to depending on any result at all.
The solution is to perform subtraction instead of addition, to prevent overflow and take a special case:
if ( number > std::numeric_limits< int >::max() - 1 ) { // ie number + 1 > max
// fix things so "normal" math happens, in this case saturation.
} else {
++ number;
}
Without knowing the desired result, I can't be more specific about the it. The performance impact should be minimal, as a rarely-taken branch can usually be retired in parallel with subsequent instructions without delaying them.
Edit: To simply do math without worrying about overflow or handling it yourself, use a bignum library such as GMP. It's quite portable, and usually the best on any given platform. It has C and C++ interfaces. Do not write your own assembly. The result would be unportable, suboptimal, and the interface would be your responsibility!
No, you have to add them manually to check for overflow.
What do you want the result of INT_MAX + 1 to be? You can only fit INT_MAX into an int, so if you add one to it, the result is not going to be one greater. (Edit: On common platforms such as x86 it is going to wrap to the largest negative number: -(INT_MAX+1). The only way to get bigger numbers is to use a larger variable.
Assuming int is 4-bytes (as is typical on x86 compilers) and you are executing an add instruction (in 32-bit mode), the destination register simply does overflow -- it is out of bits and can't hold a larger value. It is a limitation of the hardware.
To get around this, you can hand-code, or use an aribitrarily-sized integer library that does the following:
First perform a normal add instruction on the lowest-order words. If overflow occurs, the Carry flag is set.
For each increasingly-higher-order word, use the adc instruction, which adds the two operands as usual, but takes into account the value of the Carry flag (as a value of 1.)
You can see this for a 64-bit value here.
According to C++ Standard (5/5) dividing by zero is undefined behavior. Now consider this code (lots of useless statements are there to prevent the compiler from optimizing code out):
int main()
{
char buffer[1] = {};
int len = strlen( buffer );
if( len / 0 ) {
rand();
}
}
Visual C++ compiles the if-statement like this:
sub eax,edx
cdq
xor ecx,ecx
idiv eax,ecx
test eax,eax
je wmain+2Ah (40102Ah)
call rand
Clearly the compiler sees that the code is to divide by zero - it uses xor x,x pattern to zero out ecx which then serves the second operand in integer division. This code will definitely trigger an "integer division by zero" error at runtime.
IMO such cases (when the compiler knows that the code will divide by zero at all times) are worth a compile-time error - the Standard doesn't prohibit that. That would help diagnose such cases at compile time instead of at runtime.
However I talked to several other developers and they seem to disagree - their objection is "what if the author wanted to divide by zero to... emm... test error handling?"
Intentionally dividing by zero without compiler awareness is not that hard - using __declspec(noinline) Visual C++ specific function decorator:
__declspec(noinline)
void divide( int what, int byWhat )
{
if( what/byWhat ) {
rand();
}
}
void divideByZero()
{
divide( 0, 0 );
}
which is much more readable and maintainable. One can use that function when he "needs to test error handling" and have a nice compile-time error in all other cases.
Am I missing something? Is it necessary to allow emission of code that the compiler knows divides by zero?
There is probably code out there which has accidental division by zero in functions which are never called (e.g. because of some platform-specific macro expansion), and these would no longer compile with your compiler, making your compiler less useful.
Also, most division by zero errors that I've seen in real code are input-dependent, or at least are not really amenable to static analysis. Maybe it's not worth the effort of performing the check.
Dividing by 0 is undefined behavior because it might trigger, on certain platforms, a hardware exception. We could all wish for a better behaved hardware, but since nobody ever saw fit to have integers with -INF/+INF and NaN values, it's quite pointeless.
Now, because it's undefined behavior, interesting things may happen. I encourage you to read Chris Lattner's articles on undefined behavior and optimizations, I'll just give a quick example here:
int foo(char* buf, int i) {
if (5 / i == 3) {
return 1;
}
if (buf != buf + i) {
return 2;
}
return 0;
}
Because i is used as a divisor, then it is not 0. Therefore, the second if is trivially true and can be optimized away.
In the face of such transformations, anyone hoping for a sane behavior of a division by 0... will be harshly disappointed.
In the case of integral types (int, short, long, etc.) I can't think of any uses for intentional divide by zero offhand.
However, for floating point types on IEEE-compliant hardware, explicit divide by zero is tremendously useful. You can use it to produce positive & negative infinity (+/- 1/0), and not a number (NaN, 0/0) values, which can be quite helpful.
In the case of sorting algorithms, you can use the infinities as initial values representing greater or less than all possible values.
For data analysis purposes, you can use NaNs to indicate missing or invalid data, which can then be handled gracefully. Matlab, for example, uses explicit NaN values to suppress missing data in plots, etc.
Although you can access these values through macros and std::numeric_limits (in C++), it is useful to be able to create them on your own (and allows you to avoid lots of "special case" code). It also allows implementors of the standard library to avoid resorting to hackery (such as manual assembly of the correct FP bit sequence) to provide these values.
If the compiler detects a division-by-0, there is absolutely nothing wrong with a compiler error. The developers you talked to are wrong - you could apply that logic to every single compile error. There is no point in ever dividing by 0.
Detecting divisions by zero at compile-time is the sort of thing that you'd want to have be a compiler warning. That's definitely a nice idea.
I don't keep no company with Microsoft Visual C++, but G++ 4.2.1 does do such checking. Try compiling:
#include <iostream>
int main() {
int x = 1;
int y = x / 0;
std::cout << y;
return 0;
}
And it will tell you:
test.cpp: In function ‘int main()’:
test.cpp:5: warning: division by zero in ‘x / 0’
But considering it an error is a slippery slope that the savvy know not to spend too much of their spare time climbing. Consider why G++ doesn't have anything to say when I write:
int main() {
while (true) {
}
return 0;
}
Do you think it should compile that, or give an error? Should it always give a warning? If you think it must intervene on all such cases, I eagerly await your copy of the compiler you've written that only compiles programs that guarantee successful termination! :-)