I have been programming for a long time and stumbled across something very weird.
Somewhere in my code I had the following code:
for(int i = 0; i < 512; i++){
if((i * 12391823) % 5 == 1){
std::cout << i << std::endl;
}
}
I have already tracked the problem down to this piece of code.
I am using CLion. When compiling a Debug build, the loop is not endless and finishes eventually after printing a few numbers.
Yet when building as Release, it seems to never exit the loop.
...
>>> 15968768
>>> 15968773
>>> 15968778
...
If (i * 12391823) is replaced with a different (smaller) number, this does not happen.
E.g. with (i * 123), it does exit nicely:
...
>>> 482
>>> 487
>>> 492
...
I have also looked at the compile output which displayed the following:
warning: iteration 174 invokes undefined behavior [-Waggressive-loop-optimizations]
16 | if((i * 12391823) % 5 == 1){
| ~~~^~~~~~~~~~~
I do wonder why this would lead to the loop not ending.
It seems to overflow, yet it is not supposed to change i and therefor end the loop at some point, right?
I am very happy for an explanation on this topic!
Greetings
FInn
I do wonder why this would lead to the loop not ending. It seems to overflow, yet it is not supposed to change i and therefor end the loop at some point, right?
The answer is "Undefined Behaviour is undefined". It can do anything. But, the fact that you get a warning from aggressive-loop-optimizations may hint at the reason why loop becomes endless. It is possible that compiler decides to modify your loop into
for(int i = 0; i < 512 * 12391823; i + 12391823){
if(i % 5 == 1){
std::cout << i << std::endl;
}
}
Or maybe even
for(int i = 24783646; i < 512 * 12391823; i + 12391823 * 5){
std::cout << i << std::endl;
}
Both options could act strangely when integer overflow happens.
The solution is to not get into Undefined Behaviour lands. You can do that for example by changing type of i from int to unsigned long long
for(unsigned long long i = 0; i < 512; i++)
You are hitting undefined behavior in the line:
if((i * 12391823) % 5 == 1){
This is because for i larger then 173 the multiplication result exceeds integer range.
When it comes to undefined behavior - you are at the liberty of the compiler. In optimized builds they tend to compile away some (even large) chunks of code in and around the expression which causes it. And as you just experienced - it may even spread to the code which by itself is correct (the exit condition in the for loop).
BTW integer overflow (as far as I know) causes undefined behavior only for signed integers, it is well defined for unsigned types (the result is truncated). You may want o try i of unsigned type (but this still MAY yield results which you didn't expect).
As explained in the comments, the values generated in the loop, specifically this sub-expression:
(i * 12391823)
overflows for larger values of i. This results in undefined behaviour.
The issue is resolved by using a wider type for i, like long long.
Note: If you use an unsigned type, they will wrap around rather than overflow, if their max limit is exceeded.
Related
I have the following code in C++
int factorial_recursivo(int factorial) {
if(factorial <= 1) return 1;
// To show all the factors in each iteration std::cout << factorial << std::endl;
return factorial * factorial_recursivo(--factorial);
}
However, if i write a number n, the result is the factorial of the number n-1.
If i change the last line of code the factorial_recursivo(--factorial) by factorial_recursivo(factorial - 1) works properly.
Why this happen? I even printed the factors in console and it correctly showed. Per example, with factoria_recursivo(5) i got 5 4 3 2, however the result was 24.
You should do value - 1 instead:
return factorial * factorial_recursivo(factorial-1);
Executing:
return factorial * factorial_recursivo(--factorial);
results in unsequenced modification and access to factorial. On my laptop it actually produces 6 regardless of what I input as a parameter. Thank you M.M for asking me to clarify this in the comments. This is an example of undefined behavior.
Unsequencing occurs when the end result depends on which operand gets executed first. A simple example of this would be: arr[j] = arr2[++j]; The result of this would depend on whichever gets executed first, whether it would be arr[j] or arr2[++j].
.
The program has undefined behaviour because the operands of * are unsequenced, and so there are unsequenced read and writes of factorial.
The problem is essentially the same as the cases in this thread: Undefined behavior and sequence points
I'm very new to C++ programming, and have written a simple program to calculate the factorial of an integer provided by the user. I am attempting to account for inputs which would cause an error, or do not make sense (e.g. I have accounted for input of a negative number/-1 already). I want to print out an error if the user enters a number whose factorial would be larger than the maximum integer size.
I started with:
if(factorial(n) > INT_MAX)
std::cout << "nope";
continue
I tested this with n = ~25 or 26 but it doesn't prevent the result from overflowing and printing out a large negative number instead.
Second, I tried assigning this to a variable using a function from the 'limits.h' header and then comparing the result of factorial(n) against this. Still no luck (you can see this solution in the code sample below).
I could of course assign the result to a long and test against that but you wouldn't have to go very far until you started to wrap around that value, either. I'd prefer to find a way to simply prevent the value from being printed if this happens.
#include <iostream>
#include <cstdlib>
#include <limits>
int factorial(int n)
{
auto total = 1;
for(auto i = 1; i <= n; i++)
{
total = total * i; //Product of all numbers up to n
}
return total;
}
int main()
{
auto input_toggle = true;
auto n = 0;
auto int_max_size = std::numeric_limits<int>::max();
while(input_toggle = true)
{
/* get user input, check it is an integer */
if (factorial(n) > int_max_size)
{
std::cout << "Error - Sorry, factorial of " << n << " is larger than \nthe maximum integer size supported by this system. " << std::endl;
continue;
}
/* else std::cout << factorial(n) << std::endl; */`
As with my other condition(s), I expect it to simply print out that small error message and then continue asking the user for input to calculate. The code does work, it just continues to print values that have wrapped around if I request the factorial of a value >25 or so. I feel this kind of error-checking will be quite useful.
Thanks!
You are trying to do things backwards.
First, no integer can actually be bigger than INT_MAX, by definition - this is a maximum value integer can be! So your condition factorial(n) > int_max_size is always going to be false.
Moreover, there is a logical flaw in your approach. You calculate the value first and than check if it is less than maximum value allowed. By that time it is too late! You have already calculated the value and went through any overflows you might have encountered. Any check you might be performing should be performed while you are still doing your calculations.
In essence, you need to check if multiplying X by Z will be within allowed range without actually doing the multiplication (unfortunately, C++ is very strict in leaving signed integer overflow undefined behavior, so you can't try and see.).
So how do you check if X * Y will be lesser than Z? One approach would be to divide Z by Y before engaging in calculation. If you end up with the number which is lesser than X, you know that multiplying X by Y will result in overflow.
I believe, you know have enough information to code the solution yourself.
I wrote a very trivial program to try to examine the undefined behavior attached to buffer overflows. Specifically, regarding what happens when you perform a read on data outside the allocated space.
#include <iostream>
#include<iomanip>
int main() {
int values[10];
for (int i = 0; i < 10; i++) {
values[i] = i;
}
std::cout << values << " ";
std::cout << std::endl;
for (int i = 0; i < 11; i++) {
//UB occurs here when values[i] is executed with i == 10
std::cout << std::setw(2) << i << "(" << (values + i) << "): " << values[i] << std::endl;
}
system("pause");
return 0;
}
When I run this program on Visual Studio, the results aren't terribly surprising: reading index 10 produces garbage:
000000000025FD70
0(000000000025FD70): 0
1(000000000025FD74): 1
2(000000000025FD78): 2
3(000000000025FD7C): 3
4(000000000025FD80): 4
5(000000000025FD84): 5
6(000000000025FD88): 6
7(000000000025FD8C): 7
8(000000000025FD90): 8
9(000000000025FD94): 9
10(000000000025FD98): -1966502944
Press any key to continue . . .
But when I fed this program into Ideone.com's online compiler, I got extremely bizarre behavior:
0xff8cac48
0(0xff8cac48): 0
1(0xff8cac4c): 1
2(0xff8cac50): 2
3(0xff8cac54): 3
4(0xff8cac58): 4
5(0xff8cac5c): 5
6(0xff8cac60): 6
7(0xff8cac64): 7
8(0xff8cac68): 8
9(0xff8cac6c): 9
10(0xff8cac70): 1
11(0xff8cac74): -7557836
12(0xff8cac78): -7557984
13(0xff8cac7c): 1435443200
14(0xff8cac80): 0
15(0xff8cac84): 0
16(0xff8cac88): 0
17(0xff8cac8c): 1434052387
18(0xff8cac90): 134515248
19(0xff8cac94): 0
20(0xff8cac98): 0
21(0xff8cac9c): 1434052387
22(0xff8caca0): 1
23(0xff8caca4): -7557836
24(0xff8caca8): -7557828
25(0xff8cacac): 1432254426
26(0xff8cacb0): 1
27(0xff8cacb4): -7557836
28(0xff8cacb8): -7557932
29(0xff8cacbc): 134520132
30(0xff8cacc0): 134513420
31(0xff8cacc4): 1435443200
32(0xff8cacc8): 0
33(0xff8caccc): 0
34(0xff8cacd0): 0
35(0xff8cacd4): 346972086
36(0xff8cacd8): -29697309
37(0xff8cacdc): 0
38(0xff8cace0): 0
39(0xff8cace4): 0
40(0xff8cace8): 1
41(0xff8cacec): 134514984
42(0xff8cacf0): 0
43(0xff8cacf4): 1432277024
44(0xff8cacf8): 1434052153
45(0xff8cacfc): 1432326144
46(0xff8cad00): 1
47(0xff8cad04): 134514984
...
//The heck?! This just ends with a Runtime Error after like 200 lines.
So apparently, with their compiler, overrunning the buffer by a single index causes the program to enter an infinite loop!
Now, to reiterate: I realize that I'm dealing with undefined behavior here. But despite that, I'd like to know what on earth is happening behind the scenes to cause this. The code that physically performs the buffer overrun is still performing a read of 4 bytes and writing whatever it reads to a (presumably better protected) buffer. What is the compiler/CPU doing that causes these issues?
There are two execution paths leading to the condition i < 11 being evaluated.
The first is before the initial loop iteration. Since i had been initialised to 0 just before the check, this is trivially true.
The second is after a successful loop iteration. Since the loop iteration caused values[i] to be accessed, and values only has 10 elements, this can only be valid if i < 10. And if i < 10, after i++, i < 11 must also be true.
This is what Ideone's compiler (GCC) is detecting. There is no way the condition i < 11 can ever be false unless you have an invalid program, therefore it can be optimised away. At the same time, your compiler doesn't go out of its way to check whether you might have an invalid program unless you provide additional options to tell it to do so (such as -fsanitize=undefined in GCC/clang).
This is a trade off implementations must make. They can favour understandable behaviour for invalid programs, or they can favour raw speed for valid programs. Or a mix of both. GCC definitely focuses greatly on the latter, at least by default.
A few days ago, I encountered what I believe to be a bug in g++ 5.3 concerning the nesting of for loops at higher -OX optimization levels. (Been experiencing it specifically for -O2 and -O3). The issue is that if you have two nested for loops, that have some internal sum to keep track of total iterations, once this sum exceeds its maximum value it prevents the outer loop from terminating. The smallest code set that I have been able to replicate this with is:
int main(){
int sum = 0;
// Value of 100 million. (2047483648 less than int32 max.)
int maxInner = 100000000;
int maxOuter = 30;
// 100million * 30 = 3 billion. (Larger than int32 max)
for(int i = 0; i < maxOuter; ++i)
{
for(int j = 0; j < maxInner; ++j)
{
++sum;
}
std::cout<<"i = "<<i<<" sum = "<<sum<<std::endl;
}
}
When this is compiled using g++ -o run.me main.cpp it runs just as expected outputting:
i = 0 sum = 100000000
i = 1 sum = 200000000
i = 2 sum = 300000000
i = 3 sum = 400000000
i = 4 sum = 500000000
i = 5 sum = 600000000
i = 6 sum = 700000000
i = 7 sum = 800000000
i = 8 sum = 900000000
i = 9 sum = 1000000000
i = 10 sum = 1100000000
i = 11 sum = 1200000000
i = 12 sum = 1300000000
i = 13 sum = 1400000000
i = 14 sum = 1500000000
i = 15 sum = 1600000000
i = 16 sum = 1700000000
i = 17 sum = 1800000000
i = 18 sum = 1900000000
i = 19 sum = 2000000000
i = 20 sum = 2100000000
i = 21 sum = -2094967296
i = 22 sum = -1994967296
i = 23 sum = -1894967296
i = 24 sum = -1794967296
i = 25 sum = -1694967296
i = 26 sum = -1594967296
i = 27 sum = -1494967296
i = 28 sum = -1394967296
i = 29 sum = -1294967296
However, when this is compiled using g++ -O2 -o run.me main.cpp, the outer loop fails to terminate. (This only occurs when maxInner * maxOuter > 2^31) While sum continually overflows, it shouldn't in any way affect the other variables. I have also tested this on Ideone.com with the test case demonstrated here: https://ideone.com/5MI5Jb
My question is thus twofold.
How is it possible for the value of sum to in some way effect the system? No decisions are based upon its value, it is merely utilized for the purposes of a counter and the std::cout statement.
What could possibly be causing the dramatically different outcomes at different optimization levels?
Thank you greatly in advance for taking the time to read and consider my question.
Note: This question differs from existing questions such as: Why does integer overflow on x86 with GCC cause an infinite loop? because the issue with that problem was an overflow for the sentinal variable. However, both sentinal variables in this question i and j never exceed the value of 100m let alone 2^31.
This is an optimisation that's perfectly valid for correct code. Your code isn't correct.
What GCC sees is that the only way the loop exit condition i >= maxOuter could ever be reached is if you have signed integer overflow during earlier loop iterations in your calculation of sum. The compiler assumes there isn't signed integer overflow, because signed integer overflow isn't allowed in standard C. Therefore, i < maxOuter can be optimised to just true.
This is controlled by the -faggressive-loop-optimizations flag. You should be able to get the behaviour you expect by adding -fno-aggressive-loop-optimizations to your command line arguments. But better would be making sure your code is valid. Use unsigned integer types to get guaranteed valid wraparound behaviour.
Your code invokes undefined behaviour, since the int sum overflows. You say "this shouldn't in any way affect the other variables". Wrong. Once you have undefined behaviour, all odds are off. Anything can happen.
gcc is (in)famous for optimisations that assume there is no undefined behaviour and do let's say interesting things if undefined behaviour happens.
Solution: Don't do it.
Answers
As #hvd pointed out, the problem is in your invalid code, not in the compiler.
During your program execution, the sum value overflows int range. Since int is by default signed and overflow of signed values causes undefined behavior* in C, the compiler is free to do anything. As someone noted somewhere, dragons could be flying out of your nose. The result is just undefined.
The difference -O2 causes is in testing the end condition. When the compiler optimizes your loop, it realizes that it can optimize away the inner loop, making it
int sum = 0;
for(int i = 0; i < maxOuter; i++) {
sum += maxInner;
std::cout<<"i = "<<i<<" sum = "<<sum<<std::endl;
}
and it may go further, transforming it to
int i = 0;
for(int sum = 0; sum < (maxInner * maxOuter); sum += maxInner) {
i++;
std::cout<<"i = "<<i<<" sum = "<<sum<<std::endl;
}
To be honest, I don't really know what it does, the point is, it can do just this. Or anything else, remember the dragons, your program causes undefined behavior.
Suddenly, your sum variable is used in the loop end condition. Note that for defined behavior, these optimizations are perfectly valid. If your sum was unsigned (and your maxInner and maxOuter), the (maxInner * maxOuter) value (which would also be unsigned) would be reached after maxOuter loops, because unsigned operations are defined** to overflow as expected.
Now since we're in the signed domain, the compiler is for one free to assume, that at all times sum < (maxInner * maxOuter), just because the latter overflows, and therefore is not defined. So the optimizing compiler can end up with something like
int i = 0;
for(int sum = 0;/* nothing here evaluates to true */; sum += maxInner) {
i++;
std::cout<<"i = "<<i<<" sum = "<<sum<<std::endl;
}
which looks like observed behavior.
*: According to the C11 standard draft, section 6.5 Expressions:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
**: According to the C11 standard draft, Annex H, H.2.2:
C’s unsigned integer types are ‘‘modulo’’ in the LIA−1 sense in that overflows or out-of-bounds results silently wrap.
I did some research on the topic. I compiled the code above with gcc and g++ (version 5.3.0 on Manjaro) and got some pretty interesting things of it.
Description
To successfully compile it with gcc (C compiler, that is), I have replaced
#include <iostream>
...
std::cout<<"i = "<<i<<" sum = "<<sum<<std::endl;
with
#include <stdio.h>
...
printf("i = %d sum = %d\n", i, sum);
and wrapped this replacement with #ifndef ORIG, so I could have both versions. Then I ran 8 compilations: {gcc,g++} x {-O2, ""} x {-DORIG=1,""}. This yields following results:
Results
gcc, -O2, -DORIG=1: Won't compile, missing <iostream>. Not surprising.
gcc, -O2, "": Produces compiler warning and behaves "normally". A look in the assembly shows that the inner loop is optimized out (j being incremented by 100000000) and the outer loop variable is compared with hardcoded value -1294967296. So, GCC can detect this and do some clever things while the program is working expectably. More importantly, warning is emitted to warn user about undefined behavior.
gcc, "", -DORIG=1: Won't compile, missing <iostream>. Not surprising.
gcc, "", "": Compiles without warning. No optimizations, program runs as expected.
g++, -O2, -DORIG=1: Compiles without warning, runs in endless loop. This is OP's original code running. C++ assembly is tough to follow for me. Addition of 100000000 is there though.
g++, -O2, "": Compiles with warning. It is enough to change how the output is printed to change compiler warning emiting. Runs "normally". By the assembly, AFAIK the inner loop gets optimized out. At least there is again comparison against -1294967296 and incrementation by 100000000.
g++, "", -DORIG=1: Compiles without warning. No optimization, runs "normally".
g++, "", "": dtto
The most interesting part for me was to find out the difference upon change of printing. Actually from all the combinations, only the one used by OP produces endless-loop program, the others fail to compile, do not optimize or optimize with warning and preserve sanity.
Code
Follows example build command and my full code
$ gcc -x c -Wall -Wextra -O2 -DORIG=1 -o gcc_opt_orig main.cpp
main.cpp:
#ifdef ORIG
#include <iostream>
#else
#include <stdio.h>
#endif
int main(){
int sum = 0;
// Value of 100 million. (2047483648 less than int32 max.)
int maxInner = 100000000;
int maxOuter = 30;
// 100million * 30 = 3 billion. (Larger than int32 max)
for(int i = 0; i < maxOuter; ++i)
{
for(int j = 0; j < maxInner; ++j)
{
++sum;
}
#ifdef ORIG
std::cout<<"i = "<<i<<" sum = "<<sum<<std::endl;
#else
printf("i = %d sum = %d\n", i, sum);
#endif
}
}
Consider the following piece of code. This function reads the some integers and strings from a file.
const int vardo_ilgis = 10;
void skaityti(int &n, int &m, int &tiriama, avys A[])
{
ifstream fd("test.txt");
fd >> n >> m >> tiriama;
fd.ignore(80, '\n');
char vard[vardo_ilgis]; // <---
for(int i = 1; i <= n; i++)
{
cout << i << ' ';
fd.get(vard, vardo_ilgis+1); // <---
cout << i << endl;
A[i].vardas = vard;
getline(fd, A[i].DNR);
}
fd.close();
}
and input:
4 6
4
Baltukas TAGCTT
Bailioji ATGCAA
Doli AGGCTC
Smarkuolis AATGAA
In this case, variable 'vard' has a length vardo_ilgis = 10, but in function fd.get the read input is vardo_ilgis+1 = 11 (larger than the variable length in which data is stored). I'm not asking how to fix a problem, because it's obvious not to read more than you can store on a variable.
However, I really want to understand the reason of this behaviour: the loop count variable gets decreased by fd.get. Why and how even can this happen? That's the output of this little piece of code:
1 0
1 0
1 0
1 0
1 1
2 2
3 3
4 4
Why did you use +1 ??
fd.get(vard, vardo_ilgis+1);
Overrunning that buffer corrupts some memory. In a simple unoptimized build, that corrupted memory could be the loop index.
the loop count variable gets decreased by fd.get. Why and how even can this happen?
Once you know why you have caused undefined behavior, many people say you aren't supposed to inquire into the details of that undefined behavior. I disagree. By understanding the details, you can improve your ability to diagnose other situations where you don't know what undefined behavior you might have invoked.
All you local variables are stored together, so overwriting one will tend to clobber another.
You describe the variable being "decreased" when in fact it was set to zero. The fact that it was 1 before being zeroed didn't affect its being zeroed. The undefined behavior happened to be equivalent to i&=~255; which for values under 256 is equal to i=0;. It is more accidental that you could see it as i--;
Hopefully it is clear why i stopped being zeroed once you ran out of input.
fd.get(vard, vardo_ilgis+1); makes buffer be written out-of-bounds.
In your case, the area where you write (and where you should not) is probably the same memory area where i is stored.
But, what's most important is that you end up with the so famous undetermined behaviour. Which mean anything could happen and there is no point trying to understand why or how (what happens is platform, compiler and even context specific, I don't think anyone can predict nor explain it).