g++ "warning: iteration ... invokes undefined behavior" for Seemingly Unrelated Variable - c++

Consider the following code in strange.cpp:
#include <vector>
using namespace std;
int i = 0;
int *bar()
{
++i;
return &i;
}
int main()
{
for(size_t j = 0; j < 99999999999; ++j) // (*)
{
const auto p = bar();
if(!p) // (**)
return -1;
}
}
Compiling this with g++ gives a warning:
$ g++ --std=c++11 -O3 strange.cpp
strange.cpp: In function ‘int main()’:
strange.cpp:12:12: warning: iteration 4294967296ul invokes undefined behavior [-Waggressive-loop-optimizations]
++i;
^
strange.cpp:19:9: note: containing loop
for(size_t j = 0; j < 99999999999; ++j) // (*)
^
I don't understand why the increment invokes undefined behavior. Moreover, there are two changes, each of which makes the warning disappear:
changing the line (*) to for(int j...
changing the line (**) to if(!*p)
What is the meaning of this warning, and why are the changes relevant to it?
Note
$ g++ --version
g++ (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4

The increment is undefined because once i reaches std::numeric_limits<int>::max() (231 - 1 on a 32-bit, LP64 or LLP64 platform), incrementing it will overflow, which is undefined behavior for signed integral types.
gcc is warning on iteration 4294967296ul (232) rather than iteration 2147483646u (231) as you might expect, because it doesn't know the initial value of i; some other code might have run before main to set i to something other than 0. But once main is entered, no other code can run to alter i, and so once 232 iterations have completed it will have at some point reached 231 - 1 and overflowed.
"fixes" it by turning the controlling condition of the loop into a tautologically true expression; this makes the loop an infinite loop, since the if inside the loop will never execute, as &i cannot be a null pointer. Infinite loops can be optimized away, so gcc eliminates the body of the loop and the integer overflow of i does not occur.
"fixes" it by allowing gcc an out from the undefined behavior of integer overflow. The only way to prevent integer overflow is for i to have an initial value that is negative, such that at some point i reaches zero. This is possible (see above), and the only alternative is undefined behavior, so it must happen. So i reaches zero, the if inside the loop executes, and main returns -1.

Related

Sequence point, function call and undefined behaviour

main.cpp
const int& f(int& i ) { return (++i);}
int main(){
int i = 10;
int a = i++ + i++; //undefined behavior
int b = f(i) + f(i); //but this is not
}
compile
$ g++ main.cpp -Wsequence-point
statement int a = i++ + i++; is undefined behaviour.
statement int b = f(i) + f(i); is not undefined .
why?
statement int b = f(i) + f(i); is not undefined . why?
No, the second statement will result in unspecified behavior. You can confirm this here. As you'll see in the above linked demo, gcc gives the output as 23 while msvc gives 24 for the same program.
Pre-c++11, we use Sequence_point_rules:
Pre-C++11 Undefined behavior
Between the previous and next sequence point, the value of any object in a memory location must be modified at most once by the evaluation of an expression, otherwise the behavior is undefined.
Pre-C++11 Rules
3) There is a sequence point after the copying of a returned value of a function and before the execution of any expressions outside the function.
In int a = i++ + i++;, i is modified twice
In int b = f(i) + f(i);, there are sequence point with function call. i is modified only once between the sequence call. so no UB.
Note though that order of evaluation is unspecified so (evaluation of result) f(i) might happens after or before the second f(i), which might lead to different result depending of optimization/compiler and even between call.
Since C++11, we use "Sequenced before" rules, which is a similar way disallows i++ + i++ but allows f(i) + f(i).

Why does my for loop not stop when I use an int32_t loop variable? [duplicate]

This question already has answers here:
Why is this loop being endless?
(3 answers)
Is signed integer overflow still undefined behavior in C++?
(3 answers)
Why does integer overflow work differently inside of if statement conditional?
(2 answers)
Why does integer overflow on x86 with GCC cause an infinite loop?
(6 answers)
Closed 10 months ago.
In my program, I found the loop unable to exit correctly when i is int32_t. It seems like integer overflow, and is much larger than 10, and the loop does not stop.
Please tell me what happened and how I can avoid this error in a large project.
#include <iostream>
#include <stdint.h>
int f(int n){
for (int32_t i = 0; i < 10; ++i)
{
int64_t time = 4500000000 + (i) * 500000000;
std::cout << time<< " i: " << i << std::endl;
}
return 0;
}
int main ()
{
return f(10);
}
code link
If you use GCC 11.2.0 and the -Wall -O2 options, you will see a warning about undefined behavior:
test.cpp: In function 'int f(int)':
test.cpp:7:42: warning: iteration 5 invokes undefined behavior [-Waggressive-loop-optimizations]
7 | int64_t time = 4500000000 + (i) * 500000000;
| ~~~~^~~~~~~~~~~
test.cpp:5:27: note: within this loop
5 | for (int32_t i = 0; i < 10; ++i)
| ~~^~~~
The compiler knows that 5 * 500000000 is too large to fit in an int (which is typically 32-bit). Signed integer overflow is undefined behavior in C++. Therefore, the compiler is free to assume that this overflow never happens, so it will assume that i can never reach 10, so it can get rid of the part of your for loop that checks i < 10. I know that sounds crazy, but if your program does undefined behavior, the compiler is free to do whatever it wants.
Just add some casts to specify that you want to do 64-bit arithmetic. This eliminates the warnings, the overflows, and the undefined behavior:
int64_t time = (int64_t)4500000000 + i * (int64_t)500000000;
Update: For a larger project that could have more bugs, you might consider using GCC's -fwrapv option, which makes the behavior of signed integer overflow be defined. You could also use -fsanitize=signed-integer-overflow or -fsanitize=undefined to detect these issues at run time, if your toolchain supports those options.

Undefined behaviour accessing const ptr sometimes

I have a header file defined as
#pragma once
#include <iostream>
template<int size>
struct B
{
double arr[size * size];
constexpr B() : arr()
{
arr[0] = 1.;
}
};
template<int size>
struct A
{
const double* arr = B<size>().arr;
void print()
{
// including this statement also causes undefined behaviour on subsequent lines
//printf("%i\n", arr);
printf("%f\n", arr[0]);
printf("%f\n", arr[0]); // ???
// prevent optimisation
for (int i = 0; i < size * size; i++)
printf("%f ", arr[i]);
}
};
and call it with
auto a = A<8>();
a.print();
Now this code only runs expectedly when compiled with msvc release mode (all compiled with c++17).
expected output:
1.000000
1.000000
msvc debug:
1.000000
-92559631349317830736831783200707727132248687965119994463780864.000000
gcc via mingw (with and without -g):
1.000000
0.000000
However, this behaviour is inconsistent. The expected output is given if I replace double arr[size * size] with double arr[size] instead. No more problems if I allocate arr on the heap of course.
I looked at the assembly of the msvc debug build but I don't see anything out of the ordinary. Why does this undefined behaviour only occur sometimes?
asm output
decompiled msvc release
In this declaration
const double* arr = B<size>().arr;
there is declared a pointer to (the first element of ) a temporary array that will not be alive after the declaration
So dereferencing the pointer results in undefined behavior.
When you wrote:
const double* arr = B<size>().arr;
The above statement initializes a pointer to const double i.e., const double* named arr with a temporary array object. Since this temporary array object will be destroyed at the end of the full-expression, using arr will lead to undefined behavior.
Why does this undefined behaviour only occur sometimes?
Undefined behavior means anything1 can happen including but not limited to the program giving your expected output. But never rely(or make conclusions based) on the output of a program that has undefined behavior.
So the output that you're seeing(maybe seeing) is a result of undefined behavior. And as i said don't rely on the output of a program that has UB. The program may just crash.
So the first step to make the program correct would be to remove UB. Then and only then you can start reasoning about the output of the program.
1For a more technically accurate definition of undefined behavior see this where it is mentioned that: there are no restrictions on the behavior of the program.
It seems that it was completely coincidental that smaller allocations were always addressed in a spot that would not get erased by the rep stosd instruction present in printf. Not caused by strange compiler optimisations as I first thought it was.
What does the "rep stos" x86 assembly instruction sequence do?
I also have no idea why I decided to do it this way. Not exactly the question I asked but I ultimately wanted a compile time lookup table so the real solution was static inline constexpr auto arr = B<size>() on c++20. Which is why the code looks strange.

is this for loop correct and infinite and why?

so i had this segment of code in my C++ test today:
for(int i=10;i++;i<15){cout<<i;}
what is this supposed to output? and why ?
Thanks!
A for loop will run until either:
its 2nd argument evaluates as 0/false.
the loop body calls break or return, or throws an exception, or otherwise exits the calling thread.
The code you have shown may or may not loop indefinitely. However, it will not loop the 5 times you might be expected, because the 2nd and 3rd arguments of the for statement are reversed.
The loop should look like this:
for(int i = 10; i < 15; i++)
However, in the form you have shown:
for(int i = 10; i++; i < 15)
The loop will continue running as long as i is not 0. Depending on how the compiler interprets i++ as a loop condition, it may recognize that this will lead to overflow and just decide to ignore i and make the loop run indefinitely, or it may actually increment i and let overflow happen.
In the latter case, on every loop iteration, i will be incremented, and eventually i will overflow past the maximum value that an int can hold. At that time, i will usually wrap around to a negative value, since int is a signed type (though overflow behavior is not officially defined by the C++ standard, so the wrap is not guaranteed). Then the loop will keep running since i is not 0, incrementing i each time until it eventually reaches 0, thus breaking the loop.
In either case, the loop will end up calling cout << i many thousands/millions of times (depending on the byte size of int), or call it forever until the program is terminated.
what is this supposed to output? and why ?
That unconditionally do an signed int overflow.
So it is is Undefined behavior and can do anything.
For practical purposes, with any modern compiler this loop will continue forever. The reason for that is that the code is syntactically correct (however very incorrect semantically).
for(int i = 10; i++; i < 15)
means: start with i equal to 10. Check if i++ is true (which it will be, since integers are convertible to booleans, with non-0 values converted to true). Proceed with loop body, on every iteration performing comparison of i and 15 (just comparing, not checking the result, this is your increment expression), incrementing i and checking if it is non-0.
Since compilers understand that signed integers never overflow, i++ could never go to 0 when started with 10. As a result, optimizing compiler will remove the check altogether, and turn it into infinite loop.
Last, but not the least, learn to love compiler warnings. In particular, this code produces following:
<source>:4:29: warning: for increment expression has no effect [-Wunused-value]
for (int i = 10; i++; i < 15) {
~~^~~~
<source>:4:23: warning: iteration 2147483637 invokes undefined behavior [-Waggressive-loop-optimizations]
for (int i = 10; i++; i < 15) {
~^~
<source>:4:23: note: within this loop
for (int i = 10; i++; i < 15) {
^~

Does while(i--) s+= a[i]; contain undefined behavior in C and C++?

Consider simple code:
#include "stdio.h"
#define N 10U
int main() {
int a[N] = {0};
unsigned int i = N;
int s = 0;
// Fill a
while(i--)
s += a[i];
printf("Sum is %d\n", s);
return 0;
}
Does while loop contain undefined behavior because of integer underflow? Do compilers have right to assume that while loop condition is always true because of that and end up with endless loop?
What if i is signed int? Doesn't it contain pitfalls related to array access?
Update
I run this and similar code many times and it worked fine. Moreover, it's popular way to iterate over arrays and vectors backwards. I'm asking this question to make sure that this way is OK from point of view of standard.
At glance, it's obviously not infinite. On other hand, sometimes compiler can "optimize" away some conditions and code assuming that code contains no undefined behavior. It can lead to infinite loops and other unwanted consequences. See this.
This code doesn't invoke undefined behavior. The loop will be terminated once i becomes 0.
For unsigned int, there is no integer over/underflow. The effect will be same with i as signed except there will no wrapping in this case.
Does while loop contain undefined behavior because of integer underflow?
No, overflow/underflow is only undefined behavior in case of signed integers.
Do compilers have right to assume that while loop condition is always true because of that and end up with endless loop?
No, because the expression will eventually turn out to be zero.
What if i is signed int? Doesn't it contain pitfalls related to array access?
If it is signed and over/underflows, you invoke undefined behavior.
The loop does not yield undefined behaviour for the following reasons.
i is initialised to 10, and decremented in the loop. When i has a value of zero, decrementing it produces a value equal to UINT_MAX - the largest value an unsigned can represent, but the loop will terminate. a[i] will only ever be accessed (within the loop) for values of i between N-1 (i.e. 9) and 0. Those are all valid indices in array a.
s and all the elements of a are initialized to zero. So all the additions add 0 to 0. That will never overflow nor underflow an int, so can never result in undefined behaviour.
If i is changed to signed int, the decrementing never underflows, and i will have a negative value when the loop terminates. The only net change is in the value that i has after the loop.
i will wrap around to ~0 (0XFFFFFFFF for 32 bits) but the loop will terminate so there is no UB.
Your code is well-behaved and contains no undefined behavior for any value of N ≥ 0.
Let us focus on the while-loop, and trace through an execution example with N = 2.
// Initialization
#define N 2U
unsigned int i = N; // i = 2
// First iteration
while(i--) // condition = i = 2 = true; i post-decremented to 1
s += a[i]; // i = 1 is in bounds
// Second iteration
while(i--) // condition = i = 1 = true; i post-decremented to 0
s += a[i]; // i = 0 is in bounds
// Third iteration
while(i--) // condition = i = 0 = false; i post-decremented to 0xFFFFFFFF
// Loop terminated
We can see from this trace that a[i] is always in bounds, and i experiences an unsigned wraparound when decrementing from 0, which is perfectly well-defined.
To answer your second question, if we changed the type of i to signed int, the only behavior that changes in the example trace is that the loop terminates at the same place but i gets decremented to -1, which is also perfectly well-defined.
Thus in conclusion, your code is well-behaved assuming that N ≥ 0, whether i is unsigned int or signed int. (If N is negative and i is signed int, then the loop will keep decrementing until undefined behavior happens at INT_MIN.)
Does while loop contain undefined behavior because of integer underflow?
First off this is a Boolean Conversion:
A prvalue of integral, floating-point, unscoped enumeration, pointer, and pointer-to-member types can be converted to a prvalue of type bool.
The value zero (for integral, floating-point, and unscoped enumeration) and the null pointer and the null pointer-to-member values become false. All other values become true.
So there will be no integer underflow if i is properly initialized, it will reach 0U and that will be cast to a false value.
So what the compiler is effectively doing here is: while(static_cast<bool>(i--))
Do compilers have right to assume that while loop condition is always true because of that and end up with endless loop?
The key reason this isn't an endless loop is that the postfix decrement operator returns the value of the decremented variable. It's defined as T operator--(T&, int) And as discussed in previously the compiler is going to evaluate whether that value is 0U as part of the conversion to bool.
What the compiler is effectively doing here is: while(0 != operator--(i, 1))
In your update you cite this code as a motivation for your question:
void fn(void)
{
/* write something after this comment so that the program output is 10 */
int a[1] = {0};
int j = 0;
while(a[j] != 5) ++j; /* Search stack until you find 5 */
a[j] = 10; /* Overwrite it with 10 */
/* write something before this comment */
}
Upon inspection, that entire program has undefined behavior there is only 1 element of a and it's initialized to 0. So for any index other than 0, a[j] is looking off the end of the array. The will continue till a 5 is found or until the OS errors because the program has read from protected memory. This is unlike your loops condition which will exit when the postfix decrement operator returns a value of 0, so there can't be an assumption that this is always true, or that the loop will go on infinitely.
What if i is signed int?
Both of the above conversions are built-in to C++. And they are both defined for all integral types, signed and unsigned. So ultimately this expands to:
while(static_cast<bool>(operator--(i, 1)))
Doesn't it contain pitfalls related to array access?
Again this is calling a built-in operator, the subscript operator: T& operator[](T*, std::ptrdiff_t)
So a[i] is the equivalent of calling operator(a, static_cast<ptrdiff_t>(i))
So the obvious followup question is what's a ptrdiff_t? It's a implementation defined integer, but as such each implementation of the standard is responsible for defining conversions to and from this type, so i will cast correctly.