Different output calling malloc on clang applying compiler options -00 vs -03 - c++

The following code running compiler options -O3 vs -O0 results different output:
#include <stdlib.h>
#include <stdio.h>
int main(){
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d", *p, *q);
return 0;
}
I was very surprised with the outcome.
Compiling with clang 3.4, 3.5 (http://goo.gl/sDLvrq)
using compiler options -O0 — output: 2 2
using compiler options -O3 — output: 1 2
Is it a bug?
Interestingly if I modify the code slightly
(http://goo.gl/QwrozF) it behaves as expected.
int *p = (int*)malloc(sizeof(int));
*p = 1;
Testing it on gcc seems to work fine.

After the realloc, p is no longer valid.

Assuming both of the allocations are successful, q points to an allocated region of memory and p is an invalid pointer. The standard treats realloc and free as deallocation routines, and if successful, the address the pointer held can no longer be used. If the call to realloc fails for some reason, the original memory is still valid (but of course q isn't, it's NULL).
Although you compare p and q, you've already written to an invalid pointer, so all bets are off.
What's probably happening here is that the O3 setting is causing the compiler to ignore the pointers and just substitute numbers inline. High optimisation means a compiler can take all sorts of short cuts and ignore statements so long as it guarantees the same result - the condition being that all of the code is well defined.

Related

Casting array pointer to another type gives different output for different compiler optimizations

I was trying the reinterpret_cast in C++, but then I noticed an inconsistency with it. It gave different outputs for different optimization levels. Then I tried the C-version of it and it gave the wrong output again.
This is the C code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char const *argv[]) {
unsigned long long* arr = (unsigned long long*)malloc(16);
arr[0] = 0x300000061;
arr[1] = 9;
int* casted = (int*)(arr);
for (int i = 0; i < 3; ++i)
printf("%d\n", casted[i]);
return 0;
}
Notice the int* casted = (int*)(arr); where it casts the pointer. When instead casting to char* and increasing 3 to 12 in the for-loop it gives the output I expect.
Output with O1-O3 flag:
0
0
0
Output without any O flag:
97
3
9
Output with char (with or without optimization):
97
0
0
0
3
0
0
0
9
0
0
0
The second output is what I would expect. Is this kind of pointer casting undefined behaviour or is it a compiler bug?
I use the WSL gcc compiler.
Edit:
Thanks for the quick response. Is there a way to write an asm function to get the desired output? I know I could use memcpy() instead but I need to use this in a specific problem I can't explain easily, so I'd rather not.
Speaking to the C program presented:
The second output is what I would expect. Is this kind of pointer casting undefined behaviour or is it a compiler bug?
The casting has well-defined behavior, but reading out the data via the resulting int * is a violation of the strict-aliasing rule (C17 paragraph 6.5/7). That produces undefined behavior. On the other hand, it is allowed to read the representation of any object via a char *, so that variation is ok (at least in C).
Observable behavior changing at different optimization levels is one of the common symptoms of UB, and especially of UB arising from violation of the strict-aliasing rule.
As far as I am aware, similar applies to C++: you are allowed to use reinterpret_cast to convert unsigned long long * to int *, but UB results from attempting to derference the resulting pointer.

Feature, bug or UB? The expected infinite loop in my code is deleted when using gcc -O2

The code is the following:
#include <cstdint>
#include <iostream>
using u64 = std::uint64_t;
u64 *test() {
u64 *a, *p;
p = (u64 *)&a;
a = (u64 *)&p;
{
for (int i = 0; i < 100; ++i) {
p = new u64((u64)p);
}
}
while (true) {
if ((u64)p == 0) {
break;
}
p = (u64 *)*p;
}
return p;
}
int main() {
std::cout << test() << std::endl;
}
And the compiled asm of function test is the following:
test():
xor eax, eax
ret
You can see https://godbolt.org/z/8eTd8WMzG.
In fact, it's expected when the final stmt is return a; although the compiler tells a warning about retuning a local address. And if I make a and p being global variables, everything is ok, see https://godbolt.org/z/n7YWzGvd5.
So, I think that maybe I face some ubs so that its behavior not match my expectation?
The instructions p = (u64 *)&a; and a = (u64 *)&p; followed by assignments and the dereferencing of the variables break the strict aliasing rule resulting in a undefined behaviour. Indeed, p and a are of type u64* while &a and &p are of type u64**. Moreover, p = (u64 *)*p; is a perfect example of instruction breaking the strict aliasing rule: u64**, u64* and u64 are three distinct different types.
If you want to solve this, you first need to check the size and the alignment of the types match (it should be fine on a mainstream 64-bit architecture). Moreover, you should use a std::bit_cast or a memcpy to perform the conversions (see this related post).
Moreover, note that infinite loops without side-effects are undefined behaviour too. Since p cannot be null in your case. A compiler detecting that your loop is infinite and does not have any side effect can just remove it (or can generate a wrong code too).

Undefined behaviour observed in C++/memory allocation

#include <iostream>
using namespace std;
int main()
{
int a=50;
int b=50;
int *ptr = &b;
ptr++;
*ptr = 40;
cout<<"a= "<<a<<" b= "<<b<<endl;
cout<<"address a "<<&a<<" address b= "<<&b<<endl;
return 0;
}
The above code prints :
a= 50 b= 50
address a 0x7ffdd7b1b710 address b= 0x7ffdd7b1b714
Whereas when I remove the following line from the above code
cout<<"address a "<<&a<<" address b= "<<&b<<endl;
I get output as
a= 40 b= 50
My understanding was that the stack grows downwards, so the second answers seems to be the correct one. I am not able to understand why the print statement would mess up the memory layout.
EDIT:
I forgot to mention, I am using 64 bit x86 machine, with OS as ubuntu 14.04 and gcc version 4.8.4
First of all, it's all undefined behavior. The C++ standard says that you can increment pointers only as long as you are in array boundaries (plus one element after), with some more exceptions for standard layout classes, but that's about it. So, in general, snooping around with pointers is uncharted territory.
Coming to your actual code: since you are never asking for its address, probably the compiler either just left a in a register, or even straight propagated it as a constant throughout the code. For this reason, a never touches the stack, and you cannot corrupt it using the pointer.
Notice anyhow that the compiler isn't restricted to push/pop variables on the stack in the order of their declaration - they are reordered in whatever order they seem fit, and actually they can even move in the stack frame (or be replaced) throughout the function - and a seemingly small change in the function may make the compiler to alter completely the stack layout. So, even comparing the addresses as you did says nothing about the direction of stack growth.
UB - You have taken a pointer to b, you move that pointer ptr++ which means you are pointing to some unknown, un-assigned memory and you try to write on that memory region, which will cause an Undefined Behavior.
On VS 2008, debugging it step-by-step will throw this message for you which is very self-explanatory::

issue related to const and pointers

I have written 2 programs. Please go through both the programs and help me in understanding why variable 'i' and '*ptr' giving different values.
//Program I:
//Assumption: Address of i = 100, address of ptr = 500
int i = 5;
int *ptr = (int *) &i;
*ptr = 99;
cout<<i; // 99
cout<<&i;// 100
cout<<ptr; // 100
cout<<*ptr; // 99
cout<<&ptr; // 500
//END_Program_I===============
//Program II:
//Assumption: Address of i = 100, address of ptr = 500
const int i = 5;
int *ptr = (int *) &i;
*ptr = 99;
cout<<i; // 5
cout<<&i;// 100
cout<<ptr; // 100
cout<<*ptr; // 99
cout<<&ptr; // 500
//END_PROGRAM_II===============
The confusion is: Why variable i still coming as 5, even though *ptr ==99?
In the following three lines, you are modifying a constant:
const int i = 5;
int *ptr = (int *) &i;
*ptr = 99;
This is undefined behavior. Anything can happen. So don't do it.
As for what's happening underneath in this particular case:
Since i is const, the compiler assumes it will not change. Therefore, it simply inlines the 5 to each place where it is used. That's why printing out i shows the original value of 5.
All answer will probably talk about "undefined behavior", since you are attempting the logical nonsense of modifying a constant.
Although this is technically perfect, let me give you some hints about why this happens (about "how", see Mysticial answer).
It happens because C++ is by design an "imperfectly specified language". The "imperfection" consist in a number of "undefined behaviors" that pervade the language specification.
In fact, language designers deliberately choose that -in some circumstances- instead of say "if you do this, will gave you that", (that may be: you got this code, or you got this error) thay prefer to say "we don't define what will happen".
This lets the compiler manufacturers free to decide what to do. And since there are many compiler working on many platforms, may be the optimal solution for one in not necessarily the optimal solution for another (that may have rely to a machine with a different instruction set) and hence you (as a programmer) are left in the dramatic situation that you'll never know what to expect, and even if you test it, you cannot trust the result of the test, since in another situation (compiling the same code with a different compiler or just a different version of it, or for a different platform) it will be different.
The "bad" thing, here, is that a compiler should warn when an undefined behavior is hit (forcing a const should be warned as a potential bug, especially if the compiler does const-inlining otimizations, since it is a nonsense if a const is allowed to be changed), as mot likely it does, if you specify the proper flag (may be -W4 or -wall or -pedantic or similar, depending of the compiler you have).
In particular the line
int *ptr = (int *) &i;
should issue a warning like:
warning: removing cv-qualifier from &i.
So that, if you correct your program as
const int *ptr = (const int *) &i;
to satisfy the waarning, you wil get an error at
*ptr = 99;
as
error: *ptr is const
thus making the problem evident.
Moral of the story:
From a legal point of view, you wrote bad code since it is -by language definition- relying on undefined behavior.
From a moral point of view: the compiler kept an unfair behavior: performing const-inlining (replacing cout << i with cout << 5) after accepting (int*)&i is a self-contradition, and incoherent behavior should at least be warned.
If it wants to do one thing must not accept the other, or vice-versa.
So check if there is a flag you can set to be warned, and if not, report to the compiler manufacturer its unfairness: it didn't warn about its own contradiction.
const int i = 5;
Implies that the variable i is a const and it cannot/should not be changed, it is Imuttable and changing it through a pointer results in Undefined Behavior.
An Undefined Behavior means that the program is ill-formed and any behavior is possible. Your program might seem to work as desired, or not or it might even crash. All safe bets are off.
Remember the Rule:
It is Undefined Behavior to modify an const variable. Don't ever do it.
You're attempting to modify a constant through a pointer, which is undefined. This means anything unexpected can happen, from the correct output, to the wrong output, to the program crashing.

Different behavior of shift operator with -O2 and without

Without -O2 this code prints 84 84, with O2 flag the output is 84 42. The code was compiled using gcc 4.4.3. on 64-bit Linux platform. Why the output for the following code is different?
Note that when compiled with -Os the output is 0 42
#include <iostream>
using namespace std;
int main() {
long long n = 42;
int *p = (int *)&n;
*p <<= 1;
cout << *p << " " << n << endl;
return 0;
}
When you use optimization with gcc, it can use certain assumptions based on the type of expressions to avoid repeating unnecessary reads and to allow retaining variables in memory.
Your code has undefined behaviour because you cast a pointer to a long long (which gcc allows as an extenstion) to a pointer to an int and then manipulate the pointed-to-object as if it were an int. A pointer-to-int cannot normally point to an object of type long long so gcc is allowed to assume that an operation that writes to an int (via a pointer) won't affect an object that has type long long.
It is therefore legitimate of it to cache the value of n between the time it was originally assigned and the time at which it is subsequently printed. No valid write operation could have changed its value.
The particular switch and documentation to read is -fstrict-aliasing.
You're breaking strict aliasing. Compiling with -Wall should give you a dereferencing type-punned pointer warning. See e.g. http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
I get the same results with GCC 4.4.4 on Linux/i386.
The program's behavior is undefined, since it violates the strict aliasing rule.