Reference to a temporary variable - why doesn't compiler detect it? - c++

I hope this is not a duplicate, I've read a number of related questions but no one seemed to cover this case:
#include <iostream>
int* return_dangling_p()
{
int x = 1;
return &x; // warning: address of local variable 'x' returned
}
void some_func()
{
int x = 2;
}
int main(int argc, char** argv)
{
// UB 1
int* p = return_dangling_p();
std::cout << *p; // 1
some_func();
std::cout << *p; // 2, some_func() wrote over the memory
// UB 2
if (true) {
int x = 3;
p = &x; // why does compiler not warn about this?
}
std::cout << *p; // 3
if (true) {
int x = 4;
}
std::cout << *p; // 3, why not 4?
return 0;
}
I thought these are two cases of the same undefined behaviour. The output is 1233 while I (naively?) expected 1234.
So my question is: why doesn't compiler complain in the second case and why the stack isn't rewritten like in the case of 12? Am I missing something?
(MinGW 4.5.2, -Wall -Wextra -pedantic)
EDIT: I'm aware that it's pointless to discuss outputs of UB. My main concern was if there's any deeper reason to why one is detected by the compiler and the other isn't.

I'm not sure why the compiler doesn't complain. I suppose it's not a very common use-case, so the compiler authors didn't think to add a warning for it.
You can't infer anything useful about behaviour you observe when you are invoking undefined behaviour. The final output could have been 3, it could have been 4, or it could have been something else.
[If you want an explanation, I suggest look at the assembler that the compiler produced. If I had to guess, I'd say that the compiler optimised the final if (true) { ... } away entirely.]

why doesn't compiler complain in the second case
I am not sure. I suppose it could.
why the memory isn't rewritten like in the case of 12
It's undefined behaviour. Anything can happen.
Read on if you're really curious...
When I compile your code as-is, my compiler (g++ 4.4.3) places the two x variables in UB 2 at different locations on the stack (I've verified this by looking at the disassembly). Therefore they don't clash, and your code also prints out 1233 here.
However, the moment I take the address of the second x, the compiler suddenly decides to place it at the same address as the first x, so the output changes to 1234.
if (true) {
int x = 4; // 3, why not 4?
&x;
}
Now, this is what happens when I compile without any optimization options. I haven't experimented with optimizations (in your version of the code, there's no reason why int x = 4 can't be optimized away completely).
The wonders of undefined behaviour...

Related

Why is it that my second snippet below shows undefined behavior?

Both clang and g++ seem to be compliant with the last version of the paragraph [expr.const]/5 in the C++ Standard. The following snippet prints 11 for both compilers. See live example:
#include <iostream>
void f(void) {
static int n = 11;
static int* temp = &n;
static constexpr int *&&r = std::move(temp);
std::cout << *r << '\n';
}
int main()
{
f();
}
According to my understanding of this paragraph both compilers should print 2016 for the code below. But they don't. Therefore, I must conclude that the code shows undefined behavior, as clang prints an arbitrary number and g++ prints 0. I'd like to know why is it UB, taking into consideration, for example, the draft N4527 of the Standard? Live example.
#include <iostream>
void f(void) {
static int n = 11;
static int m = 2016;
static int* temp = &n + 1;
static constexpr int *&&r = std::move(temp);
std::cout << *r << '\n';
}
int main()
{
f();
}
Edit
I have a habit of not being satisfied with an answer that just says the code is UB, or shows undefined behavior. I always like to investigate a little deeper, and sometimes, as now, I happen to be lucky enough to understand a little bit more, how compilers are built. And that's what I found out in this case:
Both clang and GCC seem to eliminate any unused variable, like m, from the code, for any optimization level greater than -O0. GCC seems to order local variables with static storage duration, the same way variables are placed on the stack, i.e., from higher to lower addresses.
Thus, in clang, if we change the optimization level to -O0 we get the number 2016 printed as expected.
In GCC, if in addition to that, we also change the definition of
static int* temp = &n + 1;
to
static int* temp = &n - 1;
we will also get the number 2016 printed by the code.
I don't think there's anything subtle here. &n + 1 points one-past-the-end of the array-of-one as which you may consider the location n, and so it does not constitute a dereferenceable pointer, although it is a perfectly valid pointer. Thus temp and r are perfectly fine constexpr variables.
You could use r like this:
for (int * p = &n; p != r; ++p) { /* ... */ }
This loop could even appear in a constexpr function.
The behaviour is of course undefined when you attempt to dereference r, but that has nothing to do with constant expressions.
You've apparently expected that you can:
obtain a pointer to a static storage duration object
add one to it
get a pointer to the "next" static storage duration object (in declaration order)
This is nonsense.
You'd have to eschew all standard-backed guarantees, relying only on an unholy combination of UB and implementation documentation. Clearly you have crossed the UB threshold long before we ever even entertain discussions about constexpr and std::move, so I'm not sure what relevance they were intended to hold in this question.
Pointers are not "memory addresses" that you can use to navigate your declaration space.

Conversion from const int to int giving strange results.Can anyone explain the reason for the strange results

When I tried below code I got strange results.I am trying to change value of constant by using the pointers.But when I output the results pointer value and the original variable variable value its giving two different values.Can anyone explain what exactly happens when explicit conversion take place?
int main()
{
int *p ;
const int a = 20;
p=(int *)&a;
*p = *p +10;
cout<<"p is"<<*p<<"\na is"<<a;
}
output:
p is 30
a is 20
Both C and C++ say that any attempt to modify an object declared with the const qualifier results in undefined behavior.
So as a is object is const qualified, the *p = *p +10; statement invokes undefined behavior.
First of - You really shouldn't be doing this. const is a constant, meaning don't change it! :)
Now to explain what happens (I think):
The space on the stack is allocated for both variables, p and a. This is done for a because it has been referenced by an address. If you removed p, you'd effectively remove a as well.
The number 20 is indeed written to the a variable, and modified to 30 via p, which is what is being printed.
The 20 printed is calculated at compile time. Since it is a const, the compiler optimized it away and replaced with 20, as if you did a #define a 20.
Don't Do That.
If you would write this code in C++ with an explicit cast, you would get something like this:
int main()
{
int *p ;
const int a = 20;
p= const_cast<int*>(&a); // the change
*p = *p +10;
cout<<"p is"<<*p<<"\na is"<<a;
}
Now, this code tells a bit more about what's going on: the constant is cast to a non-constant.
If you are writing a compiler, constants are special variables that are allowed to be 'folded' in the const folding phase. Basically this means that the compiler is allowed to change your code into this:
int main()
{
int *p ;
const int a = 20;
p= const_cast<int*>(&a);
*p = *p +10;
cout<<"p is"<<*p<<"\na is" << 20; // const fold
}
Because you're also using &a, you tell the compiler to put the value 20 in a memory location. Combined with the above, you get the exact results you describe.
This is undefined behavior.
A compiler can assume that nothing is going to change the value of a const object. The compiler knows that the value of "a" is 20. You told the compiler that. So, the compiler actually goes ahead and simply compiles the equivalent of
cout << "p is" << *p << "\na is" << 20;
Your compiler should've also given you a big fat warning, about "casting away const-ness", or something along the same lines, when it tried to compile your code.
Although it is defined as undefined behaviour (as everyone else tells you), it could be that your compiler has allocated a storage location (int) for the const int; that is why the *p= *p + 10 works, but may have repaced a in the output statement with the value 20, as it is supposed to be constant.

Function of type int not using return C++

If I have a function like this:
int addNumbers(int x, int y)
{
return x + y;
}
and if I use it as such:
cout << addNumbers(4, 5) << endl;
It will return and print 9. Using the same cout line above, if I comment out or delete the return in addNumbers, it will return and print 1. If I do this:
int addNumbers(int x, int y)
{
int answer = x + y;
//return x + y;
}
It will automatically return and print 9, without me using return. Similarly, I can write int answer = x; and it will return 4. I can also write this:
int addNumbers(int x, int y)
{
int answer = x;
answer = 1;
//return x + y;
}
and it will still return 4.
What exactly is returned and why? It only returns something other than 1 when I use the parameter variables, but it isn't returning the variable answer as shown in the last example because I changed it to 1 and it still returned the value of x (4).
ยง6.6.3 [stmt.return]/p2:
Flowing off the end of a function is equivalent to a return with no
value; this results in undefined behavior in a value-returning
function.
(main() is a special exception. Flowing off the end of main() is equivalent to a return 0;)
Permissible UB include:
Returning what you "wanted" to return
Returning a garbage value instead
Crashing
Sending your password to hackers
Formatting your hard drive
Making your computer explode and blow your legs off
Conjuring nasal demons
Traveling back in time and fixing your program to the right thing
Creating a black hole
......
But seriously, UB can manifest in all sorts of ways. For instance, given this code:
#include <iostream>
bool foo = false;
int addNumbers(int x, int y)
{
int answer = x;
answer = 1;
//return x + y;
}
int main(){
if(!foo) {
addNumbers(10, 20);
std::cout << 1 << std::endl;
}
else {
std::cout << 2 << std::endl;
}
}
clang++ at -O2 prints 2.
Why? Because it deduced that addNumbers(10, 20); has undefined behavior, which allows it to assume that the first branch is never taken and that foo is always true, even though that's obviously not the case.
You are relying on "undefined behaviour". The return value is, for simple types, typically stored in a register, which may also be used in the formation of the result of the calculation. But it may also NOT be used, and you get some arbitrary "random" result, and being "undefined behaviour", you may also get any other possible operation that your computer may perform - such as crashing or executing some code you didn't want to execute...
You are observing undefined behavior. There is no good reason "why" the program does that, because it isn't a well-formed program. It could do anything, including delete itself from disk when run. Enable compiler warnings and errors (e.g. g++ -Wall -Wextra -Werror) and you will be automatically prevented from writing such code (as you should be).
Thus it is undefined behavior, disassembling your binary may explain why such values are returned.
objdump -d example.bin
Since the return value is associated with the rax registry, if the compiler uses rax to process the function, the returned value is the value that remains in rax.
Anyway, you shouldn't do this because compiler optimizations and use of registries isn't known when you write such code.

Why strange behavior with casting back pointer to the original class?

Assume that in my code I have to store a void* as data member and typecast it back to the original class pointer when needed. To test its reliability, I wrote a test program (linux ubuntu 4.4.1 g++ -04 -Wall) and I was shocked to see the behavior.
struct A
{
int i;
static int c;
A () : i(c++) { cout<<"A() : i("<<i<<")\n"; }
};
int A::c;
int main ()
{
void *p = new A[3]; // good behavior for A* p = new A[3];
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
}
This is just a test program; in actual for my case, it's mandatory to store any pointer as void* and then cast it back to the actual pointer (with help of template). So let's not worry about that part. The output of the above code is,
p->i = 0
p->i = 0 // ?? why not 1
p->i = 1
However if you change the void* p; to A* p; it gives expected behavior. WHY ?
Another question, I cannot get away with (A*&) otherwise I cannot use operator ++; but it also gives warning as, dereferencing type-punned pointer will break strict-aliasing rules. Is there any decent way to overcome warning ?
Well, as the compiler warns you, you are violating the strict aliasing rule, which formally means that the results are undefined.
You can eliminate the strict aliasing violation by using a function template for the increment:
template<typename T>
void advance_pointer_as(void*& p, int n = 1) {
T* p_a(static_cast<T*>(p));
p_a += n;
p = p_a;
}
With this function template, the following definition of main() yields the expected results on the Ideone compiler (and emits no warnings):
int main()
{
void* p = new A[3];
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
}
You have already received the correct answer and it is indeed the violation of the strict aliasing rule that leads to the unpredictable behavior of the code. I'd just note that the title of your question makes reference to "casting back pointer to the original class". In reality your code does not have anything to do with casting anything "back". Your code performs reinterpretation of raw memory content occupied by a void * pointer as a A * pointer. This is not "casting back". This is reinterpretation. Not even remotely the same thing.
A good way to illustrate the difference would be to use and int and float example. A float value declared and initialized as
float f = 2.0;
cab be cast (explicitly or implicitly converted) to int type
int i = (int) f;
with the expected result
assert(i == 2);
This is indeed a cast (a conversion).
Alternatively, the same float value can be also reinterpreted as an int value
int i = (int &) f;
However, in this case the value of i will be totally meaningless and generally unpredictable. I hope it is easy to see the difference between a conversion and a memory reinterpretation from these examples.
Reinterpretation is exactly what you are doing in your code. The (A *&) p expression is nothing else than a reinterpretation of raw memory occupied by pointer void *p as pointer of type A *. The language does not guarantee that these two pointer types have the same representation and even the same size. So, expecting the predictable behavior from your code is like expecting the above (int &) f expression to evaluate to 2.
The proper way to really "cast back" your void * pointer would be to do (A *) p, not (A *&) p. The result of (A *) p would indeed be the original pointer value, that can be safely manipulated by pointer arithmetic. The only proper way to obtain the original value as an lvalue would be to use an additional variable
A *pa = (A *) p;
...
pa++;
...
And there's no legal way to create an lvalue "in place", as you attempted to by your (A *&) p cast. The behavior of your code is an illustration of that.
As others have commented, your code appears like it should work. Only once (in 17+ years of coding in C++) I ran across something where I was looking straight at the code and the behavior, like in your case, just didn't make sense. I ended up running the code through debugger and opening a disassembly window. I found what could only be explained as a bug in VS2003 compiler because it was missing exactly one instruction. Simply rearranging local variables at the top of the function (30 lines or so from the error) made the compiler put the correct instruction back in. So try debugger with disassembly and follow memory/registers to see what it's actually doing?
As far as advancing the pointer, you should be able to advance it by doing:
p = (char*)p + sizeof( A );
VS2003 through VS2010 never give you complaints about that, not sure about g++

Contiguous memory guarantees with C++ function parameters

Appel [App02] very briefly mentions that C (and presumably C++) provide guarantees regarding the locations of actual parameters in contiguous memory as opposed to registers when the address-of operator is applied to one of formal parameters within the function block.
e.g.
void foo(int a, int b, int c, int d)
{
int* p = &a;
for(int k = 0; k < 4; k++)
{
std::cout << *p << " ";
p++;
}
std::cout << std::endl;
}
and an invocation such as...
foo(1,2,3,4);
will produce the following output "1 2 3 4"
My question is "How does this interact with calling conventions?"
For example __fastcall on GCC will try place the first two arguments in registers and the remainder on the stack. The two requirements are at odds with each other, is there any way to formally reason about what will happen or is it subject to the capricious nature of implementation defined behaviour?
[App02] Modern Compiler Implementation in Java, Andrew w. Appel, Chapter 6, Page 124
Update: I suppose that this question is answered. I think I was wrong to base the whole question on contiguous memory allocation when what I was looking for (and what the reference speaks of) is the apparent mismatch between the need for parameters being in memory due the use of address-of as opposed to in registers due to calling conventions, maybe that is a question for another day.
Someone on the internet is wrong and sometimes that someone is me.
First of all your code doesn't always produce 1, 2, 3, 4. Just check this one: http://ideone.com/ohtt0
Correct code is at least like this:
void foo(int a, int b, int c, int d)
{
int* p = &a;
for (int i = 0; i < 4; i++)
{
std::cout << *p;
p++;
}
}
So now let's try with fastcall, here:
void __attribute__((fastcall)) foo(int a, int b, int c, int d)
{
int* p = &a;
for (int i = 0; i < 4; i++)
{
std::cout << *p << " ";
p++;
}
}
int main()
{
foo(1,2,3,4);
}
Result is messy: 1 -1216913420 134514560 134514524
So I really doubt that something can be guaranteed here.
There is nothing in the standard about calling conventions or how parameters are passed.
It is true that if you take the address of one variable (or parameter) that one has to be stored in memory. It doesn't say that the value cannot be passed in a register and then stored to memory when its address is taken.
It definitely doesn't affect other variables, who's addresses are not taken.
The C++ standard has no concept of a calling convention. That's left for the compiler to deal with.
In this case, if the standard requires that parameters be contiguous when the address-of operator is applied, there's a conflict between what the standard requires of your compiler and what you require of it.
It's up to the compiler to decide what to do. I'd think most compilers would give your requirements priority over the standard's, however.
Your basic assumption is flawed. On my machine, foo(1,2,3,4) using your code prints out:
1 -680135568 32767 4196336
Using g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3 on 64-bit x86.