Why is it that my second snippet below shows undefined behavior? - c++

Both clang and g++ seem to be compliant with the last version of the paragraph [expr.const]/5 in the C++ Standard. The following snippet prints 11 for both compilers. See live example:
#include <iostream>
void f(void) {
static int n = 11;
static int* temp = &n;
static constexpr int *&&r = std::move(temp);
std::cout << *r << '\n';
}
int main()
{
f();
}
According to my understanding of this paragraph both compilers should print 2016 for the code below. But they don't. Therefore, I must conclude that the code shows undefined behavior, as clang prints an arbitrary number and g++ prints 0. I'd like to know why is it UB, taking into consideration, for example, the draft N4527 of the Standard? Live example.
#include <iostream>
void f(void) {
static int n = 11;
static int m = 2016;
static int* temp = &n + 1;
static constexpr int *&&r = std::move(temp);
std::cout << *r << '\n';
}
int main()
{
f();
}
Edit
I have a habit of not being satisfied with an answer that just says the code is UB, or shows undefined behavior. I always like to investigate a little deeper, and sometimes, as now, I happen to be lucky enough to understand a little bit more, how compilers are built. And that's what I found out in this case:
Both clang and GCC seem to eliminate any unused variable, like m, from the code, for any optimization level greater than -O0. GCC seems to order local variables with static storage duration, the same way variables are placed on the stack, i.e., from higher to lower addresses.
Thus, in clang, if we change the optimization level to -O0 we get the number 2016 printed as expected.
In GCC, if in addition to that, we also change the definition of
static int* temp = &n + 1;
to
static int* temp = &n - 1;
we will also get the number 2016 printed by the code.

I don't think there's anything subtle here. &n + 1 points one-past-the-end of the array-of-one as which you may consider the location n, and so it does not constitute a dereferenceable pointer, although it is a perfectly valid pointer. Thus temp and r are perfectly fine constexpr variables.
You could use r like this:
for (int * p = &n; p != r; ++p) { /* ... */ }
This loop could even appear in a constexpr function.
The behaviour is of course undefined when you attempt to dereference r, but that has nothing to do with constant expressions.

You've apparently expected that you can:
obtain a pointer to a static storage duration object
add one to it
get a pointer to the "next" static storage duration object (in declaration order)
This is nonsense.
You'd have to eschew all standard-backed guarantees, relying only on an unholy combination of UB and implementation documentation. Clearly you have crossed the UB threshold long before we ever even entertain discussions about constexpr and std::move, so I'm not sure what relevance they were intended to hold in this question.
Pointers are not "memory addresses" that you can use to navigate your declaration space.

Related

Is the address of a local variable a constexpr?

In Bjarne Stroustrup's book "The C++ Programming Language (4th Edition)" on p. 267 (Section 10.4.5 Address Constant Expressions), he uses a code example where the address of a local variable is set to a constexpr variable. I thought this looked odd, so I tried running the example with g++ version 7.3.0 and was unable to get the same results. Here is his code example verbatim (although slightly abridged):
extern char glob;
void f(char loc) {
constexpr const char* p0 = &glob; // OK: &glob's is a constant
constexpr const char* p2 = &loc; // OK: &loc is constant in its scope
}
When I run this, I get:
error: ‘(const char*)(& loc)’ is not a constant expression
Is something happening with g++ that I'm not aware of, or is there something more to Bjarne's example?
An earlier printing of Bjarne Stroustrup's book "The C++ Programming Language (4th Edition)" on p. 267 has the error outlined in the OP's question. The current printing and electronic copies have been "corrected" but introduced another error described later. It now refers to the following code:
constexpr const char* p1="asdf";
This is OK because "asdf" is stored in a fixed memory location. In the earlier printing the book errs here:
void f(char loc) {
constexpr const char* p0 = &glob; // OK: &glob's is a constant
constexpr const char* p2 = &loc; // OK: &loc is constant in its scope
}
However, loc is not in a fixed memory location. it's on the stack and will have varying locations depending on when it is called.
However, the current 4th edition printing has another error. This is the code verbatim from 10.5.4:
int main() {
constexpr const char* p1 = "asdf";
constexpr const char* p2 = p1; // OK
constexpr const char* p3 = p1+2; // error: the compiler does not know the value of p1
}
This is wrong. The compiler/linker does know the value of p1 and can determine the value of p1+2 at link time. It compiles just fine.
It appears that the example from section 10.4.5 provided in my hard-copy of the "The C++ Programming Language (4th Edition)" is incorrect. And so I've concluded that the address of a local variable is not a constexpr.
The example appears to have been updated in some pdf versions as seen here:
This answer tries to clarify why the address of a local variable can't be constexpr by analysing an example for the x86-64 architecture.
Consider the following toy function print_addr(), which displays the address of its local variable local_var and call itself recursively n times:
void print_addr(int n) {
int local_var{};
std::cout << n << " " << &local_var << '\n';
if (!n)
return; // base case
print_addr(n-1); // recursive case
}
A call to print_addr(2) produced the following output on my x86-64 system:
2 0x7ffd89e2cd8c
1 0x7ffd89e2cd5c
0 0x7ffd89e2cd2c
As you can see, the corresponding addresses of local_var are different for each call to print_addr(). You can also see that the deeper the function call, the lower the address of the local variable local_var. This is because the stack grows downwards (i.e., from higher to lower addresses) on the x86-64 platform.
For the output above, the call stack would look like the following on the x86-64 platform:
| . . . |
Highest address ----------------- <-- call to print_addr(2)
| print_addr(2) |
-----------------
| print_addr(1) |
-----------------
| print_addr(0) | <-- base case, end of recursion
Lowest address ----------------- Top of the stack
Each rectangle above represents the stack frame for each call to print_addr(). The local_var of each call is located in its corresponding stack frame. Since the local_var of each call to print_addr() is located in its own (different) stack frame, the addresses of local_var differ.
To conclude, since the address of a local variable in a function may not be the same for every call to the function (i.e., each call's stack frame may be located in a different position in memory), the address of such a variable can't be determined at compile time, and therefore can't be qualified as constexpr.
Just to add to other answers that have pointed out the mistake, C++ standard only allows constexpr pointers to objects of static-storage duration, one past the end of such, or nullptr. See [expr.const/8] specifically #8.2;
It's worth noting that:
string-literals have static-storage duration:
Based on constraints in declaring extern variables, they'll inherently have static-storage duration or thread local-storage duration.
Hence this is valid:
#include <string>
extern char glob;
std::string boom = "Haha";
void f(char loc) {
constexpr const char* p1 = &glob;
constexpr std::string* p2 = nullptr;
constexpr std::string* p3 = &boom;
}

Conversion from const int to int giving strange results.Can anyone explain the reason for the strange results

When I tried below code I got strange results.I am trying to change value of constant by using the pointers.But when I output the results pointer value and the original variable variable value its giving two different values.Can anyone explain what exactly happens when explicit conversion take place?
int main()
{
int *p ;
const int a = 20;
p=(int *)&a;
*p = *p +10;
cout<<"p is"<<*p<<"\na is"<<a;
}
output:
p is 30
a is 20
Both C and C++ say that any attempt to modify an object declared with the const qualifier results in undefined behavior.
So as a is object is const qualified, the *p = *p +10; statement invokes undefined behavior.
First of - You really shouldn't be doing this. const is a constant, meaning don't change it! :)
Now to explain what happens (I think):
The space on the stack is allocated for both variables, p and a. This is done for a because it has been referenced by an address. If you removed p, you'd effectively remove a as well.
The number 20 is indeed written to the a variable, and modified to 30 via p, which is what is being printed.
The 20 printed is calculated at compile time. Since it is a const, the compiler optimized it away and replaced with 20, as if you did a #define a 20.
Don't Do That.
If you would write this code in C++ with an explicit cast, you would get something like this:
int main()
{
int *p ;
const int a = 20;
p= const_cast<int*>(&a); // the change
*p = *p +10;
cout<<"p is"<<*p<<"\na is"<<a;
}
Now, this code tells a bit more about what's going on: the constant is cast to a non-constant.
If you are writing a compiler, constants are special variables that are allowed to be 'folded' in the const folding phase. Basically this means that the compiler is allowed to change your code into this:
int main()
{
int *p ;
const int a = 20;
p= const_cast<int*>(&a);
*p = *p +10;
cout<<"p is"<<*p<<"\na is" << 20; // const fold
}
Because you're also using &a, you tell the compiler to put the value 20 in a memory location. Combined with the above, you get the exact results you describe.
This is undefined behavior.
A compiler can assume that nothing is going to change the value of a const object. The compiler knows that the value of "a" is 20. You told the compiler that. So, the compiler actually goes ahead and simply compiles the equivalent of
cout << "p is" << *p << "\na is" << 20;
Your compiler should've also given you a big fat warning, about "casting away const-ness", or something along the same lines, when it tried to compile your code.
Although it is defined as undefined behaviour (as everyone else tells you), it could be that your compiler has allocated a storage location (int) for the const int; that is why the *p= *p + 10 works, but may have repaced a in the output statement with the value 20, as it is supposed to be constant.

Reference to a temporary variable - why doesn't compiler detect it?

I hope this is not a duplicate, I've read a number of related questions but no one seemed to cover this case:
#include <iostream>
int* return_dangling_p()
{
int x = 1;
return &x; // warning: address of local variable 'x' returned
}
void some_func()
{
int x = 2;
}
int main(int argc, char** argv)
{
// UB 1
int* p = return_dangling_p();
std::cout << *p; // 1
some_func();
std::cout << *p; // 2, some_func() wrote over the memory
// UB 2
if (true) {
int x = 3;
p = &x; // why does compiler not warn about this?
}
std::cout << *p; // 3
if (true) {
int x = 4;
}
std::cout << *p; // 3, why not 4?
return 0;
}
I thought these are two cases of the same undefined behaviour. The output is 1233 while I (naively?) expected 1234.
So my question is: why doesn't compiler complain in the second case and why the stack isn't rewritten like in the case of 12? Am I missing something?
(MinGW 4.5.2, -Wall -Wextra -pedantic)
EDIT: I'm aware that it's pointless to discuss outputs of UB. My main concern was if there's any deeper reason to why one is detected by the compiler and the other isn't.
I'm not sure why the compiler doesn't complain. I suppose it's not a very common use-case, so the compiler authors didn't think to add a warning for it.
You can't infer anything useful about behaviour you observe when you are invoking undefined behaviour. The final output could have been 3, it could have been 4, or it could have been something else.
[If you want an explanation, I suggest look at the assembler that the compiler produced. If I had to guess, I'd say that the compiler optimised the final if (true) { ... } away entirely.]
why doesn't compiler complain in the second case
I am not sure. I suppose it could.
why the memory isn't rewritten like in the case of 12
It's undefined behaviour. Anything can happen.
Read on if you're really curious...
When I compile your code as-is, my compiler (g++ 4.4.3) places the two x variables in UB 2 at different locations on the stack (I've verified this by looking at the disassembly). Therefore they don't clash, and your code also prints out 1233 here.
However, the moment I take the address of the second x, the compiler suddenly decides to place it at the same address as the first x, so the output changes to 1234.
if (true) {
int x = 4; // 3, why not 4?
&x;
}
Now, this is what happens when I compile without any optimization options. I haven't experimented with optimizations (in your version of the code, there's no reason why int x = 4 can't be optimized away completely).
The wonders of undefined behaviour...

Why strange behavior with casting back pointer to the original class?

Assume that in my code I have to store a void* as data member and typecast it back to the original class pointer when needed. To test its reliability, I wrote a test program (linux ubuntu 4.4.1 g++ -04 -Wall) and I was shocked to see the behavior.
struct A
{
int i;
static int c;
A () : i(c++) { cout<<"A() : i("<<i<<")\n"; }
};
int A::c;
int main ()
{
void *p = new A[3]; // good behavior for A* p = new A[3];
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
((A*&)p)++;
cout<<"p->i = "<<((A*)p)->i<<endl;
}
This is just a test program; in actual for my case, it's mandatory to store any pointer as void* and then cast it back to the actual pointer (with help of template). So let's not worry about that part. The output of the above code is,
p->i = 0
p->i = 0 // ?? why not 1
p->i = 1
However if you change the void* p; to A* p; it gives expected behavior. WHY ?
Another question, I cannot get away with (A*&) otherwise I cannot use operator ++; but it also gives warning as, dereferencing type-punned pointer will break strict-aliasing rules. Is there any decent way to overcome warning ?
Well, as the compiler warns you, you are violating the strict aliasing rule, which formally means that the results are undefined.
You can eliminate the strict aliasing violation by using a function template for the increment:
template<typename T>
void advance_pointer_as(void*& p, int n = 1) {
T* p_a(static_cast<T*>(p));
p_a += n;
p = p_a;
}
With this function template, the following definition of main() yields the expected results on the Ideone compiler (and emits no warnings):
int main()
{
void* p = new A[3];
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
advance_pointer_as<A>(p);
std::cout << "p->i = " << static_cast<A*>(p)->i << std::endl;
}
You have already received the correct answer and it is indeed the violation of the strict aliasing rule that leads to the unpredictable behavior of the code. I'd just note that the title of your question makes reference to "casting back pointer to the original class". In reality your code does not have anything to do with casting anything "back". Your code performs reinterpretation of raw memory content occupied by a void * pointer as a A * pointer. This is not "casting back". This is reinterpretation. Not even remotely the same thing.
A good way to illustrate the difference would be to use and int and float example. A float value declared and initialized as
float f = 2.0;
cab be cast (explicitly or implicitly converted) to int type
int i = (int) f;
with the expected result
assert(i == 2);
This is indeed a cast (a conversion).
Alternatively, the same float value can be also reinterpreted as an int value
int i = (int &) f;
However, in this case the value of i will be totally meaningless and generally unpredictable. I hope it is easy to see the difference between a conversion and a memory reinterpretation from these examples.
Reinterpretation is exactly what you are doing in your code. The (A *&) p expression is nothing else than a reinterpretation of raw memory occupied by pointer void *p as pointer of type A *. The language does not guarantee that these two pointer types have the same representation and even the same size. So, expecting the predictable behavior from your code is like expecting the above (int &) f expression to evaluate to 2.
The proper way to really "cast back" your void * pointer would be to do (A *) p, not (A *&) p. The result of (A *) p would indeed be the original pointer value, that can be safely manipulated by pointer arithmetic. The only proper way to obtain the original value as an lvalue would be to use an additional variable
A *pa = (A *) p;
...
pa++;
...
And there's no legal way to create an lvalue "in place", as you attempted to by your (A *&) p cast. The behavior of your code is an illustration of that.
As others have commented, your code appears like it should work. Only once (in 17+ years of coding in C++) I ran across something where I was looking straight at the code and the behavior, like in your case, just didn't make sense. I ended up running the code through debugger and opening a disassembly window. I found what could only be explained as a bug in VS2003 compiler because it was missing exactly one instruction. Simply rearranging local variables at the top of the function (30 lines or so from the error) made the compiler put the correct instruction back in. So try debugger with disassembly and follow memory/registers to see what it's actually doing?
As far as advancing the pointer, you should be able to advance it by doing:
p = (char*)p + sizeof( A );
VS2003 through VS2010 never give you complaints about that, not sure about g++

Contiguous memory guarantees with C++ function parameters

Appel [App02] very briefly mentions that C (and presumably C++) provide guarantees regarding the locations of actual parameters in contiguous memory as opposed to registers when the address-of operator is applied to one of formal parameters within the function block.
e.g.
void foo(int a, int b, int c, int d)
{
int* p = &a;
for(int k = 0; k < 4; k++)
{
std::cout << *p << " ";
p++;
}
std::cout << std::endl;
}
and an invocation such as...
foo(1,2,3,4);
will produce the following output "1 2 3 4"
My question is "How does this interact with calling conventions?"
For example __fastcall on GCC will try place the first two arguments in registers and the remainder on the stack. The two requirements are at odds with each other, is there any way to formally reason about what will happen or is it subject to the capricious nature of implementation defined behaviour?
[App02] Modern Compiler Implementation in Java, Andrew w. Appel, Chapter 6, Page 124
Update: I suppose that this question is answered. I think I was wrong to base the whole question on contiguous memory allocation when what I was looking for (and what the reference speaks of) is the apparent mismatch between the need for parameters being in memory due the use of address-of as opposed to in registers due to calling conventions, maybe that is a question for another day.
Someone on the internet is wrong and sometimes that someone is me.
First of all your code doesn't always produce 1, 2, 3, 4. Just check this one: http://ideone.com/ohtt0
Correct code is at least like this:
void foo(int a, int b, int c, int d)
{
int* p = &a;
for (int i = 0; i < 4; i++)
{
std::cout << *p;
p++;
}
}
So now let's try with fastcall, here:
void __attribute__((fastcall)) foo(int a, int b, int c, int d)
{
int* p = &a;
for (int i = 0; i < 4; i++)
{
std::cout << *p << " ";
p++;
}
}
int main()
{
foo(1,2,3,4);
}
Result is messy: 1 -1216913420 134514560 134514524
So I really doubt that something can be guaranteed here.
There is nothing in the standard about calling conventions or how parameters are passed.
It is true that if you take the address of one variable (or parameter) that one has to be stored in memory. It doesn't say that the value cannot be passed in a register and then stored to memory when its address is taken.
It definitely doesn't affect other variables, who's addresses are not taken.
The C++ standard has no concept of a calling convention. That's left for the compiler to deal with.
In this case, if the standard requires that parameters be contiguous when the address-of operator is applied, there's a conflict between what the standard requires of your compiler and what you require of it.
It's up to the compiler to decide what to do. I'd think most compilers would give your requirements priority over the standard's, however.
Your basic assumption is flawed. On my machine, foo(1,2,3,4) using your code prints out:
1 -680135568 32767 4196336
Using g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3 on 64-bit x86.