Pointer arithmetic ignored by the compiler - c++

I'm compiling the following with -O0 (recent gcc/clang) and they both give me a answer I don't expect.
#include <iostream>
struct xy{
int x,y;
};
int main()
{
xy a{1,2};
int x{1};
int y{2};
int *ptr1=&a.x;
int *ptr2=&x;
ptr1++; // I now point to a.y!
(*ptr1)++; // I now incremented a.y to 3
ptr2++; // I now point to y!
(*ptr2)++; // I now incremented y to 3
std::cout << "a.y=" << a.y << " ptr1=" << *ptr1 << '\n';
std::cout << "y= " << y << " ptr2=" << *ptr2 << '\n';
}
Output:
a.y=3 ptr1=3
y= 2 ptr2=2
So this access with pointers to non-class variables is being optimized-out by the compiler.
I also tried to mark the int and int* as volatile, but it didn't make any difference.
What part of the standard am I missing / why is the compiler allowed to do this?
Coliru snippet at: http://coliru.stacked-crooked.com/a/ed0757a6621c37a9

In the first case dealing with class members the part you are ignoring is the compiler is allowed to add any amount of padding in between members of a object and at the end of the object. Because of this increment a pointer to one member does not have to give you the next member.
The second part of the standard you are missing is it is illegal to access memory though a pointer to what it doesn't point to. Even though y might be there in memory the pointer is not allowed to access it. It is allowed to access x and it is allowed to compare to see if it one past x but it cannot dereference that one past x address.

Pointer arithmetic is only valid in arrays. You cannot reach y by incementing a pointer to x. The behaviour of your program is undefined. Your statement
ptr1++; // I now point to a.y!
is simply wrong. Remember that a compiler is allowed to insert an arbitrary amount of padding between the elements in your struct.
In more detail, you can set a pointer to one past the address of a scalar, but you are not allowed to dereference it.

Related

Why a reference declaration influences my pointer?

Case 1
#include <iostream>
using namespace std;
int main() {
int n = 1;
// int & r = n;
int * p;
cout << &n << endl;
cout << p << endl;
return 0;
}
Output 1
0x7fffffffdc94
0
Case 2
#include <iostream>
using namespace std;
int main() {
int n = 1;
int & r = n;
int * p;
cout << &n << endl;
cout << p << endl;
return 0;
}
Output 2
0x7fffffffdc8c
0x7fffffffdd90
In Output 2, the pointer p just pointed to the address followed int n. Shouldn't an unintialized pointer point to some random places?
Why adding a reference declaration influences the address of p pointed to?
Shouldn't an unintialized pointer point to some random places?
No, an uninitialized pointer points nowhere. It has an indeterminate value. Trying to read and/or print this indeterminate pointer value as you are doing in
cout << p << endl;
has undefined behavior. That means there is no guarantee whatsoever what will happen, whether there will be output or not or whatever the output will be.
Therefore, there is no guarantee that changing any other part of the code doesn't influence the output you will get. And there is also no guarantee that the output will be consistent in any way, even between runs of the same compiled program or multiple compilations of the same code.
Practically, the most likely behavior is that the compiler will emit instructions that will simply print the value of whatever memory was reserved on the stack for the pointer. If there wasn't anything there before, it will likely result in a zero value. If there was something there before it will give you that unrelated value.
None of this is guaranteed however. The compiler can also just substitute the read of the pointer value with whatever suits it or e.g. recognize that reading the value has undefined behavior and that it may therefore remove the output statement since it can not be ever reached in a valid program.

C++ Pointers break when changing a pointer

When I change a pointer in a Union, my other pointers break and show invalid pointer.
CustomDataTypeExample Class:
struct CustomDataTypeExample {
float x;
float y;
float z;
CustomDataTypeExample() = default;
CustomDataTypeExample(float x, float y, float z) {
this->x = x;
this->y = y;
this->z = z;
};
// ...
};
ConfigCustomDataTypeExample class:
struct ConfigCustomDataTypeExample {
public:
ConfigCustomDataTypeExample() = default;
ConfigCustomDataTypeExample(CustomDataTypeExample values) {
x = &values.x;
y = &values.y;
z = &values.z;
}
union {
struct {
CustomDataTypeExample* ex;
};
struct {
float* x;
float* y;
float* z;
};
};
};
main:
ConfigCustomDataTypeExample example({ 1.2f,3.4f,5.6f });
float value = 565;
example.x = &value;
std::cout << example.ex->x << ", " << example.ex->y << ", " << example.ex->z << "\n";
std::cout << *example.x << ", " << *example.y << ", " << *example.z << "\n";
Output:
565, -1.07374e+08, -1.07374e+08
565, 3.4, 5.6
What exactly is happening? If I dont change the example.x to point to something else it would work just fine otherwise if i change it then it will ruin the other pointers.
TL;DR: Three different kinds of undefined behaviour: lifetime issue, accessing a non-active member of an union (without non-standard extensions) and dereferencing an invalid pointer value through the members of example.ex (a misunderstanding of the what the declared union represented).
Looks like you could do with using plain references. The full solution is described at the end.
Deeper analysis
This is actually a really interesting problem as there is so much going on here! Three different kinds of undefined behavior. Let's go over these piece by piece.
First, like mentioned in the comments, you are assigning the address of the parameter values to x, y and z (addresses of the members). The parameter values has an automatic storage duration, which means it gets destructed at the end the constructor for ConfigCustomDataTypeExample.
struct ConfigCustomDataTypeExample {
public:
ConfigCustomDataTypeExample() = default;
ConfigCustomDataTypeExample(CustomDataTypeExample values) {
x = &values.x;
y = &values.y;
z = &values.z;
} // Pass this line x, y and z store invalid pointer values
// (addresses to now destructed members of values).
// Any indirection through these pointers is undefined behavior.
...
With your program you were still able to read the values of y and z. This is the essence of undefined behaviour: you might sometimes get sensible results, but nothing is guaranteed. For example when I tried to run your program, I got wildly different results for y and z. This was the first clear UB. Let's examine the declaration of the union next to understand what it really represents.
A class is a type that consist of a sequence of members. Union is a special type of class that can hold at most one of its non-static data members at a time. The currently held object for an union is called the active member. This implies that an union is only as big as its largest data member, which is useful if memory usage is a concern.
union {
struct {
CustomDataTypeExample* ex;
};
struct {
float* x;
float* y;
float* z;
};
};
For this union the members are the two anonymous structs (note that anonymous structs are prohibited by the C++ standard). The size of the union is determined by the largest struct, which is is the float* struct. For a 64-bit system a the size of a pointer type is commonly 8 bytes, thus for a 64-bit system the size of this union is 24 bytes.
What comes to the usage of the union, you are clearly not utilizing the union for the purpose of reducing memory consumption. Instead, you are trying to do something called type punning. Type punning is when you try interpret a binary representation of a type as another type. According to C++ standard type punning with unions is undefined behavior (second), albeit many compilers provide non-standard extensions that allow this. Let's analyze your main program according to the standard rules:
ConfigCustomDataTypeExample example({1.2f, 3.4f, 5.6f});
// The anonymous struct holding 3 float* is now the active member.
// Though, all of the pointers are invalid, as already mentioned.
float value = 565;
example.x = &value;
// example.x is now a valid ptr value
std::cout
<< example.ex->x << ", " // UB: Accessing a non-active member
<< example.ex->y << ", " // UB: non-active and invalid ptr (more on that later)
<< example.ex->z << "\n"; // UB: same as above
std::cout
<< *example.x << ", " // This is ok (active member and valid ptr)
<< *example.y << ", " // UB: indirection to an invalid ptr
<< *example.z << "\n"; // UB: same as above
Yet again, undefined behavior was kind enough to print 565 when dereferencing example.ex->x. This is because the float* x and example.ex->x overlap in the union's binary representation, albeit this is still undefined behavior.
Let's first quick fix the lifetime issue by changing ConfigCustomDataTypeExample to take a reference as parameter: ConfigCustomDataTypeExample(CustomDataTypeExample& values) and declare a CustomDataTypeExample variable in main. I am also compiling with gcc, where type punning with unions is well defined (non-standard extension):
CustomDataTypeExample data{1.0f, 2.0f, 3.0f};
ConfigCustomDataTypeExample example(data);
float value = 565;
example.x = &value;
std::cout
<< example.ex->x << ", " // This is now ok (using gcc's non-standard extension)
<< example.ex->y << ", " // Something seems odd
<< example.ex->z << "\n"; // with these two lines
std::cout
<< *example.x << ", " // Now well defined
<< *example.y << ", " // same
<< *example.z << "\n"; // same
Here goes nothing. The output from one of my runs is:
565, 1961.14, 4.59163e-41
565, 2, 3
Ok, at least now the x, y and z values are valid, but we are still getting junk values when dereferencing parts of example.ex. What gives? Let's go back to the declaration of our union and think how it translates to its binary representation. Here is a rough diagram:
[float* x, float* y, float* z]
So our union's memory layout is three floating point pointers, that each point to a single floating point value (equivalent to an array that stores three floating point pointers eg. float* arr[3]). Yet, with example.ex we're trying to interpret the float* x as an array of 3 floating points. This is because CustomDataTypeExample's memory layout is equivalent to an array of 3 floating point values and trying to refer to its members is equivalent to array indexing.
I think gcc's extension bases its interpretation of example->ex on C90 standard section 6.5.2.2 footnote 82:
If the member used to access the contents of a union object is not the same as the member last used tostore a value in the object, the appropriate part of the object representation of the value is reinterpretedas an object representation in the newtype as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
We can also verify this by looking at how the compiler translates these three lines to assembly:
example.x = &value;
std::cout
<< example.ex->x << ", "
<< example.ex->y << ", "
<< example.ex->z << "\n";
Using godbolt we get the following (I only took the parts that are relevant):
// Copies the value of rax to the memory pointed by QWORD PTR [rbp-48]
mov QWORD PTR [rbp-48], rax // example.x = &value;
// Copy a 32-bit value from memory address rax to eax.
// (eax register is used here to pass the value to std::cout)
// No surprises yet, as this address has a well defined floating point value (526).
mov eax, DWORD PTR [rax] // example.ex->x
// Not good, tries to copy a floating point value from memory address
// [rax + 4 bytes]. Equivalent to *(&value + 1). This is gonna get
// whatever random junk is in that part of memory.
mov eax, DWORD PTR [rax+4] // example.ex->y
We can see quite clearly how the compiler tries interpret the address pointed to by example.ex as region in memory that contains 3 floating point values, even though it only contains one. Hence, the first read is fine, but the second and third dereferences go very wrong.
This code is produces extremely similar assembly, which is no surprise, as the behavior is equivalent:
float* value_ptr = &value;
std::cout
<< *value_ptr << ", " // equivalent to example.ex->x, OK
<< value_ptr[1] << ", " // equivalent to example.ex->y, plain UB
<< value_ptr[2] << '\n'; // equivalent to example.ex->z, plain UB
This is case of undefined behavior is very similar to the very first case. The program is performing indirection through the invalid pointer values (third).
These three undefined behaviors combined caused the weird values to appear when you executed the main. Now on the solution.
Solution
First let's get minor nitpick out of the way. CustomDataTypeExample is clearly an aggregate that just encloses data inside it, so there is no need to explicitly declare special member functions for it (constructors in this case). The special member functions are implicitly declared (and trivial):
struct CustomDataTypeExample {
float x;
float y;
float z;
};
// Construct an instance of CustomDataTypeExample by aggregate initializing.
// This was also utilized earlier.
CustomDataTypeExample data{1.0f, 2.0f, 3.0f};
What comes to the solution, it looks like you are trying to come up with an extra layer of abstraction for a simple problem. Plain references should do the trick. There is no reason for that complicated union setup, which, as you might have noticed, is quite error-prone. In C++ unions should only really be utilized for reducing memory consumption on systems, where memory is a scarce resource.
Thus, I would just get rid of the ConfigCustomDataTypeExample and utilize references like so:
CustomDataTypeExample data{1.0f, 2.0f, 3.0f};
CustomDataTypeExample& data_ref = data;
// Modifies the contents of the existing data
data_ref.x = 565;
std::cout
<< data_ref.x << ", "
<< data_ref.y << ", "
<< data_ref.z << '\n';
When you are working with variables that have an automatic storage duration, references are the way to go. Compared to pointers, with references lifetime issues are a little bit harder to create, and the overall solution is usually simpler.

Problem assigning a value to an "adjacent" variable

Here in this code, I have first described int a and assigned value 9 to it and then I declared another int b and then I have given value 3 to *(&b-1) so (&b-1) refers to &a and then I printed the value of a then it prints 9 only but when I add new line in the code(line no. 6) i.e. first printed a and then assigned value 3 to (&b-1) then it updates a to 3 and prints it. So why it's happening like this?
#include <iostream>
using namespace std;
int main() {
double a, b;
a = 9;
//cout<<&a<<" "<<a << endl ;
*(&b - 1) = 3;
cout << a << " " << &b - 1 << " ";
cout << &a;
}
so (&b-1) refers to &a
No, that's not how C++ works.
You can't "navigate" the stack frame like this, because C++ is an abstraction and does not have stack frames.
What you're doing here is pretending that b is a pointer to the second (or later) element of an array, and trying to get the value of the preceding element in that array. As we know, you do not actually have an array.
So why it's happening like this?
That's why. You lied to the compiler, and now it's freaking out.
Yes, it really does care about this kind of thing!
Your question is based on a false premise
[...] (&b-1) refers to &a [...]
Thats wrong. So when you ...
*(&b - 1) = 3;
you are dereferencing a pointer that you are not allowed to dereference. There is no double stored at (&b - 1). As this is undefined behaviour your program can do anything and thats about as much as one can say about your code ;).

Getting the offset of a member variable via casting a nullptr

I'm looking at the macro offsetof from <cstddef>, and saw that a possible implementation is via
#define my_offsetof(type, member) ((void*) &(((type*)nullptr)->member))
I tried it and indeed it works as expected
#include <iostream>
#define my_offsetof(type, member) ((void*) &(((type*)nullptr)->member))
struct S
{
char x;
short y;
int z;
};
int main()
{
std::cout << my_offsetof(S, x) << '\n';
std::cout << my_offsetof(S, y) << '\n';
std::cout << my_offsetof(S, z) << '\n';
S s;
std::cout << (void*) &((&s)->x) << '\n'; // no more relative offsets
std::cout << (void*) &((&s)->y) << '\n'; // no more relative offsets
std::cout << (void*) &((&s)->z) << '\n'; // no more relative offsets
}
Live on Coliru
the only modification I've done being that I use a final cast to void* instead of size_t, as I want to display the address as a pointer.
My question(s):
Is the code perfectly legal, i.e. is it legal to "access" a member via a nullptr, then take its address? If that's the case, then it seems that &(((type*)nullptr)->member) computes the address of the member relative to 0, is this indeed the case? (it seems so, as in the last 3 lines I get the offsets relative to the address of s).
If I remove the final cast to (void*) from the macro definition, I get a segfault. Why? Shouldn't &(((type*)nullptr)->member) be a pointer of type type*, or is the type somehow erased here?
Is the code perfectly legal?
No. It's undefined behavior. A compiler may choose to implement offsetof in that manner, but that's because it is the implementation: it can choose how to implement its own features. You, on the other hand, do not get such "luxury."
There is no way for you to implement the offsetof macro. Not in any standards-conforming manner.
If I remove the final cast to (void*) from the macro definition, I get a segfault. Why? Shouldn't &(((type*)nullptr)->member) be a pointer of type type*, or is the type somehow erased here?
It's probably a segfault from trying to print my_offsetof(S, x) (since x is a char and that expression results in char*), because std::ostream's operator<< will try to print char* as a C-style string.

Please explain what is incorrect about this procedure to find the largest pointer

Wouldn't the highest pointer be the one which can't be incremented through pointer arithmetic?
#include <iostream>
int main()
{
// Find the largest pointer
int x = 0;
int* px = &x;
while ((px+1) != px)
++px;
std::cout << "The largest pointer is: " << px;
return 0;
}
yields
Timeout
As already mentioned, you've got an infinite loop because the condition can never be false.
That being said, what you're doing is undefined behaviour, illegal C++. Pointer arithmetic is only legal with pointers pointing to the same array (and a single object is treated as an array of one element) and right past the end of it. You can't expect a reasonable outcome of your program even if you fix the loop.
I suspect the value of std::numeric_limits<uintptr_t>::max() is the theoretical maximum value of pointer (converted to integer), but it might not be avaliable to your program. There are things such as virtual address space and segmented memory model to consider. Anyway, exact values of pointers (except for nullptr) is not something you should be concerned with. You get pointers by taking addresses of existing objects or by calling allocation functions and that's that.
N.B. I think you have a misconception that attempting to increment an integer type beyond its maximum value will just do nothing. That's incorrect - unsigned integers will wrap around to 0 and with signed integers you get undefined behaviour again (see signed integer overflow).
Hope that helps.
This will never be false and thus never quit
while ((px+1) != px)
Look at this program:
#include <iostream>
int main()
{
int *px = (int *) (~0);
std::cout << "Value: " << px;
++px;
std::cout << " Value: " << px << std::endl;
}
whose output is:
Value: 0xffffffffffffffff Value: 0x3
As you can see, when you increment a pointer that is at its maximum, it values is reseted and begins again
You might want to look for the largest pointer value that occurs before wrap-around, i.e.:
while (px+1 > px)
px++;
...which will not work, of course, without the proper casts:
while ((unsigned long long)(px + 1) > (unsigned long long)px)
px++;