Getting the offset of a member variable via casting a nullptr - c++

I'm looking at the macro offsetof from <cstddef>, and saw that a possible implementation is via
#define my_offsetof(type, member) ((void*) &(((type*)nullptr)->member))
I tried it and indeed it works as expected
#include <iostream>
#define my_offsetof(type, member) ((void*) &(((type*)nullptr)->member))
struct S
{
char x;
short y;
int z;
};
int main()
{
std::cout << my_offsetof(S, x) << '\n';
std::cout << my_offsetof(S, y) << '\n';
std::cout << my_offsetof(S, z) << '\n';
S s;
std::cout << (void*) &((&s)->x) << '\n'; // no more relative offsets
std::cout << (void*) &((&s)->y) << '\n'; // no more relative offsets
std::cout << (void*) &((&s)->z) << '\n'; // no more relative offsets
}
Live on Coliru
the only modification I've done being that I use a final cast to void* instead of size_t, as I want to display the address as a pointer.
My question(s):
Is the code perfectly legal, i.e. is it legal to "access" a member via a nullptr, then take its address? If that's the case, then it seems that &(((type*)nullptr)->member) computes the address of the member relative to 0, is this indeed the case? (it seems so, as in the last 3 lines I get the offsets relative to the address of s).
If I remove the final cast to (void*) from the macro definition, I get a segfault. Why? Shouldn't &(((type*)nullptr)->member) be a pointer of type type*, or is the type somehow erased here?

Is the code perfectly legal?
No. It's undefined behavior. A compiler may choose to implement offsetof in that manner, but that's because it is the implementation: it can choose how to implement its own features. You, on the other hand, do not get such "luxury."
There is no way for you to implement the offsetof macro. Not in any standards-conforming manner.
If I remove the final cast to (void*) from the macro definition, I get a segfault. Why? Shouldn't &(((type*)nullptr)->member) be a pointer of type type*, or is the type somehow erased here?
It's probably a segfault from trying to print my_offsetof(S, x) (since x is a char and that expression results in char*), because std::ostream's operator<< will try to print char* as a C-style string.

Related

unexpected address for a referenced indirected pointer in a struct as opposed to same declaration with a plain variable?

I have a struct containing a byte array and several typecast references to various points in the array. Bytes 4:7 may be interpreted as a float, int32_t, or uint32_t as determined by other fields in the packet being received over a serial connection. To make access simple (e.g. message.argument.F for a float interpretation), I made multiple references to indirected typecast pointers. But when I ran the program, I got a segfault trying to write to the references in the struct. As near as I can tell, the problem has to do with the container, as illustrated by this example snippet (cpp shell: http://cpp.sh/3vmoy):
#include <iostream>
#include <cstring>
using namespace std;
#define PACKET_SIZE 9
#define ARG_I 4
struct Message{
uint8_t bytes[PACKET_SIZE];
uint8_t* argbytes = static_cast<uint8_t*>(argp);
float& argf = *static_cast<float*>(argp);
void* argp = &bytes[ARG_I];
} message;
int main(){
// USING STRUCT
cout << "Using message struct" << endl;
cout << message.argp << endl; // the pointer at index stored in struct
cout << static_cast<float*>(message.argp) << endl; // casting the pointer to a float* - should be the same
cout << &message.argf << endl; // the address of the float reference cast from argp, ** should be the same BUT IS NOT **
// RAW VARS
uint8_t bytes[PACKET_SIZE];
void* argp = &bytes[ARG_I];
float& argf = *static_cast<float*>(argp);
cout << endl << "using raw vars" << endl;
cout << argp << endl; // a pointer to a byte in an array of bytes.
cout << static_cast<float*>(argp) << endl; // the same pointer cast as a float*
cout << &argf << endl; // the address of a float reference cast from argp, **should be the same AND IS.**
}
I expect to see the same address for the pointer, a typecast pointer, and the address of the reference for the indirected pointer. I do see that if I create an array and the pointer/reference as standalone variables, but not for the same declarations in a struct. What arcane knowledge do I lack to explain this behavior (or what silly thing have I overlooked?)
My thoughts for fixing this are to a) ignore it and just typecast the pointer as necessary instead, or b) make some setter/getter functions to access the argument portion of the serial "packet".
There are two major, fundamental differences between the two alternative chunks of code.
void* argp = &bytes[ARG_I];
float& argf = *static_cast<float*>(argp);
Here, this constructs and initializes argp first, then argf.
float& argf = *static_cast<float*>(argp);
void* argp = &bytes[ARG_I];
And here, it does not.
This initializes argf first, then argp. The consequences of this should be quite apparent.
Note: I'm ignoring all the aliasing rule violations here, that are likely to be a source of further undefined behavior.

Dereference a structure to get value of first member

I found out that address of first element of structure is same as the address of structure. But dereferencing address of structure doesn't return me value of first data member. However dereferencing address of first data member does return it's value. eg. Address of structure=100, address of first element of structure is also 100. Now dereferencing should work in the same way on both.
Code:
#include <iostream>
#include <cstring>
struct things{
int good;
int bad;
};
int main()
{
things *ptr = new things;
ptr->bad = 3;
ptr->good = 7;
std::cout << *(&(ptr->good)) <<" " << &(ptr->good) << std::endl;
std::cout << "ptr also print same address = " << ptr << std::endl;
std::cout << "But *ptr does not print 7 and gives compile time error. Why ?" << *ptr << std::endl;
return 0;
}
*ptr returns to you an instance of type of things, for which there is no operator << defined, hence the compile-time error.
A struct is not the same as an array†. That is, it doesn't necessarily decay to a pointer to its first element. The compiler, in fact, is free to (and often does) insert padding in a struct so that it aligns to certain byte boundaries‡. So even if a struct could decay in the same way as an array (bad idea), simply printing it would not guarantee printing of the first element!
† I mean a C-Style array like int[]
‡ These boundaries are implementation-dependent and can often be controlled in some manner via preprocessor statements like pragma pack
Try any of these:
#include <iostream>
#include <cstring>
struct things{
int good;
int bad;
};
int main()
{
things *ptr = new things;
ptr->bad = 3;
ptr->good = 7;
std::cout << *(int*)ptr << std::endl;
std::cout << *reinterpret_cast<int*>(ptr) << std::endl;
int* p = reinterpret_cast<int*>(ptr);
std::cout << *p << std::endl;
return 0;
}
You can do a cast of the pointer to Struct, to a pointer to the first element of the struct so the compiler knows what size and alignment to use to collect the value from memory.
If you want a "clean" cast, you can consider converting it to "VOID pointer" first.
_ (Struct*) to (VOID*) to (FirstElem*) _
Also see:
Pointers in Stackoverflow
Hope it helps!!
I found out that address of first element of structure is same as the address of structure.
Wherever you found this out, it wasn't the c++ standard. It's an incorrect assumption in the general case.
There is nothing but misery and pain for you if you continue down this path.

Pointer arithmetic ignored by the compiler

I'm compiling the following with -O0 (recent gcc/clang) and they both give me a answer I don't expect.
#include <iostream>
struct xy{
int x,y;
};
int main()
{
xy a{1,2};
int x{1};
int y{2};
int *ptr1=&a.x;
int *ptr2=&x;
ptr1++; // I now point to a.y!
(*ptr1)++; // I now incremented a.y to 3
ptr2++; // I now point to y!
(*ptr2)++; // I now incremented y to 3
std::cout << "a.y=" << a.y << " ptr1=" << *ptr1 << '\n';
std::cout << "y= " << y << " ptr2=" << *ptr2 << '\n';
}
Output:
a.y=3 ptr1=3
y= 2 ptr2=2
So this access with pointers to non-class variables is being optimized-out by the compiler.
I also tried to mark the int and int* as volatile, but it didn't make any difference.
What part of the standard am I missing / why is the compiler allowed to do this?
Coliru snippet at: http://coliru.stacked-crooked.com/a/ed0757a6621c37a9
In the first case dealing with class members the part you are ignoring is the compiler is allowed to add any amount of padding in between members of a object and at the end of the object. Because of this increment a pointer to one member does not have to give you the next member.
The second part of the standard you are missing is it is illegal to access memory though a pointer to what it doesn't point to. Even though y might be there in memory the pointer is not allowed to access it. It is allowed to access x and it is allowed to compare to see if it one past x but it cannot dereference that one past x address.
Pointer arithmetic is only valid in arrays. You cannot reach y by incementing a pointer to x. The behaviour of your program is undefined. Your statement
ptr1++; // I now point to a.y!
is simply wrong. Remember that a compiler is allowed to insert an arbitrary amount of padding between the elements in your struct.
In more detail, you can set a pointer to one past the address of a scalar, but you are not allowed to dereference it.

Cast from hexadecimal to unsigned int* in c++

I have an assignment that were supposed to evaluate some pointer manipulation expressions and memory leak situations in C/C++. There's one I'm stuck with:
unsigned int* pInt = (unsigned int*) 0x403004;
Right off the bat this is suspicious to me, but in the assignment this line is theoretically working, however running the program I'm get segfault right at this line.
The question is: Is this right or even is possible or the professor is just fooling us telling this is right? I've seen some examples and questions with string "hex" to int, but nothing regarding "pure hex" to int or int*
unsigned int* pInt = (unsigned int*) 0x403004;
Two things are suspicious here:
Unless, you are writing some specialized Software like device drivers or OS, or you are in some embedded or special system where memory is fixed, seeing memory address hardcoded is certainly suspicious. Your program will (at best) fail if it tries to access memory it doesn't have the access rights to.
On the right hand side, the compiler first deduces the value 0x403004 as in int and will correctly convert it to a pointer. Thus, your Segfault is probably as a result of the first point.
unsigned int* pInt = (unsigned int*) 0x403004;
Possible?: yes (compiles, builds just fine)
Is it right?: depends on what for. Evidently it is useful for illustration in a classroom assignment.
Is it recommended? no. It will invoke undefined behavior. You are creating a variable that points to a location in memory that you may or may not have rights to. If you never use it, fine. But if you do use it, the results are indeterminate.
it works fine only if that number represents an already allocated memory
eg:
#include <iostream>
int main()
{
int i = 7;
std::cout << "i: " << i << std::endl; // 7
std::cout << "&i: " << &i << std::endl; // in my case: 0018FF44
unsigned int* ptr = (unsigned int*)0x0018FF44; // it's ok
/*
this is just for explaining because the address of i may differ at anytime the program
launches thus 0018FF44 will no longer be assigned to i and consequently segfault.
the right thing is to make pointer points always to whatever i may take as an address.
to do so:
*/
//int* ptr = (unsigned int*)&i; // not i but the address of i
(*ptr)++;
std::cout << i << std::endl; // 8
// above we changed i through pointer ptr
int* pRandom = (int*)0xAB2FC0DE0; // this causes me segfault
*pRandom = 5; // segfault
std::cout << "*pRandom: " << *pRandom << std::endl; // segfault
std::cout << std::endl;
return 0;
}

where is the address that a function pointer stores

I know that a function pointer stores the address of a function.
int fun(int x){
//return something
}
int (pfun*)(int)=&fun;
int main(){
std::cout << &fun << "\n"; // this print out 1
std::cout << fun << "\n" ; // this print out 1
std::cout << &pfun << "\n"; // this print out 0x0022ff40
std::cout << pfun << "\n" ; // this print out 1
}
So my questions are :
1) if the fun() doesn't even have an address how can pfun does point to fun().
2) for example in dynamic binding when I use a pointer function at runtime. does the compiler change pfun value to a real pointer like 0X..... so that at runtime will know which function to call since the names doesn't existe after compilation?
The expressions fun and &fun have the same meaning: &fun which is equivalent to the value stored in pfun, so it is no wonder that the three of them yield the same output. &pfun is the address of the pointer, which is the address of the variable.
Now the question is why 1... well, the answer is that there is no overloaded operator<< that takes an std::ostream and a function pointer, so the compiler tries to find the best match among the existing overloads which happens to be bool (a function pointer is implicitly convertible to bool). The function pointer will be converted to false only if the function pointer is null, which is not the case. The true value is finally printed as 1 (you can check this by doing: std::cout << std::boolalpha << fun which will print true).
If you want to obtain the actual address of the function (in this process) you can force the cast to a void pointer and print the result. This might not be technically correct, but it will give you a number different than 1... Note that the value might differ in different runs and basically has no meaning at all.
operator<< does not have an appropriate overload for printing function pointers. Try this instead.
#include <iostream>
void fun() {}
void (*pFun)() = &fun;
int main ()
{
std::cout << (void*)pFun << "\n";
}