Accessing unions in struct - c++

Consider following code:
struct test1str
{
int testintstr : 2;
int testintstr2 : 1;
};
struct test2str
{
int testintstr : 2;
int testintstr2 : 1;
};
union test1uni
{
int testint1;
test1str str1;
};
union test2uni
{
int testint2;
test2str str2;
};
struct finalstruct
{
test1uni union1;
test2uni union2;
} finstr;
int* ptr = &finstr.union1.testint1;
finstr.union1.testint1 = 2;
finstr.union2.testint2 = 4;
cout << &finstr.union1 << endl;
cout << &finstr.union2 << endl;
printf("val: %i addr: %x\n", *ptr, ptr);
ptr++;
printf("val: %i addr: %x\n", *ptr, ptr);
Is there more appropriate way of accessing values from unions inside example finalstruct? Using code from above example, I could iterate throught all unions inside "finalstruct", and get int that was needed, but is there some other way to do this?
Assume that data size from all structs will be less or equal to the size of variable inside union - structs will be treated as bitfields, and data will be read through union variable.
This will be used only on one type of processor, compiled with one compiller (gcc) and sizes of all structs and unions will be the same (except of finalstruct of course). What I'm trying to achieve is to be able to change different bits easily by using struct (test1str, test2str), and for reading I need to know only what will be final value that these bits will make - for that I will use union (test1uni, test2uni). By packing these unions inside struct (finalstruct), I can easily process all data.

ptr++
ptr does not point to an element of an array, so after you increment it, it is no longer valid - even if there happens to be another object in that address. When you indirect through it, the behaviour of the program is undefined.
What you really need to iterate members of a class is language feature called "reflection". C++ has very limited support for reflection. You could store references to the members in an array, and iterate the array. Note that since we cannot have arrays of references, we need to wrap them, and in case of printf, explicitly convert the wrapper back:
std::array members {
std::ref(finstr.union1.testint1),
std::ref(finstr.union2.testint2),
};
auto ptr = std::begin(members);
printf("val: %i addr: %p\n", ptr->get(), (void*)&ptr->get());
ptr++;
printf("val: %i addr: %p\n", ptr->get(), (void*)&ptr->get());
P.S. I took the liberty of fixing the printf call. %x is wrong for a pointer.

I don't think I've quite understood what you're trying to achieve, but...
If you're sure that sizeof(test1uni) == sizeof(test2uni) == sizeof(int) and there's nothing else in the struct, then you can treat finalstruct itself as an array of ints:
int *ptr = (int*)&finstr;
for(int i=0; i<sizeof(finstr)/sizeof(int); i++)
printf("val: %i addr: %x\n", ptr[i], &ptr[i]);
But as raised in the comments, casting the struct to an int probably violates strict aliasing, and "type-punning" using the members of union is no longer explicitly allowed in C++ (as opposed to explicitly not allowed). Thus this is in the realm of undefined behaviour, so you need to either:
Disable optimisations based on strict aliasing. Eg. pass -fno-strict-aliasing to GCC 3.4.1 and above. Still have the type-punning issue though.
Check the assembly to make sure the compiler is doing what you want.
Change to C.
Also be aware of other gotchas: int has to be a multiple of a word, finstr has to be aligned to a word boundary, the compiler/platform has to follow convention. So I certainly wouldn't consider this portable without a fair bit more rigour.

Related

Portable tagged pointers

Is there a portable way to implement a tagged pointer in C/C++, like some documented macros that work across platforms and compilers? Or when you tag your pointers you are at your own peril? If such helper functions/macros exist, are they part of any standard or just are available as open source libraries?
Just for those who do not know what tagged pointer is but are interested, it is a way to store some extra data inside a normal pointer, because on most architectures some bits in pointers are always 0 or 1, so you keep your flags/types/hints in those extra bits, and just erase them right before you want to use pointer to dereference some actual value.
const int gc_flag = 1;
const int flag_mask = 7; // aka 0b00000000000111, because on some theoretical CPU under some arbitrary OS compiled with some random compiler and using some particular malloc last three bits are always zero in pointers.
struct value {
void *data;
};
struct value val;
val.data = &data | gc_flag;
int data = *(int*)(val.data & flag_mask);
https://en.wikipedia.org/wiki/Pointer_tagging
You can get the lowest N bits of an address for your personal use by guaranteeing that the objects are aligned to multiples of 1 << N. This can be achieved platform-independently by different ways (alignas and aligned_storage for stack-based objects or std::aligned_alloc for dynamic objects), depending on what you want to achieve:
struct Data { ... };
alignas(1 << 4) Data d; // 4-bits, 16-byte alignment
assert(reinterpret_cast<std::uintptr_t>(&d) % 16 == 0);
// dynamic (preferably with a unique_ptr or alike)
void* ptr = std::aligned_alloc(1 << 4, sizeof(Data));
auto obj = new (ptr) Data;
...
obj->~Data();
std::free(ptr);
You pay by throwing away a lot of memory, exponentionally growing with the number of bits required. Also, if you plan to allocate many of such objects contiguously, such an array won't fit in the processor's cacheline for comparatively small arrays, possibly slowing down the program considerably. This solution therefore is not to scale.
If you're sure that the addresses you are passing around always have certain bits unused, then you could use uintptr_t as a transport type. This is an integer type that maps to pointers in the expected way (and will fail to exist on an obscure platform that offers no such possible map).
There aren't any standard macros but you can roll your own easily enough. The code (sans macros) might look like:
void T_func(uintptr_t t)
{
uint8_t tag = (t & 7);
T *ptr = (T *)(t & ~(uintptr_t)7);
// ...
}
int main()
{
T *ptr = new T;
assert( ((uintptr_t)ptr % 8) == 0 );
T_func( (uintptr_t)ptr + 3 );
}
This may defeat compiler optimizations that involve tracking pointer usage.
Well, GCC at least can compute the size of bit-fields, so you can get portability across platforms (I don't have an MSVC available to test with). You can use this to pack the pointer and tag into an intptr_t, and intptr_t is guaranteed to be able to hold a pointer.
#include <limits.h>
#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
#include <inttypes.h>
struct tagged_ptr
{
intptr_t ptr : (sizeof(intptr_t)*CHAR_BIT-3);
intptr_t tag : 3;
};
int main(int argc, char *argv[])
{
struct tagged_ptr p;
p.tag = 3;
p.ptr = (intptr_t)argv[0];
printf("sizeof(p): %zu <---WTF MinGW!\n", sizeof p);
printf("sizeof(p): %lu\n", (unsigned long int)sizeof p);
printf("sizeof(void *): %u\n", (unsigned int)sizeof (void *));
printf("argv[0]: %p\n", argv[0]);
printf("p.tag: %" PRIxPTR "\n", p.tag);
printf("p.ptr: %" PRIxPTR "\n", p.ptr);
printf("(void *)*(intptr_t*)&p: %p\n", (void *)*(intptr_t *)&p);
}
Gives:
$ ./tag.exe
sizeof(p): zu <---WTF MinGW!
sizeof(p): 8
sizeof(void *): 8
argv[0]: 00000000007613B0
p.tag: 3
p.ptr: 7613b0
(void *)*(intptr_t*)&p: 60000000007613B0
I've put the tag at the top, but changing the order of the struct would put it at the bottom. Then shifting the pointer-to-be-stored right by 3 would implement the OP's use case. Probably make macros for access to make it easier.
I also kinda like the struct because you can't accidentally dereference it as if it were a plain pointer.

Request explanation on the behaviour of pointers in array

I m running this using rextester (online compiler). I followed a tutorial but there is something I don't understand.
I thought it would be better to write my question directly inside the code.
//gcc 5.4.0
#include <stdint.h>
#include <stdio.h>
uint8_t var1 = 17;
uint8_t var2 = 23;
uint8_t arr[] ={7,9,11};
uint64_t *ptr1;//pointer
uint64_t *ptr2;
uint64_t *ptr3;
int main(void)
{
printf("var1: %d\n", var1) ;
//connecting pointer to address
ptr1 = &var1;
printf("address of ptr1: %d\n", ptr1) ;
printf("value of ptr1: %d\n\n", *ptr1) ;
//connecting pointer to address + 1
ptr2 = &var1 +1;
printf("address of ptr2: %d\n", ptr2) ;
//assign value to pointer
*ptr2 = var2;
printf("value of ptr2: %d\n\n", *ptr2) ;
//try on array
ptr3= &arr;//no need to point element 0, or yes?
printf("address of ptr3: %d\n", ptr3) ;
printf("value of ptr3: %d\n\n", *ptr3) ;//i expect 7
return 0;
}
Any help would be very appreciate to help me understand the right behaviour of pointers in c and cpp
I made a lot of tries but i m not able to link a pointer to an array
Edit after response of mato:
Do you think this is a clean way to work with pointer and array? Or there are better solution which take care of not overwriting memory?
//gcc 5.4.0
#include <stdint.h>
#include <stdio.h>
uint16_t var = 17;
uint16_t arr[] ={3,5,7,11,13};
uint16_t *ptr;
int main(void)
{
printf("var: %d\n", var) ;
//connecting pointer to address
ptr = &var;
printf("address of ptr: %d\n", ptr) ;
printf("value of ptr: %d\n\n", *ptr) ;
//try on array
for (uint16_t n =0;n<5;n++){
ptr= &arr[n] ;
printf("item: %d value: %d ads: %d pointer: %d\n", n, arr[n], ptr, *ptr) ;
}
return 0;
}
It seems that you do understand what pointers are and you can use them with basic types.
There are two problems in your code. First is this part:
//connecting pointer to address + 1
ptr2 = &var1 + 1;
Here you assigned some address to variable ptr2. Up to this point there is nothing dangerous about that.
But then you assign a value to memory at that address
//assign value to pointer
*ptr2 = var2;
This is dangerous because you, as a developer, don't know what is stored at that address. Even if you are lucky right now, and that part of memory isn't being used for anything else, it will most likely change once your program gets longer and then you will have hard time searching for the bug.
Now arrays usually are a bit confusing, because when you create an array like this:
uint8_t arr[] = {7,9,11};
three things happen.
Your program allocates continual block of memory, that fits 3 variables of type uint8_t. The 3 variables in this context are called elements.
The elements will get the provided initial values 7, 9 and 11.
An address of first element (the one that contains value 7) will be stored in arr.
So arr is actually of type uint8_t *.
In order to get the last part do what you expect, you just need to change this one line (remove the &):
ptr3 = arr;
EDIT: BTW watch and understand this course and you will be expert on C memory manipulation. Video is a bit dated, but trust me, the guy is great.
EDIT2: I just realised the other answer is absolutely correct, you really need to match the types.
You are doing many mistakes. Up to the point, that g++ does not compile the code and explains why pretty good.
Pointer is an address. There is no "connecting pointer to address". ptr1 = &var1; means literally "store address of var1 in variable named ptr1"
You use incompatible pointer types. So as long as you dereference it (e.g. using *) you are going into undefined behaviour.
I am pretty sure you can reinterpret any type of data as char* or unsigned char*, I image this is true for equivalent types like uint8_t, i.e. single byte types.
You, however, are going the other way, you declare 1-byte data, and are pretending it's a 4 byte int. Basically you force the program to read memory out the variable bounds.
Fact, that *ptr1 and *ptr2 give result you expect is a rather lucky coincidence. Probably memory behind them was zeroed. For ptr3 it isn't because you have filled it with other element of the array (7 and 9).
I believe you also use wrong type specifier for printing. %d is for int, uint8_t should be described as hhu and uint64_t as lu. I am not 100% convinced how fatal this is, because of platform specific widths and integer promotions.
You should use matching types for your pointers and variables.

Are elements stored in struct are next each other

if i have a struct , say:
struct A {
int a,b,c,d,e;
}
A m;//struct if 5 ints
int n[5];//array of 5 ints.
i know that elements in the array are stored one after other so we can use *(n+i) or n[i]
But in case of struct ,is each element is stored next to each other (in the struct A)?
The compiler may insert padding as it wishes, except before the first item.
In C++03 you were guaranteed increasing addresses of items between access specifiers.
I'm not sure if the access specifier restriction is still there in C++11.
The only thing that is granted is that members are stored in the same order.
Between elements there can be some "padding" the compiler may insert so that each value is aligned with the processor word length.
Different compiler can make different choices also depending on the target platform and can be forced to keep a given alignment by option switches or pragma-s.
Your particular case is "luky" for the most of compiler since int is normally implemented as "the integral that better fits the integer arithmetic of the processor". With this idea, a sequence of int-s is aligned by definition. But that may not be the case, for example if you have
struct test
{
char a;
short b;
long c;
long long d;
};
You can dscovery that (&a)+1 != &b and (&b)+1 != &c or (&b)-1 != &a etc.
What is granted is the progression &a < &b; &b < &c; &c < &d;
Structs members in general are stored in increasing addresses but they are not guaranteed to be contiguous.so elements may not always be contiguous. In the example above, given $base is the base address of the struct the layout will be the following.
a will be stored at $base+0
b will be stored at $base+4
c will be stored at $base+8 ... etc
You can see the typical alignment values at http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86
I have written simple program to show strut elements are next to each other
int main() {
struct S {
int a;
int b;
int c;
};
S s = {1,2,3};
int* p = reinterpret_cast <int *> (&s);
cout<<p[0]<<" "<<p[1]<<" "<<p[2];
return 0;
}
Output : 1,2,3
Remember, [] or *(i+1) are symantic construct, that suits with pointers, not with struct variables directly.
As suggested in Cheers and hth. - Alf's answer, there can be padding, before or after struct elements.

casting object addresses to char ptrs then using pointer math on them

According to Effective C++, "casting object addresses to char* pointers and then using pointer arithemetic on them almost always yields undefined behavior."
Is this true for plain-old-data? for example in this template function I wrote long ago to print the bits of an object. It works splendidly on x86, but... is it portable?
#include <iostream>
template< typename TYPE >
void PrintBits( TYPE data ) {
unsigned char *c = reinterpret_cast<unsigned char *>(&data);
std::size_t i = sizeof(data);
std::size_t b;
while ( i>0 ) {
i--;
b=8;
while ( b > 0 ) {
b--;
std::cout << ( ( c[i] & (1<<b) ) ? '1' : '0' );
}
}
std::cout << "\n";
}
int main ( void ) {
unsigned int f = 0xf0f0f0f0;
PrintBits<unsigned int>( f );
return 0;
}
It certainly is not portable. Even if you stick to fundamental types, there is endianness and there is sizeof, so your function will print different results on big-endian machines, or on machines where sizeof(int) is 16 or 64. Another issue is that not all PODs are fundamental types, structs may be POD, too.
POD struct members may have internal paddings according to the implementation-defined alignment rules. So if you pass this POD struct:
struct PaddedPOD
{
char c;
int i;
}
your code would print the contents of padding, too. And that padding will be different even on the same compiler with different pragmas and options.
On the other side, maybe it's just what you wanted.
So, it's not portable, but it's not UB. There are some standard guarantees: you can copy PODs to and from array of char or unsigned char, and the result of this copying via char buffer will hold the original value. That implies that you can safely traverse that array, so your function is safe. But nobody guarantees that this array (or object representation) of objects with same type and value will be the same on different computers.
BTW, I couldn't find that passage in Effective C++. Would you quote it, pls? I could say, if a part of your code already contains lots of #ifdef thiscompilerversion, sometimes it makes sense to go all-nonstandard and use some hacks that lead to undefined behavior, but work as intended on this compiler version with this pragmas and options. In that sense, yes, casting to char * often leads to UB.
Yes, POD types can always be treated as an array of chars, of size sizeof (TYPE). POD types are just like the corresponding C types (that's what makes them "plain, old"). Since C doesn't have function overloading, writing "generic" functions to do things like write them to files or network streams depends on the ability to access them as char arrays.

Union hack for endian testing and byte swapping

For a union, writing to one member and reading from other member (except for char array) is UB.
//snippet 1(testing for endianess):
union
{
int i;
char c[sizeof(int)];
} x;
x.i = 1; // writing to i
if(x.c[0] == 1) // reading from c[0]
{ printf("little-endian\n");
}
else
{ printf("big-endian\n");
}
//snippet 2(swap bytes using union):
int swapbytes()
{
union // assuming 32bit, sizeof(int)==4
{
int i;
char c[sizeof(int)];
} x;
x.i = 0x12345678; // writing to member i
SWAP(x.ch[0],x.ch[3]); // writing to char array elements
SWAP(x.ch[1],x.ch[2]); // writing to char array elements
return x.i; // reading from x.i
}
Snippet 1 is legal C or C++ but not snippet 2. Am I correct? Can some one point to the section of standard where it says its OK to write to a member of union and read from another member which is a char array.
There is a really simple way that gets round the undefined behaviour (well undefinied behvaiour that is defined in pretty much every compiler out there ;)).
uint32_t i = 0x12345678;
char ch[4];
memcpy( ch, &i, 4 );
bool bLittleEndian = ch[0] == 0x78;
This has the added bonus that pretty much every compiler out there will see that you are memcpying a constant number of bytes and optimise out the memcpy completely resulting in exactly the same code as your snippet 1 while staying totally within the rules!
I believe it (snippet 1) is technically not allowed, but most compilers allow it anyway because people use this kind of code. GCC even documents that it is supported.
You will have problems on some machines where sizeof(int) == 1, and possibly on some that are neither big endian nor little endian.
Either use available functions that change words to the proper order, or set this with a configuration macro. You probably need to recognize compiler and OS anyway.