Best way to set bits of fields in union - c++

Let's say I have the following
struct S {
union {
uint8_t flags;
struct {
uint8_t flag2bits : 2;
uint8_t flag1bit : 1;
};
};
};
S s;
s.flag2bits = 2;
s.flag1bit = 1; // this will wipe out the values of other bits
What's the best way to assign value to a specific bit without affecting other bit fields?
I can shift around and then assign and then shift again but it means once someone changes the order of the bit fields, the code is broken....

I can shift around and then assign and then shift again but it means
once someone changes the order of the bit fields, the code is
broken....
No, it doesn't mean the code is broken. You can change the bitfields whatever (in any order/you can leave some of them unset) you like
In your example:
S s;
s.flag2bits = 2;
s.flag1bit = 1;
Changing flag2bits will not affect value stored in flag1bit.
However, your problem may be related to the union you hold in your struct. Changing the flags variable will affect both of the bitfields, as you are storing them in a separate struct.
I hope this example will explain the case here:
#include <iostream>
#include <cstdint>
struct S {
union {
uint8_t flags;
struct {
uint8_t flag2bits : 2;
uint8_t flag1bit : 1;
};
};
};
int main(int argc, char *argv[]) {
S s;
s.flag2bits = 2;
s.flag1bit = 1;
std::cout << int(s.flag2bits) << int(s.flag1bit) << std::endl;
s.flags = 4; // As you are using union, at this point you are overwriting
// values stored in your (nested) struct
std::cout << int(s.flag2bits) << int(s.flag1bit) << std::endl;
return 0;
}
EDIT: As #M.M points out, it's undefined behavior to read from the member of the union that wasn't most recently written. Though at least on clang-3.5, the code above would print:
21
01
which illustrates the point I am trying to make (i.e. overwriting of union fields).
I would consider removing union from your struct S code, though I may not see the whole picture of what you are trying to achieve.

The C++ compiler will manage the bits for you. You can just set the values as you have it. Only the appropriate bits will be set.
Did you try it?

Related

uint32_t pointer to the same location as uint8_t pointer

#include <iostream>
int main(){
uint8_t memory[1024];
memory[0] = 1;
memory[1] = 1;
uint32_t *test = memory;
//is it possible to get a value for *test that would be in this example 257?
}
I want to create a uin32_t pointer to the same adress as the uint8_t pointer. Is this possible without using new(adress)? I don't want to lose the information at the adress. I know pointers are just adresses and therefor I should be able to just set the uint32_t pointer to the same adress.
This code produces an error:
invalid conversion from 'uint8_t*' to 'uint32_t*' in initialization
This would be a violation of so-called Strict Aliasing Rule, so it can not be done. Sad, but true.
Use memcpy to copy data and in many cases compilers will optimize memory copy and generate the same code as they would with cast, but in Standard-conforming way.
As already mentioned you cannot convert uint8_t * to uint32_t * due to strict aliasing rule, you can convert uint32_t * to unsigned char * though:
#include <iostream>
int main(){
uint32_t test[1024/4] = {}; // initialize it!
auto memory = reinterpret_cast<unsigned char *>( test );
memory[0] = 1;
memory[1] = 1;
std::cout << test[0] << std::endl;
}
this is not portable code due to Endianness, but at least it does not have UB.
This question completely ignores the concept of endian-ness; while your example has the lower and upper byte the same value, if the byte order is swapped it makes no difference; but in the case where it is; your number will be wrong unexpectedly.
As such, there's no portable way to use the resulting number.
You can do that with union. As mentioned above, you have to be aware of endianness of target device, but in most cases it will be little-endian. And there is also a bit of controversy about using unions in such way, but fwiw it's getting a job done and for some uses it's good enough.
#include <iostream>
int main(){
union {
uint8_t memory[1024] = {};
uint32_t test[1024/4];
};
memory[0] = 1;
memory[1] = 1;
std::cout << test[0]; // 257
}
uint32_t *test =(uint32_t*) memory;
uint32_t shows that the memory pointed by test should contain uint32_t .

Can someone explain how the union works in this line of code and how the numbers are being swaped?

#include<iostream>
using namespace std;
union swap_byte { //This code is for union
public:
void swap();
void show_byte();
void set_byte(unsigned short x);
unsigned char c[2];
unsigned short s;
};
void swap_byte::swap() //swaping the declared char c[2]
{
unsigned char t;
t = c[1];
c[1] = c[0];
c[0] = t;
}
void swap_byte::show_byte()
{
cout << s << "\n";
}
void swap_byte::set_byte(unsigned short x) //input for the byte
{
s = x;
}
int main()
{
swap_byte b;
b.set_byte(49034);
b.show_byte();
b.swap();
b.show_byte();
cin.get();
return 0;
}
I am unable to understand the purpose of union and I saw the implemetation of union via above code but got confused please explain what the code does and how the union is working.
An union is a special kind of struct in which members overlap, so the layout of swap_byte is something like:
| | | char c[2]
-------------
| | short s
But this occurs in the same 2 memory bytes. That's why swapping the single bytes of c produces the effect of swapping the most relevant and least relevant byte of the short number.
Mind that this can be fragile and it's not the best way to do it because you must make sure of multiple aspects. In addition, by default, accessing an union field different from the last one set yields undefined behavior in C++ (while it's allowed in C). This is an old trick which is rarely needed.

Padding in struct containing only one int array member in C++?

Giving simple structure (POD) containing only one array of shorts (bytes, ints from <cstdint>, etc) and no more fields will be added later:
#define FIXED_SIZE 128 // 'fixed' in long term, shouldn’t change in future versions
struct Foo {
uint16_t bar[FIXED_SIZE];
};
is it any possibility to end up with padding at the end of the structure added by compiler for any reason ?
It seems reasonable not to make any padding as it is no any obvious need of it, but is it any guarantees by standard (could you provide any links where it is explained)?
Later I would like to use arrays of Foo structs in simple serialization (IPC) within different platforms and don't want to use any libraries for this simple task (code simplified for demonstration):
#define FOO_ELEMS 1024
...
// sender
Foo *from = new Foo[FOO_ELEMS];
uint8_t *buff_to = new uint8_t[FOO_ELEMS * FIXED_SIZE * sizeof(uint16_t) ];
memcpy(buff_to, from, ...);
...
// receiver
uint8_t *buff_from = new uint8_t[ ... ];
Foo *to = new Foo[FOO_ELEMS];
memcpy(to, buff_from, ...);
I would like to use struct here instead of plain arrays as it will be some auxiliary methods within struct and it seems more convenient then to use plain functions + arrays pointers instead.
Intersects with this (plain C) question, but seems a little bit different for me:
Alignment of char array struct members in C standard
The various standards provide for padding to occur (but not at the start).
There is no strict requirement at all that it will only appear to align the members and the object in arrays.
So the truly conformant answer is:
Yes, there may be padding because the compiler can add it but not at the start or between array elements.
There is no standard way of forcing packing either.
However every time this comes up and every time I ask no one has ever identified a real compiler on a platform that pads structures for any other reason than for internal alignment and array alignment.
So for all know practical purposes that structure will not be packed on any known platform.
Please consider this yet another request for someone to find a real platform that breaks that principle.
Since we are already guaranteed that there will no padding at the beginning of the structure don't have to worry about that. At the end I could see padding being added if the sizeof of the array was not divisible by the word size of the machine.
The only way I could get any padding to be added to the struct though was to add an int member to the struct as well. In doing so the struct was padded to make them the same size.
#include <iostream>
#include <cstdint>
struct a
{
uint16_t bar[128];
};
struct b
{
uint16_t bar[127];
};
struct c
{
int test;
uint16_t bar[128];
};
struct d
{
int test;
uint16_t bar[127];
};
struct e
{
uint16_t bar[128];
int test;
};
struct f
{
uint16_t bar[127];
int test;
};
int main()
{
std::cout << sizeof(a) << "\t" << sizeof(b) << "\t" << sizeof(c) << "\t" << sizeof(d) << "\t" << sizeof(e) << "\t" << sizeof(f);
}
Live Example

C++ understanding Unions and Structs

I've come to work on an ongoing project where some unions are defined as follows:
/* header.h */
typedef union my_union_t {
float data[4];
struct {
float varA;
float varB;
float varC;
float varD;
};
} my_union;
If I understand well, unions are for saving space, so sizeof(my_union_t) = MAX of the variables in it. What are the advantages of using the statement above instead of this one:
typedef struct my_struct {
float varA;
float varB;
float varC;
float varD;
};
Won't be the space allocated for both of them the same?
And how can I initialize varA,varB... from my_union?
Unions are often used when implementing a variant like object (a type field and a union of data types), or in implementing serialisation.
The way you are using a union is a recipe for disaster.
You are assuming the the struct in the union is packing the floats with no gaps between then!
The standard guarantees that float data[4]; is contiguous, but not the structure elements. The only other thing you know is that the address of varA; is the same as the address of data[0].
Never use a union in this way.
As for your question: "And how can I initialize varA,varB... from my_union?". The answer is, access the structure members in the normal long-winded way not via the data[] array.
Union are not mostly for saving space, but to implement sum types (for that, you'll put the union in some struct or class having also a discriminating field which would keep the run-time tag). Also, I suggest you to use a recent standard of C++, at least C++11 since it has better support of unions (e.g. permits more easily union of objects and their construction or initialization).
The advantage of using your union is to be able to index the n-th floating point (with 0 <= n <= 3) as u.data[n]
To assign a union field in some variable declared my_union u; just code e.g. u.varB = 3.14; which in your case has the same effect as u.data[1] = 3.14;
A good example of well deserved union is a mutable object which can hold either an int or a string (you could not use derived classes in that case):
class IntOrString {
bool isint;
union {
int num; // when isint is true
str::string str; // when isint is false
};
public:
IntOrString(int n=0) : isint(true), num(n) {};
IntOrString(std::string s) : isint(false), str(s) {};
IntOrString(const IntOrString& o): isint(o.isint)
{ if (isint) num = o.num; else str = o.str); };
IntOrString(IntOrString&&p) : isint(p.isint)
{ if (isint) num = std::move (p.num);
else str = std::move (p.str); };
~IntOrString() { if (isint) num=0; else str->~std::string(); };
void set (int n)
{ if (!isint) str->~std::string(); isint=true; num=n; };
void set (std::string s) { str = s; isint=false; };
bool is_int() const { return isint; };
int as_int() const { return (isint?num:0; };
const std::string as_string() const { return (isint?"":str;};
};
Notice the explicit calls of destructor of str field. Notice also that you can safely use IntOrString in a standard container (std::vector<IntOrString>)
See also std::optional in future versions of C++ (which conceptually is a tagged union with void)
BTW, in Ocaml, you simply code:
type intorstring = Integer of int | String of string;;
and you'll use pattern matching. If you wanted to make that mutable, you'll need to make a record or a reference of it.
You'll better use union-s in a C++ idiomatic way (see this for general advices).
I think the best way to understand unions is to just to give 2 common practical examples.
The first example is working with images. Imagine you have and RGB image that is arranged in a long buffer.
What most people would do, is represent the buffer as a char* and then loop it by 3's to get the R,G,B.
What you could do instead, is make a little union, and use that to loop over the image buffer:
union RGB
{
char raw[3];
struct
{
char R;
char G;
char B;
} colors;
}
RGB* pixel = buffer[0];
///pixel.colors.R == The red color in the first pixel.
Another very useful use for unions is using registers and bitfields.
Lets say you have a 32 bit value, that represents some HW register, or something.
Sometimes, to save space, you can split the 32 bits into bit fields, but you also want the whole representation of that register as a 32 bit type.
This obviously saves bit shift calculation that a lot of programmers use for no reason at all.
union MySpecialRegister
{
uint32_t register;
struct
{
unsigned int firstField : 5;
unsigned int somethingInTheMiddle : 25;
unsigned int lastField : 6;
} data;
}
// Now you can read the raw register into the register field
// then you can read the fields using the inner data struct
The advantage is that with a union you can access the same memory in two different ways.
In your example the union contains four floats. You can access those floats as varA, varB... which might be more descriptive names or you can access the same variables as an array data[0], data[1]... which might be more useful in loops.
With a union you can also use the same memory for different kinds of data, you might find that useful for things like writing a function to tell you if you are on a big endian or little endian CPU.
No, it is not for saving space. It is for ability to represent some binary data as various data types.
for example
#include <iostream>
#include <stdint.h>
union Foo{
int x;
struct y
{
unsigned char b0, b1, b2, b3;
};
char z[sizeof(int)];
};
int main()
{
Foo bar;
bar.x = 100;
std::cout << std::hex; // to show number in hexadec repr;
for(size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)bar.z[i] << " "; // int is just to show values as numbers, not a characters
}
return 0;
}
output: 0x64 0x0 0x0 0x0 The same values are stored in struct bar.y, but not in array but in sturcture members. Its because my machine have a little endiannes. If it were big, than the output would be reversed: 0x0 0x0 0x0 0x64
You can achieve the same using reinterpret_cast:
#include <iostream>
#include <stdint.h>
int main()
{
int x = 100;
char * xBytes = reinterpret_cast<char*>(&x);
std::cout << std::hex; // to show number in hexadec repr;
for (size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)xBytes[i] << " "; // (int) is just to show values as numbers, not a characters
}
return 0;
}
its usefull, for example, when you need to read some binary file, that was written on a machine with different endianess than yours. You can just access values as bytearray and swap those bytes as you wish.
Also, it is usefull when you have to deal with bit fields, but its a whole different story :)
First of all: Avoid unions where the access goes to the same memory but to different types!
Unions did not save space at all. The only define multiple names on the same memory area! And you can only store one of the elements in one time in a union.
if you have
union X
{
int x;
char y[4];
};
you can store an int OR 4 chars but not both! The general problem is, that nobody knows which data is actually stored in a union. If you store a int and read the chars, the compiler will not check that and also there is no runtime check. A solution is often to provide an additional data element in a struct to a union which contains the actual stored data type as an enum.
struct Y
{
enum { IS_CHAR, IS_INT } tinfo;
union
{
int x;
char y[4];
};
}
But in c++ you always should use classes or structs which can derive from a maybe empty parent class like this:
class Base
{
};
class Int_Type: public Base
{
...
int x;
};
class Char_Type: public Base
{
...
char y[4];
};
So you can device pointers to base which actually can hold a Int or a Char Type for you. With virtual functions you can access the members in a object oriented way of programming.
As mentioned already from Basile's answer, a useful case can be the access via different names to the same type.
union X
{
struct data
{
float a;
float b;
};
float arr[2];
};
which allows different access ways to the same data with the same type. Using different types which are stored in the same memory should be avoided at all!

Can I prevent breaking anti-aliasing rules using this technique?

If I recall correctly, it would be undefined behavior to write to FastKey::key and then read from FastKey::keyValue:
struct Key {
std::array<uint8_t, 6> MACAddress;
uint16_t EtherType;
};
union FastKey {
Key key;
uint64_t keyValue;
};
However, I have been told that if I add char array to the union then the UB is cleared:
union FastKey {
Key key;
uint64_t keyValue;
char fixUB[sizeof(Key)];
};
Is this true?
Edit
As usual my understanding was wrong. With the new information I gathered, I think that I can get the key as a uint64_t value like this:
struct Key {
std::array<uint8_t, 6> MACAddress;
uint16_t EtherType;
};
union FastKey {
Key key;
unsigned char data[sizeof(Key)];
};
inline uint64_t GetKeyValue(FastKey fastKey)
{
uint64_t key = 0;
key |= size_t(fastKey.data[0]) << 56;
key |= size_t(fastKey.data[1]) << 48;
key |= size_t(fastKey.data[2]) << 40;
key |= size_t(fastKey.data[3]) << 32;
key |= size_t(fastKey.data[4]) << 24;
key |= size_t(fastKey.data[5]) << 16;
key |= size_t(fastKey.data[6]) << 8;
key |= size_t(fastKey.data[7]) << 0;
return key;
}
I suspect that this will be equally fast as the original version. Feel free to correct me.
Update
#Steve Jessop I implemented a quick benchmark to test the performance of memcpy vs my solution. I'm not a benchmarking expert, so there may be stupid errors in the code the lead to wrong results. However, if the code is right then it would seem that memcpy is much slower.
Note: It seems the benchmark is wrong because the time to calculate the time for fast key is always zero. I'll see if I can fix it.
No, reading a uint64_t if you have a Key object there is still UB. What isn't UB is to read a char, because there's an exception for char in the aliasing rules. Adding the array doesn't propagate the exception to the other types.
The version in the edit seems fine (though I'd use unsigned char), but now it is more complex than just using a reinterpret_cast from Key* to unsigned char* or a memcpy.