In the following code, can the value of int be predicted ( how ? ), or it is just the garbage ?
union a
{
int i;
char ch[2];
};
a u;
u.ch[0] = 0;
u.ch[1] = 0;
cout<<u.i;
}
I would say that depends on the size of int and char. A union contains the memory of the largest variable. If int is 4 bytes and char[2] represents 2 bytes, the int consumes more memory than the char-array, so you are not initialising the full int-memory to 0 by setting all char-variables. It depends on your memory initialization mechanisms but basically the value of the int will appear to be random as the extra 2 bytes are filled with unspecified values.
Besides, filling one variable of a union and reading another is exactly what makes unions unsafe in my oppinion.
If you are sure that int is the largest datatype, you can initialize the whole union by writing
union a
{
int i;
char ch[2];
};
void foo()
{
a u = { 0 }; // Initializes the first field in the union
cout << u.i;
}
Therefore it may be a good idea to place the largest type at the beginning of the union. Althugh that doesn't garantuee that all datatypes can be considered zero or empty when all bits are set to 0.
Related
void Manager::byteArrayToDoubleArray(byte ch[]) {
int counter = 0;
// temp array to break the byte array into size of 8 and read it
byte temp[64];
// double result values
double res[8];
int index = 0;
int size = (sizeof(ch) / sizeof(*ch));
for (int i = 0; i < size; i++) {
counter++;
temp[i] = ch[i];
if (counter % 8 == 0) {
res[index] = *reinterpret_cast<double * const>(temp);
index++;
counter = 0;
}
}
}
Here result would be a list of double values with count = 8.
Your problem is two things. You have some typos and misunderstanding. And the C++ standard is somewhat broken in this area.
I'll try to fix both.
First, a helper function called laundry_pods. It takes raw memory and "launders" it into an array of a type of your choice, so long as you pick a pod type:
template<class T, std::size_t N>
T* laundry_pods( void* ptr ) {
static_assert( std::is_pod<std::remove_cv_t<T>>{} );
char optimized_away[sizeof(T)*N];
std::memcpy( optimized_away, ptr , sizeof(T)*N );
T* r = ::new( ptr ) T[N];
assert( r == ptr );
std::memcpy( r, optimized_away, sizeof(T)*N );
return r;
}
now simply do
void Manager::byteArrayToDoubleArray(byte ch[]) {
double* pdouble = laundry_pods<double, 8>(ch);
}
and pdouble is a pointer to memory of ch interpreted as an array of 8 doubles. (It is not a copy of it, it interprets those bytes in-place).
While laundry_pods appears to copy the bytes around, both g++ and clang optimize it down into a binary noop. The seeming copying of bytes around is a way to get around aliasing restrictions and object lifetime rules in the C++ standard.
It relies on arrays of pod not having extra bookkeeping overhead (which C++ implementations are free to do; none do that I know of. That is what the non-static assert double-checks), but it returns a pointer to a real honest to goodness array of double. If you want to avoid that assumption, you could instead create each doulbe as a separate object. However, then they aren't an array, and pointer arithmetic over non-arrays is fraught as far as the standard is concerned.
The use of the term "launder" has to do with getting around aliasing and object lifetime requirements. The function does nothing at runtime, but in the C++ abstract machine it takes the memory and converts it into binary identical memory that is now a bunch of doubles.
The trick of doing this kind of "conversion" is to always cast the double* to a char* (or unsigned char or std::byte). Never the other way round.
You should be able to do something like this:
void byteArrayToDoubleArray(byte* in, std::size_t n, double* out)
{
for(auto out_bytes = (byte*) out; n--;)
*out_bytes++ = *in++;
}
// ...
byte ch[64];
// .. fill ch with double data somehow
double res[8];
byteArrayToDoubleArray(ch, 64, res);
Assuming that type byte is an alias of char or unsigned char or std::byte.
I am not completly sure what you are trying to achieve here because of the code (sizeof(ch) / sizeof(*ch)) which does not make sense for an array of undefined size.
If you have a byte-Array (POD data type; something like a typedef char byte;) then this most simple solution would be a reinterpret_cast:
double *result = reinterpret_cast<double*>(ch);
This allows you to use result[0]..result[7] as long as ch[] is valid and contains at least 64 bytes. Be aware that this construct does not generate code. It tells the compiler that result[0] corresponds to ch[0..7] and so on. An access to result[] will result in an access to ch[].
But you have to know the number of elements in ch[] to calculate the number of valid double elements in result.
If you need a copy (because - for example - the ch[] is a temporary array) you could use
std::vector<double> result(reinterpret_cast<double*>(ch), reinterpret_cast<double*>(ch) + itemsInCh * sizeof(*ch) / sizeof(double));
So if ch[] is an array with 64 items and a byte is really an 8-bit value, then
std::vector<double> result(reinterpret_cast<double*>(ch), reinterpet_cast<double*>(ch) + 8);
will provide a std::vector containing 8 double values.
There is another possible method using a union:
union ByteToDouble
{
byte b[64];
double d[8];
} byteToDouble;
the 8 double values will occupie the same memory as the 64 byte values. So you can write the byte values to byteToDouble.b[] and read the resultingdouble values from byteToDouble.d[].
I have something of this nature:
SomeClass {
public:
union {
int m_256[256];
int m_16[16][16];
int m_4[4][4][4][4];
}
SomeClass() {
// Initialize Array to some default value
for ( unsigned u = 0; u < 256; u++ ) {
m_256[u] = 0;
}
}
};
With the understanding of unions the for loop within the constructor will initialize m_256 to all 0s and the other 2 arrays are just another version or alias of it, so those arrays should be initialized as well since the memory size is exactly the same and the memory is shared.
I'd prefer not to go through a for loop to initialize all values of the array to some default value. Also since there are 3 arrays within this union, you can only have 1 non static member with an initialize list within a union. So this is valid.
union {
int m_256[256]{};
int m_16[16][16];
int m_4[4][4][4][4];
}
Instead of using the for loop within the constructor or typing the same number manually 256 times, is there a short hand method that will initialize all values within the array to the same initial value?
EDIT
Based on a comment from user657267 he said:
Keep in mind that technically you can't write to one union member and read from another
Consider this: without making any changes to the class above and simply adding this operator overload:
std::ostream& operator<<( std::ostream& out, const SomeClass& s ) {
out << std::endl;
out << "{Box, Slice, Row, Column}\n";
for (unsigned box = 0; box < 4; box++) {
for (unsigned slice = 0; slice < 4; slice++) {
for (unsigned row = 0; row < 4; row++) {
for (unsigned col = 0; col < 4; col++) {
out << "(" << box << "," << slice << "," << row << "," << col << ") = "
<< s.m_4[box][slice][row][col] << std::endl;
}
}
}
}
return out;
} // operator<<
Now in the main function we can do this.
int main() {
SomeClass s;
// Initialize This Array To Have Each Value Incremented From 0 - 255
for ( unsigned u = 0; u < 256; u++ ) {
s.m_256[u] = u;
}
// Print Out Our Array That Is In The Union Using The Overloaded Operator.
// Note The Overloaded Operator Is Using The declaration of m_p4 and not m_p256
std::cout << s << std::endl;
// Now We Know That If You Extract A Value From Any Given Index It Will Return That Value.
// So Lets Pull Out Two Random Values From Using The Other Two Members of the Union.
int A = s.m_4[0][2][1][3];
int B = s.m_16[12][9];
// Now Print Out A & B
std::cout << A << ", " << B << std::endl;
return 0;
}
Other than the printed array table the last two values are:
39, 201
Now if we scroll through the table and look for (0,2,1,3) the value is 39 and
to test out if 201 is correct; we used [12][9]. If you are using a double for loop to index a flat array the indexing is equal to (i * num_j + j ) so, knowing that the 2D Array Version of this 1D or 4D array is [16][16] in size we can calculate this value mathematically: 12 * 16 + 9 = 201.
Within my specific case doesn't this invalidate his statement? In c++ are unions not the sharing of memory between two variables, and if you have the same data type such as int & int doesn't this make them alias of one another? I was surely able to initialize the 1D flat array, use the 4D version to print a table, and was able to extract values from both the 2D & 4D versions.
Edit
I know what others are saying about you can't write to one and access another technically because of a case as such:
union foo {
int x;
char y;
};
Here is an excellent answer to a different question on unions Difference Between Structs & Unions. However, in my case here, the data type and the size in memory for all three arrays are the same. Also, if one value is changed in one I am expecting it to change in another. These are just different ways to access the same memory. Eventually this "nameless union" will be a private member within my class, and there will be public methods to retrieve a const reference to the full array, to retrieve the contents from the arrays by index values from each of the 3 types, and to place contents into the array by any of the three types.
All the boiler plate will be done behind the scenes in private. So I do not see this as in issue, and the behavior here is what I am after. It is not as if I'm creating an instance of a union where it is mixed types. I'm not doing bit fields with them. I'm basically conserving memory so I don't have to have 3 arrays that are 256 x (size of element) then have to have a bunch of copying from one array to another array semantics, this property of how unions work, is what I'm actually after. It is 1/3 the amount of memory used by a single instance of the class in use, probably 20% faster and more efficient without having to keep 3 different independent arrays synchronized every time an element is added or removed, or moved within the array.
As structs are defined: They will allocate enough memory for every data type and instance within the struct. You can set each data type independently
As unions are defined: They will allocate enough memory for the largest data type within the union. If you set one data type, it will change the other data type.
In the context of my class, the union is nameless so you can not create an instance of it, however you can access any of its members. In the case here with these three arrays:
byte a[256]; // 256 bytes
byte b[16][16]; // 16 x 16 = 256 bytes
byte c[4][4][4][4]; // 4^4 = 256 bytes
They are exactly the same size; and in my all three of these are the same array in my class. This is used to access the data in different ways.
The loop is certainly an overkill:
std::fill(std::begin(m_256), std::end(m_256), 42); // fills with 42
Other than that, no built-in language construct; it'd amount to pretty much the same as the above, though.
I've done some research and I cant quite find what I'm looking for on here or google. Is there a way to access the elements in a Customer by address (and not by using customer[i].bottles). I cannot modify the struct so I cannot put the properties into an array.
typedef struct Customer {
int id;
int bottles;
int diapers;
int rattles;
} Customer;
Customer customers[100];
void setValue(int custInd, int propertyInd) {
//propertyInd would be 1 for id, 2 for bottles
//Attempting to set customers[0].bottles
*(&customers[custInd]+propertyInd) = 5;
}
I thought I'd be able to do this but I got various errors. Knowing that the "bottles" value will be the second space in memory from the address of a Customer shouldn't i be able to directly set the spot.
I know this may be improper code but I would like to understand how and why does/doesnt work. I also promise I have reasons for attempting to do this over the conventional way hah
Instead of using propertyInd, perhaps pass an offset into the structure. That way, the code will work even if the layout changes dramatically (for example, if it includes non-int fields at the beginning).
Here's how you might do it:
void setValue(int custInd, int fieldOffset) {
int *ptr = (int *)((char *)&customers[custInd] + fieldOffset);
*ptr = 5;
}
...
setValue(custInd, offsetof(Customer, bottles));
offsetof is a standardized macro that returns the offset, in bytes, from the start of the structure to the given element.
If you still want to use indices, you can compute the offset as propertyInd * sizeof(int), assuming every field in the struct is an int.
You can't do this:
*(&customers[custInd]+propertyInd) = 5;
because the type of &customers[custInd] is struct Customer*, not int *. So &customers[custInd]+propertyInd means the same thing as &customers + custInd + propertyInd or, in other words, &customers[custInd + propertyInd]. The assignment then attempts to set a structure value to the integer 5, which is obviously illegal.
What I suppose you meant was
((int*)&customers[custInd])[propertyInd] = 5;
which would compile fine, and would probably work[*], but is undefined behaviour because you cannot assume that just because a struct consists of four ints, that it is laid-out in memory the same way as int[4] would be. It may seem reasonable and even logical that they layout be the same, but the standard doesn't require it, so that's that. Sorry.
As #iharob suggests in a comment, you might find a compiler clever enough to generate efficient code from the following verbiage:
void setValue(int custInd, int propertyInd, int value) {
//propertyInd would be 1 for id, 2 for bottles
switch (propertyInd) {
case 1: customers[custInd].id = value; break;
case 2: customers[custInd].bottles = value; break;
case 3: customers[custInd].diapers = value; break;
case 4: customers[custInd].rattles = value; break;
default: assert(0);
}
}
*: Actually, it would (probably) work if propertyInd for id were 0, not 1. C array indices start at 0.
&customers[custInd] is a pointer to customers[custInd], so &customers[custInd]+propertyInd is a pointer to customers[custInd+propertyInd]. It is not a pointer to a member. It will have type pointer to Customer. The value of that pointer will be equal to &(customers[custInd+propertyInd].id), but is not a pointer to int - hence the compiler error.
Your bigger problem is that four int in a struct are not necessarily laid out like an array of int - there may be padding between struct members. So, if we do
int *p = &(customers[custInd].id);
then p+1 is not necessarily equal to &(customers[custInd].bottles).
So you will need to do something like
void setValue(int custInd, int Offset)
{
int *ptr = (int *)(((char *)&customers[custInd]) + Offset);
*ptr = 5;
}
/* and to call it to set customers[custInd].bottles to 5 */
setValue(custInd, offsetof(Customer, bottles));
if i have a struct , say:
struct A {
int a,b,c,d,e;
}
A m;//struct if 5 ints
int n[5];//array of 5 ints.
i know that elements in the array are stored one after other so we can use *(n+i) or n[i]
But in case of struct ,is each element is stored next to each other (in the struct A)?
The compiler may insert padding as it wishes, except before the first item.
In C++03 you were guaranteed increasing addresses of items between access specifiers.
I'm not sure if the access specifier restriction is still there in C++11.
The only thing that is granted is that members are stored in the same order.
Between elements there can be some "padding" the compiler may insert so that each value is aligned with the processor word length.
Different compiler can make different choices also depending on the target platform and can be forced to keep a given alignment by option switches or pragma-s.
Your particular case is "luky" for the most of compiler since int is normally implemented as "the integral that better fits the integer arithmetic of the processor". With this idea, a sequence of int-s is aligned by definition. But that may not be the case, for example if you have
struct test
{
char a;
short b;
long c;
long long d;
};
You can dscovery that (&a)+1 != &b and (&b)+1 != &c or (&b)-1 != &a etc.
What is granted is the progression &a < &b; &b < &c; &c < &d;
Structs members in general are stored in increasing addresses but they are not guaranteed to be contiguous.so elements may not always be contiguous. In the example above, given $base is the base address of the struct the layout will be the following.
a will be stored at $base+0
b will be stored at $base+4
c will be stored at $base+8 ... etc
You can see the typical alignment values at http://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86
I have written simple program to show strut elements are next to each other
int main() {
struct S {
int a;
int b;
int c;
};
S s = {1,2,3};
int* p = reinterpret_cast <int *> (&s);
cout<<p[0]<<" "<<p[1]<<" "<<p[2];
return 0;
}
Output : 1,2,3
Remember, [] or *(i+1) are symantic construct, that suits with pointers, not with struct variables directly.
As suggested in Cheers and hth. - Alf's answer, there can be padding, before or after struct elements.
I've come to work on an ongoing project where some unions are defined as follows:
/* header.h */
typedef union my_union_t {
float data[4];
struct {
float varA;
float varB;
float varC;
float varD;
};
} my_union;
If I understand well, unions are for saving space, so sizeof(my_union_t) = MAX of the variables in it. What are the advantages of using the statement above instead of this one:
typedef struct my_struct {
float varA;
float varB;
float varC;
float varD;
};
Won't be the space allocated for both of them the same?
And how can I initialize varA,varB... from my_union?
Unions are often used when implementing a variant like object (a type field and a union of data types), or in implementing serialisation.
The way you are using a union is a recipe for disaster.
You are assuming the the struct in the union is packing the floats with no gaps between then!
The standard guarantees that float data[4]; is contiguous, but not the structure elements. The only other thing you know is that the address of varA; is the same as the address of data[0].
Never use a union in this way.
As for your question: "And how can I initialize varA,varB... from my_union?". The answer is, access the structure members in the normal long-winded way not via the data[] array.
Union are not mostly for saving space, but to implement sum types (for that, you'll put the union in some struct or class having also a discriminating field which would keep the run-time tag). Also, I suggest you to use a recent standard of C++, at least C++11 since it has better support of unions (e.g. permits more easily union of objects and their construction or initialization).
The advantage of using your union is to be able to index the n-th floating point (with 0 <= n <= 3) as u.data[n]
To assign a union field in some variable declared my_union u; just code e.g. u.varB = 3.14; which in your case has the same effect as u.data[1] = 3.14;
A good example of well deserved union is a mutable object which can hold either an int or a string (you could not use derived classes in that case):
class IntOrString {
bool isint;
union {
int num; // when isint is true
str::string str; // when isint is false
};
public:
IntOrString(int n=0) : isint(true), num(n) {};
IntOrString(std::string s) : isint(false), str(s) {};
IntOrString(const IntOrString& o): isint(o.isint)
{ if (isint) num = o.num; else str = o.str); };
IntOrString(IntOrString&&p) : isint(p.isint)
{ if (isint) num = std::move (p.num);
else str = std::move (p.str); };
~IntOrString() { if (isint) num=0; else str->~std::string(); };
void set (int n)
{ if (!isint) str->~std::string(); isint=true; num=n; };
void set (std::string s) { str = s; isint=false; };
bool is_int() const { return isint; };
int as_int() const { return (isint?num:0; };
const std::string as_string() const { return (isint?"":str;};
};
Notice the explicit calls of destructor of str field. Notice also that you can safely use IntOrString in a standard container (std::vector<IntOrString>)
See also std::optional in future versions of C++ (which conceptually is a tagged union with void)
BTW, in Ocaml, you simply code:
type intorstring = Integer of int | String of string;;
and you'll use pattern matching. If you wanted to make that mutable, you'll need to make a record or a reference of it.
You'll better use union-s in a C++ idiomatic way (see this for general advices).
I think the best way to understand unions is to just to give 2 common practical examples.
The first example is working with images. Imagine you have and RGB image that is arranged in a long buffer.
What most people would do, is represent the buffer as a char* and then loop it by 3's to get the R,G,B.
What you could do instead, is make a little union, and use that to loop over the image buffer:
union RGB
{
char raw[3];
struct
{
char R;
char G;
char B;
} colors;
}
RGB* pixel = buffer[0];
///pixel.colors.R == The red color in the first pixel.
Another very useful use for unions is using registers and bitfields.
Lets say you have a 32 bit value, that represents some HW register, or something.
Sometimes, to save space, you can split the 32 bits into bit fields, but you also want the whole representation of that register as a 32 bit type.
This obviously saves bit shift calculation that a lot of programmers use for no reason at all.
union MySpecialRegister
{
uint32_t register;
struct
{
unsigned int firstField : 5;
unsigned int somethingInTheMiddle : 25;
unsigned int lastField : 6;
} data;
}
// Now you can read the raw register into the register field
// then you can read the fields using the inner data struct
The advantage is that with a union you can access the same memory in two different ways.
In your example the union contains four floats. You can access those floats as varA, varB... which might be more descriptive names or you can access the same variables as an array data[0], data[1]... which might be more useful in loops.
With a union you can also use the same memory for different kinds of data, you might find that useful for things like writing a function to tell you if you are on a big endian or little endian CPU.
No, it is not for saving space. It is for ability to represent some binary data as various data types.
for example
#include <iostream>
#include <stdint.h>
union Foo{
int x;
struct y
{
unsigned char b0, b1, b2, b3;
};
char z[sizeof(int)];
};
int main()
{
Foo bar;
bar.x = 100;
std::cout << std::hex; // to show number in hexadec repr;
for(size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)bar.z[i] << " "; // int is just to show values as numbers, not a characters
}
return 0;
}
output: 0x64 0x0 0x0 0x0 The same values are stored in struct bar.y, but not in array but in sturcture members. Its because my machine have a little endiannes. If it were big, than the output would be reversed: 0x0 0x0 0x0 0x64
You can achieve the same using reinterpret_cast:
#include <iostream>
#include <stdint.h>
int main()
{
int x = 100;
char * xBytes = reinterpret_cast<char*>(&x);
std::cout << std::hex; // to show number in hexadec repr;
for (size_t i = 0; i < sizeof(int); i++)
{
std::cout << "0x" << (int)xBytes[i] << " "; // (int) is just to show values as numbers, not a characters
}
return 0;
}
its usefull, for example, when you need to read some binary file, that was written on a machine with different endianess than yours. You can just access values as bytearray and swap those bytes as you wish.
Also, it is usefull when you have to deal with bit fields, but its a whole different story :)
First of all: Avoid unions where the access goes to the same memory but to different types!
Unions did not save space at all. The only define multiple names on the same memory area! And you can only store one of the elements in one time in a union.
if you have
union X
{
int x;
char y[4];
};
you can store an int OR 4 chars but not both! The general problem is, that nobody knows which data is actually stored in a union. If you store a int and read the chars, the compiler will not check that and also there is no runtime check. A solution is often to provide an additional data element in a struct to a union which contains the actual stored data type as an enum.
struct Y
{
enum { IS_CHAR, IS_INT } tinfo;
union
{
int x;
char y[4];
};
}
But in c++ you always should use classes or structs which can derive from a maybe empty parent class like this:
class Base
{
};
class Int_Type: public Base
{
...
int x;
};
class Char_Type: public Base
{
...
char y[4];
};
So you can device pointers to base which actually can hold a Int or a Char Type for you. With virtual functions you can access the members in a object oriented way of programming.
As mentioned already from Basile's answer, a useful case can be the access via different names to the same type.
union X
{
struct data
{
float a;
float b;
};
float arr[2];
};
which allows different access ways to the same data with the same type. Using different types which are stored in the same memory should be avoided at all!