How do bit fields interplay with bits padding in C++ - c++

See the C version of this questions here.
I have two questions concerning bit fields when there are padding bits.
Say I have a struct defined as
struct T {
unsigned int x: 1;
unsigned int y: 1;
};
Struct T only has two bits actually used.
Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
Question 2: Are those unused 30 bits always initialized to 0? What does the C++ standard say about it?

Question 1: are these two bits always the least significant bits of the underlying unsigned int? Or it is platform dependent?
Very platform dependent. The standard even has a note just to clarify how much:
[class.bit]
1 ...Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is
implementation-defined. Bit-fields are packed into some addressable
allocation unit. [ Note: Bit-fields straddle allocation units on some
machines and not on others. Bit-fields are assigned right-to-left on
some machines, left-to-right on others.  — end note ]
You can't assume much of anything about the object layout of a bit field.
Question 2: Are those unused 30 bits always initialized to 0? What does the C++ standard say about it?
Your example has a simple aggregate, so we can enumerate the possible initializations. Specifying no initializer...
T t;
... will default initialize it, leaving the members with indeterminate value. On the other hand, if you specify empty braces...
T t{};
... the object will be aggregate initialized, and so the bit fields will be initialized with {} themselves, and set to zero. But that applies only to the members of the aggregate, which are the bit fields. It's not specified what value, if any, the padding bits take. So we cannot assume they will be initialized to zero.

Q1: Usually from low to hi (i.e. x is 1 << 0, y is 1 << 1, etc).
Q2: The value of the unused bits is undefined. On some compilers/platforms, stack initialised variables might be set to zero first (might!!), but don't count on it!! Heap allocated variables could be anything, so it's best to assume the bits are garbage. Using a slightly non standard anonymous struct buried in a union, you could do something like this to ensure the value of the bits:
union T {
unsigned intval;
struct {
unsigned x : 1;
unsigned y : 1;
};
};
T foo;
foo.intval = 0;

Related

Do the padding bytes of a POD type get copied?

Suppose I have a POD type like this:
struct A {
char a;
int b;
};
On my system, sizeof(A) == 8, even though sizeof(char) == 1 and sizeof(b) == 4. This means that the data structure has 3 unused bytes.
Now suppose we do
A x = ...;
A y =x;
Question:
Is it guaranteed that all 8 bytes of x and y will be identical, even those 3 unused ones?
Equivalently, if I transfer the underlying bytes of some A objects to another program that does not understand their meaning or structure, and treats them as an array of 8 bytes, can that other program safely compare two As for equality?
Note: In an experiment with gcc 7, it appears that those bytes do get copied. I would like to know if this is guaranteed.
The implicitly-defined copy/move constructor for a non-union class X performs a memberwise copy/move
of its bases and members.
12.8/15 [class.copy] in N4141
The bit pattern in the padding bytes is thus allowed to differ.
It's not authoritative, but cppreference's entry for std::memcmp suggests that the padding bytes may differ:
memcmp() between two objects of type struct{char c; int n;} will compare the padding bytes whose values may differ when the values of c and n are the same
given that you asked about a POD type ( hence including unions ) it's worth mentioning that according to [class.copy]
The implicitly-defined copy/move constructor for a union X copies the object representation (6.9) of X
that for trivially copyable types should include padding bits as well.
So, it could be just a matter of replacing A with
union A{ struct {
char a;
int b;
}; };
(actually, the above uses a non standard anonymous struct, but you get the point ... )
Answering your second question:
Equivalently, if I transfer the underlying bytes of some A objects to another program that does not understand their meaning or structure, and treats them as an array of 8 bytes, can that other program safely compare two As for equality?
As an object of your type may contain padding bytes, another program generally can not compare two such objects for equality:
Knowing which bits of the bytes that make out the object semantically is the key for defining its value representation. However, in this scenario, the target program only knows the object representation, i.e. the sequence of bytes representing such an object in memory, including padding bytes. A function like memcmp can only compare such objects whose value representation is identical to its object representation in a meaningful way. If you use it to compare objects value-wise even though they have padding, it may fail to give the right results as it cannot tell which bits in the object representation are irrelevant for the value representations of two objects to be equal.
See http://en.cppreference.com/w/cpp/language/object

Storage of Union In Memory

I read that in Unions, the data members occupy the same block of memory. So, I tried to read off ASCII codes of the English Alphabet using this implementation.
union {
int i;
char a,b;
}eps;
eps.i=65;
cout<<eps.a<<eps.b;
I got the right output (A) for 65 but, both a and b seem to occupy the same place in the memory.
Q. But an integer being 2 bytes, shouldn't a have occupied the first 8 bits and b the other 8 ?
Also, while repeating the above with multiple integers inside the union, they seem to have the same value.
Q. So does that mean that every variable of a given data type acts like a reference for any other variable for the same data type? (Given simple adding on the variables int i,j,k,l.....)
Q. Can we only use one (distinct) variable of a given datatype in a union since all others point at the same location?
EDIT
I would like to mention that while adding on any more variables inside the union, it simply means adding them like int i,j,k... not using wrapping them inside struct or in any other way.
As Clarified by Baum mit in the chat (and comments), Here's the discussion for other/future users to see.
Reading a member of a union that is not the one you last wrote to is undefined behavior. This means that your code could do anything and arguing about its behavior is not meaningful.
To perform conversion between types, use the appropriate cast, not a union.
To answer your questions after the edit:
Q. But an integer being 2 bytes, shouldn't a have occupied the first 8 bits and b the other 8 ?
As you said, every member of the union shares the same space. Since a and b are different members, they share the same space too (in the sense that they both live somewhere in the space belonging to the union). The actual layout of the union might look like this:
byte 0 | byte 1 | byte 2 | byte 3
i i i i
a
b
Q. So does that mean that every variable of a given data type acts as a reference for any other variable for the same data type?
No, members of the same time do not act as references to one another. If your have a reference to an object, you can reliably access that object through the reference. Two members of the same type will probably use the exact same memory, but you cannot rely on that. The rule I stated above still applies.
Q. Can we only use one (distinct) variable of a given datatype in a union since all others point at the same location?
You can have as many members of the same type as you want. They might or might not live in the exact same memory. It does not matter because you can only access the last one written to anyways.
You have misunderstood what unions are for. They make no guarantees about sharing memory in any predictable way. They simply provide a way to say an entity could store one of several types of information. If you set one type, the others are undefined (could be anything, even something unrelated to the data you put in). How they share memory is up to the compiler, and could depend on optimisations which are enabled (e.g. memory alignment rules).
Having said all that, in most situations (and with optimisations disabled), you will find that each part of a union begins at byte 0 of the union (do not rely on this). In your code, union{int i;char a,b;} says "this union could be an integer i, or a char a, or a char b". You could use (as many have suggested), union{int i;struct{char a,b;}}, which would tell the compiler: "this union could be an integer i, or it could be a structure of characters a and b".
Casting from one type to another, or to its component bytes, therefore is not a job for unions. Instead you should use casts.
So where would you use a union? Here's an example:
struct {
int type; // maybe 0 = int, 1 = long, ...
union {
char c;
int i;
long l;
float f;
double d;
struct {
int x;
int y;
} pos;
// etc.
} value;
};
With an object like that, we can dynamically store numbers of any type (or whatever else we might want, like 2D position in this example), while keeping track of what's actually there using an external variable. It uses much less memory than the equivalent code would without a union, and makes setting/getting safe (we don't need to cast pointers all over the place)
Recall that an union type is a set of alternative possibilities. The formal wording is that it's the co-product of all the types its fields belong to.
union {
int i;
char a,b;
}
is syntactically equivalent to:
union {
int i;
char a;
char b;
}
a and b being of the same type, they don't contribute more together than each other taken individually. In other words, b is redundant.
You need to wrap the a and b fields in a struct to get them bundled as one alternative of the union.
union {
int i;
struct {
char a;
char b;
};
}
Furthermore, the int type is on most platforms a 32 bits wide integral type, and char a 8 bit wide integral type — I say usually, because the sizes are not formally defined more than just in terms of int being larger or equal to char.
So, assuming we have the usual definitions for char and int, the second alternative being 16 bit wide, the compiler has the opportunity to place it where it wants within the same space occupied by the larger field (32 bits).
Another issue is the byte ordering which could be different from one platform to the next.
You could perhaps get it to work (and in practice it almost always works) by padding the struct with the missing bytes to reach 32 bits:
union {
int i;
struct {
char a;
char b;
char c;
char d;
};
}
(think of a int representation of an IPv4 address for instance, and the htons function to cover the byte ordering issue).
The definitive rule however is dictated by the C language specifications, which don't specify that point.
To be on the safe side, rather than using an union, I would go for a set of functions to pull out bytes by bit masking, but if you are targeting a specific platform, and the above union works...

what does ":1" after a member variable mean? [duplicate]

What does the following C++ code mean?
unsigned char a : 1;
unsigned char b : 7;
I guess it creates two char a and b, and both of them should be one byte long, but I have no idea what the ": 1" and ": 7" part does.
The 1 and the 7 are bit sizes to limit the range of the values. They're typically found in structures and unions. For example, on some systems (depends on char width and packing rules, etc), the code:
typedef struct {
unsigned char a : 1;
unsigned char b : 7;
} tOneAndSevenBits;
creates an 8-bit value, one bit for a and 7 bits for b.
Typically used in C to access "compressed" values such as a 4-bit nybble which might be contained in the top half of an 8-bit char:
typedef struct {
unsigned char leftFour : 4;
unsigned char rightFour : 4;
} tTwoNybbles;
For the language lawyers amongst us, the 9.6 section of the C++11 standard explains this in detail, slightly paraphrased:
Bit-fields [class.bit]
A member-declarator of the form
identifieropt attribute-specifieropt : constant-expression
specifies a bit-field; its length is set off from the bit-field name by a colon. The optional attribute-specifier appertains to the entity being declared. The bit-field attribute is not part of the type of the class member.
The constant-expression shall be an integral constant expression with a value greater than or equal to zero. The value of the integral constant expression may be larger than the number of bits in the object representation of the bit-field’s type; in such cases the extra bits are used as padding bits and do not participate in the value representation of the bit-field.
Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.
Note: bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. - end note
I believe those would be bitfields.
Strictly speaking, a bitfield must be a int, unsigned int, or _Bool. Although most compilers will take any integral type.
Ref C11 6.7.2.1:
A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other
implementation-defined type.
Your compiler will probably allocate 1 byte of storage, but it is free to grab more.
Ref C11 6.7.2.1:
An implementation may allocate any addressable storage unit large
enough to hold a bit- field.
The savings comes when you have multiple bitfields that are declared one after another. In this case, the storage allocated will be packed if possible.
Ref C11 6.7.2.1:
If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined.

Assigning a value to bitfield with length 1

Suppose I have
struct A
{
signed char a:1;
unsigned char b:1;
};
If I have
A two, three;
two.a = 2; two.b = 2;
three.a = 3; three.b = 3;
two will contain 0s in its fields, while three will contain 1s. So, this makes me think, that assigning a number to a single-bit-field gets the least significant bit (2 is 10 in binary and 3 is 11).
So, my question is - is this correct and cross-platform? Or it depends on the machine, on the compiler, etc. Does the standard says anything about this, or it's completely implementation defined?
Note: The same result may be achieved by assigning 0 and 1, instead of 2 and 3 respectively. I used 2 and 3 just for illustrating my question, I wouldn't use it in a real-world situation
P.S. And, yes, I'm interesting in both - C and C++, please don't tell me they are different languages, because I know this :)
The rules in this case are no different than in case of full-width arithmetic. Bit-fields behave the same way as the corresponding full-size types, except that their width is limited by the value you specified in the bit-field declaration (6.7.2.1/9 in C99).
Assigning an overflowing value to a signed bit-field leads to implementation-defined behavior, which means that behavior you observe with bit-field a is generally not portable.
Assigning an overflowing value to an unsigned bit-field uses the rules of modulo arithmetic, meaning that the value is taken modulo 2^N, where N is the width of the bit-field. This means, for example, that assigning even numbers to your bit-field b will always produce value 0, while assigning odd numbers to such bit-field will always produce 1.

Is the size of a struct required to be an exact multiple of the alignment of that struct?

Once again, I'm questioning a longstanding belief.
Until today, I believed that the alignment of the following struct would normally be 4 and the size would normally be 5...
struct example
{
int m_Assume_32_Bits;
char m_Assume_8_Bit_Bytes;
};
Because of this assumption, I have data structure code that uses offsetof to determine the distance in bytes between two adjacent items in an array. Today, I spotted some old code that was using sizeof where it shouldn't, couldn't understand why I hadn't had bugs from it, coded up a unit test - and the test surprised me by passing.
A bit of investigation showed that the sizeof the type I used for the test (similar to the struct above) was an exact multiple of the alignment - ie 8 bytes. It had padding after the final member. Here is an example of why I never expected this...
struct example2
{
example m_Example;
char m_Why_Cant_This_Be_At_Offset_6_Bytes;
};
A bit of Googling showed examples that make it clear that this padding after the final member is allowed - for example http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding (the "or at the end of the structure" bit).
This is a bit embarrassing, as I recently posted this comment - Use of struct padding (my first comment to that answer).
What I can't seem to determine is whether this padding to an exact multiple of the alignment is guaranteed by the C++ standard, or whether it is just something that is permitted and that some (but maybe not all) compilers do.
So - is the size of a struct required to be an exact multiple of the alignment of that struct according to the C++ standard?
If the C standard makes different guarantees, I'm interested in that too, but the focus is on C++.
5.3.3/2
When applied to a class, the result [of sizeof] is the number of bytes in an object of that class, including any padding required for placing objects of that type in an array.
So yes, object size is a multiple of its alignment.
One definition of alignment size:
The alignment size of a struct is the offset from one element to the next element when you have an array of that struct.
By its nature, if you have an array of a struct with two elements, then both need to have aligned members, so that means that yes, the size has to be a multiple of the alignment. (I'm not sure if any standard explicitly enforce this, but because the size and alignment of a struct don't depend on whether the struct is alone or inside an array, the same rules apply to both, so it can't really be any other way.)
The standard says (section [dcl.array]:
An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
Therefore there is no padding between array elements.
Padding inside structures is not required by the standard, but the standard doesn't permit any other way of aligning array elements.
I am unsure if this is in the actual C/C++ standard, and I am inclined to say that it is up to the compiler (just to be on the safe side). However, I had a "fun" time figuring that out a few months ago, where I had to send dynamically generated C structs as byte arrays across a network as part of a protocol, to communicate with a chip. The alignment and size of all the structs had to be consistent with the structs in the code running on the chip, which was compiled with a variant of GCC for the MIPS architecture. I'll attempt to give the algorithm, and it should apply to all variants of gcc (and hopefully most other compilers).
All base types, like char, short and int align to their size, and they align to the next available position, regardless of the alignment of the parent. And to answer the original question, yes the total size is a multiple of the alignment.
// size 8
struct {
char A; //byte 0
char B; //byte 1
int C; //byte 4
};
Even though the alignment of the struct is 4 bytes, the chars are still packed as close as possible.
The alignment of a struct is equal to the largest alignment of its members.
Example:
//size 4, but alignment is 2!
struct foo {
char A; //byte 0
char B; //byte 1
short C; //byte 3
}
//size 6
struct bar {
char A; //byte 0
struct foo B; //byte 2
}
This also applies to unions, and in a curious way. The size of a union can be larger than any of the sizes of its members, simply due to alignment:
//size 3, alignment 1
struct foo {
char A; //byte 0
char B; //byte 1
char C; //byte 2
};
//size 2, alignment 2
struct bar {
short A; //byte 0
};
//size 4! alignment 2
union foobar {
struct foo A;
struct bar B;
}
Using these simple rules, you should be able to figure out the alignment/size of any horribly nested union/struct you come across. This is all from memory, so if I have missed a corner case that can't be decided from these rules please let me know!
C++ doesn't explicitly says so, but it is a consequence of two other requirements:
First, all objects must be well-aligned.
3.8/1 says
The lifetime of an object of type T begins when [...] storage with the proper alignment and size for type T is obtained
and 3.9/5:
Object types have *alignnment requirements (3.9.1, 3.9.2). The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.
So every object must be aligned according to its alignment requirements.
The other requirement is that objects in an array are allocated contigulously:
8.3.4/1:
An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.
For the objects in an array to be contiguously allocated, there can be no padding between them. But for every object in the array to be properly aligned, each individual object must be padded so that the byte immediately after the end of the object is also well aligned. In other words, the size of the object must be a multiple of its alignment.
So to split your question up into two:
1. Is it legal?
[5.3.3.2] When applied to a class, the result [of the sizeof() operator] is the number of bytes in an object of that class including any padding required for placing objects of that type in an array.
So, no, it's not.
2. Well, why isn't it?
Here, I cna only speculate.
2.1. Pointer arithmetics get weirder
If alignment would be "between array elements" but would not affect the size, zthigns would get needlessly complicated, e.g.
(char *)(X+1) != ((char *)X) + sizeof(X)
(I have a hunch that this is required implicitely by the standard even without above statement, but I can't put it to proof)
2.2 Simplicity
If alignment affects size, alignment and size can be decided by looking at a single type. Consider this:
struct A { int x; char y; }
struct B { A left, right; }
With the current standard, I just need to know sizeof(A) to determine size and layout of B.
With the alternate you suggest I need to know the internals of A. Similar to your example2: for a "better packing", sizeof(example) is not enough, you need to consider the internals of example.
It is possible to produce a C or C++ typedef whose alignment is not a multiple of its size. This came up recently in this bindgen bug. Here's a minimal example, which I'll call test.c below:
#include <stdio.h>
#include <stdalign.h>
__attribute__ ((aligned(4))) typedef struct {
char x[3];
} WeirdType;
int main() {
printf("sizeof(WeirdType) = %ld\n", sizeof(WeirdType));
printf("alignof(WeirdType) = %ld\n", alignof(WeirdType));
return 0;
}
On my Arch Linux x86_64 machine, gcc -dumpversion && gcc test.c && ./a.out prints:
9.3.0
sizeof(WeirdType) = 3
alignof(WeirdType) = 4
Similarly clang -dumpversion && clang test.c && ./a.out prints:
9.0.1
sizeof(WeirdType) = 3
alignof(WeirdType) = 4
Saving the file as test.cc and using g++/clang++ gives the same result. (Update from a couple years later: I get the same results from GCC 11.1.0 and Clang 13.0.0.)
Notably however, MSVC on Windows does not seem to reproduce any behavior like this.
The standard says very little about padding and alignment. Very little is guaranteed. About the only thing you can bet on is that the first element is at the beginning of the structure. After that...alignment and padding can be anything.
Seems the C++03 standard didn't say (or I didn't find) whether the alignment padding bytes should be included in the object representation.
And the C99 standard says the "sizeof" a struct type or union type includes internal and trailing padding, but I'm not sure if all alignment padding is included in that "trailing padding".
Now back to your example. There is really no confusion. sizeof(example) == 8 means the structure does take 8 bytes to represent itself, including the tailing 3 padding bytes. If the char in the second structure has an offset of 6, it will overwrite the space used by m_Example. The layout of a certain type is implementation-defined, and should be kept stable in the whole implementation.
Still, whether p+1 equals (T*)((char*)p + sizeof(T)) is unsure. And I'm hoping to find the answer.