what does ":1" after a member variable mean? [duplicate]

what does ":1" after a member variable mean? [duplicate] - c++

What does the following C++ code mean?
unsigned char a : 1;
unsigned char b : 7;
I guess it creates two char a and b, and both of them should be one byte long, but I have no idea what the ": 1" and ": 7" part does.

The 1 and the 7 are bit sizes to limit the range of the values. They're typically found in structures and unions. For example, on some systems (depends on char width and packing rules, etc), the code:
typedef struct {
unsigned char a : 1;
unsigned char b : 7;
} tOneAndSevenBits;
creates an 8-bit value, one bit for a and 7 bits for b.
Typically used in C to access "compressed" values such as a 4-bit nybble which might be contained in the top half of an 8-bit char:
typedef struct {
unsigned char leftFour : 4;
unsigned char rightFour : 4;
} tTwoNybbles;
For the language lawyers amongst us, the 9.6 section of the C++11 standard explains this in detail, slightly paraphrased:
Bit-fields [class.bit]
A member-declarator of the form
identifieropt attribute-specifieropt : constant-expression
specifies a bit-field; its length is set off from the bit-field name by a colon. The optional attribute-specifier appertains to the entity being declared. The bit-field attribute is not part of the type of the class member.
The constant-expression shall be an integral constant expression with a value greater than or equal to zero. The value of the integral constant expression may be larger than the number of bits in the object representation of the bit-field’s type; in such cases the extra bits are used as padding bits and do not participate in the value representation of the bit-field.
Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.
Note: bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. - end note

I believe those would be bitfields.

Strictly speaking, a bitfield must be a int, unsigned int, or _Bool. Although most compilers will take any integral type.
Ref C11 6.7.2.1:
A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other
implementation-defined type.
Your compiler will probably allocate 1 byte of storage, but it is free to grab more.
Ref C11 6.7.2.1:
An implementation may allocate any addressable storage unit large
enough to hold a bit- field.
The savings comes when you have multiple bitfields that are declared one after another. In this case, the storage allocated will be packed if possible.
Ref C11 6.7.2.1:
If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined.

Related

Does std::memcpy make its destination determinate?

Here is the code:
unsigned int a; // a is indeterminate
unsigned long long b = 1; // b is initialized to 1
std::memcpy(&a, &b, sizeof(unsigned int));
unsigned int c = a; // Is this not undefined behavior? (Implementation-defined behavior?)
Is a guaranteed by the standard to be a determinate value where we access it to initialize c? Cppreference says:
void* memcpy( void* dest, const void* src, std::size_t count );
Copies count bytes from the object pointed to by src to the object pointed to by dest. Both objects are reinterpreted as arrays of unsigned char.
But I don't see anywhere in cppreference that says if an indeterminate value is "copied to" like this, it becomes determinate.
From the standard, it seems it's analogous to this:
unsigned int a; // a is indeterminate
unsigned long long b = 1; // b is initialized to 1
auto* a_ptr = reinterpret_cast<unsigned char*>(&a);
auto* b_ptr = reinterpret_cast<unsigned char*>(&b);
a_ptr[0] = b_ptr[0];
a_ptr[1] = b_ptr[1];
a_ptr[2] = b_ptr[2];
a_ptr[3] = b_ptr[3];
unsigned int c = a; // Is this undefined behavior? (Implementation defined behavior?)
It seems like the standard leaves room for this to be allowed, because the type aliasing rules allow for the object a to be accessed as an unsigned char this way. But I can't find something that says this makes a no longer indeterminate.

Is this not undefined behavior
It's UB, because you're copying into the wrong type. [basic.types]2 and 3 permit byte copying, but only between objects of the same type. You copied from a long long into an int. That has nothing to do with the value being indeterminate. Even though you're only copying sizeof(int) bytes, the fact that you're not copying from an actual int means that you don't get the protection of those rules.
If you were copying into the value of the same type, then [basic.types]3 says that it's equivalent to simply assigning them. That is, a " shall subsequently hold the same value as" b.

TL;DR: It's implementation-defined whether there will be undefined behavior or not. Proof-style, with lines of code numbered:
unsigned int a;
The variable a is assumed to have automatic storage duration. Its lifetime begins (6.6.3/1). Since it is not a class, its lifetime begins with default initialization, in which no other initialization is performed (9.3/7.3).
unsigned long long b = 1ull;
The variable b is assumed to have automatic storage duration. Its lifetime begins (6.6.3/1). Since it is not a class, its lifetime begins with copy-initialization (9.3/15).
std::memcpy(&a, &b, sizeof(unsigned int));
Per 16.2/2, std::memcpy should have the same semantics and preconditions as the C standard library's memcpy. In the C standard 7.21.2.1, assuming sizeof(unsigned int) == 4, 4 characters are copied from the object pointed to by &b into the object pointed to by &a. (These two points are what is missing from other answers.)
At this point, the sizes of unsigned int, unsigned long long, their representations (e.g. endianness), and the size of a character are all implementation defined (to my understanding, see 6.7.1/4 and its note saying that ISO C 5.2.4.2.1 applies). I will assume that the implementation is little-endian, unsigned int is 32 bits, unsigned long long is 64 bits, and a character is 8 bits.
Now that I have said what the implementation is, I know that a has a value-representation for an unsigned int of 1u. Nothing, so far, has been undefined behavior.
unsigned int c = a;
Now we access a. Then, 6.7/4 says that
For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.
I know now that the value of a is determined by the implementation-defined value bits in a, which I know hold the value-representation for 1u. The value of a is then 1u.
Then like (2), the variable c is copy-initialized to 1u.
We made use of implementation-defined values to find what happens. It is possible that the implementation-defined value of 1ull is not one of the implementation-defined set of values for unsigned int. In that case, accessing a will be undefined behavior, because the standard doesn't say what happens when you access a variable with a value-representation that is invalid.
AFAIK, we can take advantage of the fact that most implementations define an unsigned int where any possible bit pattern is a valid value-representation. Therefore, there will be no undefined behavior.

Note: I updated this answer since by exploring the issue further in some of the comments has reveled cases where it would be implementation defined or even undefined in a case I did not consider originally (specifically in C++17 as well).
I believe that this is either implementation defined behavior in some cases and undefined in others (as another answer came to conclude for similar reasons). In a sense it's implementation defined if it's undefined behavior or implementation defined, so I am not sure if it being undefined in general takes precedence in such a classification.
Since std::memcpy works entirely on the object representation of the types in question (by aliasing the pointers given to unsigned char as is specified by 6.10/8.8 [basic.lval]). If the bits within the bytes in question of the unsigned long long are guaranteed to be something specific then you can manipulate them however you wish or write them into the object representation of any other type. The destination type will then use the bits to form its value based on its value representation (whatever that may be) as is defined in 6.9/4 [basic.types]:
The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For trivially copyable types, the value
representation is a set of bits in the object representation that
determines a value, which is one discrete element of an
implementation-defined set of values.
And that:
The intent is that the memory model of C++ is compatible with that of
ISO/IEC 9899 Programming Language C.
Knowing this, all that matters now is what the object representation of the integer types in question are. According to 6.9.1/7 [basic.fundemental]:
Types bool, char, char16_t, char32_t, wchar_t, and the signed and
unsigned integer types are collectively called integral types. A
synonym for integral type is integer type. The representations of
integral types shall define values by use of a pure binary numeration
system. [Example: This International Standard permits two’s
complement, ones’ complement and signed magnitude representations for
integral types. — end example ]
A footnote does clarify the definition of "binary numeration system" however:
A positional representation for integers that uses the binary digits 0
and 1, in which the values represented by successive bits are
additive, begin with 1, and are multiplied by successive integral
power of 2, except perhaps for the bit with the highest position.
(Adapted from the American National Dictionary for Information
Processing Systems.)
We also know that unsigned integers have the same value representation as signed integers, just under a modulus according to 6.9.1/4 [basic.fundemental]:
Unsigned integers shall obey the laws of arithmetic modulo 2^n where n
is the number of bits in the value representation of that particular
size of integer.
While this does not say exactly what the value representation may be, based on the specified definition of a binary numeration system, successive bits are to be additive powers of two as expected (rather than allowing the bits to be in any given order), with the exception of the maybe present sign bit. Additionally since signed and unsigned value representations this means an unsigned integer will stored as an increasing binary sequence up until 2^(n-1) (past then depending on how signed number are handled things are implementation defined).
There are still some other considerations however, such as endianness and how many padding bits may be present due to sizeof(T) only measuring the size of the object representation rather than the value representation (as stated before). Since in C++17 there is no standard way (I think) to check for endianness, this is the main factor that would leave this to be implementation defined in what the outcome would be. As for padding bits, while they may be present (but not specified where they will be from what I can tell other than the implication that they will not interrupt the contiguous sequence of bits forming the value representation of a integer), writing to them can prove potentially problematic. Since the intent of the C++ memory model is based on the C99 standard's memory model in a "comparable" way, a footnote from 6.2.6.2 (which is referenced in the C++20 standard as a note to remind that it's based on that) can be taken which say as follows:
Some combinations of padding bits might generate trap representations,
for example, if one padding bit is a parity bit. Regardless, no
arithmetic operation on valid values can generate a trap
representation other than as part of an exceptional condition such as
an overflow, and this cannot occur with unsigned types. All other
combinations of padding bits are alternative object representations of
the value specified by the value bits.
This implies that writing directly to padding bits incorrectly could potentially generate a trap representation from what I can tell.
This shows that in some cases depending on if padding bits are present and endianness, the result can be influenced in an implementation-defined manner. If some combination of padding bits is also a trap representation, this may become undefined behavior.
While not possible in C++17, in C++20 one can use std::endian in conjunction with std::has_unique_object_representations<T> (which was present in C++17) or some math with CHAR_BIT, UINT_MAX/ULLONG_MAX and the sizeof those types to ensure the expected endianness is correct as well as the absence of padding bits, allowing this to actually produce the expected result in a defined manner given what was previously established with how integers are said to be stored. Of course C++20 also further refines this and specifies that integer are to be stored in two's complement alone eliminating further implementation-specific issues.

What is the " : " (two dots) operator in c++ [duplicate]

What does the following C++ code mean?
unsigned char a : 1;
unsigned char b : 7;
I guess it creates two char a and b, and both of them should be one byte long, but I have no idea what the ": 1" and ": 7" part does.

The 1 and the 7 are bit sizes to limit the range of the values. They're typically found in structures and unions. For example, on some systems (depends on char width and packing rules, etc), the code:
typedef struct {
unsigned char a : 1;
unsigned char b : 7;
} tOneAndSevenBits;
creates an 8-bit value, one bit for a and 7 bits for b.
Typically used in C to access "compressed" values such as a 4-bit nybble which might be contained in the top half of an 8-bit char:
typedef struct {
unsigned char leftFour : 4;
unsigned char rightFour : 4;
} tTwoNybbles;
For the language lawyers amongst us, the 9.6 section of the C++11 standard explains this in detail, slightly paraphrased:
Bit-fields [class.bit]
A member-declarator of the form
identifieropt attribute-specifieropt : constant-expression
specifies a bit-field; its length is set off from the bit-field name by a colon. The optional attribute-specifier appertains to the entity being declared. The bit-field attribute is not part of the type of the class member.
The constant-expression shall be an integral constant expression with a value greater than or equal to zero. The value of the integral constant expression may be larger than the number of bits in the object representation of the bit-field’s type; in such cases the extra bits are used as padding bits and do not participate in the value representation of the bit-field.
Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.
Note: bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. - end note

I believe those would be bitfields.

Strictly speaking, a bitfield must be a int, unsigned int, or _Bool. Although most compilers will take any integral type.
Ref C11 6.7.2.1:
A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other
implementation-defined type.
Your compiler will probably allocate 1 byte of storage, but it is free to grab more.
Ref C11 6.7.2.1:
An implementation may allocate any addressable storage unit large
enough to hold a bit- field.
The savings comes when you have multiple bitfields that are declared one after another. In this case, the storage allocated will be packed if possible.
Ref C11 6.7.2.1:
If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined.

Size of a structure having unsigned short ints

I was surfing in one of our organisational data documents and I came across the following piece of code.
struct A {
unsigned short int i:1;
unsigned short int j:1;
unsigned short int k:14;
};
int main(){
A aa;
int n = sizeof(aa);
cout << n;
}
Initially I thought the size will be 6 bytes as the size of the unsigned short int is 2 bytes. but the output of the above code was 2 bytes(On visual studio 2008).
Is there a slight possibility that the i:1, j:1 and k:14 makes it a bit field or something? Its just a guess and I am not very sure about it. Can somebody please help me in this?

Yes, this is bitfield, indeed.
Well, i'm not very much sure about c++, but In c99 standard, as per chapter 6.7.2.1 (10):
An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next bits or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
That makes your structure size (1 bit + 1 bit + 14 bits) = 16 bits = 2 bytes.
Note: No structure padding is considered here.
Edit:
As per C++14 standard, chapter §9.7,
A member-declarator of the form
identifieropt attribute-specifier-seqopt: constant-expression
specifies a bit-field; its length is set off from the bit-field name by a colon. [...] Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.

When is uint8_t ≠ unsigned char?

According to C and C++, CHAR_BIT >= 8.
But whenever CHAR_BIT > 8, uint8_t can't even be represented as 8 bits.
It must be larger, because CHAR_BIT is the minimum number of bits for any data type on the system.
On what kind of a system can uint8_t be legally defined to be a type other than unsigned char?
(If the answer is different for C and C++ then I'd like to know both.)

If it exists, uint8_t must always have the same width as unsigned char. However, it need not be the same type; it may be a distinct extended integer type. It also need not have the same representation as unsigned char; for instance, the bits could be interpreted in the opposite order. This is a silly example, but it makes more sense for int8_t, where signed char might be ones complement or sign-magnitude while int8_t is required to be twos complement.
One further "advantage" of using a non-char extended integer type for uint8_t even on "normal" systems is C's aliasing rules. Character types are allowed to alias anything, which prevents the compiler from heavily optimizing functions that use both character pointers and pointers to other types, unless the restrict keyword has been applied well. However, even if uint8_t has the exact same size and representation as unsigned char, if the implementation made it a distinct, non-character type, the aliasing rules would not apply to it, and the compiler could assume that objects of types uint8_t and int, for example, can never alias.

On what kind of a system can uint8_t be legally defined to be a type other than unsigned char?
In summary, uint8_t can only be legally defined on systems where CHAR_BIT is 8. It's an addressable unit with exactly 8 value bits and no padding bits.
In detail, CHAR_BIT defines the width of the smallest addressable units, and uint8_t can't have padding bits; it can only exist when the smallest addressable unit is exactly 8 bits wide. Providing CHAR_BIT is 8, uint8_t can be defined by a type definition for any 8-bit unsigned integer type that has no padding bits.
Here's what the C11 standard draft (n1570.pdf) says:
5.2.4.2.1 Sizes of integer types
1 The values given below shall be replaced by constant expressions suitable for use in #if
preprocessing directives. ... Their implementation-defined values shall be equal or
greater in magnitude (absolute value) to those shown, with the same sign.
-- number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
Thus the smallest objects must contain exactly CHAR_BIT bits.
6.5.3.4 The sizeof and _Alignof operators
...
4 When sizeof is applied to an operand that has type char, unsigned
char, or signed char, (or a qualified version thereof) the result is
1. ...
Thus, those are (some of) the smallest addressable units. Obviously int8_t and uint8_t may also be considered smallest addressable units, providing they exist.
7.20.1.1 Exact-width integer types
1 The typedef name intN_t designates a signed integer type with width
N, no padding bits, and a two’s complement representation. Thus,
int8_t denotes such a signed integer type with a width of exactly 8
bits.
2 The typedef name uintN_t designates an unsigned integer type with
width N and no padding bits. Thus, uint24_t denotes such an unsigned
integer type with a width of exactly 24 bits.
3 These types are optional. However, if an implementation provides
integer types with widths of 8, 16, 32, or 64 bits, no padding bits,
and (for the signed types) that have a two’s complement
representation, it shall define the corresponding typedef names.
The emphasis on "These types are optional" is mine. I hope this was helpful :)

A possibility that no one has so far mentioned: if CHAR_BIT==8 and unqualified char is unsigned, which it is in some ABIs, then uint8_t could be a typedef for char instead of unsigned char. This matters at least insofar as it affects overload choice (and its evil twin, name mangling), i.e. if you were to have both foo(char) and foo(unsigned char) in scope, calling foo with an argument of type uint8_t would prefer foo(char) on such a system.

Extracting two signed integers from one given integer?

I have the following structure:
struct
{
int a:4;
int b:7;
int c:21;
} example;
I would like to combine a and b to form an integer d in C++. For instance, I would like the bit values of a to be on the left of the bit values of b in order to form integer d. How is this implemented in c++?
Example:
a= 1001
b = 1010101
I would like int d = 10011010101 xxxxxxxxxxxxxxxxxxxxx
where x can be 21 bits that belonged to d previously. I would like the values of a and b to be put in bit positions 0-3 and 4-10 respectively since a occupies the first 4 bits and b occupies the next 7 bits in the struct "example".
The part that I am confused about is that variable a and variable b both have a "sign" bit at the most significant bit. Does this affect the outcome? Are all bits in variable a and variable b used in the end result for integer d? Will integer d look like a concatenation of variable a's bits and variable b's bits?
Thanks

Note that whether an int bit-field is signed or unsigned is implementation-defined. The C++ standard says this, and the C standard achieves the same net result with different wording:
ISO/IEC 14882:2011 — C++
§7.1.6.2 Simple type specifiers
¶3 ... [ Note: It is implementation-defined whether objects of char type and certain bit-fields (9.6) are
represented as signed or unsigned quantities. The signed specifier forces char objects and bit-fields to be
signed; it is redundant in other contexts. —end note ]
§9.6 Bit-fields
¶3 ... A bit-field shall have integral or enumeration type (3.9.1). It is
implementation-defined whether a plain (neither explicitly signed nor unsigned) char, short, int, long,
or long long bit-field is signed or unsigned.
ISO/IEC 9899:2011 — C
§6.7.2.1 Structure and union specifiers
¶10 A bit-field is interpreted as having a signed or unsigned integer type consisting of the specified number of bits.125)
125) As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int, then it is implementation-defined whether the bit-field is signed or unsigned.
§6.7.2 Type specifiers
¶5 ... for bit-fields, it is implementation-defined whether the specifier int designates the same type as signed int or the same type as unsigned int.
The context of §6.7.2 shows that int can be combined with short, long etc and the rule will apply; C++ specifies that a bit more clearly. The signedness of plain char is implementation-defined already, of course.
Unsigned bit-fields
If the type of the bit-fields are unsigned, then the expression is fairly straight-forward:
int d = (example.a << 7) | example.b;
Signed bit-fields
If the values are signed, then you have a major interpretation exercise to undertake, deciding what the value should be if example.a is negative and example.b is positive, or vice versa. To some extent, the problem arises even if the values are both negative or both positive.
Suppose example.a = 7; and example.b = 12; — what should be the value of d? Probably the same expression applies, but you could argue that it would be better to shift by 1 fewer places:
assert(example.a >= 0 && example.b >= 0);
int d = (example.a << 6) | example.b; // Alternative interpretation
The other cases are left for you to decide; it depends on the interpretation you want to place on the values. For example:
int d = ((example.a & 0x0F) << 7) | (example.b & 0x7F);
This forces the signed values to be treated as unsigned. It probably isn't what you're after.
Modified question
example.a = 1001 // binary
example.b = 1010101 // binary
d = 10011010101 xxxxxxxxxxxxxxxxxxxxx
where x can be 21 bits that belonged to d previously.
For this to work, then you need:
d = (d & 0x001FFFFF) | ((((example.a & 0x0F) << 7) | (example.b & 0x7F)) << 21);
You probably can use fewer parentheses; I'm not sure I'd risk doing so.
Union
However, with this revised specification, you might well be tempted to look at a union such as:
union u
{
struct
{
int a:4;
int b:7;
int c:21;
} y;
int x;
} example;
However, the layout of the bits in the bit-fields w.r.t the bits in the int x; is not specified (they could be most significant bits first or least significant bits first), and there are always mutterings about 'if you access a value in a union that wasn't the last one assigned to you invoke undefined behaviour'. Thus you have multiple platform-defined aspects of the bit field to deal with. In fact, this sort of conundrum generally means that bit-fields are closely tied to one specific type of machine (CPU) and compiler and operating system. They are very, very non-portable at the level of detail you're after.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js