How to merge two signed bit variables into one signed bit variable? - c++

Suppose the following c++ code:
#include <iostream>
using namespace std;
typedef struct
{
int a: 5;
int b: 4;
int c: 1;
int d: 22;
} example;
int main()
{
example blah;
blah.a = -5; // 11011
blah.b = -3; // 1101
int result = blah.a << 4 | blah.b;
cout << "Result = " << result << endl; // equals 445 , but I am interested in this having a value of -67
return 0;
}
I am interested in having the variable result be of type int where the 9th bit is the most significant bit. I would like this to be the case so that result = -67 instead of 445. How is this done? Thanks.

See Sign Extending an int in C for a closely related question (but not a duplicate).
You need to be aware that almost everything about bit fields is 'implementation defined'. In particular, it is not clear that you can assign negative numbers to a 'plain int' bit-field; you have to know whether your implementation uses 'plain int is signed' or 'plain int is unsigned'. Which is the 9th bit gets tricky too; are you counting from 0 or 1, and which end of the set of bit-fields is at bit 0 and which at bit 31 (counting least significant bit (LSB) as bit 0 and most significant bit (MSB) as bit 31 of a 32-bit quantity). Indeed, the size of your structure need not be 32 bits; the compiler might have different rules for the layout.
With all those caveats out of the way, you have a 9-bit value formed from (blah.a << 4) | blah.b, and you want that sign-extended as if it was a 9-bit 2's complement number being promoted to (32-bit) int.
The function in the cross-referenced answer could do the job:
#include <assert.h>
#include <limits.h>
extern int getFieldSignExtended(int value, int hi, int lo);
enum { INT_BITS = CHAR_BIT * sizeof(int) };
int getFieldSignExtended(int value, int hi, int lo)
{
assert(lo >= 0);
assert(hi > lo);
assert(hi < INT_BITS - 1);
int bits = (value >> lo) & ((1 << (hi - lo + 1)) - 1);
if (bits & (1 << (hi - lo)))
return(bits | (~0 << (hi - lo)));
else
return(bits);
}
Invoke it as:
int result = getFieldSignExtended((blah.a << 4) | blah.b), 8, 0);
If you want to hard-wire the numbers, you can write:
int x = (blah.a << 4) | blah.b;
int result = (x & (1 << 8)) ? (x | (~0 << 8)) : x;
Note I'm assuming the 9th bit is bit 8 of a value with bits 0..8 in it. Adjust if you have some other interpretation in mind.
Working code
Compiled with g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) from a RHEL 5 x86/64 machine.
#include <iostream>
using namespace std;
typedef struct
{
int a: 5;
int b: 4;
int c: 1;
int d: 22;
} example;
int main()
{
example blah;
blah.a = -5; // 11011
blah.b = -3; // 1101
int result = blah.a << 4 | blah.b;
cout << "Result = " << result << endl;
int x = (blah.a << 4) | blah.b;
cout << "x = " << x << endl;
int result2 = (x & (1 << 8)) ? (x | (~0 << 8)) : x;
cout << "Result2 = " << result2 << endl;
return 0;
}
Sample output:
Result = 445
x = 445
Result2 = -67
ISO/IEC 14882:2011 — C++ Standard
§7.1.6.2 Simple type specifiers
¶3 ... [ Note: It is implementation-defined whether objects of char type and certain bit-fields (9.6) are
represented as signed or unsigned quantities. The signed specifier forces char objects and bit-fields to be
signed; it is redundant in other contexts. —end note ]
§9.6 Bit-fields [class.bit]
¶1 A member-declarator of the form
identifier<sub>opt</sub> attribute-specifier-seq<sub>opt</sub>: constant-expression
specifies a bit-field; its length is set off from the bit-field name by a colon. The optional attribute-specifier-seq
appertains to the entity being declared. The bit-field attribute is not part of the type of the class
member. The constant-expression shall be an integral constant expression with a value greater than or equal
to zero. The value of the integral constant expression may be larger than the number of bits in the object
representation (3.9) of the bit-field’s type; in such cases the extra bits are used as padding bits and do not
participate in the value representation (3.9) of the bit-field. Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some
addressable allocation unit. [ Note: Bit-fields straddle allocation units on some machines and not on others.
Bit-fields are assigned right-to-left on some machines, left-to-right on others. —end note ]
¶2 A declaration for a bit-field that omits the identifier declares an unnamed bit-field. Unnamed bit-fields
are not members and cannot be initialized. [ Note: An unnamed bit-field is useful for padding to conform
to externally-imposed layouts. —end note ] As a special case, an unnamed bit-field with a width of zero
specifies alignment of the next bit-field at an allocation unit boundary. Only when declaring an unnamed
bit-field may the value of the constant-expression be equal to zero.
¶3 A bit-field shall not be a static member. A bit-field shall have integral or enumeration type (3.9.1). It is
implementation-defined whether a plain (neither explicitly signed nor unsigned) char, short, int, long,
or long long bit-field is signed or unsigned. A bool value can successfully be stored in a bit-field of any
nonzero size. The address-of operator & shall not be applied to a bit-field, so there are no pointers to bitfields.
A non-const reference shall not be bound to a bit-field (8.5.3). [ Note: If the initializer for a reference
of type const T& is an lvalue that refers to a bit-field, the reference is bound to a temporary initialized to
hold the value of the bit-field; the reference is not bound to the bit-field directly. See 8.5.3. —end note ]
¶4 If the value true or false is stored into a bit-field of type bool of any size (including a one bit bit-field),
the original bool value and the value of the bit-field shall compare equal. If the value of an enumerator is
stored into a bit-field of the same enumeration type and the number of bits in the bit-field is large enough
to hold all the values of that enumeration type (7.2), the original enumerator value and the value of the
bit-field shall compare equal. [ Example:
enum BOOL { FALSE=0, TRUE=1 };
struct A {
BOOL b:1;
};
A a;
void f() {
a.b = TRUE;
if (a.b == TRUE) // yields true
{ /* ... */ }
}
—end example ]
ISO/IEC 9899:2011 — C2011 Standard
The C standard has essentially the same effect, but the information is presented somewhat differently.
6.7.2.1 Structure and union specifiers
¶4 The expression that specifies the width of a bit-field shall be an integer constant
expression with a nonnegative value that does not exceed the width of an object of the
type that would be specified were the colon and expression omitted.122) If the value is
zero, the declaration shall have no declarator.
¶5 A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed
int, unsigned int, or some other implementation-defined type. It is
implementation-defined whether atomic types are permitted.
¶9 ... In addition, a member may be declared to consist of a
specified number of bits (including a sign bit, if any). Such a member is called a
bit-field;124) its width is preceded by a colon.
¶10 A bit-field is interpreted as having a signed or unsigned integer type consisting of the
specified number of bits.125) If the value 0 or 1 is stored into a nonzero-width bit-field of
type _Bool, the value of the bit-field shall compare equal to the value stored; a _Bool
bit-field has the semantics of a _Bool.
¶11 An implementation may allocate any addressable storage unit large enough to hold a bitfield.
If enough space remains, a bit-field that immediately follows another bit-field in a
structure shall be packed into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is
implementation-defined. The order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined. The alignment of the
addressable storage unit is unspecified.
¶12 A bit-field declaration with no declarator, but only a colon and a width, indicates an
unnamed bit-field.126) As a special case, a bit-field structure member with a width of 0
indicates that no further bit-field is to be packed into the unit in which the previous bitfield,
if any, was placed.
122) While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and
value bits) of a _Bool may be just 1 bit.
124) The unary & (address-of) operator cannot be applied to a bit-field object; thus, there are no pointers to
or arrays of bit-field objects.
125) As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int,
then it is implementation-defined whether the bit-field is signed or unsigned.
126) An unnamed bit-field structure member is useful for padding to conform to externally imposed
layouts.
Annex J of the standard defines Portability Issues, and §J.3 defines Implementation-defined Behaviour. In part, it says:
J.3.9 Structures, unions, enumerations, and bit-fields
¶1 — Whether a ‘‘plain’’ int bit-field is treated as a signed int bit-field or as an
unsigned int bit-field (6.7.2, 6.7.2.1).
— Allowable bit-field types other than _Bool, signed int, and unsigned int
(6.7.2.1).
— Whether atomic types are permitted for bit-fields (6.7.2.1).
— Whether a bit-field can straddle a storage-unit boundary (6.7.2.1).
— The order of allocation of bit-fields within a unit (6.7.2.1).

Related

What is the " : " (two dots) operator in c++ [duplicate]

What does the following C++ code mean?
unsigned char a : 1;
unsigned char b : 7;
I guess it creates two char a and b, and both of them should be one byte long, but I have no idea what the ": 1" and ": 7" part does.
The 1 and the 7 are bit sizes to limit the range of the values. They're typically found in structures and unions. For example, on some systems (depends on char width and packing rules, etc), the code:
typedef struct {
unsigned char a : 1;
unsigned char b : 7;
} tOneAndSevenBits;
creates an 8-bit value, one bit for a and 7 bits for b.
Typically used in C to access "compressed" values such as a 4-bit nybble which might be contained in the top half of an 8-bit char:
typedef struct {
unsigned char leftFour : 4;
unsigned char rightFour : 4;
} tTwoNybbles;
For the language lawyers amongst us, the 9.6 section of the C++11 standard explains this in detail, slightly paraphrased:
Bit-fields [class.bit]
A member-declarator of the form
identifieropt attribute-specifieropt : constant-expression
specifies a bit-field; its length is set off from the bit-field name by a colon. The optional attribute-specifier appertains to the entity being declared. The bit-field attribute is not part of the type of the class member.
The constant-expression shall be an integral constant expression with a value greater than or equal to zero. The value of the integral constant expression may be larger than the number of bits in the object representation of the bit-field’s type; in such cases the extra bits are used as padding bits and do not participate in the value representation of the bit-field.
Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.
Note: bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. - end note
I believe those would be bitfields.
Strictly speaking, a bitfield must be a int, unsigned int, or _Bool. Although most compilers will take any integral type.
Ref C11 6.7.2.1:
A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other
implementation-defined type.
Your compiler will probably allocate 1 byte of storage, but it is free to grab more.
Ref C11 6.7.2.1:
An implementation may allocate any addressable storage unit large
enough to hold a bit- field.
The savings comes when you have multiple bitfields that are declared one after another. In this case, the storage allocated will be packed if possible.
Ref C11 6.7.2.1:
If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined.

Is overflow of an unsigned bit field guaranteed to wrap-around?

Details
The reference for bit fields at cppreference presents the following example:
#include <iostream>
struct S {
// three-bit unsigned field,
// allowed values are 0...7
unsigned int b : 3;
};
int main()
{
S s = {7};
++s.b; // unsigned overflow (guaranteed wrap-around)
std::cout << s.b << '\n'; // output: 0
}
Emphasis on the guaranteed wrap-around comment.
However, WG21 CWG Issue 1816 describe some possible issues with unclear specification of bit field values, and [expr.post.incr]/1 in the latest standard draft states:
The value of a postfix ++ expression is the value of its operand. ...
If the operand is a bit-field that cannot represent the incremented value, the resulting value of the bit-field is implementation-defined.
I'm unsure, however, if this applies also for wrap-around of unsigned bit fields.
Question
Is overflow of an unsigned bit field guaranteed to wrap-around?
Both [expr.pos]/1 and [expr.ass]/6 agree that integer overflow on a (signed or unsigned) bit-field is implementation defined.
[expr.pos]/1
[...] If the operand is a bit-field that cannot represent the incremented value, the resulting value of the bit-field is implementation-defined.
[expr.ass]/6
When the left operand of an assignment operator is a bit-field that cannot represent the value of the expression, the resulting value of the bit-field is implementation-defined.
I've fixed the cppreference page. Thank you for noticing.

what does ":1" after a member variable mean? [duplicate]

What does the following C++ code mean?
unsigned char a : 1;
unsigned char b : 7;
I guess it creates two char a and b, and both of them should be one byte long, but I have no idea what the ": 1" and ": 7" part does.
The 1 and the 7 are bit sizes to limit the range of the values. They're typically found in structures and unions. For example, on some systems (depends on char width and packing rules, etc), the code:
typedef struct {
unsigned char a : 1;
unsigned char b : 7;
} tOneAndSevenBits;
creates an 8-bit value, one bit for a and 7 bits for b.
Typically used in C to access "compressed" values such as a 4-bit nybble which might be contained in the top half of an 8-bit char:
typedef struct {
unsigned char leftFour : 4;
unsigned char rightFour : 4;
} tTwoNybbles;
For the language lawyers amongst us, the 9.6 section of the C++11 standard explains this in detail, slightly paraphrased:
Bit-fields [class.bit]
A member-declarator of the form
identifieropt attribute-specifieropt : constant-expression
specifies a bit-field; its length is set off from the bit-field name by a colon. The optional attribute-specifier appertains to the entity being declared. The bit-field attribute is not part of the type of the class member.
The constant-expression shall be an integral constant expression with a value greater than or equal to zero. The value of the integral constant expression may be larger than the number of bits in the object representation of the bit-field’s type; in such cases the extra bits are used as padding bits and do not participate in the value representation of the bit-field.
Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit.
Note: bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. - end note
I believe those would be bitfields.
Strictly speaking, a bitfield must be a int, unsigned int, or _Bool. Although most compilers will take any integral type.
Ref C11 6.7.2.1:
A bit-field shall have a type that is a qualified or unqualified
version of _Bool, signed int, unsigned int, or some other
implementation-defined type.
Your compiler will probably allocate 1 byte of storage, but it is free to grab more.
Ref C11 6.7.2.1:
An implementation may allocate any addressable storage unit large
enough to hold a bit- field.
The savings comes when you have multiple bitfields that are declared one after another. In this case, the storage allocated will be packed if possible.
Ref C11 6.7.2.1:
If enough space remains, a bit-field that
immediately follows another bit-field in a structure shall be packed
into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined.

Are there any guarantees on the representation of large enum values?

Suppose I have (on a 32 bit machine)
enum foo {
val1 = 0x7FFFFFFF, // originally '2^31 - 1'
val2,
val3 = 0xFFFFFFFF, // originally '2^32 - 1'
val4,
val5
};
what is the value of val2, val4 and val5? I know I could test it, but is the result standardized?
In C standard:
C11 (n1570), § 6.7.2.2 Enumeration specifiers
Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration.
If the underlying type used by the compiler is not capable to represent these values, the behavior is undefined.
C11 (n1570), § 4. Conformance
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined.
From the C++11 standard (§7.2,6, emphasis mine):
For an enumeration whose underlying type is not fixed, the underlying type is an integral type that can represent all the enumerator values defined in the enumeration. If no integral type can represent all the enumerator values, the enumeration is ill-formed. It is implementation-defined which integral type is used as the underlying type except that the underlying type shall not be larger than int unless the value of an enumerator cannot fit in an int or unsigned int.
So the compiler will happily do The Right Thing if there is an integral type bigger than 32bit. If not, the enum is illformed. There will be no wrapping around.
The values will be:
enum foo {
val1 = 0x7FFFFFFF,
val2, // 0x80000000 = 2^31
val3 = 0xFFFFFFFF,
val4, //0x0000000100000000 = 2^32
val5 //0x0000000100000001 = 2^32+1
};
The increasing numbers are well defined as well (§7.2,2):
[...] An enumerator-definition without an initializer gives the enumerator the value obtained by increasing the value of the previous enumerator by one.
C99 / C11
Prelude:
5.2.4.2.1 requires int to be at least 16 bits wide; AFAIK there's no upper bound (long must be longer or equal, though, 6.2.5 /8).
6.5 /5:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.
If your `int` is 32 bits wide (or less)
then the example in the OP is a violation of constraint 6.7.2.2 /2:
The expression that defines the value of an enumeration constant shall be an integer
constant expression that has a value representable as an int.
Furthermore, the enumerators are defined as constant of type int, 6.7.2.2 /3:
The identifiers in an enumerator list are declared as constants that have type int and
may appear wherever such are permitted.
Note, there's a difference between the type of the enumeration and the type of an enumerator / enumeration constant:
enum foo { val0 };
enum foo myVariable; // myVariable has the type of the enumeration
uint_least8_t v = val0*'c'; // if val0 appears in any expression, it has type int
It seems to me this allows narrowing, e.g. reducing the size of the enum type to 8 bits:
enum foo { val1 = 1, val2 = 5 };
enum foo myVariable = val1; // allowed to be 8-bit
But it seems to disallow widening, e.g.
enum foo { val1 = INT_MAX+1 }; // constraint violation AND undefined behaviour
// not sure about the following, we're already in UB-land
enum foo myVariable = val1; // maximum value of an enumerator still is INT_MAX
// therefore myVariable will have sizeof int
Auto-increment of enumerators
Because of 6.7.2.2 /3,
[...] Each subsequent enumerator with no = defines its enumeration constant as the value of the constant expression obtained by adding 1 to the value of the previous enumeration constant. [...]
the example results in UB:
enum foo {
val0 = INT_MAX,
val1 // equivalent to `val1 = INT_MAX+1`
};
Here's the C++ answer: in 7.2/6, it states:
[...] the underlying type is an integral type that can represent all
the enumerator values defined in the enumeration. If no integral type
can represent all the enumerator values, the enumeration is
ill-formed. It is implementation-defined which integral type is used
as the underlying type except that the underlying type shall not be
larger than int unless the value of an enumerator cannot fit in an int
or unsigned int.
So compared to C: no undefined behavior if the compiler can't find a type, and the compiler can't just use its 512-bit extended integer type for your two-value enum.
Which means that in your example, the underlying type will probably be some signed 64-bit type - most compilers always try the signed version of a type first.

Extracting two signed integers from one given integer?

I have the following structure:
struct
{
int a:4;
int b:7;
int c:21;
} example;
I would like to combine a and b to form an integer d in C++. For instance, I would like the bit values of a to be on the left of the bit values of b in order to form integer d. How is this implemented in c++?
Example:
a= 1001
b = 1010101
I would like int d = 10011010101 xxxxxxxxxxxxxxxxxxxxx
where x can be 21 bits that belonged to d previously. I would like the values of a and b to be put in bit positions 0-3 and 4-10 respectively since a occupies the first 4 bits and b occupies the next 7 bits in the struct "example".
The part that I am confused about is that variable a and variable b both have a "sign" bit at the most significant bit. Does this affect the outcome? Are all bits in variable a and variable b used in the end result for integer d? Will integer d look like a concatenation of variable a's bits and variable b's bits?
Thanks
Note that whether an int bit-field is signed or unsigned is implementation-defined. The C++ standard says this, and the C standard achieves the same net result with different wording:
ISO/IEC 14882:2011 — C++
§7.1.6.2 Simple type specifiers
¶3 ... [ Note: It is implementation-defined whether objects of char type and certain bit-fields (9.6) are
represented as signed or unsigned quantities. The signed specifier forces char objects and bit-fields to be
signed; it is redundant in other contexts. —end note ]
§9.6 Bit-fields
¶3 ... A bit-field shall have integral or enumeration type (3.9.1). It is
implementation-defined whether a plain (neither explicitly signed nor unsigned) char, short, int, long,
or long long bit-field is signed or unsigned.
ISO/IEC 9899:2011 — C
§6.7.2.1 Structure and union specifiers
¶10 A bit-field is interpreted as having a signed or unsigned integer type consisting of the specified number of bits.125)
125) As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int, then it is implementation-defined whether the bit-field is signed or unsigned.
§6.7.2 Type specifiers
¶5 ... for bit-fields, it is implementation-defined whether the specifier int designates the same type as signed int or the same type as unsigned int.
The context of §6.7.2 shows that int can be combined with short, long etc and the rule will apply; C++ specifies that a bit more clearly. The signedness of plain char is implementation-defined already, of course.
Unsigned bit-fields
If the type of the bit-fields are unsigned, then the expression is fairly straight-forward:
int d = (example.a << 7) | example.b;
Signed bit-fields
If the values are signed, then you have a major interpretation exercise to undertake, deciding what the value should be if example.a is negative and example.b is positive, or vice versa. To some extent, the problem arises even if the values are both negative or both positive.
Suppose example.a = 7; and example.b = 12; — what should be the value of d? Probably the same expression applies, but you could argue that it would be better to shift by 1 fewer places:
assert(example.a >= 0 && example.b >= 0);
int d = (example.a << 6) | example.b; // Alternative interpretation
The other cases are left for you to decide; it depends on the interpretation you want to place on the values. For example:
int d = ((example.a & 0x0F) << 7) | (example.b & 0x7F);
This forces the signed values to be treated as unsigned. It probably isn't what you're after.
Modified question
example.a = 1001 // binary
example.b = 1010101 // binary
d = 10011010101 xxxxxxxxxxxxxxxxxxxxx
where x can be 21 bits that belonged to d previously.
For this to work, then you need:
d = (d & 0x001FFFFF) | ((((example.a & 0x0F) << 7) | (example.b & 0x7F)) << 21);
You probably can use fewer parentheses; I'm not sure I'd risk doing so.
Union
However, with this revised specification, you might well be tempted to look at a union such as:
union u
{
struct
{
int a:4;
int b:7;
int c:21;
} y;
int x;
} example;
However, the layout of the bits in the bit-fields w.r.t the bits in the int x; is not specified (they could be most significant bits first or least significant bits first), and there are always mutterings about 'if you access a value in a union that wasn't the last one assigned to you invoke undefined behaviour'. Thus you have multiple platform-defined aspects of the bit field to deal with. In fact, this sort of conundrum generally means that bit-fields are closely tied to one specific type of machine (CPU) and compiler and operating system. They are very, very non-portable at the level of detail you're after.