Is overflow of an unsigned bit field guaranteed to wrap-around?

Is overflow of an unsigned bit field guaranteed to wrap-around? - c++

Details
The reference for bit fields at cppreference presents the following example:
#include <iostream>
struct S {
// three-bit unsigned field,
// allowed values are 0...7
unsigned int b : 3;
};
int main()
{
S s = {7};
++s.b; // unsigned overflow (guaranteed wrap-around)
std::cout << s.b << '\n'; // output: 0
}
Emphasis on the guaranteed wrap-around comment.
However, WG21 CWG Issue 1816 describe some possible issues with unclear specification of bit field values, and [expr.post.incr]/1 in the latest standard draft states:
The value of a postfix ++ expression is the value of its operand. ...
If the operand is a bit-field that cannot represent the incremented value, the resulting value of the bit-field is implementation-defined.
I'm unsure, however, if this applies also for wrap-around of unsigned bit fields.
Question
Is overflow of an unsigned bit field guaranteed to wrap-around?

Both [expr.pos]/1 and [expr.ass]/6 agree that integer overflow on a (signed or unsigned) bit-field is implementation defined.
[expr.pos]/1
[...] If the operand is a bit-field that cannot represent the incremented value, the resulting value of the bit-field is implementation-defined.
[expr.ass]/6
When the left operand of an assignment operator is a bit-field that cannot represent the value of the expression, the resulting value of the bit-field is implementation-defined.
I've fixed the cppreference page. Thank you for noticing.

Related

does signed integers now behave differently, with regards to left shift?

In c++20, signed integers are now defined to use two's complement,
see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r3.html
This is a welcome change, however one of the bullet-points caught my eye:
Change Left-shift on signed integer types produces the same
results as left-shift on the corresponding unsigned integer type.
This seem like a strange change. Will this not shift away the sign bit?

The C++17 wording for signed left shifts (E1 << E2) was:
Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
Note that it speaks of being representable in "the corresponding unsigned type". So if you have a 32-bit signed integer whose value is the 0x7FFFFFFF, and you left-shift it by 1, the resulting shift is representable in a 32-bit unsigned integer (0xFFFFFFFE). But then this unsigned value gets converted into the result type. And converting an unsigned integer whose value is too big for the corresponding signed type is implementation-defined.
Overall, in C++17, left-shifting into the sign bit could happen through implementation-defined behavior, and even then only if you don't shift beyond the unsigned result type's size. Going past that is explicitly UB.
The C++20 wording, for both signed and unsigned integers, is:
The value of E1 << E2 is the unique value congruent to E1×2E2 modulo 2N, where N is the width of the type of the result.
Integer congruence modulo a number basically means cutting off the bits beyond the modulo number. The "width" of an integer is explicitly defined as:
The range of representable values for a signed integer type is −2N−1 to 2N−1−1 (inclusive), where N is called the width of the type.
This means that for a 32-bit signed integer, the width is 31. So the modulous of the result of a shift is 31 bits, which cuts off the sign bit, explicitly preventing shifting into it.
So in C++20, we have a harder guarantee; implementations can never do a signed left-shift into the sign bit. This is different from C++17 only in the sense that implementation variance/UB has been explicitly defined to not happen.
So left shift wasn't defined to shift into the sign bit in C++17, and is defined not to do so in C++20.
What exactly that quote means probably refers to the fact that left shift on a negative number is now valid, shifting is always well-defined no matter how much shifting you do, and the wording for the signed/unsigned shifting is overall the same.

Yes, the left shifting signed integer behavior changed with C++20.
With C++17, left-shifting a positive signed integer into the sign bit invokes implementation defined behavior.1 Example:
int i = INT_MAX;
int j = i << 1; // implementation defined behavior with std < C++20
C++20 changed this to defined behavior because it mandates two's complement representation for signed integers.2,3
With C++17, shifting a negative signed integer invokes undefined behavior.1 Example:
int i = -1;
int j = i << 1; // undefined behavior with std < C++20
In C++20, this changed as well and this operation now also invokes defined behavior.3
This seem like a strange change. Will this not shift away the sign bit?
Yes, a signed left shift shifts away the sign bit. Example:
int i = 1 << (sizeof(int)*8-1); // C++20: defined behavior, set most significant bit
int j = i << 1; // C++20: defined behavior, set to 0
The main reason for specifying something as undefined or implementation defined behavior is to allow for efficient implementations on different hardware.
Nowadays, since all CPUs implement two's complement it's natural that the C++ standard mandates it. And if you mandate two's complement it's only consequential that you make the above operations defined behavior because this is also how left shift behaves in all two's complement instruction set architectures (ISAs).
IOW, leaving it implementation defined and undefined wouldn't buy you anything.
Or, if you liked the previous undefined behavior why would you care if it gets changed to defined behavior? You can still avoid this operation as before. You wouldn't have to change your code.
1
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned
type, the value of the result is E1 × 2**E2, reduced modulo one more than the maximum value representable in
the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2**E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value; otherwise, the behavior is undefined.
(C++17 final working draft, Section 8.8 Shift operators [expr.shift], Paragraph 2, page 132 - emphasis mine)
2
[..] For each value x of a signed integer type, the value of the
corresponding unsigned integer type congruent to x modulo 2 N has the same value of corresponding bits in
its value representation. 41) This is also known as two’s complement representation. [..]
(C++20 latest working draft, Section 6.8.1 Fundamental types [basic.fundamental], Paragraph 3, page 66)
3
The value of E1 << E2 is the unique value congruent to E1 × 2**E2 modulo 2**N, where N is the width of the
type of the result. [Note: E1 is left-shifted E2 bit positions; vacated bits are zero-filled. — end note]
(C++20 latest working draft, Section 7.6.7 Shift operators [expr.shift], Paragraph 2, page 129, link mine)

Does setting a 1-bit wide bitfield to 2 mean the bitfield is set or unset?

So I have a bitfield like so:
unsigned int foobar:1;
And then I set it using this code
uint32_t code = loadCode();
structure.foobar = code & 2;
So, if code is set to 2, would this mean that foobar is set to 1, 0, or undefined? The exact standard I'm using is actually C++11, not plain C.

[expr.ass]/6:
When the left operand of an assignment operator is a bit-field that
cannot represent the value of the expression, the resulting value of
the bit-field is implementation-defined.
Similarly, for initialization:
When initializing a bit-field with a value that it cannot represent,
the resulting value of the bit-field is implementation-defined.
This is added by DR 1816. As a defect report that fixes a bug in the standard, it is de facto retroactive.

Value of a+b and char type

I am working in C++ and I had (as an exercise) to write on paper 2 answers.
The first question: if we have the following declarations and initialisations of variables:
unsigned char x=250, z=x+7, a='8';
What is the value of the expression?
z|(a-'0') // (here | is bitwise disjunction)
We have unsigned char, so the number z=x+7 is reduced mod 256, thus, after writing the numbers in binary, the answer is 9.
The next question: a and b are int variables, a=1 and b=32767.
The range of int is [-32768, 32767]. We don't have an unsigned type here. My question is: what is the value of a+b? How does this work with signed data types if the value of a certain variable is greater than the range of that data type?

The next question: a and b are int variables, a=1 and b=32767.
[...]My question is: what is the value of a+b?
Its undefined behavior. We cant tell you what it will be. We could make a reasonable guess but as far as C++ is concerned signed integer overflow is undefined behavior.

There is no operator+(unsigned char, unsigned char) in C++, it first promotes these unsigned char arguments to int and only then does the addition, so that the type of the expression is int.
And then that int whose value is too big to fit in unsigned char gets converted to unsigned char.
The standard says:
A prvalue of an integer type can be converted to a prvalue of another integer type. A prvalue of an unscoped enumeration type can be converted to a prvalue of an integer type. If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2**n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). — end note ]

For the second question, the answer is undetermined.
You can verify it yourself like this :
#include <iostream>
using namespace std;
int main()
{
int a = 1;
int b = 32767;
int c = a+b;
cout << c << endl;
}
The result will depend on your machine.

Should use of bit-fields of type int be discouraged? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
From the Draft C++ Standard (N3337):
9.6 Bit-fields
4 If the value true or false is stored into a bit-field of type bool of any size (including a one bit bit-field), the original bool value and the value of the bit-field shall compare equal. If the value of an enumerator is stored into a bit-field of the same enumeration type and the number of bits in the bit-field is large enough to hold all the values of that enumeration type (7.2), the original enumerator value and the value of the bit-field shall compare equal.
The standard is non-committal about any such behavior for bit-fields of other types. To understand how g++ (4.7.3) deals with other types of bit-fields, I used the following test program:
#include <iostream>
enum TestEnum
{
V1 = 0,
V2
};
struct Foo
{
bool d1:1;
TestEnum d2:1;
int d3:1;
unsigned int d4:1;
};
int main()
{
Foo foo;
foo.d1 = true;
foo.d2 = V2;
foo.d3 = 1;
foo.d4 = 1;
std::cout << std::boolalpha;
std::cout << "d1: " << foo.d1 << std::endl;
std::cout << "d2: " << foo.d2 << std::endl;
std::cout << "d3: " << foo.d3 << std::endl;
std::cout << "d4: " << foo.d4 << std::endl;
std::cout << std::endl;
std::cout << (foo.d1 == true) << std::endl;
std::cout << (foo.d2 == V2) << std::endl;
std::cout << (foo.d3 == 1) << std::endl;
std::cout << (foo.d4 == 1) << std::endl;
return 0;
}
The output:
d1: true
d2: 1
d3: -1
d4: 1
true
true
false
true
I was surprised by the lines of the output corresponding to Foo::d3. The output is the same at ideone.com.
Since the standard is non-committal about the comparision of bit-fields of type int, g++ does not seem to be in violation of the standard. That brings me to my questions.
Is use of bit-fields of type int a bad idea? Should it be discouraged?

Yes, bit fields of type int are a bad idea, because their signedness is implementation-defined. Use signed int or unsigned int instead.
For non-bitfield declarations, the type name int is exactly equivalent to signed int (or int signed, or signed). The same pattern is followed for short, long, and long long: the unadorned type name is the signed version, and you have to add the unsigned keyword to name the corresponding unsigned type.
Bit fields, for historical reasons, are a special case. A bit-field defined with the type int is equivalent either to the same declaration with signed int, or to the same declaration with unsigned int. The choice is implementation-defined (i.e., it's up to the compiler, not to the programmer). A bit field is the only context in which int and signed int aren't (necessarily) synonymous. The same applies to char, short, long, and long long.
Quoting the C++11 standard, section 9.6 [class.bit]:
It is implementation-defined whether a plain (neither explicitly
signed nor unsigned) char, short, int, long, or long long bit-field is
signed or unsigned.
(I'm not entirely sure of the rationale for this. Very old versions of C didn't have the unsigned keyword, and unsigned bit fields are usually more useful than signed bit fields. It may be that some early C compilers implemented bit fields before the unsigned keyword was introduced. Making bit fields unsigned by default, even when declared as int, may have been just a matter of convenience. There's no real reason to keep the rule other than to avoid breaking old code.)
Most bit fields are intended to be unsigned, which of course means that they should be defined that way.
If you want a signed bit field (say, a 4-bit field that can represent values from -8 to +7, or from -7 to +7 on a non-two's-complement system), then you should explicitly define it as signed int. If you define it as int, then some compilers will treat it as unsigned int.
If you don't care whether your bit field is signed or unsigned, then you can define it as int -- but if you're defining a bit field, then you almost certainly do care whether it's signed or unsigned.

You can absolutely use unsigned bit-fields of any size no greater than the size of an unsigned int. While signed bit-fields are legal (at least if the width is greater than one), I personally prefer not to use them. If, however, you do want to use a signed bit-field, you should explicitly mark it as signed because it is implementation-dependent as to whether an unqualified int bit-field is signed or unsigned. (This is similar to char, but without the complicating feature of explicitly unqualified char* literals.)
So to that extent, I agree that int bit-fields should be discouraged. [Note 1] While I don't know of any implementation in which an int bitfield is implicitly unsigned, it is certainly allowed by the standard, and consequently there is lots of opportunity for implementation-specific unanticipated behaviour if you are not explicit about signs.
The standards specify that a signed integer representation consists of optional padding bits, exactly one sign bit, and value bits. While the standard does not guarantee that there is at least one value bit, -- and as the example in the OP shows, gcc does not insist that there be -- I think it is a plausible interpretation of the standard, since it explicitly allows there to be no padding bits, and does not have any such wording corresponding to value bits.
In any case, there are only three possible signed representations allowed:
2's complement, in which a single-bit field consisting of a 1 should be interpreted as -1
1's complement and sign-magnitude. In both of these case, a single-bit field consisting of a 1 is allowed to be a trap representation, so the only number which can be represented in a 1-bit signed bit-field is 0.
Since portable code cannot assume that a 1-bit signed bit-field can represent any non-zero value, it seems reasonable to insist that a signed bit-field have at least 2 bits, regardless of whether you interpret the standard(s) to actually require that or not.
Notes:
Indeed, if it were not for the fact that string literals are explicitly unqualified, I would prefer to always specify unsigned char. But there's no way to roll-back history on that point.)

int is signed, and in C++ Two's complement can be used, so in first int's byte sign may be stored. When there are 2 bits for an signed int, it can be equal to 1, see it working.

This is perfectly logical. int is a signed integer type, and if the underlying architecture uses two's complement to represent signed integers (as all modern architectures do), then the high-order bit is the sign bit. So a 1-bit signed integer bitfield can take the values 0 or -1. And a 3-bit signed integer bitfield, for instance, can take values between -4 and 3 inclusive.
There is no reason for a blanket ban on signed integer bitfields, as long as you understand two's complement representation.

How to merge two signed bit variables into one signed bit variable?

Suppose the following c++ code:
#include <iostream>
using namespace std;
typedef struct
{
int a: 5;
int b: 4;
int c: 1;
int d: 22;
} example;
int main()
{
example blah;
blah.a = -5; // 11011
blah.b = -3; // 1101
int result = blah.a << 4 | blah.b;
cout << "Result = " << result << endl; // equals 445 , but I am interested in this having a value of -67
return 0;
}
I am interested in having the variable result be of type int where the 9th bit is the most significant bit. I would like this to be the case so that result = -67 instead of 445. How is this done? Thanks.

See Sign Extending an int in C for a closely related question (but not a duplicate).
You need to be aware that almost everything about bit fields is 'implementation defined'. In particular, it is not clear that you can assign negative numbers to a 'plain int' bit-field; you have to know whether your implementation uses 'plain int is signed' or 'plain int is unsigned'. Which is the 9th bit gets tricky too; are you counting from 0 or 1, and which end of the set of bit-fields is at bit 0 and which at bit 31 (counting least significant bit (LSB) as bit 0 and most significant bit (MSB) as bit 31 of a 32-bit quantity). Indeed, the size of your structure need not be 32 bits; the compiler might have different rules for the layout.
With all those caveats out of the way, you have a 9-bit value formed from (blah.a << 4) | blah.b, and you want that sign-extended as if it was a 9-bit 2's complement number being promoted to (32-bit) int.
The function in the cross-referenced answer could do the job:
#include <assert.h>
#include <limits.h>
extern int getFieldSignExtended(int value, int hi, int lo);
enum { INT_BITS = CHAR_BIT * sizeof(int) };
int getFieldSignExtended(int value, int hi, int lo)
{
assert(lo >= 0);
assert(hi > lo);
assert(hi < INT_BITS - 1);
int bits = (value >> lo) & ((1 << (hi - lo + 1)) - 1);
if (bits & (1 << (hi - lo)))
return(bits | (~0 << (hi - lo)));
else
return(bits);
}
Invoke it as:
int result = getFieldSignExtended((blah.a << 4) | blah.b), 8, 0);
If you want to hard-wire the numbers, you can write:
int x = (blah.a << 4) | blah.b;
int result = (x & (1 << 8)) ? (x | (~0 << 8)) : x;
Note I'm assuming the 9th bit is bit 8 of a value with bits 0..8 in it. Adjust if you have some other interpretation in mind.
Working code
Compiled with g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) from a RHEL 5 x86/64 machine.
#include <iostream>
using namespace std;
typedef struct
{
int a: 5;
int b: 4;
int c: 1;
int d: 22;
} example;
int main()
{
example blah;
blah.a = -5; // 11011
blah.b = -3; // 1101
int result = blah.a << 4 | blah.b;
cout << "Result = " << result << endl;
int x = (blah.a << 4) | blah.b;
cout << "x = " << x << endl;
int result2 = (x & (1 << 8)) ? (x | (~0 << 8)) : x;
cout << "Result2 = " << result2 << endl;
return 0;
}
Sample output:
Result = 445
x = 445
Result2 = -67
ISO/IEC 14882:2011 — C++ Standard
§7.1.6.2 Simple type specifiers
¶3 ... [ Note: It is implementation-defined whether objects of char type and certain bit-fields (9.6) are
represented as signed or unsigned quantities. The signed specifier forces char objects and bit-fields to be
signed; it is redundant in other contexts. —end note ]
§9.6 Bit-fields [class.bit]
¶1 A member-declarator of the form
identifier<sub>opt</sub> attribute-specifier-seq<sub>opt</sub>: constant-expression
specifies a bit-field; its length is set off from the bit-field name by a colon. The optional attribute-specifier-seq
appertains to the entity being declared. The bit-field attribute is not part of the type of the class
member. The constant-expression shall be an integral constant expression with a value greater than or equal
to zero. The value of the integral constant expression may be larger than the number of bits in the object
representation (3.9) of the bit-field’s type; in such cases the extra bits are used as padding bits and do not
participate in the value representation (3.9) of the bit-field. Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some
addressable allocation unit. [ Note: Bit-fields straddle allocation units on some machines and not on others.
Bit-fields are assigned right-to-left on some machines, left-to-right on others. —end note ]
¶2 A declaration for a bit-field that omits the identifier declares an unnamed bit-field. Unnamed bit-fields
are not members and cannot be initialized. [ Note: An unnamed bit-field is useful for padding to conform
to externally-imposed layouts. —end note ] As a special case, an unnamed bit-field with a width of zero
specifies alignment of the next bit-field at an allocation unit boundary. Only when declaring an unnamed
bit-field may the value of the constant-expression be equal to zero.
¶3 A bit-field shall not be a static member. A bit-field shall have integral or enumeration type (3.9.1). It is
implementation-defined whether a plain (neither explicitly signed nor unsigned) char, short, int, long,
or long long bit-field is signed or unsigned. A bool value can successfully be stored in a bit-field of any
nonzero size. The address-of operator & shall not be applied to a bit-field, so there are no pointers to bitfields.
A non-const reference shall not be bound to a bit-field (8.5.3). [ Note: If the initializer for a reference
of type const T& is an lvalue that refers to a bit-field, the reference is bound to a temporary initialized to
hold the value of the bit-field; the reference is not bound to the bit-field directly. See 8.5.3. —end note ]
¶4 If the value true or false is stored into a bit-field of type bool of any size (including a one bit bit-field),
the original bool value and the value of the bit-field shall compare equal. If the value of an enumerator is
stored into a bit-field of the same enumeration type and the number of bits in the bit-field is large enough
to hold all the values of that enumeration type (7.2), the original enumerator value and the value of the
bit-field shall compare equal. [ Example:
enum BOOL { FALSE=0, TRUE=1 };
struct A {
BOOL b:1;
};
A a;
void f() {
a.b = TRUE;
if (a.b == TRUE) // yields true
{ /* ... */ }
}
—end example ]
ISO/IEC 9899:2011 — C2011 Standard
The C standard has essentially the same effect, but the information is presented somewhat differently.
6.7.2.1 Structure and union specifiers
¶4 The expression that specifies the width of a bit-field shall be an integer constant
expression with a nonnegative value that does not exceed the width of an object of the
type that would be specified were the colon and expression omitted.122) If the value is
zero, the declaration shall have no declarator.
¶5 A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed
int, unsigned int, or some other implementation-defined type. It is
implementation-defined whether atomic types are permitted.
¶9 ... In addition, a member may be declared to consist of a
specified number of bits (including a sign bit, if any). Such a member is called a
bit-field;124) its width is preceded by a colon.
¶10 A bit-field is interpreted as having a signed or unsigned integer type consisting of the
specified number of bits.125) If the value 0 or 1 is stored into a nonzero-width bit-field of
type _Bool, the value of the bit-field shall compare equal to the value stored; a _Bool
bit-field has the semantics of a _Bool.
¶11 An implementation may allocate any addressable storage unit large enough to hold a bitfield.
If enough space remains, a bit-field that immediately follows another bit-field in a
structure shall be packed into adjacent bits of the same unit. If insufficient space remains,
whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is
implementation-defined. The order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined. The alignment of the
addressable storage unit is unspecified.
¶12 A bit-field declaration with no declarator, but only a colon and a width, indicates an
unnamed bit-field.126) As a special case, a bit-field structure member with a width of 0
indicates that no further bit-field is to be packed into the unit in which the previous bitfield,
if any, was placed.
122) While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and
value bits) of a _Bool may be just 1 bit.
124) The unary & (address-of) operator cannot be applied to a bit-field object; thus, there are no pointers to
or arrays of bit-field objects.
125) As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int,
then it is implementation-defined whether the bit-field is signed or unsigned.
126) An unnamed bit-field structure member is useful for padding to conform to externally imposed
layouts.
Annex J of the standard defines Portability Issues, and §J.3 defines Implementation-defined Behaviour. In part, it says:
J.3.9 Structures, unions, enumerations, and bit-fields
¶1 — Whether a ‘‘plain’’ int bit-field is treated as a signed int bit-field or as an
unsigned int bit-field (6.7.2, 6.7.2.1).
— Allowable bit-field types other than _Bool, signed int, and unsigned int
(6.7.2.1).
— Whether atomic types are permitted for bit-fields (6.7.2.1).
— Whether a bit-field can straddle a storage-unit boundary (6.7.2.1).
— The order of allocation of bit-fields within a unit (6.7.2.1).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js