struct bitfield max size (C99, C++)

struct bitfield max size (C99, C++) - c++

What is maximal bit width for bit struct field?
struct i { long long i:127;}
Can I define a bit field inside struct, with size of bitfield up to 128 bit, or 256 bit, or larger? There are some extra-wide vector types, like sse2 (128-bit), avx1/avx2 (256-bit), avx-512 (512-bit for next Xeon Phis) registers; and also extensions like __int128 in gcc.

C99 §6.7.2.1, paragraph 3:
The expression that specifies the
width of a bit-field shall be an
integer constant expression that has
nonnegative value that shall not
exceed the number of bits in an object
of the type that is specified if the
colon and expression are omitted. If
the value is zero, the declaration
shall have no declarator.
C++0xa §9.6, paragraph 1:
... The constant-expression shall be an
integral constant expression with a
value greater than or equal to zero.
The value of the integral constant
expression may be larger than the
number of bits in the object
representation (3.9) of the
bit-field’s type; in such cases the
extra bits are used as padding bits
and do not participate in the value
representation (3.9) of the bit-field.
So in C you can't do that at all, and in C++ it won't do what you want it to.

The C++ Standard sets no limits on the size of a bit-field, other than that it must be greater or equal to zero - section 9.6/1. It also says:
Bit-fields are packed into some
addressable allocation unit. [Note:
bit-fields straddle allocation units
on some machines and not on others.
Bit-fields are assigned right-to-left
on some machines, left-to-right on
others. ]
Which I suppose could be taken to indicate some sort of maximum size.
This does not mean that your specific compiler implementation supports arbitrarily sized bit-fields, of course.

Typically, you cannot allocate more bits than the underlying type has. If long long is 64 bits, then your bitfield is probably limited to :64.

Since the values of bit-fields are assigned to integers, I'd assume that the largest bit-field value you can use is that of the size of intmax_t.
Edit:
From the C99 Spec:
6.7.2.1 Bullet 9:
A bit-field is interpreted as a signed
or unsigned integer type consisting of
the specified number of bits. If
the value 0 or 1 is stored into a
nonzero-width bit-field of type
_Bool, the value of the bit-field shall compare equal to the value
stored.
6.7.2.1 Bullet 10:
An implementation may allocate any
addressable storage unit large enough
to hold a bit- field. If enough space
remains, a bit-field that immediately
follows another bit-field in a
structure shall be packed into
adjacent bits of the same unit. If
insufficient space remains, whether a
bit-field that does not fit is put into
the next unit or overlaps adjacent
units is implementation-defined. The
order of allocation of bit-fields
within a unit (high-order to low-order
or low-order to high-order) is
implementation-defined. The alignment
of the addressable storage unit is
unspecified.

Related

How to check whether an int variable contains a legal (not trap representation) value?

Context:
This is mainly a followup to that other question. OP wanted to guess whether a variable contained an int or not, and my first thought was that in C (as in C++) an int variable could only contain an int value. And Eric Postpischil reminded me that trap representations were allowed per standard for the int type...
Of course, I know that most modern system only use 2-complement representations of integers and no padding bits, meaning that no trap representation can be observed. Nevertheless both standards seem to still allow for 3 representations of signed types: sign and magnitude, one's complement and two's complement. And at least C18 draft (n2310 6.2.6 Representations of types) explicitely allows padding bits for integer types other that char.
Question
So in the context of possible padding bits, or non two's complement signed representation, int variables could contain trap values for conformant implementations. Is there a reliable way to make sure that an int variable contains a valid value?

In C++'s current working draft (for C++20), an integer cannot have a trap representation. An integer is mandated as two's complement: ([basic.fundamental]/3)
An unsigned integer type has the same object representation, value representation, and alignment requirements ([basic.align]) as the corresponding signed integer type.
For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2N has the same value of corresponding bits in its value representation. 41
[ Example: The value −1 of a signed integer type has the same representation as the largest value of the corresponding unsigned type.
— end example
]
Where the note 41 says
This is also known as two's complement representation.
This was changed in p0907.
Additionally, padding bits in integers cannot cause traps: ([basic.fundamental/4])
Each set of values for any padding bits ([basic.types]) in the object representation are alternative representations of the value specified by the value representation.
[ Note: Padding bits have unspecified value, but do not cause traps.
See also ISO C 6.2.6.2.
— end note
]

Is it safe to memset bool to 0?

Suppose I have some legacy code which cannot be changed unless a bug is discovered, and it contains this code:
bool data[32];
memset(data, 0, sizeof(data));
Is this a safe way to set all bool in the array to a false value?
More generally, is it safe to memset a bool to 0 in order to make its value false?
Is it guaranteed to work on all compilers? Or do I to request a fix?

Is it guaranteed by the law? No.
C++ says nothing about the representation of bool values.
Is it guaranteed by practical reality? Yes.
I mean, if you wish to find a C++ implementation that does not represent boolean false as a sequence of zeroes, I shall wish you luck. Given that false must implicitly convert to 0, and true must implicitly convert to 1, and 0 must implicitly convert to false, and non-0 must implicitly convert to true … well, you'd be silly to implement it any other way.
Whether that means it's "safe" is for you to decide.
I don't usually say this, but if I were in your situation I would be happy to let this slide. If you're really concerned, you can add a test executable to your distributable to validate the precondition on each target platform before installing the real project.

No. It is not safe (or more specifically, portable). However, it likely works by virtue of the fact that your typical implementation will:
use 0 to represent a boolean (actually, the C++ specification requires it)
generate an array of elements that memset() can deal with.
However, best practice would dictate using bool data[32] = {false} - additionally, this will likely free the compiler up to internally represent the structure differently - since using memset() could result in it generating a 32 byte array of values rather than, say, a single 4 byte that will fit nicely within your average CPU register.

Update
P1236R1: Alternative Wording for P0907R4 Signed Integers are Two's Complement says the following:
As per EWG decision in San Diego, deviating from P0907R3, bool is specified to have some integral type as its underlying type, but the presence of padding bits for "bool" will remain unspecified, as will the mapping of true and false to values of the underlying type.
Original Answer
I believe this unspecified although it seems likely the underlying representation of false would be all zeros. Boost.Container relies on this as well (emphasis mine):
Boost.Container uses std::memset with a zero value to initialize some
types as in most platforms this initialization yields to the desired
value initialization with improved performance.
Following the C11 standard, Boost.Container assumes that for any
integer type, the object representation where all the bits are zero
shall be a representation of the value zero in that type. Since
_Bool/wchar_t/char16_t/char32_t are also integer types in C, it considers all C++ integral types as initializable via std::memset.
This C11 quote they they point to as a rationale actually comes from a C99 defect: defect 263: all-zero bits representations which added the following:
For any integer type, the object representation where all the bits are
zero shall be a representation of the value zero in that type.
So then the question here is the assumption correct, are the underlying object representation for integer compatible between C and C++?
The proposal Resolving the difference between C and C++ with regards to object representation of integers sought to answer this to some extent which as far as I can tell was not resolved. I can not find conclusive evidence of this in the draft standard. We have a couple of cases where it links to the C standard explicitly with respect to types. Section 3.9.1 [basic.fundamental] says:
[...] The signed and unsigned integer types shall satisfy the
constraints given in the C standard, section 5.2.4.2.1.
and 3.9 [basic.types] which says:
The object representation of an object of type T is the sequence of N
unsigned char objects taken up by the object of type T, where N equals
sizeof(T). The value representation of an object is the set of bits
that hold the value of type T. For trivially copyable types, the value
representation is a set of bits in the object representation that
determines a value, which is one discrete element of an
implementation-defined set of values.44
where footnote 44(which is not normative) says:
The intent is that the memory model of C++ is compatible with that of
ISO/IEC 9899 Programming Language C.
The farthest the draft standard gets to specifying the underlying representation of bool is in section 3.9.1:
Types bool, char, char16_t, char32_t, wchar_t, and the signed and
unsigned integer types are collectively called integral types.50 A
synonym for integral type is integer type. The representations of
integral types shall define values by use of a pure binary numeration
system.51 [ Example: this International Standard permits 2’s
complement, 1’s complement and signed magnitude representations for
integral types. —end example ]
the section also says:
Values of type bool are either true or false.
but all we know of true and false is:
The Boolean literals are the keywords false and true. Such literals
are prvalues and have type bool.
and we know they are convertible to 0 an 1:
A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one.
but this gets us no closer to the underlying representation.
As far as I can tell the only place where the standard references the actual underlying bit value besides padding bits was removed via defect report 1796: Is all-bits-zero for null characters a meaningful requirement? :
It is not clear that a portable program can examine the bits of the representation; instead, it would appear to be limited to examining the bits of the numbers corresponding to the value representation (3.9.1 [basic.fundamental] paragraph 1). It might be more appropriate to require that the null character value compare equal to 0 or '\0' rather than specifying the bit pattern of the representation.
There are more defect reports that deal with the gaps in the standard with respect to what is a bit and difference between the value and object representation.
Practically, I would expect this to work, I would not consider it safe since we can not nail this down in the standard. Do you need to change it, not clear, you clearly have a non-trivial trade-off involved. So assuming it works now the question is do we consider it likely to break with future versions of various compilers, that is unknown.

From 3.9.1/7:
Types bool , char , char16_t , char32_t , wchar_t , and the signed and
unsigned integer types are collectively called integral types. A
synonym for integral type is integer type . The representations of
integral types shall define values by use of a pure binary numeration
system.
Given this I can't see any possible implementation of bool that wouldn't represent false as all 0 bits.

Why alignment is power of 2?

There is a quote from cppreference:
Every object type has the property called alignment requirement, which
is an integer value (of type std::size_t, always a power of 2)
representing the number of bytes between successive addresses at which
objects of this type can be allocated.
I understand, this reference is non-normative. But there is no something about value of alignof(T) in the standard, rather than it is no more than alignof(std::max_align_t).
It is not obviously, that alignment is power of 2. Why does alignment not be a 3?

The standard has the final word for the language, so here a quote of that section. I bolded the power-of-2 requirement:
3.11 Alignment [basic.align]
1 Object types have alignment requirements (3.9.1, 3.9.2) which place restrictions on the addresses at which an object of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type; stricter alignment can be requested using the alignment specifier (7.6.2).
2 A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported by the implementation in all contexts, which is equal to alignof(std::max_align_t) (18.2). The alignment required for a type might be different when it is used as the type of a complete object and when it is used as the type of a subobject. [ Example:
struct B { long double d; };
struct D : virtual B { char c; }
When D is the type of a complete object, it will have a subobject of type B, so it must be aligned appropriately for a long double. If D appears as a subobject of another object that also has B as a virtual base class, the B subobject might be part of a different subobject, reducing the alignment requirements on the D subobject. —end example ] The result of the alignof operator reflects the alignment requirement of the type in the complete- object case.
3 An extended alignment is represented by an alignment greater than alignof(std::max_align_t). It is implementation-defined whether any extended alignments are supported and the contexts in which they are supported (7.6.2). A type having an extended alignment requirement is an over-aligned type. [ Note: every over-aligned type is or contains a class type to which extended alignment applies (possibly through a non-static data member). —end note ]
4 Alignments are represented as values of the type std::size_t. Valid alignments include only those values returned by an alignof expression for the fundamental types plus an additional implementation-defined set of values, which may be empty. Every alignment value shall be a non-negative integral power of two.
5 Alignments have an order from weaker to stronger or stricter alignments. Stricter alignments have larger alignment values. An address that satisfies an alignment requirement also satisfies any weaker valid alignment requirement.
Why did all implementations conform to that requirement (That's part of the reason it could be included at all)?
Well, because it is natural to multiply / divide / mask powers of 2 in binary, and all systems were (excluding some really ancient ones), are, and for the foreseeable future will stay fundamentally binary.
Being natural means it is much more efficient than any other multiplications / divisions / modulo arithmetic, sometimes by orders of magnitude.
As #MooingDuck points out, this fundamental binary nature of computing platforms has already pervaded the language and its standard to such an extent, trying to build a non-binary conforming implementation is about on-par with untying the gordian knot without just cutting it. There are really few computer languages where that's not true.
Also, see a table of word sizes on wikipedia for corroboration.

That's how computers are built.
A computer has a natural 'word' size that is handled more easily than other sizes. On 64-bit CPUs, the size is 8-bytes. Operating on 8-bytes is most efficient. The hardware is built in a way that fetching memory that is aligned to this word size is also more efficient. So alignment is usually based on the CPU's word size.
Word sizes are powers of two because, again, that's how computers are built. Everything comes down to bits - so does the number of bits in a word. It's easier to design the hardware where the number of bits in a word is itself a power of two.

Is the non-negative range of a signed C++ integer at least as big as the negative range?

Does the C++ standard mandate that the non-negative range of a standard signed integer type is at least as big as the negative range?
EDIT: Please note that I am referring to the non-negative range here, not the positive range which is obviously one smaller than the non-negative range.
EDIT: If we assume C++11, the answer is "Yes". See my clarifications below. From the point of view of C++03, the answer is probably "No".
The same question can be posed as follows: Does the standard guarantee that the result of a - b is representable in a standard signed integer type T assuming that both a and b are negative values of type T, and that a ≥ b?
I know that the standard allows for two's complement, ones' complement, and sign magnitude representation of negative values (see C++11 section 3.9.1 [basic.fundamental] paragraph 7), but I am not sure whether it demands the use of one of those three representations. Probably not.
If we assume one of these three representations, and we assume that there is no "spurious" restrictions on either of the two ranges (negative, and non-negative), then it is indeed true that the non-negative range is at least as big as the negative one. In fact, with two's complement the size of the two ranges will be equal, and with the two other representations, the size of the non-negative range will be one greater than the size of the negative one.
However, even if we assume one of the mentioned representations, it is really not enough to guarantee anything about the size of either range.
What I am seeking here, is a section (or set of sections) that unambiguously provides the desired guarantee.
Any help will be appreciated.
Note that something like the following would suffice: Every bit within the "storage slot" of the integer has one, and only one of the following functions:
Unused
Sign bit (exclusively, or mixed sign/value bit)
Value bit (participating in the value)
I have a vague memory that C99 says something along those lines. Anyone that knows anything about that?
Alright, C99 (with TC3) does provide the necessary guarantees in section 6.2.6.2 "Integer types" paragraph 2:
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M ≤ N ). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value −(2N ) (two’s complement);
the sign bit has the value −(2N − 1) (ones’ complement ).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.
Can someone confirm that this part of C99 is also a binding part of C++11?
I have taken another careful look at both the C99 and the C++11 standards, and it is clear that the guarantees in C99 section 6.2.6.2 paragraph 2 are binding in C++11 too.
C89/C90 does not provide the same guarantees, so we do need C99, which means that we do need C++11.
In summary, C++11 (and C99) provides the following guarantees:
Negative values in fundamental signed integers types (standard + extended) must be represented using one of the following three representations: Two's complement, ones' complement, or sign magnitude.
The size of the non-negative range is one greater than, or equal to the size of the negative range for all fundamental signed integers types (standard + extended).
The second guarantee can be restated as follows:
-1 ≤ min<T> + max<T> ≤ 0
for any fundamental signed integers type T (standard + extended) where min<T> and max<T> are shorthands for std::numeric_limits<T>::min() and std::numeric_limits<T>::max() respectively.
Also, if we assume that a and b are values of the same, or of different fundamental signed integer types (standard or extended), then it follows that a - b is well defined and representable in decltype(a - b) as long as a and b are either both negative or both non-negative.

The standard does seem to not mandate such a thing although I may be missing key passages. All we know about fundamental signed integral types is in 3.9.1/2:
There are five standard signed integer types : “signed char”, “short
int”, “int”, “long int”, and “long long int”. In this list, each type
provides at least as much storage as those preceding it in the list.
And in 3.9.1/7:
Types bool, char, char16_t, char32_t, wchar_t, and the signed and
unsigned integer types are collectively called integral types.48 A
synonym for integral type is integer type. The representations of
integral types shall define values by use of a pure binary numeration
system.
Neither of these passages seem to say anything about the respective positive and negative ranges. Even given that I can't conceive of a binary representation that wouldn't meet your needs.

does in c++ the conversion from unsigned int to int always preserve the bit pattern?

From the standard (4.7) it looks like the conversion from int to unsigned int, when they both use the same number of bits, is purely conceptual:
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2 n where n
is the number of bits used to represent the unsigned type). [ Note: In
a two’s complement representation, this conversion is conceptual and
there is no change in the bit pattern (if there is no truncation). —
end note ]
So in this direction the conversion preserves the bitmask. I am not sure the standard guarantees the same for the conversion from unsigned int to int (again, assuming the same number of bits are used). The standard here says:
If the destination type is signed, the value is unchanged if it can be
represented in the destination type (and bit-ﬁeld width); otherwise,
the value is implementation-deﬁned.
What does it exactly mean "the destination type" here? For instance 2^32-1 cannot be represented by a 32 bit int. Does that mean that it cannot be represented in the destination type and therefore it cannot be assumed that the bit pattern will stay the same?

You cannot assume anything.
The first quote doesn't state that the bitmask remains the same. It may be the same in two's complement, but not in one's complement or other representations.
Second, implementation-deﬁned means implementation-deﬁned, you can't assume anything in general.
In theory, the representation can be completely different after each conversion. That's it.
If you look at it in a realistic way things come more concrete. Usually, int's are stored in two's complement and signed->unsigned preserves the pattern as unsigned->signed does (since the value can be implementation-deﬁned, the cheapest way is doing nothing).

int is the destination type in this case. As you say 2^32-1 cannot be represented so in this case so it is implementation-specific. Although, I've only ever seen it preserve bit patterns.
EDIT: I should add that in the embedded world often whats done when one storage location needs multiple representations that are bit-for-bit identical we often use unions.
so in this case
union FOO {
int32_t signedVal;
uint32_t unsignedVal;
} var;
var can be accessed as var.signedVal to get the 32 bits stored as a signed int and var.unsignedVal to get the 32 bits stored as an unsigned value. In this case bits will be preserved.

"Destination type" refers to the type you're assigning/casting to.
The whole paragraph means a 32 bit unsigned int converted to a 32 bit signed int will stay as-is, given the value fits into the signed int. If they don't fit, it depends on the implementation on what it does (e.g. truncate). That means it really depends on the implementation whether the bits stay or whether they're changed (there's no guarantee).
Or in other words: If the unsigned int uses its most significant bit, the result is no longer predictable. Otherwise there's no change (other than changing the "name on the box").

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js