Constant enum size no matter the number of enumerated values

Constant enum size no matter the number of enumerated values - c++

Why is the size of an enum always 2 or 4 bytes (on a 16- or 32-bit architecture respectively), regardless of the number of enumerators in the type?
Does the compiler treat an enum like it does a union?

In both C and C++, the size of an enum type is implementation-defined, and is the same as the size of some integer type.
A common approach is to make all enum types the same size as int, simply because that's typically the type that makes for the most efficient access. Making it a single byte, for example, would save a very minor amount of space, but could require bigger and slower code to access it, depending on the CPU architecture.
In C, enumeration constants are by definition of type int. So given:
enum foo { zero, one, two };
enum foo obj;
the expression zero is of type int, but obj is of type enum foo, which may or may not have the same size as int. Given that the constants are of type int, it tends to be easier to make the enumerated type the same size.
In C++, the rules are different; the constants are of the enumerated type. But again, it often makes the most sense for each enum type to be one "word", which is typically the size of int, for efficiency reasons.
And the 2011 ISO C++ standard added the ability to specify the underlying integer type for an enum type. For example, you can now write:
enum foo: unsigned char { zero, one, two };
which guarantees that both the type foo and the constants zero, one, and two have a size of 1 byte. C does not have this feature, and it's not supported by older pre-2011 C++ compilers (unless they provide it as a language extension).
(Digression follows.)
So what if you have an enumeration constant too big to fit in an int? You don't need 231, or even 215, distinct constants to do this:
#include <limits.h>
enum huge { big = INT_MAX, bigger };
The value of big is INT_MAX, which is typically 231-1, but can be as small as 215-1 (32767). The value of bigger is implicitly big + 1.
In C++, this is ok; the compiler will simply choose an underlying type for huge that's big enough to hold the value INT_MAX + 1. (Assuming there is such a type; if int is 64 bits and there's no integer type bigger than that, that won't be possible.)
In C, since enumeration constants are of type int, the above is invalid. It violates the constraint stated in N1570 6.7.2.2p2:
The expression that defines the value of an enumeration constant shall
be an integer constant expression that has a value representable as an
int.
and so a compiler must reject it, or at least warn about it. gcc, for example, says:
error: overflow in enumeration values

An enum is not a structure, it's just a way of giving names to a set of integers. The size of a variable with this type is just the size of the underlying integer type. This will be a type needed to hold the largest value in the enum. So as long as all the types fit in the same integer type, the size won't change.

The size of an enum is implementation-defined -- the compiler is allowed to choose whatever size it wants, as long as it's large enough to fit all of the values. Some compilers choose to use 4-byte enums for all enum types, while some compilers will choose the smallest type (e.g. 1, 2, or 4 bytes) which can fit the enum values. The C and C++ language standards allow both of these behaviors.
From C99 §6.7.2.2/4:
Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,110) but shall be
capable of representing the values of all the members of the enumeration.
From C++03 §7.2/5:
The underlying type of an enumeration is an integral type that can represent all the enumerator values
defined in the enumeration. It is implementation-defined which integral type is used as the underlying type
for an enumeration except that the underlying type shall not be larger than int unless the value of an enumerator
cannot fit in an int or unsigned int. If the enumerator-list is empty, the underlying type is
as if the enumeration had a single enumerator with value 0. The value of sizeof() applied to an enumeration
type, an object of enumeration type, or an enumerator, is the value of sizeof() applied to the
underlying type.

It seems to me that the OP has assumed that an enum is some kind of collection which stores the values declared in it. This is incorrect.
An enumeration in C/C++ is simply a numeric variable with strictly defined value range. The names of the enum are kind of aliases for numbers.
The storage size is not influenced by the amount of the values in enumeration. The storage size is implementation defined, but mostly it is the sizeof(int).

The size of an enum is "an integral type at least large enough to contain any of the values specified in the declaration". Many compilers will just use an int (possibly unsigned), but some will use a char or short, depending on optimization or other factors. An enum with less than 128 possible values would fit in a char (256 for unsigned char), and you would have to have 32768 (or 65536) values to overflow a short, and either 2 or 4 billion values to outgrow an int on most modern systems.
An enum is essentially just a better way of defining a bunch of different constants. Instead of this:
#define FIRST 0
#define SECOND 1
...
you just:
enum myenum
{ FIRST,
SECOND,
...
};
It helps avoid assigning duplicate values by mistake, and removes your need to even care what the particular values are (unless you really need to).

The big problem with making an enum type smaller than int when a smaller type could fit all the values is that it would make the ABI for a translation unit dependent on the number of enumeration constants. For instance, suppose you have a library that uses an enum type with 256 constants as part of its public interface, and the compiler chooses to represent the type as a single byte. Now suppose you add a new feature to the library and now need 257 constants. The compiler would have to switch to a new size/representation, and now all object files compiled for the old interface would be incompatible with your updated library; you would have to recompile everything to make it work again.
Thus, any sane implementation always uses int for enum types.

Related

Enumeration conversion and undefined behavior

From cpprefernce/static_cast/8:
A value of integer or enumeration type can be converted to any
complete enumeration type.
If the underlying type is not fixed, the
behavior is undefined if the value of expression is out of range (the
range is all values possible for the smallest bit field large enough
to hold all enumerators of the target enumeration).
I have two questions here:
How the underlying type of an enum cannot be fixed. Consider this example:
enum A : int { i = -1, b = 'c' };
Does the enum A's underlying type is fixed, regardless of the type of the enumerator values? Does the fixation of the underlying type is determined by either specifying the type or not, regardless of the type of the enumerator values? For example does this enum is fixed enum B { b, c }?
How I can determine the range of an enumeration. Consider this example:
enum N { c = 'A', hex = 0x64 };
Does the range of enum N is from 65 to 100? Hence the behavior is undefined in the following casts:
static_cast<N>(64) // UB?
static_cast<N>(101) // UB?

I'm going to write this with a focus on how to read cppreference.com. (This might make it seem more like I am answering one question instead of going over the limit.)
How the underlying type of an enum cannot be fixed.
This is a question about the enumeration type. So if the answer is not on the current page, the next place to look would be the page for
enumeration type.
Conveniently, on the page you linked to, the phrase "enumeration type" is linked, so you you don't have to search; just click the link.
Once you get to the enumeration type page, you are interested in "fixed". So do a find-in-page (ctrl-f) for "fixed".
The first occurrence of "fixed" is highly suggestive of what it means, while the second and third define the term in this context. The definition of an unscoped enumeration type whose underlying type is not fixed looks like form (1) in that section
enum A { i = -1, b = 'c' };
while the definition of an unscoped enumeration type whose underlying type is fixed looks like form (2) in that section
enum A : int { i = -1, b = 'c' };
If you specify the underlying type, then the underlying type is what you specify; it is fixed. If you do not specify the underlying type, then the underlying type is whatever the compiler decides to use; it is not determined in advance (i.e. not fixed).
How I can determine the range of an enumeration
This is given in your quote:
the range is all values possible for the smallest bit field large enough to hold all enumerators of the target enumeration
That's a lot of words, but just take it one piece at a time. Let's use your example:
enum N { c = 'A', hex = 0x64 };
The range is all values possible for the smallest bit field large enough to hold all enumerators of the target enumeration.
The range is all values possible for the smallest bit field large enough to hold all enumerators of N.
The range is all values possible for the smallest bit field large enough to hold 'A' and 0x64.
The range is all values possible for the smallest bit field large enough to hold 65 and 100.
The range is all values possible for the smallest bit field large enough to hold values representable with 7 bits (unsigned).
The range is all values possible for a bit field of length 7.
The range is 0 to 127.
While the compiler has some leeway in choosing the underlying type, the underlying type must be able to represent all enumerators. No matter what underlying type is chosen, it will be comprised of bits and be at least as large as the "smallest bit field" from the definition. The range of an enumeration consists of the values that will be representable no matter what underlying type is chosen. Values outside this range might fit in the underlying type, but they might not. Hence, whether or not they can be converted is undefined.

What happens if an enum cannot fit into an integral type?

I came across this question about the underlying types of enums, where an answers quotes Standard C++ 7.2/5 as:
The underlying type of an enumeration is an integral type that can represent all the enumerator values defined in the enumeration. It is implementation-defined which integral type is used as the underlying type for an enumeration except that the underlying type shall not be larger than int unless the value of an enu- merator cannot fit in an int or unsigned int.
This is pretty clear for all reasonable cases. But what happens if I make an enum so ridiculously large that it can't even fit in a long long?
(I don't know why this would ever happen in practice, but maybe I'm feeling destructive and have a free afternoon)
Is this behavior defined by the standard?

The behaviour of
enum foo : int
{
bar = INT_MAX,
oops
};
and similar is undefined.
I've cheated a little here by forcing the type to an int, but the same applies to the largest integral type available on your platform.

Extremely large enumerators (> 64 bits types)

TL;DR How many elements could an enumerator hold, and would that number be larger than a 64-bit or 96-bit number?
In my experience with C and C++, I've found that the largest variable type that will compile is 96 bits (12 bytes) and even that is a non-standard addition by GCC 64 bit. So, I started thinking about enumerators and how large an enumerator could be. Let's say, for the sake of simplicity, I have an enumerator-type called foo:
enum foo
{
//Insert types here
}
And that we have the enumerator filled with an ENORMOUS amount of types:
enum foo
{
type1,
type2,
type3,
//Some keyboard-time later....
type9999999999999999999999999999999999999997,
type9999999999999999999999999999999999999998,
type9999999999999999999999999999999999999999,
type10000000000000000000000000000000000000000 //That's fifty zeroes
}
Would that even compile? (Yes,I know compilation would give me a good idea of how long an ice age lasts, but still) And would I be able to declare a foo with the value of every single one of the types?

From the standard:
For an enumeration whose underlying type is not fixed, the underlying type is an integral type that can represent all the enumerator values defined in the enumeration. If no integral type can represent all the enumerator values, the enumeration is ill-formed.
So for standard conforming compiler the program should not compile if there is no integral type that can hold all the enumerator values.
This is from the working draft https://isocpp.org/std/the-standard, but I doubt that part has changed.

Is this warning related to enum class size wrong?

Warning:
src/BoardRep.h:49:12: warning: ‘BoardRep::BoardRep::Row::<anonymous struct>::a’
is too small to hold all values of ‘enum class BoardRep::Piece’
[enabled by default]
Piece a:2;
^
Enum:
enum class Piece: unsigned char {
EMPTY,
WHITE,
BLACK
};
Use:
union Row {
struct {
Piece a:2;
Piece b:2;
Piece c:2;
Piece d:2;
Piece e:2;
Piece f:2;
Piece g:2;
Piece h:2;
};
unsigned short raw;
};
With an enum I'd agree with GCC, it may have to truncate but that's because enums are not really separate from integers and pre-processor definitions. However an enum class is much stronger. If it is not strong enough to assume ALL Piece values taken as integers will be between 0 and 2 inclusive then the warning makes sense. Otherwise GCC is being needlessly picky and it might be worth mailing the list to say "look, this is a silly warning"
Incase anyone cannot see the point
You can store 4 distinct values in 2 bits of data, I only need 3 distinct values, so any enum of length 4 or less should fit nicely in the 2 bits given (and my enum does "derive" (better term?) from an unsigned type). If I had 5 or more THEN I'd expect a warning.

The warning issued by gcc is accurate, there's no need to compose a mail to the mailing list asking them to make the warning less likely to appear.
The standard says that an enumeration with the underlying type of unsigned char cannot be represented by a bitfield of length 2; even if there are no enumerations that holds such value.
THE STANDARD
The underlying value of an enumeration is valid even if there are no enum-keys corresponding to this value, the standard only says that a legal value to be stored inside an enumeration must fit inside the underlying type; it doesn't state that such value must be present among the enum-keys.
7.2 Enumeration declarations [dcl.enum]
7 ... It is possible to define an enumeration that has values not defined by any of its enumerators. ...
Note: the quoted section is present in both C++11, and the draft of C++14.
Note: wording stating the same thing, but using different terminology, can be found in C++03 under [dcl.enum]p6
Note: the entire [decl.enum]p7 hasn't been included to preserve space in this post.
DETAILS
enum class E : unsigned char { A, B, C };
E x = static_cast<E> (10);
Above we initialize x to store the value 10, even if there's no enumeration-key present in the enum-declaration of enum class E this is still a valid construct.
With the above in mind we easily deduce that 10 cannot be stored in a bit-field of length 2, so the warning by gcc is nothing but accurate.. we are potentially trying to store values in our bit-field that it cannot represent.
EXAMPLE
enum class E : unsigned char { A, B, C };
struct A {
E value : 2;
};
A val;
val.value = static_cast<E> (10); // OMG, OPS!?

According to the C++ Standard
8 For an enumeration whose underlying type is fixed, the values of the
enumeration are the values of the underlying type.
So the values of your enumeration are in the range
std::numeric_limits<unsigned char>::min() - std::numeric_limits<unsigned char>::max()
Bit field a defined as
Piece a:2;
can not hold all values of the enumeration.
If you would define an unscoped enumeration without a fixed underlying type then the range of its values would be
0 - 2

Yes, this warning is pointless because GCC already warns about assigning a value to a bitfield field (of enum type) that is truncated like this:
warning: conversion from 'Some_Enum' to 'unsigned char:2'
changes value from '(Some_Enum)9' to '1' [-Woverflow]
At the location of the warning it's only relevant that all declared enumerators can be held inside the bitfield field.
The statement that other values that are in the range of the underlying integer type (but don't correspond to a declared enumerator, which btw is well-defined, in general) can't be represented by the field, if ever assigned, is technically true, but has zero entropy, as a warning.
Thus, this warning was fixed in GCC 9.3.
IOW, GCC 9.3 and later don't warn about such code, anymore.

typechecking provided on enum

I would expect the following code snippet to complain about trying to assign something other that 0,1,2 to a Color variable.
But the following does compile and I get the output
Printing:3
3
Can anybody explain why? Is enum not meant to be a true user-defined type? Thanks.
enum Color { blue=0,green=1,yellow=2};
void print_color(Color x);
int main(){
Color x=Color(3);
print_color(x);
std::cout << x << std::endl;
return 0;
}
void print_color(Color x)
{
std::cout << "Printing:" << x << std::endl;
}

Since you manually cast the 3 to Color, the compiler will allow you to do that. If you tried to initialize the variable x with a plain 3 without a cast, you would get a diagnostic.
Note that the range of values an enumeration can store is not limited by the enumerators it contains. It's the range of values of the smallest bitfield that can store all enumerator values of the enumeration. That is, the range of your enumeration type is 0..3:
00
01
10
11
The value 3 is thus still in range, and so the code is valid. Had you cast a 4, then the resulting value would be left unspecified by the C++ Standard.
In practice, the implementation has to chose an underlying integer type for the enumeration. The smallest type it can choose is char, but which is still able to at least store values ranging up to 127. But as mentioned, the compiler is not required to convert a 4 to a value of 4, because it's outside the range of your enumeration.
I figure i should post some explanation on the difference of "underlying type" and "range of enumeration values". The range of values for any type is the smallest and largest value of that type. The underlying type of an enumeration must be able to store the value of any enumerator (of course) - and two enumerations that have the same underlying type are layout compatible (this allows some flexibility in case a type mismatch occurs).
So while the underlying type is meant to fix the object representation (alignment and size), the values of the enumeration is defined as follows in 7.2/6
For an enumeration where emin is the smallest enumerator and emax is the largest, the values of the enumeration are the values of the underlying type in the range bmin to bmax, where bmin and bmax are, respectively, the smallest and largest values of the smallest bit-field that can store emin and emax . It is possible to define an enumeration that has values not defined by any of its enumerators.
[Footnote: On a two’s-complement machine, bmax is the smallest value greater than or equal to max (abs(emin) − 1 ,abs(emax)) of the form
2M−1; bmin is zero if emin is non-negative and −(bmin+1) otherwise.]

Color(3) is a cast, with the same semantic as (Color)3, it isn't a constructor. Note that you can also use static_cast<Color>(3) for the same conversion but you can't use Color x(3).

Enum in C++ is more of a set of named integer constants than a true type, from compile-time checking point of view. However, the C++ standard has this to say [dcl.enum]:
9 An expression of arithmetic or enumeration type can be converted to an
enumeration type explicitly. The value is unchanged if it is in the
range of enumeration values of the enumeration type; otherwise the
resulting enumeration value is unspecified.
"Unspecified" is slightly better than the usual "undefined behavior".

Both the C and the C++ standards are kind of confusing on the subject of enums. Both insist that enums are "distinct types" but then both treat them as the underlying integral type. C++ even refers to an italic term "underlying type" which is only sort-of defined when introducing wchar_t.
In summary, wchar_t and enum types are "distinct" but simply mapped to an underlying integral type chosen by the implementation, and this is no doubt due to the need to be compatible with historical enum which was definitely just an int.
Modern compilers typically have options to add more type-like behavior to enums, turning on warnings and errors for various misuses. These can't be a default because they elect non-conforming behavior.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js