While working with some custom comparators, I encountered the need for a type that only had a single possible value. There are types like std::nullptr_t and empty structs where this is the case.
Then I considered the possibility of using an enum. I might declare an enum with a single value, something like
enum E
{
only_value // BUT IS IT??
};
But it seems that the standard says that all of the underlying type's values, which fit into the "smallest bitfield" that can contain the declared values, are valid.
From cppreference.com:
(The source value, as converted to the enumeration's underlying type if floating-point, is in range if it would fit in the smallest bit field large enough to hold all enumerators of the target enumeration.)
If you declare an enum with only a single enumerator, the smallest it can be is one bit. Following that logic, the unnamed enumerator with the bit's other value should be legal. If an enum is based on a signed integer, then -1 and 0 are always legal. On an unsigned integer, 0 and 1 are always legal.
Is there something else in the standard that makes the unnamed bit value illegal or UB?
It appears that enum E { value = 0; } only has one value, that of 0. The standard has this to say:
[dcl.enum]/8 ... Otherwise [if the underlying type is not fixed - IT], for an enumeration where emin is the smallest enumerator and emax is the largest, the values of the enumeration are the values in the range bmin to bmax, defined as follows: Let K be 1 for a two's complement representation and 0 for a ones' complement or sign-magnitude representation. bmax is the smallest value greater than or equal to max(|emin| − K, |emaxn|) and equal to 2M − 1, where M is a non-negative integer. bmin is zero if emin is non-negative and −(bmax + K) otherwise.
When emin == emax == 0, this math also produces bmin == bmax == 0, by my count.
However, before you get your hopes up, see Footnote 96:
This set of values is used to define promotion and conversion semantics for the enumeration type. It does not preclude an expression of enumeration type from having a value that falls outside this range.
I must admit it's not immediately obvious to me how to produce an expression of enumeration type with value outside the range. At least, static_cast to enumeration type exhibits undefined behavior if the value being cast falls outside the range.
#Igor's answer is correct as far as older (C++17 and older) versions of C++ go.
I was perplexed that the standard language he presented didn't match what cppreference showed, until #LanguageLawyer pointed out in a comment that Igor was working off an old version of the Standard.
In C++20 and later, the language is different (https://timsong-cpp.github.io/cppwp/n4868/dcl.enum#8.sentence-2).
Interestingly,
For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type.
Meaning, if an enumeration has its type specified, the full range of values for the underlying type are allowed for that enumeration! This makes it impossible to have an enum based on an unsigned type that "just fits".
But more pertinent to my question is what follows:
Otherwise, the values of the enumeration are the values representable by a hypothetical integer type with minimal width M such that all enumerators can be represented. The width of the smallest bit-field large enough to hold all the values of the enumeration type is M. It is possible to define an enumeration that has values not defined by any of its enumerators.
So, given
enum E
{
dummy = 0
};
the smallest bit-field is 1 bit, and it must have two values, the other one being -1 (since the default underlying type in C++ is int).
The sentence that follows in the Standard
If the enumerator-list is empty, the values of the enumeration are as if the enumeration had a single enumerator with value 0.
might seem worrying at first, until we realize the Standard makes a distinction between "enumerators" (the names inside the enum declaration) and "values". All it means is that
enum E0 { };
has an implicit enumerator, e.g. the same as if declared
enum E0 { dummy = 0 };
and it also has 1 bit to work with and the allowed -1 value.
Related
From cpprefernce/static_cast/8:
A value of integer or enumeration type can be converted to any
complete enumeration type.
If the underlying type is not fixed, the
behavior is undefined if the value of expression is out of range (the
range is all values possible for the smallest bit field large enough
to hold all enumerators of the target enumeration).
I have two questions here:
How the underlying type of an enum cannot be fixed. Consider this example:
enum A : int { i = -1, b = 'c' };
Does the enum A's underlying type is fixed, regardless of the type of the enumerator values? Does the fixation of the underlying type is determined by either specifying the type or not, regardless of the type of the enumerator values? For example does this enum is fixed enum B { b, c }?
How I can determine the range of an enumeration. Consider this example:
enum N { c = 'A', hex = 0x64 };
Does the range of enum N is from 65 to 100? Hence the behavior is undefined in the following casts:
static_cast<N>(64) // UB?
static_cast<N>(101) // UB?
I'm going to write this with a focus on how to read cppreference.com. (This might make it seem more like I am answering one question instead of going over the limit.)
How the underlying type of an enum cannot be fixed.
This is a question about the enumeration type. So if the answer is not on the current page, the next place to look would be the page for
enumeration type.
Conveniently, on the page you linked to, the phrase "enumeration type" is linked, so you you don't have to search; just click the link.
Once you get to the enumeration type page, you are interested in "fixed". So do a find-in-page (ctrl-f) for "fixed".
The first occurrence of "fixed" is highly suggestive of what it means, while the second and third define the term in this context. The definition of an unscoped enumeration type whose underlying type is not fixed looks like form (1) in that section
enum A { i = -1, b = 'c' };
while the definition of an unscoped enumeration type whose underlying type is fixed looks like form (2) in that section
enum A : int { i = -1, b = 'c' };
If you specify the underlying type, then the underlying type is what you specify; it is fixed. If you do not specify the underlying type, then the underlying type is whatever the compiler decides to use; it is not determined in advance (i.e. not fixed).
How I can determine the range of an enumeration
This is given in your quote:
the range is all values possible for the smallest bit field large enough to hold all enumerators of the target enumeration
That's a lot of words, but just take it one piece at a time. Let's use your example:
enum N { c = 'A', hex = 0x64 };
The range is all values possible for the smallest bit field large enough to hold all enumerators of the target enumeration.
The range is all values possible for the smallest bit field large enough to hold all enumerators of N.
The range is all values possible for the smallest bit field large enough to hold 'A' and 0x64.
The range is all values possible for the smallest bit field large enough to hold 65 and 100.
The range is all values possible for the smallest bit field large enough to hold values representable with 7 bits (unsigned).
The range is all values possible for a bit field of length 7.
The range is 0 to 127.
While the compiler has some leeway in choosing the underlying type, the underlying type must be able to represent all enumerators. No matter what underlying type is chosen, it will be comprised of bits and be at least as large as the "smallest bit field" from the definition. The range of an enumeration consists of the values that will be representable no matter what underlying type is chosen. Values outside this range might fit in the underlying type, but they might not. Hence, whether or not they can be converted is undefined.
Consider this C++ code:
enum class Color : char { red = 0x1, yellow = 0x2 }
// ...
char *data = ReadFile();
Color color = static_cast<Color>(data[0]);
Suppose that data[0] is actually 100. What is color set to according to the standard?
In particular, if I later do
switch (color) {
// ... red and yellow cases omitted
default:
// handle error
break;
}
does the standard guarantee that default will be hit? If not, what is the proper, most efficient, most elegant way to check for an error here? Does the standard make any guarantees as about this but with plain enum?
What is color set to according to the standard?
Answering with a quote from the C++11 and C++14 Standards:
[expr.static.cast]/10
A value of integral or enumeration type can be explicitly converted to an enumeration type. The value is unchanged if the original value is within the range of the enumeration values (7.2). Otherwise, the resulting value is unspecified (and might not be in that range).
Let's look up the range of the enumeration values: [dcl.enum]/7
For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type.
Before CWG 1766 (C++11, C++14)
Therefore, for data[0] == 100, the resulting value is specified(*), and no Undefined Behaviour (UB) is involved. More generally, as you cast from the underlying type to the enumeration type, no value in data[0] can lead to UB for the static_cast.
After CWG 1766 (C++17)
See CWG defect 1766.
The [expr.static.cast]p10 paragraph has been strengthened, so you now can invoke UB if you cast a value that is outside the representable range of an enum to the enum type. This still doesn't apply to the scenario in the question, since data[0] is of the underlying type of the enumeration (see above).
Please note that CWG 1766 is considered a defect in the Standard, hence it is accepted for compiler implementers to apply to to their C++11 and C++14 compilation modes.
(*) char is required to be at least 8 bit wide, but isn't required to be unsigned. The maximum value storable is required to be at least 127 per Annex E of the C99 Standard.
Compare to [expr]/4
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
Before CWG 1766, the conversion integral type -> enumeration type can produce an unspecified value. The question is: Can an unspecified value be outside the representable values for its type? I believe the answer is no -- if the answer was yes, there wouldn't be any difference in the guarantees you get for operations on signed types between "this operation produces an unspecified value" and "this operation has undefined behaviour".
Hence, prior to CWG 1766, even static_cast<Color>(10000) would not invoke UB; but after CWG 1766, it does invoke UB.
Now, the switch statement:
[stmt.switch]/2
The condition shall be of integral type, enumeration type, or class type. [...] Integral promotions are performed.
[conv.prom]/4
A prvalue of an unscoped enumeration type whose underlying type is fixed (7.2) can be converted to a prvalue of its underlying type. Moreover, if integral promotion can be applied to its underlying type, a prvalue of an unscoped enumeration type whose underlying type is fixed can also be converted to a prvalue of the promoted underlying type.
Note: The underlying type of a scoped enum w/o enum-base is int. For unscoped enums the underlying type is implementation-defined, but shall not be larger than int if int can contain the values of all enumerators.
For an unscoped enumeration, this leads us to /1
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
In the case of an unscoped enumeration, we would be dealing with ints here. For scoped enumerations (enum class and enum struct), no integral promotion applies. In any way, the integral promotion doesn't lead to UB either, as the stored value is in the range of the underlying type and in the range of int.
[stmt.switch]/5
When the switch statement is executed, its condition is evaluated and compared with each case constant. If one of the case constants is equal to the value of the condition, control is passed to the statement following the matched case label. If no case constant matches the condition, and if there is a default label, control passes to the statement labeled by the default label.
The default label should be hit.
Note: One could take another look at the comparison operator, but it is not explicitly used in the referred "comparison". In fact, there's no hint it would introduce UB for scoped or unscoped enums in our case.
As a bonus, does the standard make any guarantees as about this but with plain enum?
Whether or not the enum is scoped doesn't make any difference here. However, it does make a difference whether or not the underlying type is fixed. The complete [decl.enum]/7 is:
For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, for an enumeration where emin is the smallest enumerator and emax is the largest, the values of the enumeration are the values in the range bmin to bmax, defined as follows: Let K be 1 for a two's complement representation and 0 for a one's complement or sign-magnitude representation. bmax is the smallest value greater than or equal to max(|emin| − K, |emax|) and equal to 2M − 1, where M is a non-negative integer. bmin is zero if emin is non-negative and −(bmax + K) otherwise.
Let's have a look at the following enumeration:
enum ColorUnfixed /* no fixed underlying type */
{
red = 0x1,
yellow = 0x2
}
Note that we cannot define this as a scoped enum, since all scoped enums have fixed underlying types.
Fortunately, ColorUnfixed's smallest enumerator is red = 0x1, so max(|emin| − K, |emax|) is equal to |emax| in any case, which is yellow = 0x2. The smallest value greater or equal to 2, which is equal to 2M - 1 for a positive integer M is 3 (22 - 1). (I think the intent is to allow the range to extent in 1-bit-steps.) It follows that bmax is 3 and bmin is 0.
Therefore, 100 would be outside the range of ColorUnfixed, and the static_cast would produce an unspecified value before CWG 1766 and undefined behaviour after CWG 1766.
The following piece of text is from the C++14 N4296 working draft 7.2/8 [dcl.enum]:
For an enumeration whose underlying type is fixed, the values of the
enumeration are the values of the underlying type. Otherwise, for an
enumeration where emin is the smallest enumerator and emax is the
largest, the values of the enumeration are the values in the range
bmin to bmax, defined as follows: Let K be 1 for a two’s complement
representation and 0 for a one’s complement or sign-magnitude
representation. bmax is the smallest value greater than or equal to
max(|emin| − K, |emax|) and equal to 2M − 1, where M is a non-negative
integer. bmin is zero if emin is non-negative and −(bmax + K)
otherwise.
Let's consider how it works by example. The enum declared below has non-fixed underlying type:
enum E { x = -2, y = 2 }
Assume that implementation defines signed magnitude representation, therfore K = 0. Now emin = -2, emax = 2 and bmax = 2^2 -1 = 3, bmin = 2. Therefore values of the enumeration are from range from 2 to 3. What does that mean? What should that interval tell us about? We can't assign a value from the interval to a variable of the enumeration type. Watch:
#include <iostream>
enum A { x = -2, y = 2 };
A a = 2; // cannot initialize a variable of type 'A' with an rvalue of type 'int'
A b = 3; // cannot initialize a variable of type 'A' with an rvalue of type 'int'
int main(){ }
DEMO
Your range is incorrect
bmin is zero if emin is non-negative and −(bmax + K) otherwise.
bmax = 3 (i.e. 22 -1), therefore bmin = -(3 + 0) or -3
What does that mean? What should that interval tell us about?
As footnote 96 informs you:
96) This set of values is used to define promotion and conversion semantics for the enumeration type. It does not preclude an expression of enumeration type from having a value that falls outside this range.
Since the underlying type of enums is always integral, the range also implicitly limits the number of usable distinct values for any given enum (in this case 7, but see below for what happens when values are outside this range).
This formula results in the smallest possible type that can hold every value being selected as the underlying type. It also allows values of an enum with a range shorter than its underlying type to be packed into bitfields.
We can't assign a value from the interval to a variable of the enumeration type
Yes you can
A a = static_cast<A>(2);
If you cast an integer outside the enum's range however you will end up with an unspecified (i.e. worthless, but valid) value
[expr.static.cast]
10 A value of integral or enumeration type can be explicitly converted to an enumeration type. The value is unchanged if the original value is within the range of the enumeration values (7.2). Otherwise, the resulting value is unspecified (and might not be in that range).[...]
Defect 1766 has strengthened this to undefined behaviour in the next version of the standard
10 A value of integral or enumeration type can be explicitly converted to a complete enumeration type. The value is unchanged if the original value is within the range of the enumeration values (7.2). Otherwise, the behavior is undefined. [...]
Either way you almost always shouldn't be casting integers to enums in the first place.
Consider this C++ code:
enum class Color : char { red = 0x1, yellow = 0x2 }
// ...
char *data = ReadFile();
Color color = static_cast<Color>(data[0]);
Suppose that data[0] is actually 100. What is color set to according to the standard?
In particular, if I later do
switch (color) {
// ... red and yellow cases omitted
default:
// handle error
break;
}
does the standard guarantee that default will be hit? If not, what is the proper, most efficient, most elegant way to check for an error here? Does the standard make any guarantees as about this but with plain enum?
What is color set to according to the standard?
Answering with a quote from the C++11 and C++14 Standards:
[expr.static.cast]/10
A value of integral or enumeration type can be explicitly converted to an enumeration type. The value is unchanged if the original value is within the range of the enumeration values (7.2). Otherwise, the resulting value is unspecified (and might not be in that range).
Let's look up the range of the enumeration values: [dcl.enum]/7
For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type.
Before CWG 1766 (C++11, C++14)
Therefore, for data[0] == 100, the resulting value is specified(*), and no Undefined Behaviour (UB) is involved. More generally, as you cast from the underlying type to the enumeration type, no value in data[0] can lead to UB for the static_cast.
After CWG 1766 (C++17)
See CWG defect 1766.
The [expr.static.cast]p10 paragraph has been strengthened, so you now can invoke UB if you cast a value that is outside the representable range of an enum to the enum type. This still doesn't apply to the scenario in the question, since data[0] is of the underlying type of the enumeration (see above).
Please note that CWG 1766 is considered a defect in the Standard, hence it is accepted for compiler implementers to apply to to their C++11 and C++14 compilation modes.
(*) char is required to be at least 8 bit wide, but isn't required to be unsigned. The maximum value storable is required to be at least 127 per Annex E of the C99 Standard.
Compare to [expr]/4
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
Before CWG 1766, the conversion integral type -> enumeration type can produce an unspecified value. The question is: Can an unspecified value be outside the representable values for its type? I believe the answer is no -- if the answer was yes, there wouldn't be any difference in the guarantees you get for operations on signed types between "this operation produces an unspecified value" and "this operation has undefined behaviour".
Hence, prior to CWG 1766, even static_cast<Color>(10000) would not invoke UB; but after CWG 1766, it does invoke UB.
Now, the switch statement:
[stmt.switch]/2
The condition shall be of integral type, enumeration type, or class type. [...] Integral promotions are performed.
[conv.prom]/4
A prvalue of an unscoped enumeration type whose underlying type is fixed (7.2) can be converted to a prvalue of its underlying type. Moreover, if integral promotion can be applied to its underlying type, a prvalue of an unscoped enumeration type whose underlying type is fixed can also be converted to a prvalue of the promoted underlying type.
Note: The underlying type of a scoped enum w/o enum-base is int. For unscoped enums the underlying type is implementation-defined, but shall not be larger than int if int can contain the values of all enumerators.
For an unscoped enumeration, this leads us to /1
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
In the case of an unscoped enumeration, we would be dealing with ints here. For scoped enumerations (enum class and enum struct), no integral promotion applies. In any way, the integral promotion doesn't lead to UB either, as the stored value is in the range of the underlying type and in the range of int.
[stmt.switch]/5
When the switch statement is executed, its condition is evaluated and compared with each case constant. If one of the case constants is equal to the value of the condition, control is passed to the statement following the matched case label. If no case constant matches the condition, and if there is a default label, control passes to the statement labeled by the default label.
The default label should be hit.
Note: One could take another look at the comparison operator, but it is not explicitly used in the referred "comparison". In fact, there's no hint it would introduce UB for scoped or unscoped enums in our case.
As a bonus, does the standard make any guarantees as about this but with plain enum?
Whether or not the enum is scoped doesn't make any difference here. However, it does make a difference whether or not the underlying type is fixed. The complete [decl.enum]/7 is:
For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, for an enumeration where emin is the smallest enumerator and emax is the largest, the values of the enumeration are the values in the range bmin to bmax, defined as follows: Let K be 1 for a two's complement representation and 0 for a one's complement or sign-magnitude representation. bmax is the smallest value greater than or equal to max(|emin| − K, |emax|) and equal to 2M − 1, where M is a non-negative integer. bmin is zero if emin is non-negative and −(bmax + K) otherwise.
Let's have a look at the following enumeration:
enum ColorUnfixed /* no fixed underlying type */
{
red = 0x1,
yellow = 0x2
}
Note that we cannot define this as a scoped enum, since all scoped enums have fixed underlying types.
Fortunately, ColorUnfixed's smallest enumerator is red = 0x1, so max(|emin| − K, |emax|) is equal to |emax| in any case, which is yellow = 0x2. The smallest value greater or equal to 2, which is equal to 2M - 1 for a positive integer M is 3 (22 - 1). (I think the intent is to allow the range to extent in 1-bit-steps.) It follows that bmax is 3 and bmin is 0.
Therefore, 100 would be outside the range of ColorUnfixed, and the static_cast would produce an unspecified value before CWG 1766 and undefined behaviour after CWG 1766.
I would expect the following code snippet to complain about trying to assign something other that 0,1,2 to a Color variable.
But the following does compile and I get the output
Printing:3
3
Can anybody explain why? Is enum not meant to be a true user-defined type? Thanks.
enum Color { blue=0,green=1,yellow=2};
void print_color(Color x);
int main(){
Color x=Color(3);
print_color(x);
std::cout << x << std::endl;
return 0;
}
void print_color(Color x)
{
std::cout << "Printing:" << x << std::endl;
}
Since you manually cast the 3 to Color, the compiler will allow you to do that. If you tried to initialize the variable x with a plain 3 without a cast, you would get a diagnostic.
Note that the range of values an enumeration can store is not limited by the enumerators it contains. It's the range of values of the smallest bitfield that can store all enumerator values of the enumeration. That is, the range of your enumeration type is 0..3:
00
01
10
11
The value 3 is thus still in range, and so the code is valid. Had you cast a 4, then the resulting value would be left unspecified by the C++ Standard.
In practice, the implementation has to chose an underlying integer type for the enumeration. The smallest type it can choose is char, but which is still able to at least store values ranging up to 127. But as mentioned, the compiler is not required to convert a 4 to a value of 4, because it's outside the range of your enumeration.
I figure i should post some explanation on the difference of "underlying type" and "range of enumeration values". The range of values for any type is the smallest and largest value of that type. The underlying type of an enumeration must be able to store the value of any enumerator (of course) - and two enumerations that have the same underlying type are layout compatible (this allows some flexibility in case a type mismatch occurs).
So while the underlying type is meant to fix the object representation (alignment and size), the values of the enumeration is defined as follows in 7.2/6
For an enumeration where emin is the smallest enumerator and emax is the largest, the values of the enumeration are the values of the underlying type in the range bmin to bmax, where bmin and bmax are, respectively, the smallest and largest values of the smallest bit-field that can store emin and emax . It is possible to define an enumeration that has values not defined by any of its enumerators.
[Footnote: On a two’s-complement machine, bmax is the smallest value greater than or equal to max (abs(emin) − 1 ,abs(emax)) of the form
2M−1; bmin is zero if emin is non-negative and −(bmin+1) otherwise.]
Color(3) is a cast, with the same semantic as (Color)3, it isn't a constructor. Note that you can also use static_cast<Color>(3) for the same conversion but you can't use Color x(3).
Enum in C++ is more of a set of named integer constants than a true type, from compile-time checking point of view. However, the C++ standard has this to say [dcl.enum]:
9 An expression of arithmetic or enumeration type can be converted to an
enumeration type explicitly. The value is unchanged if it is in the
range of enumeration values of the enumeration type; otherwise the
resulting enumeration value is unspecified.
"Unspecified" is slightly better than the usual "undefined behavior".
Both the C and the C++ standards are kind of confusing on the subject of enums. Both insist that enums are "distinct types" but then both treat them as the underlying integral type. C++ even refers to an italic term "underlying type" which is only sort-of defined when introducing wchar_t.
In summary, wchar_t and enum types are "distinct" but simply mapped to an underlying integral type chosen by the implementation, and this is no doubt due to the need to be compatible with historical enum which was definitely just an int.
Modern compilers typically have options to add more type-like behavior to enums, turning on warnings and errors for various misuses. These can't be a default because they elect non-conforming behavior.