A footnote in the standard implies that any enum expression value is defined behavior; why does Clang's undefined behavior sanitizer flag out-of-range values?
Consider the following program:
enum A {B = 3, C = 7};
int main() {
A d = static_cast<A>(8);
return d + B;
}
The output under the undefined behavior sanitizer is:
$ clang++-5.0 -fsanitize=undefined -ggdb3 enum.cc && ./a.out
enum.cc:5:10: runtime error: load of value 8, which is not a valid value for type 'A'
Note that the error is not on the static_cast, but on the addition. This is also true when an A is created (but not initialized) and then an int with value 8 is memcpyied into the A - the ubsan error is on the addition, not the initial load.
IIUC, ubsan in newer clangs does flag an error on the static_cast in C++17 mode. I don't know if that mode also finds an error in the memcpy. In any case, this question is focused on C++14.
The reported error comports with the following parts of the standard:
dcl.enum:
For an enumeration whose underlying type is fixed, the values of the enumeration are the values of the underlying type. Otherwise, the values of the enumeration are the values representable by a hypothetical integer types with minimal range exponent M such that all enumerators can be represented. The width of the smallest bit-field large enough to hold all the values of the enumeration type is M. It is possible to define an enumeration that has values not defined by any of its enumerators. If the enumerator-list is empty, the values of the enumeration are as if the enumeration had a single enumerator with value 0.100
So the values of the enumeration A are 0 through 7, inclusive, and the "range exponent" M is 3. Evaluating an expression of type A with value 8 is undefined behavior according to expr.pre:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
But there is one hiccup: that footnote from dcl.enum reads:
This set of values is used to define promotion and conversion semantics for the enumeration type. It does not preclude an expression of enumeration type from having a value that falls outside this range. [emphasis mine]
Question: Why is an expression with value 8 and type A undefined behavior if "[dcl.enum] does not preclude an expression of enumeration type from having a value that falls outside this range"?
Clang flags the use of static_cast on a value that is out of range. The behavior is undefined, if the integral value is not within the range of the enumeration.
C++ standard 5.2.9 Static cast [expr.static.cast] paragraph 7
A value of integral or enumeration type can be explicitly converted to
an enumeration type. The value is unchanged if the original value is
within the range of the enumeration values (7.2). Otherwise, the
resulting enumeration value is unspecified / undefined (since C++17).
Note the phrasing of footnote 100: "[This set of values] does not preclude [stuff]." This is not an endorsement of the "stuff" as valid; it merely emphasizes that this section does not declare the stuff invalid. It is in fact a neutral statement that should bring to mind the fallacy of the excluded middle. As far as this section goes, values outside the values of the enumeration are neither approved nor disapproved. This section defines which values are outside the values of the enumeration, but it is left to other sections (like expr.pre) to decide the validity of using such values.
You can think of this footnote as a warning to those writing compilers: do not assume! An expression of enumeration type need not have a value within the enumeration's set of values. Such a case must compile correctly unless another section classifies that case as undefined behavior.
For a better understanding of what exactly clang is complaining about, try the following code:
enum A {B = 3, C = 7};
int main() {
// Set a variable of type A to a value outside A's set of values.
A d = static_cast<A>(8);
// Try to evaluate an expression of type A with this too-big value.
if ( !static_cast<bool>(static_cast<A>(8)) )
return 2;
// Try again, but this time load the value from d.
if ( !static_cast<bool>(d) ) // Sanitizer flags only this
return 1;
return 0;
}
The sanitizer does not complain about forcing a value of 8 into a variable of type A. It does not complain about evaluating an expression of type A that happens to have the value 8 (the first if). It does, though, complain when the value of 8 comes from (is loaded from) a variable of type A.
I'm not real familiar with Clang's compiler since I am accustomed to Visual Studio. I am currently using Visual Studio 2017. I was able to compile and run your code with language flag set to both c++14 and c++17 in both x86 and x64 debug builds. Instead of returning the addition in your example:
return d + B;
I had decided to output them to the console:
std::cout << (d + B);
and in all 4 cases my compiler printed out a value of 11.
I'm not sure about GCC either as I haven't tried it with your example, but this is leading me to believe that this is a compiler dependent issue.
I had followed your link and read section 8 in which you had referred to, but what caught my attention from that draft are the details coming from other sections namely 7 and 10.
Section 7 states:
For an enumeration whose underlying type is not fixed, the underlying type is an integral type that can represent all the enumerator values defined in the enumeration. If no integral type can represent all the enumerator values, the enumeration is ill-formed. It is implementation-defined which integral type is used as the underlying type except that the underlying type shall not be larger than int unless the value of an enumerator cannot fit in an int or unsigned int. If the enumerator-list is empty, the underlying type is as if the enumeration had a single enumerator with value 0.
But it is this sentence or clause that caught my attention:
It is implementation-defined which integral type is used as the underlying type except that the underlying type shall not be larger than int unless the value of an enumerator cannot fit in an int or unsigned int.
Section 10 states:
The value of an enumerator or an object of an unscoped enumeration type is converted to an integer by integral promotion. [ Example:
enum color { red, yellow, green=20, blue };
color col = red;
color* cp = &col;
if (*cp == blue) // ...
makes color a type describing various colors, and then declares col as an object of that type, and cp as a pointer to an object of that type. The possible values of an object of type color are red, yellow, green, blue; these values can be converted to the integral values 0, 1, 20, and 21. Since enumerations are distinct types, objects of type color can be assigned only values of type color.
color c = 1; // error: type mismatch, no conversion from int to color
int i = yellow; // OK: yellow converted to integral value 1, integral promotion
Note that this implicit enum to int conversion is not provided for a scoped enumeration:
enum class Col { red, yellow, green };
int x = Col::red; // error: no Col to int conversion
Col y = Col::red;
if (y) { } // error: no Col to bool conversion
— end example ]
It is these two lines that caught my attention:
color c = 1; // error: type mismatch, no conversion from int to color
int i = yellow; // OK: yellow converted to integral value 1, integral promotion
So let's look back at your example:
enum A {B = 3, C = 7};
int main() {
A d = static_cast<A>(8);
return d + B;
}
Here A is a complete type, B & C are the enumerators which are evaluated to a constant expression of integral type by promotion and are set to the values of 3 and 7 accordingly. This covers the declaration of enum A{...};
Inside of main() you now declare an instance or an object of A called d since A is a complete type. Then you assign d a value of 8 which is a constant expression or constant literal through the mechanism of static_cast. I'm not 100% sure if every compiler performs static_cast in the same exact manner or not; I'm not sure if this is compiler dependent.
So d is an object of type A, but since the value 8 is not in the list of enumerations, I believe this falls under the clause of implementation defined. This should then promote d to an integral type.
Then on your final statement where you return d+B.
Let's say that d was promoted to integral type with a value of 8, then you are adding the enumerated value of B which is 3 to 8, and therefore you should get an output of 11 in which I have in all 4 of my test cases on visual studio.
Now as for your compiler with Clang I can not say, but as far as I can tell this doesn't seem to produce any errors or undefined behavior at least according to Visual Studio. Then again because this code appears to be implementation defined, I think this relies heavily on your particular compiler and its version as well as the language version you are compiling it under.
I can not say that this will completely answer your question, but maybe it will shed some insight into the underlining workings of the compilers according to the documentations of the drafts & standards.
-Edit-
I decided to run this through my debugger and I put a break point on this line:
A d = static_cast<A>(8);
Then I stepped through to execute this line of code and looked at the value in the debugger. Here in Visual Studio, d does have a value of 8. However under its type it is listed as A and not int. So I do not know if this is a promoting it to int or not, or if it might happen to be a compiler optimization, something under the hood such as asm that is treating d as an int or unsigned int, etc.; but Visual Studio is allowing me to assign an integral value through a static_cast to an enumerated type. However, if I remove the static_cast it does fail to compile stating you can not assign a type int to a type A.
This leads me to believe that my original statement above is actually incorrect or only partially correct. The compiler is not fully "promoting" it to an integer type on assignment as d still remains an instance of A unless if it is doing so under the hood that I am unaware of.
I have not yet checked to see the asm of this code to see what assembly instructions are being generated by Visual Studio... therefore I am not currently able to make a full assessment at this point. Now, later on if I have more available time; I may look into it to see what asm lines are being generated by my compiler to see the underlying actions the compiler is taking.
Related
In the following declaration:
enum class en : signed char { A = 127, B };
gcc (and clang) says that the B value is outside the range of the enumeration, which is [-128, 127]. An error is also shown when writting:
enum class en : signed char { A = 128 };
However, in the second case, the initializer must be a converted constant expression from type int to type signed char, which disallows narrowing conversion and thus that construct is ill-formed.
But what about the first case? There's no initializer so that rule doesn't apply.
The standard says:
[dcl.enum]§2 An enumerator-definition without an initializer gives the enumerator the value obtained by increasing the value of the previous enumerator by one.
So, it's obvious that the choosen range (signed char) is not big enough to save the value 128.
Maybe compilers decide to treat that case as a rule violation (and thus the program is ill-formed), but I don't see that as a rule violation, because it's not some kind of property of the program that violates a rule, but a command about what the compiler shall do. In this case, I would say it is a corner case not explicitely described or forbidden.
This question already has an answer here:
What happens if you static_cast invalid value to enum class?
(1 answer)
Closed 7 years ago.
Say, we have
enum E
{
Foo = 0,
Bar = 1
};
Now, we do
enum E v = ( enum E ) 2;
And then
switch ( v )
{
case Foo:
doFoo();
break;
case Bar:
doBar();
break;
default:
// Is the compiler required to honor this?
doOther();
break;
}
Since the switch above handles every possible listed value of the enum, is it allowed for the compiler to optimize away the default branch above, or otherwise have an unspecified or undefined behavior in the case the value of enum is not in the list?
As I am expecting that the behavior should be similar for C and C++, the question is about both languages. However, if there's a difference between C and C++ for that case, it would be nice to know about it, too.
C++ situation
In C++, each enum has an underlying integral type. It can be fixed, if it is explicitly specified (ex: enum test2 : long { a,b};) or if it is int by default in the case of a scoped enum (ex: enum class test { a,b };):
[dcl.enum]/5: Each enumeration defines a type that is different from all other types. Each enumeration also has an underlying type. (...) if not
explicitly specified, the underlying type of a scoped enumeration type
is int. In these cases, the underlying type is said to be fixed.
In the case of an unscoped enum where the underlying type was not explicitely fixed (your example), the standard gives more flexibility to your compiler:
[dcl.enum]/7: For an enumeration whose underlying type is not fixed, the underlying type is an integral type that can represent all the
enumerator values defined in the enumeration. (...) It is implementation-defined which integral type is used as the underlying
type except that the underlying type shall not be larger than int
unless the value of an enumerator cannot fit in an int or unsigned
int.
Now a very tricky thing: the values that can be held by an enum variable depends on whether or not the underlying type is fixed:
if it's fixed, "the values of the enumeration are the values of the
underlying type."
otherwhise, it is the integral values within the minimum and the maximum of the smallest bit-field that can hold the smallest enumerator and the largest one.
You are in the second case, although your code will work on most compilers, the smallest bitfield has a size of 1 and so the only values that you can for sure hold on all compliant C++ compilers are those between 0 and 1...
Conclusion: If you want to ensure that the value can be set to 2, you either have to make your enum a scoped enum, or explicitly indicate an underlying type.**
More reading:
SO question on how to check if an enum value is valid
article on avoiding enum out-of-rang in secure coding.
Stroutstrup's plaidoyer for scoped enum over unscoped ones
C situation
The C situation is much simpler (C11):
6.2.5/16: An enumeration comprises a set of named integer constant values. Each distinct enumeration constitutes a different enumerated
type.
So basically, it is an int:
6.7.2.2./2 The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value
representable as an int.
With the following restriction:
Each enumerated type shall be compatible with char, a signed integer
type, or an unsigned integer type. The choice of type is
implementation-defined, but shall be capable of representing the
values of all the members of the enumeration.
In C an enum type is an integer type large enough to hold all the enum constants:
(C11, 6.7.2.2p4) "Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined,110) but shall be capable of representing the values of all the members of the enumeration".
Let's say the selected type for enum E is _Bool. A _Bool object can only store the values 0 and 1. It's not possible to have a _Bool object storing a value different than 0 or 1 without invoking undefined behavior.
In that case the compiler is allowed to assume that an object of the enum E type can only hold 0 or 1 in a strictly conforming program and is so allowed to optimize out the default switch case.
C++Std 7.2.7 [dcl.enum]:
It is possible to define an enumeration that has values not defined by any of its enumerators.
So, you can have enumeration values which are not listed in enumerator list.
But in your specific case, the 'underlying type' is not 'fixed' (7.2.5). The specification doesn't say which is the underlying type in that case, but it must be integral. Since char is the smallest such type, we can conclude that there are other values of the enum which are not specified in the enumerator list.
Btw, I think that the compiler can optimize your case when it can determine that there are no other values ever assigned to v, which is safe, but I think there are no compilers which are that smart yet.
Also, 7.2/10:
An expression of arithmetic or enumeration type can be converted to an
enumeration type explicitly. The value is unchanged if it is in the
range of enumeration values of the enumeration type; otherwise the
resulting enumeration value is unspecified.
In C enumerators have type int . Thus any integer value can be assigned to an object of the enumeration type.
From the C Standard (6.7.2.2 Enumeration specifiers)
3 The identifiers in an enumerator list are declared as constants that
have type int and may appear wherever such are permitted.
In C++ enumerators have type of the enumeration that defines it. In C++ you should either expliicitly to specify the underlaying type or the compiler calculates itself the maximum allowed value.
From the C++ Standard (7.2 Enumeration declarations)
5 Each enumeration defines a type that is different from all other types. Each enumeration also has an underlying type. The underlying type can be explicitly specified using enum-base; if not explicitly specified, the underlying type of a scoped enumeration type is int. In these cases, the underlying type is said to be fixed. Following the closing brace of an enum-specifier, each enumerator has the type of its enumeration.
Thus in C any possible value of a enum is any integer value. The compiler may not optimize a switch removing the default label.
In C and C++, this can work.
Same code for both:
#include <stdio.h>
enum E
{
Foo = 0,
Bar = 1
};
int main()
{
enum E v = (enum E)2; // the cast is required for C++, but not for C
printf("v = %d\n", v);
switch (v) {
case Foo:
printf("got foo\n");
break;
case Bar:
printf("got bar\n");
break;
default:
printf("got \n", v);
break;
}
}
Same output for both:
v = 2
got default
In C, an enum is an integral type, so you can assign an integer value to it without casting. In C++, an enum is its own type.
Consider following program (See live demo here)
#include <iostream>
int main()
{
enum days{}d;
std::cout<<sizeof(d);
}
It prints 4 as an output on my local machine when compiling using g++ 4.8.1. How it occupies 4 bytes here? On gcc 6.0 in the given link I used `-pedantic-option also but still it compiles fine.
Then why it isn't allowed in C? I tried following program in gcc 4.8.1. (See live demo here )
#include <stdio.h>
int main(void)
{
enum days{}d;
printf("sizeof enum is %u",sizeof(d));
}
Compiler gives following errors:
4 12 [Error] expected identifier before '}' token
5 36 [Error] 'd' undeclared (first use in this function)
5 36 [Note] each undeclared identifier is reported only once for each function it appears in
Is it allowed to have empty enum in C++ but not in C?
C++ is not C. For C++, from [dcl.enum]:
For an enumeration whose underlying type is not fixed, the underlying type is an integral type that can
represent all the enumerator values defined in the enumeration. [...] It is implementation-defined which integral type is used
as the underlying type except that the underlying type shall not be larger than int unless the value of an
enumerator cannot fit in an int or unsigned int. If the enumerator-list is empty, the underlying type is as if the enumeration had a single enumerator with value 0.
So the underlying type of the enumerator (which determines its size) is as if it had a single 0 in it, though the actual type is implementation-defined. It could be 1 (int8_t certainly can hold 0), but definitely isn't larger than 4. In this case, you get 4, which is perfectly reasonable.
For C, the grammar simply requires having an enumerator.
As opposed to C, C++ does allow empty enumerations. [dcl.enum]/7:
If the enumerator-list is empty, the underlying type is as if the
enumeration had a single enumerator with value 0.
The underlying type (whose size is commonly also the enumerations size) is actually implementation-defined in your case, although most compilers will presumably choose int (and aren't allowed to chose anything larger here):
It is implementation-defined which integral type is used as the
underlying type except that the underlying type shall not be larger
than int unless the value of an enumerator cannot fit in an int or
unsigned int.
C has the same requirements for the "underlying type" (although that exact notion doesn't exist in C), but its grammar does not allow for empty enumerations in the first place - §6.7.2.2/1:
enumerator-list: enumerator enumerator-list , enumerator
You are right. You cannot have an empty enumerator list in C. But you can have it in C++. See http://en.cppreference.com/w/c/language/enum and http://en.cppreference.com/w/cpp/language/enum
The C11 standard requires at least one enumerator in an enum declaration (section 6.7.2.2), the salient parts copied below:
enum-specifier:
enum identifieropt { enumerator-list }
enum identifieropt { enumerator-list }
enum identifier
enumerator-list:
enumerator
enumerator-list, enumerator
Sorry for the somewhat wonky formatting, I tried to recreate the passage from the (proposed) standard as close as I could.
In C++ the size is 4 bytes because your compiler chose int as the underlying integer type for the enum. Apparently sizeof(int) is 4 on your platform. It is very popular in the compiler world to default to int for enum representation (unless a larger type is required).
As for why it isn't allowed in C... Well, it isn't allowed in C because it isn't allowed in C. C is a completely different language with its own syntactic rules.
Warning:
src/BoardRep.h:49:12: warning: ‘BoardRep::BoardRep::Row::<anonymous struct>::a’
is too small to hold all values of ‘enum class BoardRep::Piece’
[enabled by default]
Piece a:2;
^
Enum:
enum class Piece: unsigned char {
EMPTY,
WHITE,
BLACK
};
Use:
union Row {
struct {
Piece a:2;
Piece b:2;
Piece c:2;
Piece d:2;
Piece e:2;
Piece f:2;
Piece g:2;
Piece h:2;
};
unsigned short raw;
};
With an enum I'd agree with GCC, it may have to truncate but that's because enums are not really separate from integers and pre-processor definitions. However an enum class is much stronger. If it is not strong enough to assume ALL Piece values taken as integers will be between 0 and 2 inclusive then the warning makes sense. Otherwise GCC is being needlessly picky and it might be worth mailing the list to say "look, this is a silly warning"
Incase anyone cannot see the point
You can store 4 distinct values in 2 bits of data, I only need 3 distinct values, so any enum of length 4 or less should fit nicely in the 2 bits given (and my enum does "derive" (better term?) from an unsigned type). If I had 5 or more THEN I'd expect a warning.
The warning issued by gcc is accurate, there's no need to compose a mail to the mailing list asking them to make the warning less likely to appear.
The standard says that an enumeration with the underlying type of unsigned char cannot be represented by a bitfield of length 2; even if there are no enumerations that holds such value.
THE STANDARD
The underlying value of an enumeration is valid even if there are no enum-keys corresponding to this value, the standard only says that a legal value to be stored inside an enumeration must fit inside the underlying type; it doesn't state that such value must be present among the enum-keys.
7.2 Enumeration declarations [dcl.enum]
7 ... It is possible to define an enumeration that has values not defined by any of its enumerators. ...
Note: the quoted section is present in both C++11, and the draft of C++14.
Note: wording stating the same thing, but using different terminology, can be found in C++03 under [dcl.enum]p6
Note: the entire [decl.enum]p7 hasn't been included to preserve space in this post.
DETAILS
enum class E : unsigned char { A, B, C };
E x = static_cast<E> (10);
Above we initialize x to store the value 10, even if there's no enumeration-key present in the enum-declaration of enum class E this is still a valid construct.
With the above in mind we easily deduce that 10 cannot be stored in a bit-field of length 2, so the warning by gcc is nothing but accurate.. we are potentially trying to store values in our bit-field that it cannot represent.
EXAMPLE
enum class E : unsigned char { A, B, C };
struct A {
E value : 2;
};
A val;
val.value = static_cast<E> (10); // OMG, OPS!?
According to the C++ Standard
8 For an enumeration whose underlying type is fixed, the values of the
enumeration are the values of the underlying type.
So the values of your enumeration are in the range
std::numeric_limits<unsigned char>::min() - std::numeric_limits<unsigned char>::max()
Bit field a defined as
Piece a:2;
can not hold all values of the enumeration.
If you would define an unscoped enumeration without a fixed underlying type then the range of its values would be
0 - 2
Yes, this warning is pointless because GCC already warns about assigning a value to a bitfield field (of enum type) that is truncated like this:
warning: conversion from 'Some_Enum' to 'unsigned char:2'
changes value from '(Some_Enum)9' to '1' [-Woverflow]
At the location of the warning it's only relevant that all declared enumerators can be held inside the bitfield field.
The statement that other values that are in the range of the underlying integer type (but don't correspond to a declared enumerator, which btw is well-defined, in general) can't be represented by the field, if ever assigned, is technically true, but has zero entropy, as a warning.
Thus, this warning was fixed in GCC 9.3.
IOW, GCC 9.3 and later don't warn about such code, anymore.
I would expect the following code snippet to complain about trying to assign something other that 0,1,2 to a Color variable.
But the following does compile and I get the output
Printing:3
3
Can anybody explain why? Is enum not meant to be a true user-defined type? Thanks.
enum Color { blue=0,green=1,yellow=2};
void print_color(Color x);
int main(){
Color x=Color(3);
print_color(x);
std::cout << x << std::endl;
return 0;
}
void print_color(Color x)
{
std::cout << "Printing:" << x << std::endl;
}
Since you manually cast the 3 to Color, the compiler will allow you to do that. If you tried to initialize the variable x with a plain 3 without a cast, you would get a diagnostic.
Note that the range of values an enumeration can store is not limited by the enumerators it contains. It's the range of values of the smallest bitfield that can store all enumerator values of the enumeration. That is, the range of your enumeration type is 0..3:
00
01
10
11
The value 3 is thus still in range, and so the code is valid. Had you cast a 4, then the resulting value would be left unspecified by the C++ Standard.
In practice, the implementation has to chose an underlying integer type for the enumeration. The smallest type it can choose is char, but which is still able to at least store values ranging up to 127. But as mentioned, the compiler is not required to convert a 4 to a value of 4, because it's outside the range of your enumeration.
I figure i should post some explanation on the difference of "underlying type" and "range of enumeration values". The range of values for any type is the smallest and largest value of that type. The underlying type of an enumeration must be able to store the value of any enumerator (of course) - and two enumerations that have the same underlying type are layout compatible (this allows some flexibility in case a type mismatch occurs).
So while the underlying type is meant to fix the object representation (alignment and size), the values of the enumeration is defined as follows in 7.2/6
For an enumeration where emin is the smallest enumerator and emax is the largest, the values of the enumeration are the values of the underlying type in the range bmin to bmax, where bmin and bmax are, respectively, the smallest and largest values of the smallest bit-field that can store emin and emax . It is possible to define an enumeration that has values not defined by any of its enumerators.
[Footnote: On a two’s-complement machine, bmax is the smallest value greater than or equal to max (abs(emin) − 1 ,abs(emax)) of the form
2M−1; bmin is zero if emin is non-negative and −(bmin+1) otherwise.]
Color(3) is a cast, with the same semantic as (Color)3, it isn't a constructor. Note that you can also use static_cast<Color>(3) for the same conversion but you can't use Color x(3).
Enum in C++ is more of a set of named integer constants than a true type, from compile-time checking point of view. However, the C++ standard has this to say [dcl.enum]:
9 An expression of arithmetic or enumeration type can be converted to an
enumeration type explicitly. The value is unchanged if it is in the
range of enumeration values of the enumeration type; otherwise the
resulting enumeration value is unspecified.
"Unspecified" is slightly better than the usual "undefined behavior".
Both the C and the C++ standards are kind of confusing on the subject of enums. Both insist that enums are "distinct types" but then both treat them as the underlying integral type. C++ even refers to an italic term "underlying type" which is only sort-of defined when introducing wchar_t.
In summary, wchar_t and enum types are "distinct" but simply mapped to an underlying integral type chosen by the implementation, and this is no doubt due to the need to be compatible with historical enum which was definitely just an int.
Modern compilers typically have options to add more type-like behavior to enums, turning on warnings and errors for various misuses. These can't be a default because they elect non-conforming behavior.