I have a class that exposes an enum. I am trying to check the validity of the values in the setter function, like so:
enum abc
{
X,
Y
};
int my_class::set_abc(abc value)
{
if(static_cast<int>(value) > static_cast<int>(Y))
return -1;
...
}
There is a similar check for value being less than X.
I see that the compiler removes the check completely. I have Googled for the reason and come across many pages explaining the rules for integer conversions in C++, but I wouldn't find any clarifications about converting enums to ints, or checking the validity.
What is the correct way to accomplish this?
It seems arbitrary to test against Y, so I would add some limits. This also allows you to add more elements between min and max, and not be concerned with the ordering.
enum abc
{
ABC_MIN = 0,
X,
Y,
ABC_MAX
};
int my_class::set_abc(abc value)
{
assert(value > ABC_MIN && value < ABC_MAX);
{
Since 0 and 1 are the only valid values of the type abc, whoever passes in a value larger or smaller than that has already invoked undefined behavior in order to create it.
You can't easily write code in C++ to detect conditions that have previously caused UB -- as you observe, the compiler has a tendency to optimize based on what is permitted or forbidden by the language.
You could write an int overload of the function that checks the value and then converts to the enum type, and not bother checking in the abc overload since it's someone else's problem to avoid invoking UB.
Alternatively, you could avoid your test being redundant, by putting some arbitrary additional values in the enum. Then the compiler can't remove it.
In C++, you can't directly assign integers to enum variables without an explicit cast.
If your code uses the enum type everywhere, then there's no point to checking that's valid. It should be valid from the start and should remain valid.
If, however, your code gets the value as an integer and you need to convert it to an enum (or you perhaps do some arithmetic operation on an enum value), then you should validate the value at that site.
Related
let's say I have an very simple function called foo. Foo can return two values, I'll use x and y as arbitrary placeholder variables.
I define it like so:
int foo(bool expression)
{
static const int x = ..., y = ...;
if(expression)
return x;
else
return y;
}
This obviously a branching statement
I was thinking doing something like the following could remove any branching:
int foo(bool expression)
{
static const int array[] = {x, y};
return array[expression];
}
Yet I'm not sure if, by using C arrays, it still incurs branching, does it? Do C++ std:: arrays or vectors cause branching?
Is it worth it to attempt to read from the array, or is it a waste of memory and execution speed?
And lastly, if the expression contained a logical expression, such as &&, does this mean it will still branch?
As far as condition relies on a boolean value to know what to do next, then it's definitely branching. It's reasonable to say that the code itself needs to wait and branch to decide which element from the array to access and return.
By the same concept, && or any other logical operator implies branching.
The question is complicated by the fact that in one case you are showing a bool expression, in the other case you are showing an int condition.
If your expression naturally evaluates to an int, then using this int to pick up an item from the array will not involve any branching. If the most natural type that your expression evaluates to is bool, then you will need to convert it to an int, and this conversion is likely to internally involve branching, so you are probably not going to gain anything. I am saying "probably" because a lot depends on the compiler and on the underlying CPU instruction set, so you will not know unless you have your compiler produce disassembly and examine the disassembly.
That having been said, I would add that your quest to eliminate a branch is rather an exercise in futility. There is nothing inherently evil with branching, nor does it perform badly. True, it is best to eliminate branches, but only if it is trivial to do so. If, in order to eliminate branching, you introduce an array that you otherwise wouldn't have, then you are probably adding an order of magnitude more overhead than you are saving. If you introduce a vector instead of an array, you may be introducing twice the overhead of the array. So, my recommendation about this would be: do not worry about the branching.
Let I be some integral type. Now suppose I have a enum class my_enum_class : I, with values which may not be consecutive. And now I get some I value. How do I check whether it's a value enumerated in my_enum_class?
An answer to a similar question (for the C language) makes the assumption that values are contiguous, and that one can add a "dummy" upper-bound value, and check the range between 0 and that value; that's not relevant in my case. Is there another way to do it?
There is currently no way to do this.
There are reflection proposals that may make it into c++20 and/or c++23 that let you iterate (at compile, and hence run, time) over the enumerated values in an enum. Using that the check would be relatively easy.
Sometimes people do manual enum reflection, often using macros.
There is no built-in way to do this. All Is are "valid" values of my_enum_class, so you can't do anything with the underlying type. As for validating Is against the list of enumerators, without reflection there is simply no way to do it.
Depending on the context, I tend to either build a static std::unordered_set (and do lookups into that), or have a function listing all my enumerators in a switch (and returning false iff the input matches none of them), or just not bother, instead documenting somewhere that passing an unenumerated my_enum_class value to my functions shall be deemed impish trickery and have unspecified behaviour.
Ultimately this all stems from the fact that enums are supposed to list "common conveniently named values" within a wider range of totally valid states, rather than a type comprised only of a fully constrained set of constants. We pretty much all abuse enums.
Though the standard doesn't yet allow you to do introspection, there is a small workaround you could use, that can possibly be improved with ADL. Courtesy to this older answer.
namespace sparse {
template<typename E>
constexpr bool in_(std::underlying_type_t<E> i) { return false; }
template<typename E, E value, E...values>
constexpr bool in_(std::underlying_type_t<E> e) {
return static_cast<E>(e) == value || in_<E, values...>(e);
}
}
To be used like this:
enum class my_enum: int { a=3, b=4 };
template<>
constexpr auto sparse::in<my_enum> =
in_<my_enum, my_enum::a, my_enum::b>;
static_assert(sparse::in<my_enum>(3));
static_assert(sparse::in<my_enum>(4));
static_assert(!sparse::in<my_enum>(5))
Consider the following piece of code, which is perfectly acceptable by a C++11 compiler:
#include <array>
#include <iostream>
auto main() -> int {
std::array<double, 0> A;
for(auto i : A) std::cout << i << std::endl;
return 0;
}
According to the standard § 23.3.2.8 [Zero sized arrays]:
1 Array shall provide support for the special case N == 0.
2 In the case that N == 0, begin() == end() == unique value. The return value of
data() is unspecified.
3 The effect of calling front() or back() for a zero-sized array is undefined.
4 Member function swap() shall have a noexcept-specification which is equivalent to
noexcept(true).
As displayed above, zero sized std::arrays are perfectly allowable in C++11, in contrast with zero sized arrays (e.g., int A[0];) where they are explicitly forbidden, yet they are allowed by some compilers (e.g., GCC) in the cost of undefined behaviour.
Considering this "contradiction", I have the following questions:
Why the C++ committee decided to allow zero sized std::arrays?
Are there any valuable uses?
If you have a generic function it is bad if that function randomly breaks for special parameters. For example, lets say you could have a template function that takes N random elements form a vector:
template<typename T, size_t N>
std::array<T, N> choose(const std::vector<T> &v) {
...
}
Nothing is gained if this causes undefined behavior or compiler errors if N for some reason turns out to be zero.
For raw arrays a reason behind the restriction is that you don't want types with sizeof T == 0, this leads to strange effects in combination with pointer arithmetic. An array with zero elements would have size zero, if you don't add any special rules for it.
But std::array<> is a class, and classes always have size > 0. So you don't run into those problems with std::array<>, and a consistent interface without an arbitrary restriction of the template parameter is preferable.
One use that I can think of is the return of zero length arrays is possible and has functionality to be checked specifically.
For example see the documentation on the std::array function empty(). It has the following return value:
true if the array size is 0, false otherwise.
http://www.cplusplus.com/reference/array/array/empty/
I think the ability to return and check for 0 length arrays is in line with the standard for other implementations of stl types, for eg. Vectors and maps and is therefore useful.
As with other container classes, it is useful to be able to have an object that represents an array of things, and to have it possible for that array to be or become empty. If that were not possible, then one would need to create another object, or a managing class, to represent that state in a legal way. Having that ability already contained in all container classes, is very helpful. In using it, one then just needs to be in the habit of relating to the array as a container that might be empty, and checking the size or index before referring to a member of it in cases where it might not point to anything.
There are actually quite a few cases where you want to be able to do this. It's present in a lot of other languages too. For example Java actually has Collections.emptyList() which returns a list which is not only size zero but cannot be expanded or resized or modified.
An example usage might be if you had a class representing a bus and a list of passengers within that class. The list might be lazy initialized, only created when passengers board. If someone calls getPassengers() though then an empty list can be returned rather than creating a new list each time just to report empty.
Returning null would also work for the internal efficiency of the class - but would then make life a lot more complicated for everyone using the class since whenever you call getPassengers() you would need to null check the result. Instead if you get an empty list back then so long as your code doesn't make assumptions that the list is not empty you don't need any special code to handle it being null.
I have a multithreaded application that stores data as an array of instances of the following union
union unMember {
float fData;
unsigned int uiData;
};
The object that stores this array knows what type the data in the union is and so I dont have problems with UB when retrieving the correct type. However in other parts of the program, I need to test equality between 2 instances of these unions and in this part of the code the true internal data type is not known. The result of this is that I can't test equality of the union using this kind of approach
unMember un1;
unMember un2;
if (un1 == un2) {
// do stuff
}
as I get compiler errors. As such I am simply to compare the float part of the union
if (un1.fData == un2.fData) {
// compiles but is it valid?
}
Now given that I have read about it being UB accessing any part of a union that was not the part that was last written to (that is cumbersomely written but I can think of no more articulate way to say this) I am wondering if the code above is a valid way to check equality of my union instances??
This has made me realise that internally I have no idea how unions really work. I had assumed that data was simply stored as a bit pattern and that you could interpret that in whatever way you like depending on the types listed in the union. If this is not the case, what is a safe/correct way to test equality of 2 instances of a union?
Finally, my application is written in C++ but I realise that unions are also part of C, so is there any difference in how they are treated by the 2 languages?
In general, you need to prepend some kind of indicator of the current union type:
struct myData
{
int dataType;
union {
...
} u;
}
Then:
if (un1.dataType != un2.dataType)
return (1 == 0);
switch(un1.dataType)
{
case TYPE_1:
return (un1.u.type1 == un2.u.type1);
case TYPE_2:
...
}
Anyway, the syntax
if (un1.fData == un2.fData) {
// compiles but is it valid?
}
which does compile and is valid, might not work for two reasons. One is that, as you said, maybe un2 contains an integer and not a floating point. But in that case the equality test will normally fail anyway. The second is that both structures hold a floating point, and they represent the same number with a slight machine error. Then the test will tell you the numbers are different (bit by bit they are), while their "meaning" is the same.
Floating points are usually compared like
if (dabs(f1 - f2) < error)
to avoid this pitfall.
In C++, members that are not the last member written to are considered to be uninitialized (and so reading them is undefined behaviour). In C, they are considered to contain the object representation of the member that was written to, which may or not be a valid object representation.
That is,
union U {
S x;
T y;
} u;
u.x = 0;
T t = u.y; // C++ - reading uninitialized memory - could crash
T t = u.y; /* C - reading object representation of u.x - could crash */
In practice, C++ reading a union non-assigned member will behave the same as C if the code is sufficiently remote from the code that wrote the assigned member, because the only way for the compiler to generate code that behaves differently is to optimize the read-write combination.
A safe method in both languages (guaranteed not to crash) is to compare the memory contents as an array of char e.g. using memcmp:
union U u1, u2;
u1.x = 0;
u2.x = 0;
memcmp(&u1, &u2, sizeof(union U));
This may not however reflect the actual equality of the union members; e.g. for floating-point types two NaN can values have the same memory representation and compare unequal, while -0.0 and 0.0 (negative and positive zero) have different memory representations but compare equal. There is also the issue of the two types having different sizes, or containing bits that do not participate in the value (padding bits, not an issue on most modern commodity platforms). In addition, struct types can contain padding for alignment.
Different types are likely to have different storage lengths (two bytes vs say four bytes).
When a union member is written to, all that is guaranteed is that the member written to is correct.
If then you compare a different member, you have no idea what will be in the extra bytes.
The correct method to test for union equality is to have a struct which contains the union and member which indicates the current member in use, and to switch on that member, where the cases of the switch handle the equality check for each possible union member, e.g. you have to store the in-use information along with the union.
E.g.
enum test_enum
{
TEST_ENUM_INT,
TEST_ENUM_FLOAT
};
union test_union
{
int
test_int;
float
test_float;
};
struct test_struct
{
enum test_enum
te;
union test_union
tu;
};
I think it would be safest if you implemented a class instead. If a construct does not provide a feature (in this case automatically determining the right member to evaluate), then the construct might just not be suitable for your needs and you should use another construct ;) That may be a custom class, or perhaps a VARIANT if you use COM (which is basically a struct as proposed by #lserni).
In general what you are asking is impossible. Only the memory from the variable that you set would be guaranteed to be what you expect. The other memory is essentially random. However, in your case you can compare it because the size of everything is the same. If I were doing it I would just compare the unsigned ints or do a memcmp. This all relies on the fact that all members of the union have the same size. If you added a double for example all bets would be off. This falls into the bit twiddling that you can do and get away with in C/C++ but it's much harder to maintain. You are making an assumption about the union and it needs to be clear in the code that you made this assumption. A future maintainer could blow it and cause all kinds of hard to debug issues.
The best thing to do would be to have a struct with a type flag in it or use something like Boost Variant. When using something like this you would be future proofing yourself and using standard code that future maintainers have a chance at knowing or can look up the documentation on.
Another note, you have to define what you mean by equality in the case of floats. If you want a fuzzy comparison then you certainly need to know the type. If you want a bit-wise comparison then that's easy enough.
Normally to detect a negative number, you just do if(x < 0) .... But what's the best way to do this without the hard-coded literal? It's not the value that I need to avoid, it's the datatype. I am looking for something in the style of <algorithm>.
I have the following solution, but is there a better way? Something tells me this is not as efficient as possible.
template<typename T>
inline bool is_negative(const T& n)
{
return n < (n - n);
}
I want to use the same restrictions that uses. So it's fine to require that T implement arithmetic operators, but nothing more specific / specialized than that.
Otherwise we could solve this by just requiring that T implement:
bool operator <(int)
T() (default constructor which we assume equals 0)
Or why not just require bool IsNegative()
It's just for my own curiosity; not for use in any project.
return n < T();
(Now I wonder why you have that restriction).
What's wrong with if (x < T(0)) ?
Since you are seemingly interested in numeric types, a very natural requirement for such types is that they provide a constructor which takes a fundamental type (int, or double).
I find it more natural that default construction, which may not exist, or have different semantics (undefined, not-a-number, etc. An example is the time classes from eg. boost::posix_time which default construct to invalid dates)
Checking if something is less than literal zero is right solution. It's the most clear about your intent. It's also the most efficient, but please stop thinking about efficiency when you are checking if a number is negative. That's a bad thing.