How should C bitflag enumerations be translated into C++?

How should C bitflag enumerations be translated into C++? - c++

C++ is mostly a superset of C, but not always. In particular, while enumeration values in both C and C++ implicitly convert into int, the reverse isn't true: only in C do ints convert back into enumeration values. Thus, bitflags defined via enumeration declarations don't work correctly. Hence, this is OK in C, but not in C++:
typedef enum Foo
{
Foo_First = 1<<0,
Foo_Second = 1<<1,
} Foo;
int main(void)
{
Foo x = Foo_First | Foo_Second; // error in C++
return 0;
}
How should this problem be handled efficiently and correctly, ideally without harming the debugger-friendly nature of using Foo as the variable type (it decomposes into the component bitflags in watches etc.)?
Consider also that there may be hundreds of such flag enumerations, and many thousands of use-points. Ideally some kind of efficient operator overloading would do the trick, but it really ought to be efficient; the application I have in mind is compute-bound and has a reputation of being fast.
Clarification: I'm translating a large (>300K) C program into C++, so I'm looking for an efficient translation in both run-time and developer-time. Simply inserting casts in all the appropriate locations could take weeks.

Why not just cast the result back to a Foo?
Foo x = Foo(Foo_First | Foo_Second);
EDIT: I didn't understand the scope of your problem when I first answered this question. The above will work for doing a few spot fixes. For what you want to do, you will need to define a | operator that takes 2 Foo arguments and returns a Foo:
Foo operator|(Foo a, Foo b)
{
return Foo(int(a) | int(b));
}
The int casts are there to prevent undesired recursion.

It sounds like an ideal application for a cast - it's up to you to tell the compiler that yes, you DO mean to instantiate a Foo with a random integer.
Of course, technically speaking, Foo_First | Foo_Second isn't a valid value for a Foo.

Either leave the result as an int or static_cast:
Foo x = static_cast<Foo>(Foo_First | Foo_Second); // not an error in C++

Related

Type conversion vs type disambiguation

Casts are used for both type conversion and disambiguation. In further research I found these two as examples :
(double) 3; // conversion
(double) 3.0; // disambiguation
Can someone explain the difference, I don't see any. Is this distinction, also valid in C++
EDIT
Originally the code snippet was like so:
(float) 3; // conversion
(float) 3.0; // disambiguation
But changed it to double because floating point literals are no longer float in modern compilers and the question had no meaning. I hope I interpreted the comments correctly and I appologize for any answer already posted that became irrelevant after the edit.

The (double) 3 is a conversion from an Integer (int) to a floting point number (double).
The cast in (double) 3.0 is useless, it doesn't do anything since it's already double.
An unsuffixed floating constant has type double.
(ANSI C Standard, §3.1.3.1 Floating constants)
This answer is valid for C, it should be the same in C++.

This is similar to many things asked of programmers in any language. I will give a first example, which is different from what you are asking, but should illustrate better why this would appear wherever you've seen it.
Say I define a constant variable:
static const a = 5;
Now, what is 'a'? Let's try again...
static const max_number_of_buttons = 5;
This is explicit variable naming. It is much longer, and in the long run it is likely to save your ass. In C++ you have another potential problem in that regard: naming of member variables, versus local and parameter variables. All 3 should make use of a different scheme so when you read your C++ function you know exactly what it is doing. There is a small function which tells you nothing about the variables:
void func(char a)
{
char b;
b = a;
p = b * 3;
g = a - 7;
}
Using proper naming conventions and you would know whether p and g are local variables, parameters to the function, variable members, or global variables. (This function is very small so you have an idea, imagine a function of 1,000 lines of code [I've seen those], after a couple pages, you have no idea what's what and often you will shadow variables in ways that are really hard to debug...)
void func(char a)
{
char b;
b = a;
g_p = b * 3;
f_g = a - 7;
}
My personal convention is to add g_ for global variables and f_ for variable members. At this point I do not distinguish local and parameter variables... although you could write p_a instead of just a for the parameter and now you know for all the different types of variables.
Okay, so that makes sense in regard for disambiguation of variable names, although in your case you specifically are asking about types. Let's see...
Ada is known for its very strong typing of variables. For example:
type a is 1 .. 100;
type b is 1 .. 100;
A: a;
B: b;
A := 5;
B := A;
The last line does NOT compile. Even though type a and type b look alike, the compiler view them both as different numeric types. In other words, it is quite explicit. To make it work in Ada, you have to cast A as follow:
B := B'(A);
The same in C/C++, you declare two types and variables:
typedef int a;
typedef int b;
a A;
b B;
A = 5;
B = A;
That works as is in C/C++ because type a and type b are considered to be exactly the same (int, even though you gave them names a and b). It is dearly NOT explicit. So writing something like:
A = (b) 5;
would explicitly tell you that you view that 5 as of type b and convert it (implicitly) to type a in the assignment. You could also write it in this way to be fully explicit:
A = a(b(5));
Most C/C++ programmers will tell you that this is silly, which is why we have so many bugs in our software, unfortunately. Ada protects you against mixing carrots and bananas because even if both are defined as integers they both are different types.
Now there is a way to palliate to that problem in C/C++, albeit pretty much never used, you can make use of a class for each object, including numbers. Then you'd have a specific type for each different type (because variables of class A and class B cannot just be mixed together, at least not unless you allow it by adding functions for the purpose.) Very frankly, even I do not do that because it would be way too much work to write any C program of any length...
So as Juri said: 3.0 and (double) 3.0 are exactly the same thing and the (double) casting is redundant (a pleonasm, as in you say the same thing twice). Yet, it may help the guy who comes behind you see that you really meant for that number to be a double and not whatever the language says it could eventually be.

Let the compiler make the final choice which type to use

This question requires knowledge of C++ template meta-programming as (indirectly) expression templates are involved. I say indirectly because its not directly a question on expression templates, but involves C++ type computations. If you don't know what that is please don't answer this question.
To avoid putting out a question without enough background information let me elaborate a bit on the general problem I am trying to solve and then go to the more specific parts.
Suppose you have a library that provides Integers that the user can do calculations with just like with ints.
Furthermore it is possible to construct a Integer from an int. Just like:
Integer<int> i(2);
Internally my Integer class is a class template:
template<class T>
class Integer {
// cut out
};
So I can define it on whatever integer type I like.
Now without changing the API, I would like to change the library in a way that if Integer was constructed from an int it should be represented internally by a different type, say IntegerLit. The reason for this is that I can speed up some calculation knowing that an instance of Integer was created from an int (can pass it as a int argument to a function instead of as a general object described by a base pointer + separate data. This just as a comment.)
It is essential that the type is different when constructing from an int because I need the compiler to take up different code paths depending on whether constructed from an int or not. I cannot do this with a runtime data flag. (The reason in short: The compiler generates a function that takes either an int or the above mentioned more general type of object depending on the type.)
Having this said I run into a problem: When the uses does something like this:
Integer<int> a,b(2);
a = b + b;
Here a should be the general Integer and b the specialized IntegerLit. However, my problem is how to express this in C++ as the user is free to use the very same type Integer to define her variables.
Making the types polymorphic, i.e. deriving IntegerLit from Integer won't work. It looks fine for a moment. However since the user creates instances of Integer (the base class) that won't work because it is the base class the compiler sticks into the expression tree (this is why expression templates are involved in the question). So again no distinction is possible between the two cases. Doing a RTTI check a la dynamic cast is really not what I want on that point.
More promising seems to be adding a literal template parameter bool lit to the type saying if it was constructed from an int or not. The point is to not specify the conversion rule for one not literal one but do specify it for the other case.
However, I can't get that to work. The following code only compiles (GCC 4.7 C++11) if not constructing from an int. Otherwise it fails because the Integer is not specified with true as the value for lit. So the compiler searches the default implementation which doesn't have the conversion rule. It is not an option changing the API and requiring to write Integer<int,true> when constructing from an int.
template<class T,bool lit=false>
class Integer
{
public:
Integer() {
std::cout << __PRETTY_FUNCTION__ << "\n";
}
T F;
};
template<>
template<class T>
class Integer<T,true>
{
public:
Integer(int i) {
std::cout << __PRETTY_FUNCTION__ << "\n";
}
T F;
};
I am starting wondering if something like this is possible with C++.
Is there maybe a new feature of C++11 that can help here?

No, this isn't how C++ works. If you define b as Integer b, then it's an Integer. THis applies regardless of the expression which is subsequently used to initialize b.
Also, consider the following: extern Integer b;. There's an expression somewhere else that initializes b, yet the compiler still has to figure out here what type b has. (Not "will have", but "has").

You can't do that, exactly anyways.
But using "auto" you can come close.
// The default case
Integer<int> MakeInt()
{
return Integer<int>();
}
// The generic case
template <class T>
Integer<T> MakeInt(T n)
{
return Integer<T>(n);
}
// And specific to "int"
Integer<IntegerLit> MakeInt(int n)
{
return Integer<IntegerLit>(n);
}
auto a = MakeInt();
auto b = MakeInt(2);
auto c = MakeInt('c');
auto d = MakeInt(2.5);

You can't do that. Once you've said a variable is Integer<int> that is the variable's type. What you can do is make the underlying rep of the Integer vary depending on which constructor is used, something like this:
template<class T>
class Integer {
// cut out
Integer() : rep_(IntRep()) {}
explicit Integer(int val) : rep_(IntLitRep(val)) {}
private:
boost::variant<IntRep, IntLitRep> rep_;
};
Then you can easily determine which variant version is active and utilize different code paths when needed.
EDIT: In this case even though the type of the Integer is the same, you can easily use template functions to make it appear that it's behaving as two separate types (since the rep changes effective type).

Why is return type before the function name?

The new C++11 standard adds a new function declaration syntax with a trailing return type:
// Usual declaration
int foo();
// New declaration
auto foo() -> int;
This syntax has the advantage of letting the return type be deduced, as in:
template<class T, class U>
auto bar(T t, U u) -> decltype(t + u);
But then why the return type was put before the function name in the first place? I imagine that one answer will be that there was no need for such type deduction in that time. If so, is there a reason for a hypothetical new programming language to not use trailing return type by default?

As always, K&R are the "bad guys" here. They devised that function syntax for C, and C++ basically inherited it as-is.
Wild guessing here:
In C, the declaration should hint at the usage, i.e., how to get the value out of something. This is reflected in:
simple values: int i;, int is accessed by writing i
pointers: int *p;, int is accessed by writing *p
arrays: int a[n];, int is accessed by writing a[n]
functions: int f();, int is accessed by writing f()
So, the whole choice depended on the "simple values" case. And as #JerryCoffin already noted, the reason we got type name instead of name : type is probably buried in the ancient history of programming languages. I guess K&R took type name as it's easier to put the emphasis on usage and still have pointers etc. be types.
If they had chosen name : type, they would've either disjoined usage from declarations: p : int* or would've made pointers not be types anymore and instead be something like a decoration to the name: *p : int.
On a personal note: Imagine if they had chosen the latter and C++ inherited that - it simply wouldn't have worked, since C++ puts the emphasis on types instead of usage. This is also the reason why int* p is said to be the "C++ way" and int *p to be the "C way".
If so, is there a reason for a hypothetical new programming language to not use trailing return type by default?
Is there a reason to not use deduction by default? ;) See, e.g. Python or Haskell (or any functional language for that matter, IIRC) - no return types explicitly specified. There's also a movement to add this feature to C++, so sometime in the future you might see just auto f(auto x){ return x + 42; } or even []f(x){ return x + 42; }.

C++ is based on C, which was based on B, which was based on BCPL which was based on CPL.
I suspect that if you were to trace the whole history, you'd probably end up at Fortran, which used declarations like integer x (as opposed to, for example, Pascal, which used declarations like: var x : integer;).
Likewise, for a function, Pascal used something like function f(<args>) : integer; Under the circumstances, it's probably safe to guess that (at least in this specific respect) Pascal's syntax probably would have fit a bit better with type deduction.

Explain "C fundamentally has a corrupt type system"

In the book Coders at Work (p355), Guy Steele says of C++:
I think the decision to be
backwards-compatible with C is a fatal
flaw. It’s just a set of difficulties
that can’t be overcome. C
fundamentally has a corrupt type
system. It’s good enough to help you
avoid some difficulties but it’s not
airtight and you can’t count on it
What does he mean by describing the type system as "corrupt"?
Can you demonstrate with a simple example in C?
Edit:
The quote sounds polemic, but I'm not trying to be. I simply want to understand what he means.
Please give examples in C not C++. I'm interested in the "fundamentally" part too :)

The obvious examples in C of non-type-safety simply come from the fact you can cast from void * to any type without having to explicitly cast so.
struct X
{
int x;
};
struct Y
{
double y;
};
struct X xx;
xx.x = 1;
void * vv = &xx;
struct Y * yy = vv; /* no need to cast explicitly */
printf( "%f", yy->y );
Of course printf itself is not exactly typesafe.
C++ is not totally typesafe.
struct Base
{
int b;
};
struct Derived : Base
{
int d;
Derived()
{
b = 1;
d = 3;
}
};
Derived derivs[50];
Base * bb = &derivs[0];
std::cout << bb[3].b << std::endl;
It has no problem converting the Derived* to a Base* but you run into problems when you try using the Base* as an array as it will get the pointer arithmetic all wrong and whilst all the b values are 1 you may well get a 3 (As the ints will go 1-3-1-3 etc)

Basically you can cast any data type to any data type
struct SomeStruct {
void* data;
};
struct SomeStruct object;
*( (int*) &object ) = 10;
and noone catches you.

char buffer[42];
FunctionThatDestroysTheStack(buffer); // By writing 43 chars or more

The C type system does have some problems. Things like implicit function declaration and implicit conversion from void* can SILENTLY break type safety.
C++ fixes pretty much all of these holes. The C++ type system is NOT backwards compatible with C, it's only compatible with well-written typesafe C code.
Furthermore, the people arguing against C++ typically point you to Java or C# as the "solution". Yet Java and C# do have holes in their type system (array covariance). C++ doesn't have this problem.
EDIT: Examples, in C++, attempting to use array covariance that would (improperly) be allowed by the Java and C# type systems.
#include <stdlib.h>
struct Base {};
struct Derived : Base {};
template<size_t N>
void func1( Base (&array)[N] );
void func2( Base** pArray );
void func3( Base*& refArray );
void test1( void )
{
Base b[40];
Derived d[40];
func1(b); // ok
func1(d); // error caught by C++ type system
}
void test2( void )
{
Base* b[40] = {};
Derived* d[40] = {};
func2(b); // ok
func2(d); // error caught by C++ type system
func3(b[0]); // ok
func3(d[0]); // error caught by C++ type system
}
Results:
Comeau C/C++ 4.3.10.1 (Oct 6 2008 11:28:09) for ONLINE_EVALUATION_BETA2
Copyright 1988-2008 Comeau Computing. All rights reserved.
MODE:strict errors C++ C++0x_extensions
"ComeauTest.c", line 19: error: no instance of function template "func1" matches
the argument list
The argument types that you used are: (Derived [40])
func1(d); // error caught by C++ type system
^
"ComeauTest.c", line 28: error: argument of type "Derived **" is incompatible with
parameter of type "Base **"
func2(d); // error caught by C++ type system
^
"ComeauTest.c", line 31: error: a reference of type "Base *&" (not const-qualified)
cannot be initialized with a value of type "Derived *"
func3(d[0]); // error caught by C++ type system
^
3 errors detected in the compilation of "ComeauTest.c".
This doesn't mean that there are no holes at all in the C++ type system, but it does show that you can't silently overwrite a pointer-to-Derived with a pointer-to-Base like Java and C# allow.

You'd have to ask him what he meant to get a definitive answer, or perhaps provide more context for that quote.
However, it is pretty clear that if this is a fatal flaw for C++, the disease is chronic, not acute - C++ is thriving, and continually evolving as evidenced by ongoing Boost and C++0x efforts.
I don't even think about C and C++ as coupled any more - a few weeks on the respective fora here quickly cures one of any confusion over the fact that they are two different languages, each with its own strengths and foibles.

IMHO the "most broken" part of the C type system is that the concepts of
values/parameters that are optional
mutable values/pass-by-reference
arrays
non-POD function parameters
are all mapped to the single language concept "pointer". That means, if you get a function parameter of type X*, it might be an optional parameter, it might be expected that the function changes the value pointed to by X*, it might be that there are multiple instances of X after the one pointed to (it's open how many - the number could be passed as a separate parameter, or some kind special "terminator" value might mark the end of the array, as in nul-terminated strings). Or, the parameter might simply by a single structure, that you're not expected to change, but it's cheaper to pass it by reference.
If you get something of type X**, it might be an array of optional values, or it might be an array of simple values and you're expected to change it. Or it might be a 2d jagged array. Or an optional value passed by reference.
In contrast, take the ML family of languages (F#, OCaML, SML). Here these concepts map to separate language constructs:
values that are optional have the type X option
values that are mutable/pass by reference have the type X ref
arrays have the type X array
and non-POD types can be passed like PODs. Because they aren't mutable, the compiler can pass them by reference internally, but you don't need to know about that implementation detail
And you can of course combine those, i.e. int optional ref is a mutable value, that can be set to nothing or some integer value. int ref optional on the other hand is an optional mutable value; it can be nothing (and noone can change it) or it can be some mutable int (and you can change it to any other mutable it, but not to nothing).
These distinctions are very sublte, but you have to make them whether you program in ML or not. In C you have to make the same distinctions, but they're not explicitly stated in the type system. You have to read the documentation very carefully, or you might introduce sublte (read: hard to find) bugs if you misunderstand which kind of pointer usage is meant when.

Here, "corrupt" means that it is not "strict", leading to never-ending delight in C++ (because of the many custom types (objects) and overloaded operators, casting becomes a superior nuisance in C++).
The attack against C comes in regard to its MISPLACED USAGE as a strict OOP basis.
C has never been designed to limit coders, hence, maybe the frustration of Academia (and the flamboyant splendor of the ++ given to the World by B.S.).
"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind"
(Alan Kay)

Is there a reason to use enum to define a single constant in C++ code?

The typical way to define an integer constant to use inside a function is:
const int NumbeOfElements = 10;
the same for using within a class:
class Class {
...
static const int NumberOfElements = 10;
};
It can then be used as a fixed-size array bound which means it is known at compile time.
Long ago compilers didn't support the latter syntax and that's why enums were used:
enum NumberOfElementsEnum { NumberOfElements = 10; }
Now with almost every widely used compiler supporting both the in-function const int and the in-class static const int syntax is there any reason to use the enum for this purpose?

The reason is mainly brevity. First of all, an enum can be anonymous:
class foo {
enum { bar = 1 };
};
This effectively introduces bar as an integral constant. Note that the above is shorter than static const int.
Also, no-one could possibly write &bar if it's an enum member. If you do this:
class foo {
static const int bar = 1;
}
and then the client of your class does this:
printf("%p", &foo::bar);
then he will get a compile-time linker error that foo::bar is not defined (because, well, as an lvalue, it's not). In practice, with the Standard as it currently stands, anywhere bar is used where an integral constant expression is not required (i.e. where it is merely allowed), it requires an out-of-class definition of foo::bar. The places where such an expression is required are: enum initializers, case labels, array size in types (excepting new[]), and template arguments of integral types. Thus, using bar anywhere else technically requires a definition. See C++ Core Language Active Issue 712 for more info - there are no proposed resolutions as of yet.
In practice, most compilers these days are more lenient about this, and will let you get away with most "common sense" uses of static const int variables without requiring a definition. However, the corner cases may differ, however, so many consider it to be better to just use anonymous enum, for which everything is crystal clear, and there's no ambiguity at all.

Defining static constants directly in the class definition is a later addition to C++ and many still stick to the older workaround of using an enum for that. There might even be a few older compilers still in use which don't support static constants directly defined in class definitions.

Use of enum have one advantage. An enum type is a type, so if you define, for example:
enum EnumType { EnumValue1 = 10, EnumValue2 = 20 ... };
and you have a function like:
void Function1(EnumType Value)
the compiler checks that you are passing a member of the enum EnumType to the function, so only valid values for parameter Value would be EnumValue1 and EnumValue2. If you use constants and change the function to
void Function1(int Value)
the compiler checks that you are passing an int (any int, a constant, variable or literal) to the function.
Enum types are good for grouping related const-values. For only one const value, I do not see any advantage.

In your case, I'd use a constant as well. However, there are other cases where I might be adding other, related constants. Like this:
const int TextFile = 1; // XXX Maybe add other constants for binary files etc.?
In such cases, I use an enum right away with a single value, like this:
enum FileType {
TextFile = 1
// XXX Maybe add other values for binary files etc.?
}
The reason is that the compiler can then issue warnings when I'm using the constant value in switch expressions, as in:
FileType type;
// ...
switch ( type ) {
case TextFile:
// ...
}
In case I decide to add another constant value which is related to the existing value (a different type of file, in this example), virtually all compilers will issue a warning since the new value is not handled in the switch statement.
If I had used 'int' and constants instead, the compiler wouldn't have a chance to issue warnings.

The only reason for using the "enum hack" is that old compilers do not support in-class const definitions, as you say in your question. So, unless you suspect that your code will be ported to an old compiler, you should use const where const is due.

I think that there's no reason to use an enum, and that it's actually better to use a static const int for this purpose, since an enum has its own type (even if implicitly convertible to an integer).

There is a difference between those two. Enums don't have an adress as far as I know. static const ints do though. So if someone takes the adress of the const static int, casts away the const, he can modify the value (although the compiler might ignore the change because he thinks it's const). This is of course pure evil and you should not do it - but the compiler can't prevent it. This can't happen with enums.
And of course - if you (for some reason) need the adress of that const, you need the static const int.
In short - enum is an rvalue while const static int is an lvalue. See http://www.embedded.com/story/OEG20011129S0065 for more details.

Well, the portability is a good reason for using enum. It is great, because you don't have to worry whether your compiler supports "static const int S = 10" or not...
Also, as far as I remember, static variable must be defined somewhere, as well as declared, and the enum value must be declared only.

bottom line - use const.
more details:
I'm not a c++ expert, but this seems a more general design question.
If it's not a must, and you think there's a very low/non-existent probability that the enum will grow to have more then one value, then use a regular const.
even if you are wrong and at some point in the future there will be more values, making an enum the right choice - a simple refactoring and you change you const to enum.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How should C bitflag enumerations be translated into C++? - c++

It sounds like an ideal application for a cast - it's up to you to tell the compiler that yes, you DO mean to instantiate a Foo with a random integer. Of course, technically speaking, Foo_First | Foo_Second isn't a valid value for a Foo.

Either leave the result as an int or static_cast: Foo x = static_cast<Foo>(Foo_First | Foo_Second); // not an error in C++

Related

Type conversion vs type disambiguation

Let the compiler make the final choice which type to use

Why is return type before the function name?

Explain "C fundamentally has a corrupt type system"

Is there a reason to use enum to define a single constant in C++ code?

Categories

Resources