How to convert between int * and enum *? - c++

From some C legacy code I get a number of constants as int *. In the C++ part, I have an enum of underlying type int. Conversion between the enum and int on a single value basis works. However, conversion between int * and enum * is not possible. See code example below.
Why is that and how would I convert a pointer to some int values to a pointer to int enums and vice versa? I kind of expect it to work since the single value conversions work and the underlying types are the same. I read about What happens if you static_cast invalid value to enum class?
but could not determine if potentially invalid values play a role here.
int i = 3;
enum E : int;
E e;
e = static_cast<E>(i); // ok
i = static_cast<int>(e); // ok
int *j;
E * f;
j = static_cast<int *>(&i); // ok
f = static_cast<E *>(&i); // 'static_cast': cannot convert from 'int *' to 'E *'
j = static_cast<int *>(&e); // 'static_cast': cannot convert from 'E *' to 'int *'
// now use j and f
*j = *f;

Why is that?
From the compiler point of view int* and E* are pointers of different non-related types, that is why static_cast is not applicable here.
How would I convert a pointer to some int values to a pointer to int enums and vice versa?
You might try reinterpret_cast instead of static_cast:
f = reinterpret_cast<E *>(&i);
j = reinterpret_cast<int *>(&e);
From reinterpret_cast:
Any pointer to object of type T1 can be converted to pointer to object of another type cv T2
However, note, that dereferencing f or j (i.e. with *f or *j) will be a violation of the strict aliasing rule (for more details see the discussion below). This means that this kind of conversion, though strictly possible, is usually not useful.

The default 'base type' of an enum is int and can be explicitly specified in the OP. Logically the value stored at E* e where E is an enumeration with base type int is an int. It can't be statically cast.
There's no guarantee in C++ that an enum of base type (say) is layout compatible with short but even if the language tightened up that point there could be issues of type compatibility/
One issue is that E* to int*pi would violate type-safety because pi could be used to quietly set values outside the enumeration.
Similarly int* to E* may violate type safety if the integer value isn't in the enumeration.
Note however the standard makes a clear note that there's nothing to preclude an enum taking a value outside its defined set of values:
This set of values is used to define promotion and conversion semantics for the enumeration type. It does not preclude an
expression of enumeration type from having a value that falls outside this range.
See here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf (see note 95 bottom of p. 156)
The only case that could (if layout-compatibility were assured) be valid is E* to const int* because all values of E* are ints and the value cannot be (correctly) modified through the int * pointer without a further violation of the type system.
But I think the language definition is not that subtle.

Related

Is converting between pointer-to-T, array-of-T and pointer-to-array-of-T ever undefined behaviour?

Consider the following code.
#include <stdio.h>
int main() {
typedef int T;
T a[] = { 1, 2, 3, 4, 5, 6 };
T(*pa1)[6] = (T(*)[6])a;
T(*pa2)[3][2] = (T(*)[3][2])a;
T(*pa3)[1][2][3] = (T(*)[1][2][3])a;
T *p = a;
T *p1 = *pa1;
//T *p2 = *pa2; //error in c++
//T *p3 = *pa3; //error in c++
T *p2 = **pa2;
T *p3 = ***pa3;
printf("%p %p %p %p %p %p %p\n", a, pa1, pa2, pa3, p, p1, p2, p3);
printf("%d %d %d %d %d %d %d\n", a[5], (*pa1)[5],
(*pa2)[2][1], (*pa3)[0][1][2], p[5], p1[5], p2[5], p3[5]);
return 0;
}
The above code compiles and runs in C, producing the expected results. All the pointer values are the same, as are all the int values. I think the result will be the same for any type T, but int is the easiest to work with.
I confessed to being initially surprised that dereferencing a pointer-to-array yields an identical pointer value, but on reflection I think that is merely the converse of the array-to-pointer decay we know and love.
[EDIT: The commented out lines trigger errors in C++ and warnings in C. I find the C standard vague on this point, but this is not the real question.]
In this question, it was claimed to be Undefined Behaviour, but I can't see it. Am I right?
Code here if you want to see it.
Right after I wrote the above it dawned on me that those errors are because there is only one level of pointer decay in C++. More dereferencing is needed!
T *p2 = **pa2; //no error in c or c++
T *p3 = ***pa3; //no error in c or c++
And before I managed to finish this edit, #AntonSavin provided the same answer. I have edited the code to reflect these changes.
This is a C-only answer.
C11 (n1570) 6.3.2.3 p7
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned*) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer.
*) In general, the concept “correctly aligned” is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
The standard is a little vague what happens if we use such a pointer (strict aliasing aside) for anything else than converting it back, but the intent and wide-spread interpretation is that such pointers should compare equal (and have the same numerical value, e.g. they should also be equal when converted to uintptr_t), as an example, think about (void *)array == (void *)&array (converting to char * instead of void * is explicitly guaranteed to work).
T(*pa1)[6] = (T(*)[6])a;
This is fine, the pointer is correctly aligned (it’s the same pointer as &a).
T(*pa2)[3][2] = (T(*)[3][2])a; // (i)
T(*pa3)[1][2][3] = (T(*)[1][2][3])a; // (ii)
Iff T[6] has the same alignment requirements as T[3][2], and the same as T[1][2][3], (i), and (ii) are safe, respectively. To me, it sounds strange, that they couldn’t, but I cannot find a guarantee in the standard that they should have the same alignment requirements.
T *p = a; // safe, of course
T *p1 = *pa1; // *pa1 has type T[6], after lvalue conversion it's T*, OK
T *p2 = **pa2; // **pa2 has type T[2], or T* after conversion, OK
T *p3 = ***pa3; // ***pa3, has type T[3], T* after conversion, OK
Ignoring the UB caused by passing int * where printf expects void *, let’s look at the expressions in the arguments for the next printf, first the defined ones:
a[5] // OK, of course
(*pa1)[5]
(*pa2)[2][1]
(*pa3)[0][1][2]
p[5] // same as a[5]
p1[5]
Note, that strict aliasing isn’t a problem here, no wrongly-typed lvalue is involved, and we access T as T.
The following expressions depend on the interpretation of out-of-bounds pointer arithmetic, the more relaxed interpretation (allowing container_of, array flattening, the “struct hack” with char[], etc.) allows them as well; the stricter interpretation (allowing a reliable run-time bounds-checking implementation for pointer arithmetic and dereferencing, but disallowing container_of, array flattening (but not necessarily array “lifting”, what you did), the struct hack, etc.) renders them undefined:
p2[5] // UB, p2 points to the first element of a T[2] array
p3[5] // UB, p3 points to the first element of a T[3] array
The only reason your code compiles in C is that your default compiler setup allows the compiler to implicitly perform some illegal pointer conversions. Formally, this is not allowed by C language. These lines
T *p2 = *pa2;
T *p3 = *pa3;
are ill-formed in C++ and produce constraint violations in C. In casual parlance, these lines are errors in both C and C++ languages.
Any self-respecting C compiler will issue (is actually required to issue) diagnostic messages for these constraint violations. GCC compiler, for one example, will issue "warnings" telling you that pointer types in the above initializations are incompatible. While "warnings" are perfectly sufficient to satisfy standard requirements, if you really want to use GCC compiler's ability to recognize constraint violating C code, you have to run it with -pedantic-errors switch and, preferably, explicitly select standard language version by using -std= switch.
In your experiment, C compiler performed these implicit conversions for you as a non-standard compiler extension. However, the fact that GCC compiler running under ideone front completely suppressed the corresponding warning messages (issued by the standalone GCC compiler even in its default configuration) means that ideone is a broken C compiler. Its diagnostic output cannot be meaningfully relied upon to tell valid C code from invalid one.
As for the conversion itself... It is not undefined behavior to perform this conversion. But it is undefined behavior to access array data through the converted pointers.
UPDATE: The following applies to C++ only, for C scroll down.
In short, there's no UB in C++ and there is UB in C.
8.3.4/7 says:
A consistent rule is followed for multidimensional arrays. If E is an n-dimensional array of rank i x j x ... x k,
then E appearing in an expression that is subject to the array-to-pointer conversion (4.2) is converted to a
pointer to an (n - 1)-dimensional array with rank j x ... x k. If the * operator, either explicitly or implicitly
as a result of subscripting, is applied to this pointer, the result is the pointed-to (n - 1)-dimensional array,
which itself is immediately converted into a pointer.
So this won't produce error in C++ (and will work as expected):
T *p2 = **pa2;
T *p3 = ***pa3;
Regarding whether this is UB or not. Consider the very first conversion:
T(*pa1)[6] = (T(*)[6])a;
In C++ it's in fact
T(*pa1)[6] = reinterpret_cast<T(*)[6]>(a);
And this is what the standard says about reinterpret_cast:
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue
v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast< cv
T2 * >(static_cast< cv void * >(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment
requirements of T2 are no stricter than those of T1, or if either type is void.
So a is converted to pa1 through static_cast to void* and back. Static cast to void* is guaranteed to return the real address address of a as stated in 4.10/2:
A prvalue of type “pointer to cv T,” where T is an object type, can be converted to a prvalue of type “pointer
to cv void”. The result of converting a non-null pointer value of a pointer to object type to a “pointer to
cv void” represents the address of the same byte in memory as the original pointer value.
Next static cast to T(*)[6] is again guaranteed to return the same address as stated in 5.2.9/13:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T,” where T is
an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. The null pointer
value is converted to the null pointer value of the destination type. If the original pointer value represents
the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer
value represents the same address as the original pointer value, that is, A
So the pa1 is guaranteed point to the same byte in memory as a, and so access to data through it is perfectly valid because the alignment of arrays is the same as the alignment of underlying type.
What about C?
Consider again:
T(*pa1)[6] = (T(*)[6])a;
In C11 standard, 6.3.2.3/7 states the following:
A pointer to an object type may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced type, the behavior is
undefined. Otherwise, when converted back again, the result shall compare equal to the
original pointer. When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
It means that unless the conversion is to char*, the value of converted pointer is not guaranteed to be equal to value of original pointer, resulting in undefined behavior when accessing data through converted pointer. In order to make it work, the conversion has to be done explicitly through void*:
T(*pa1)[6] = (T(*)[6])(void*)a;
Conversions back to T*
T *p = a;
T *p1 = *pa1;
T *p2 = **pa2;
T *p3 = ***pa3;
All of these are conversions from array of T to pointer to T, which are valid in both C++ and C, and no UB is triggered by accessing the data through converted pointers.

Is C++11 auto keyword exactly defined for all cases? Or: how does auto know what I intend?

Let's say, in C++11, I do
auto a = 4;
What will a be? An int (as I often read), an unsigned int, a short, a long, a size_t, a char? Is the behaviour of auto always defined, will it always be the exact same type (with the exact same bit length!) on each compiler and each architecture?
Another example:
class A{};
class B:A{};
auto x = new B();
Will x be of type *B or of type *A? Always the same on each compiler and platform? Both are perfectly legit, how does the compiler know which one I intend?
Is there an exact list of the auto behaviour?
What will a be?
int, since that's the type of 4.
Will x be of type *B or of type *A?
B*, since that's the type of new B().
Is there an exact list of the auto behaviour?
Usually, it's the type of the initialiser; unless that's a reference type, in which case it's the underlying object type. There are a few other wrinkles for unusual types like arrays, as mentioned in the comments.
will it always be the exact same type (with the exact same bit length!) on each compiler and each architecture?
In most cases, the initialiser has a well-defined type, and that determines the type deduced by auto.
If the initialiser is an integer literal, then the type might depend on the platform; for example, 1000000 might be int on a 32-bit platform, but long on a 16-bit platform.
Every expression in C++ has a type. auto can only be used
when there is an initialization expression. The type will be
the type of that expression. The expression 4, for example,
has type int, always, and the type of new B() is B*,
always.
Of course, the fact that the type is clear to the compiler
doesn't mean that it is clear to the reader. Abuse of auto is
a good way of rendering a program unreadable, and also of making
it fragile, since the compiler cannot check whether they type of
the initialization expression is compatible with the desired
type.
In the first case, a will be a int, see bellow:
auto a = 4 ; // int
auto b = 4U ; // unsigned int
auto c = 4L ; // long int
auto d = 4LLU ; // unsigned long long int, maybe it's ULL i don't remember...
auto x = 4.0 ; // double
auto y = 4.0f ; // float
In fact, there is a way to write any 'type of' int in C.
For new B(), well the compiler will take the only answer, which is B *.
auto match the type of the right value the variable is assigned to, without trying to infer anything, it's not its job!
You should not see the auto keyword as a magic stuff, but just something than may help you in case you don't want to have big declaration type.
The compiler doesn't know what you intend, and it doesn't care. The type of a or x is the type of the expression on the right hand side. Since the type of 4 is int, and the type of new B() is B*, it's the same as if you wrote int a = 4; B* x = new B();

Type Conversion/Casting Confusion in C++

What is Type Conversion and what is Type Casting?
When should I use each of them?
Detail: Sorry if this is an obvious question; I'm new to C++, coming from a ruby background and being used to to_s and to_i and the like.
Conversion is when a value is, um, converted to a different type. The result is a value of the target type, and there are rules for what output value results from what input (of the source type).
For example:
int i = 3;
unsigned int j;
j = i; // the value of "i" is converted to "unsigned int".
The result is the unsigned int value that is equal to i modulo UINT_MAX+1, and this rule is part of the language. So, in this case the value (in English) is still "3", but it's an unsigned int value of 3, which is subtly different from a signed int value of 3.
Note that conversion happened automatically, we just used a signed int value in a position where an unsigned int value is required, and the language defines what that means without us actually saying that we're converting. That's called an "implicit conversion".
"Casting" is an explicit conversion.
For example:
unsigned int k = (unsigned int)i;
long l = long(i);
unsigned int m = static_cast<unsigned int>(i);
are all casts. Specifically, according to 5.4/2 of the standard, k uses a cast-expression, and according to 5.2.3/1, l uses an equivalent thing (except that I've used a different type). m uses a "type conversion operator" (static_cast), but other parts of the standard refer to those as "casts" too.
User-defined types can define "conversion functions" which provide specific rules for converting your type to another type, and single-arg constructors are used in conversions too:
struct Foo {
int a;
Foo(int b) : a(b) {} // single-arg constructor
Foo(int b, int c) : a(b+c) {} // two-arg constructor
operator float () { return float(a); } // conversion function
};
Foo f(3,4); // two-arg constructor
f = static_cast<Foo>(4); // conversion: single-arg constructor is called
float g = f; // conversion: conversion function is called
Classic casting (something like (Bar)foo in C, used in C++ with reinterpret_cast<>) is when the actual memory contents of a variable are assumed to be a variable of a different type. Type conversion (ie. Boost's lexical_cast<> or other user-defined functions which convert types) is when some logic is performed to actually convert a variable from one type to another, like integer to a string, where some code runs to logically form a string out of a given integer.
There is also static and dynamic casting, which are used in inheritance, for instance, to force usage of a parent's member functions on a child's type (dynamic_cast<>), or vice-versa (static_cast<>). Static casting also allows you to perform the typical "implicit" type conversion that occurs when you do something like:
float f = 3.14;
int i = f; //float converted to int by dropping the fraction
which can be rewritten as:
float f = 3.14;
int i = static_cast<int>(f); //same thing
In C++, any expression has a type. when you use an expression of one type (say type S) in a context where a value of another type is required (say type D), the compiler tries to convert the expression from type S to type D. If such an implicit conversion doesn't exist, this results in an error. The word type cast is not standard but is the same as conversion.
E.G.
void f(int x){}
char c;
f(c); //c is converted from char to int.
The conversions are ranked and you can google for promotions vs. conversions for more details.
There are 5 explicit cast operators in C++ static_cast, const_cast, reinterpret_cast and dynamic_cast, and also the C-style cast
Type conversion is when you actually convert a type in another type, for example a string into an integer and vice-versa, a type casting is when the actual content of the memory isn't changed, but the compiler interpret it in a different way.
Type casting indicates you are treating a block of memory differently.
int i = 10;
int* ip = &i;
char* cp = reinterpret_cast<char*>(ip);
if ( *cp == 10 ) // Here, you are treating memory that was declared
{ // as int to be char.
}
Type conversion indicates that you are converting a value from one type to another.
char c = 'A';
int i = c; // This coverts a char to an int.
// Memory used for c is independent of memory
// used for i.

Assigning a pointer variable to a const int in C++?

I'm wondering if anyone can explain the following to me: If I write
int i = 0;
float* pf = i;
I get a compile error (gcc 4.2.1):
error: invalid conversion from ‘int’ to ‘float*’
Makes sense - they are obviously two completely different types. But if instead I write
const int i = 0;
float* pf = i;
It compiles without error. Why should the 'const' make a difference on the right hand side of the assignment? Isn't part of the idea of the 'const' keyword to be able to enforce type constraints for constant values?
Any explanation I have been able to come up with feels kind of bogus. And none of my explanations also explain the fact that
const int i = 1;
float* pf = i;
fails to compile. Can anyone offer an explanation?
Your second example simply happens to be covered by the conversion rules as specified in §4.10/1 (C++03):
A null pointer constant is an integral constant expression (5.19) rvalue of integer type that evaluates to zero. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type and is distinguishable from every other value of pointer to object or pointer to function type.

How do C/C++ compilers handle type casting between types with different value ranges?

How do type casting happen without loss of data inside the compiler?
For example:
int i = 10;
UINT k = (UINT) k;
float fl = 10.123;
UINT ufl = (UINT) fl; // data loss here?
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p;
How does the compiler handle this type of typecasting? A low-level example showing the bits would be highly appreciated.
Well, first note that a cast is an explicit request to convert a value of one type to a value of another type. A cast will also always produce a new object, which is a temporary returned by the cast operator. Casting to a reference type, however, will not create a new object. The object referenced by the value is reinterpreted as a reference of a different type.
Now to your question. Note that there are two major types of conversions:
Promotions: This type can be thought of casting from a possibly more narrow type to a wider type. Casting from char to int, short to int, float to double are all promotions.
Conversions: These allow casting from long to int, int to unsigned int and so forth. They can in principle cause loss of information. There are rules for what happens if you assign a -1 to an unsigned typed object for example. In some cases, a wrong conversion can result in undefined behavior. If you assign a double larger than what a float can store to a float, the behavior is not defined.
Let's look at your casts:
int i = 10;
unsigned int k = (unsigned int) i; // :1
float fl = 10.123;
unsigned int ufl = (unsigned int) fl; // :2
char *p = "Stackoverflow Rocks";
unsigned char *up = (unsigned char *) p; // :3
This cast causes a conversion to happen. No loss of data happens, since 10 is guaranteed to be stored by an unsigned int. If the integer were negative, the value would basically wrap around the maximal value of an unsigned int (see 4.7/2).
The value 10.123 is truncated to 10. Here, it does cause lost of information, obviously. As 10 fits into an unsigned int, the behavior is defined.
This actually requires more attention. First, there is a deprecated conversion from a string literal to char*. But let's ignore that here. (see here). More importantly, what does happen if you cast to an unsigned type? Actually, the result of that is unspecified per 5.2.10/7 (note the semantics of that cast is the same as using reinterpret_cast in this case, since that is the only C++ cast being able to do that):
A pointer to an object can be explicitly converted to a pointer to
an object of different type. Except that converting an rvalue of type “pointer to T1” to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So you are only safe to use the pointer after you cast back to char * again.
The two C-style casts in your example are different kinds of cast. In C++, you'd normally write them
unsigned int uf1 = static_cast<unsigned int>(fl);
and
unsigned char* up = reinterpret_cast<unsigned char*>(p);
The first performs an arithmetic cast, which truncates the floating point number, so there is data loss.
The second makes no changes to data - it just instructs the compiler to treat the pointer as a different type. Care needs to be taken with this kind of cast: it can be very dangerous.
"Type" in C and C++ is a property assigned to variables when they're handled in the compiler. The property doesn't exist at runtime anymore, except for virtual functions/RTTI in C++.
The compiler uses the type of variables to determine a lot of things. For instance, in the assignment of a float to an int, it will know that it needs to convert. Both types are probably 32 bits, but with different meanings. It's likely that the CPU has an instruction, but otherwise the compiler would know to call a conversion function. I.e.
& __stack[4] = float_to_int_bits(& __stack[0])
The conversion from char* to unsigned char* is even simpeler. That is just a different label. At bit level, p and up are identical. The compiler just needs to remember that *p requires sign-extension while *up does not.
Casts mean different things depending on what they are. They can just be renamings of a data type, with no change in the bits represented (most casts between integral types and pointers are like this), or conversions that don't even preserve length (such as between double and int on most compilers). In many cases, the meaning of a cast is simply unspecified, meaning the compiler has to do something reasonable but doesn't have to document exactly what.
A cast doesn't even need to result in a usable value. Something like
char * cp;
float * fp;
cp = malloc(100);
fp = (float *)(cp + 1);
will almost certainly result in a misaligned pointer to float, which will crash the program on some systems if the program attempts to use it.