use of reinterpret_cast while reading value from binary file [duplicate] - c++

I hear that reinterpret_cast is implementation defined, but I don't know what this really means. Can you provide an example of how it can go wrong, and it goes wrong, is it better to use C-Style cast?

The C-style cast isn't better.
It simply tries the various C++-style casts in order, until it finds one that works. That means that when it acts like a reinterpret_cast, it has the exact same problems as a reinterpret_cast. But in addition, it has these problems:
It can do many different things, and it's not always clear from reading the code which type of cast will be invoked (it might behave like a reinterpret_cast, a const_cast or a static_cast, and those do very different things)
Consequently, changing the surrounding code might change the behaviour of the cast
It's hard to find when reading or searching the code - reinterpret_cast is easy to find, which is good, because casts are ugly and should be paid attention to when used. Conversely, a C-style cast (as in (int)42.0) is much harder to find reliably by searching
To answer the other part of your question, yes, reinterpret_cast is implementation-defined. This means that when you use it to convert from, say, an int* to a float*, then you have no guarantee that the resulting pointer will point to the same address. That part is implementation-defined. But if you take the resulting float* and reinterpret_cast it back into an int*, then you will get the original pointer. That part is guaranteed.
But again, remember that this is true whether you use reinterpret_cast or a C-style cast:
int i;
int* p0 = &i;
float* p1 = (float*)p0; // implementation-defined result
float* p2 = reinterpret_cast<float*>(p0); // implementation-defined result
int* p3 = (int*)p1; // guaranteed that p3 == p0
int* p4 = (int*)p2; // guaranteed that p4 == p0
int* p5 = reinterpret_cast<int*>(p1); // guaranteed that p5 == p0
int* p6 = reinterpret_cast<int*>(p2); // guaranteed that p6 == p0

It is implementation defined in a sense that standard doesn't (almost) prescribe how different types values should look like on a bit level, how address space should be structured and so on. So it's really a very platform specific for conversions like:
double d;
int &i = reinterpret_cast<int&>(d);
However as standard says
It is intended to be unsurprising to those who know the addressing structure
of the underlying machine.
So if you know what you do and how it all looks like on a low-level nothing can go wrong.
The C-style cast is somewhat similar in a sense that it can perform reinterpret_cast, but it also "tries" static_cast first and it can cast away cv qualification (while static_cast and reinterpret_cast can't) and perform conversions disregarding access control (see 5.4/4 in C++11 standard). E.g.:
#include <iostream>
using namespace std;
class A { int x; };
class B { int y; };
class C : A, B { int z; };
int main()
{
C c;
// just type pun the pointer to c, pointer value will remain the same
// only it's type is different.
B *b1 = reinterpret_cast<B *>(&c);
// perform the conversion with a semantic of static_cast<B*>(&c), disregarding
// that B is an unaccessible base of C, resulting pointer will point
// to the B sub-object in c.
B *b2 = (B*)(&c);
cout << "reinterpret_cast:\t" << b1 << "\n";
cout << "C-style cast:\t\t" << b2 << "\n";
cout << "no cast:\t\t" << &c << "\n";
}
and here is an output from ideone:
reinterpret_cast: 0xbfd84e78
C-style cast: 0xbfd84e7c
no cast: 0xbfd84e78
note that value produced by reinterpret_cast is exactly the same as an address of 'c', while C-style cast resulted in a correctly offset pointer.

There are valid reasons to use reinterpret_cast, and for these reasons the standard actually defines what happens.
The first is to use opaque pointer types, either for a library API or just to store a variety of pointers in a single array (obviously along with their type). You are allowed to convert a pointer to a suitably sized integer and then back to a pointer and it will be the exact same pointer. For example:
T b;
intptr_t a = reinterpret_cast<intptr_t>( &b );
T * c = reinterpret_cast<T*>(a);
In this code c is guaranteed to point to the object b as you'd expected. Conversion back to a different pointer type is of course undefined (sort of).
Similar conversions are allowed for function pointers and member function pointers, but in the latter case you can cast to/from another member function pointer simply to have a variable that is big enouhg.
The second case is for using standard layout types. This is something that was de factor supported prior to C++11 and has now been specified in the standard. In this case the standard treats reinterpret_cast as a static_cast to void* first and then a static_cast to the desination type. This is used a lot when doing binary protocols where data structures often have the same header information and allows you to convert types which have the same layout, but differ in C++ class structure.
In both of these cases you should use the explicit reinterpret_cast operator rather than the C-Style. Though the C-style would normally do the same thing, it has the danger of being subjected to overloaded conversion operators.

C++ has types, and the only way they normally convert between each other is by well-defined conversion operators that you write. In general, that's all you both need and should use to write your programs.
Sometimes, however, you want to reinterpret the bits that represent a type into something else. This is usually used for very low-level operations and is not something you should typically use. For those cases, you can use reinterpret_cast.
It is implementation defined because the C++ standard does not really say much at all about how things should actually be laid out in memory. That is controlled by your specific implementation of C++. Because of this, the behaviour of reinterpret_cast depends upon how your compiler lays structures out in memory and how it implements reinterpret_cast.
C-style casts are quite similar to reinterpret_casts, but they have much less syntax and are not recommended. The thinking goes that casting is inherently an ugly operation and it requires ugly syntax to inform the programmer that something dubious is happening.
An easy example of how it could go wrong:
std::string a;
double* b;
b = reinterpret_cast<double*>(&a);
*b = 3.4;
That program's behaviour is undefined - a compiler could do anything it likes to that. Most probably, you would get a crash when the string's destructor is called, but who knows! It might just corrupt your stack and cause a crash in an unrelated function.

Both reinterpret_cast and c-style casts are implementation defined and they do almost the same thing. The differences are :
1. reinterpret_cast can not remove constness. For example :
const unsigned int d = 5;
int *g=reinterpret_cast< int* >( &d );
will issue an error :
error: reinterpret_cast from type 'const unsigned int*' to type 'int*' casts away qualifiers
2. If you use reinterpret_cast, it is easy to find the places where you did it. It is not possible to do with c-style casts

C-style casts sometimes type-pun an object in an unspecified way, such as (unsigned int)-1, sometimes convert the same value to a different format, such as (double)42, sometimes could do either, like how (void*)0xDEADBEEF reinterprets bits but (void*)0 is guaranteed to be a null pointer constant, which does not necessarily have the same object representation as (intptr_t)0, and very rarely tells the compiler to do something like shoot_self_in_foot_with((char*)&const_object);.
That's usually all well and good, but when you want to cast a double to a uint64_t, sometimes you want the value and sometimes you want the bits. If you know C, you know which one the C-style cast does, but it's nicer in some ways to have different syntax for both.
Bjarne Stroustrup, in his guidelines, recommended reinterpret_cast in another context: if you want to type-pun in a way that the language does not define by a static_cast, he suggested that you do it with something like reinterpret_cast<double&>(uint64) rather than the other methods. They're all undefined behavior, but that makes it very explicit what you're doing and that you're doing it on purpose. Reading a different member of a union than you last wrote to does not.

Related

Assigning an int value to enum and vice versa in C++

#include <iostream>
typedef enum my_time {
day,
night
} my_time;
int main(){
// my_time t1 = 1; <-- will not compile
int t2 = night;
return 0;
}
How is it expected that I can assign an enum value to an int but not the other way in C++?
Of course this is all doable in C.
Implicit conversions, or conversions in general, are not mutual. Just because a type A can be converted to a type B does not imply that B can be converted to A.
Old enums (unscoped enums) can be converted to integer but the other way is not possible (implicitly). Thats just how it is defined. See here for details: https://en.cppreference.com/w/cpp/language/enum
Consider that roughly speaking enums are just named constants and for a function
void foo(my_time x);
It is most likely an error to pass an arbitrary int. However, a
void bar(int x);
can use an enum for special values of x while others are still allowed:
enum bar_parameter { NONE, ONE, MORE, EVEN_MORE, SOME_OTHER_NAME };
bar(NONE);
bar(SOME_OTHER_NAME);
bar(42);
This has been "fixed" in C++11 with scoped enums that don't implicitly convert either way.
  You could cast to int. This expression makes an explicit conversion of the specified data type (int) and the given value (night).
int t2 = static_cast<int>(night)
Of course this is all doable in C
That doesn't mean that the minds behind C++ automatically consider it a desired behavior. Nor should they have such an attitude. C++ follows its own philosophy with regard to types. This is not the only aspect where a conscious decision was made to be more strongly typed than C. This valid C snippet is invalid in C++
void *vptr = NULL;
int *iptr = vptr; // No implicit conversion from void* to T* in C++
How is it expected that I can assign an enum value to an int but not the other way in C++?
It's the behavior because one side of the conversion is less error prone. Allowing an enumerator to become an integer isn't likely to break any assumptions the programmer has about an integer value.
An enumeration is a new type. Some of the enmueration's values are named. And for most cases of using an enumeration, we really do want to restrict ourselves to those named constants only.
Even if an enumeration can hold the integer value, it doesn't mean that value is one of the named constants. And that can easily violate the assumptions code has about its input.
// Precondition: e is one of the name values of Enum, under pain of UB
void frombulate_the_cpu(Enum e);
This function documents its precondition. A violation of the precondition can cause dire problems, that's what UB usually is. If an implicit conversion was possible everywhere in the program, it'd be that more likely that we violate the precondition unintentionally.
C++ is geared to catch problems at compile time whenever it can. And this is deemed problematic.
If a programmer needs to convert an integer an enumeration, they can still do it with a cast. Casts stand out in code-bases. They require a conscious decision to override a compiler's checks. And it's a good thing, because when something potentially unsafe is done, it should be with full awareness.
Cast the int when assigning . . .
my_time t1 = (my_time)1;

Should std::byte pointers be used for pointer arithmetic?

It seems std::byte has become the way (in C++17) to work with buffers holding object representations, but it's unclear whether this intent still allows for performing pointer arithmetic.
The question in the title is intentionally phrased as should because I'm looking for recommendation. For example, void* can be used for pointer arithmetic as gcc extensions but are not standard (at least this is true for C), hence a possibility but not a recommendation.
I know the motivation for std::byte is to detach the character and the numeric aspects from the concept of byte. But at the same time, does pointer arithmetic stay?
EDIT: adjusted to clarify that I'm looking to do "pointer arithmetic" using std::byte* not the numerical value of pointers stores in std::bytes
Yes, std::byte* can be used for pointer arithmetic.
And you can even do things like
struct foo{int x,y};
foo f;
int* ptr_to_y = reinterpret_cast<int*>(reinterpret_cast<std::byte*>(&f)+offsetof(foo,y));
You do have to be careful that your locations are reachable through your operations. Just because pointers-as-integers gets the right result doesn't mean that the C++ code is doing defined behavior. There are a number of quirks in C++ around permitting the optimizer to "know" that a certain value cannot be modified.
struct loc {
int x,y;
};
void f( int* );
loc work( loc l ) {
l.x=3;
f(&l.y);
return l;
}
in the above case, someone who used the &l.y pointer to do pointer arithmetic (within f) and modify l.x, regardless of if they went to std::byte* or not, would be doing undefined behavior. The compiler is allowed to assume the returned l will have an .x value of 3.
These are not new pitfalls introduced by std::byte*.

Why can I use static_cast With void* but not With char*

I know that reinterpret_cast is primarily used going to or from a char*.
But I was surprised to find that static_cast could do the same with a void*. For example:
auto foo "hello world"s;
auto temp = static_cast<void*>(&foo);
auto bar = static_cast<string*>(temp);
What do we gain from using reinterpret_cast and char* over static_cast and void*? Is it something to do with the strict aliasing problem?
Generally speaking, static_cast will do cast any two types if one of them can be cast to the other implicitly. That includes arithmetic casts, down-casts, up-casts and cast to and from void*.
That is, if this cast is valid:
void foo(A a);
B b;
foo(b);
Then the both static_cast<B>(a) and static_cast<A>(b) will also be valid.
Since any pointer can be cast implicitly to void*, thus your peculiar behavior.
reinterpret_cast do cast by reinterpreting the bit-pattern of the values. That, as you said in the question, is usually done to convert between unrelated pointer types.
Yes, you can convert between unrelated pointer types through void*, by using two static_cast:
B *b;
A *a1 = static_cast<A*>(b); //compiler error
A *a2 = static_cast<A*>(static_cast<void*>(b)); //it works (evil laugh)!
But that is bending the rules. Just use reinterpret_cast if you really need this.
Your question really has 2 parts:
Should I use static_cast or reinterpret_cast to work with a pointer to the underlying bit pattern of an object without concern for the object type?
If I should use reinterpret_cast is a void* or a char* preferable to address this underlying bit pattern?
static_cast: Converts between types using a combination of implicit and user-defined conversions
In 5.2.9[expr.static.cast]13 the standard, in fact, gives the example:
T* p1 = new T;
const T* p2 = static_cast<const T*>(static_cast<void*>(p1));
It leverages the implicit cast:
A prvalue pointer to any (optionally cv-qualified) object type T can be converted to a prvalue pointer to (identically cv-qualified) void. The resulting pointer represents the same location in memory as the original pointer value. If the original pointer is a null pointer value, the result is a null pointer value of the destination type.*
There is however no implicit cast from a pointer of type T to a char*. So the only way to accomplish that cast is with a reinterpret_cast.
reinterpret_cast: Converts between types by reinterpreting the underlying bit pattern
So in answer to part 1 of your question when you cast to a void* or a char* you are looking to work with the underlying bit pattern, reinterpret_cast should be used because it's use denotes to the reader a conversion to/from the underlying bit pattern.
Next let's compare void* to char*. The decision between these two may be a bit more application dependent. If you are going to use a standard library function with your underlying bit pattern just use the type that function accepts:
void* is used in the mem functions provided in the cstring library
read and write use char* as inputs
It's notable that C++ specific libraries prefer char* for pointing to memory.
Holding onto memory as a void* seems to have been preserved for compatibility reasons as pointer out here. So if a cstring library function won't be used on your underlying bit patern, use the C++ specific libraries behavior to answer part 2 of your question: Prefer char* to void*.

Is reinterpret_cast and c-style cast compatible (by C++ standard)?

The C++ standards mentions that reinterpret_cast is implementation defined, and doesn't give any guarantees except that casting back (using reinterpret_cast) to original type will result in original value passed to first.
C-style casting of at least some types behaves much the same way - casting back and forth results with the same value - Currently I am working with enumerations and ints, but there are some other examples as well.
While C++ standard gives those definitions for both cast-styles, does it also give the same guarantee for mixed casts? If library X returns from function int Y() some enum value, can use any of above casts, without worrying what cast was used to convert initial enum to int in Y's body? I don't have X's source code, so I cannot check (and it can change with next version anyway), and things like that are hardly mentioned in documentation.
I know that under most implementations in such cases both casts behave the same; my question is: what does C++ standard say about such cases - if anything at all.
C++ defines the semantic of the C cast syntax in terms of static_cast, const_cast and reinterpret_cast. So you get the same guaranteed for the same operation whatever syntax you use to achieve it.
reinterpret_cast can only be used for specific conversions:
Pointer to (sufficiently large) integer, and the reverse
Function pointer to function pointer
Object pointer to object pointer
Pointer-to-member to pointer-to-member
lvalue expression to reference
plus (conditionally) function pointer to object pointer and the reverse. In most cases, the converted value is unspecified, but there is a guarantee that a conversion followed by its reverse will yield the original value.
In particular, you can't use reinterpret_cast to convert between integer an enumeration types; the conversion must be done using static_cast (or implicitly, when converting an unscoped enumeration to an integer type), which is well defined for sufficiently large integer types. The only possible problem is if the library did something completely insane such as return reinterpret_cast<int&>(some_enum);
A C-style cast will perform either a static_cast or a reinterpret_cast, followed by a const_cast, as necessary; so any conversion that's well-defined by static_cast is also well-defined by a C-style cast.
No, reinterpret_cast is not equivalent to a C style cast. C style casts allow casting away const-volatile (so it includes the functionality of const_cast) not allowed in reinterpret_cast. If static_cast is allowed between the source and destination types, it will perform a static_cast which has different semantics than reinterpret_cast. It the conversion is not allowed, it will fallback to reinterpret_cast. Finally there is a corner case where the C cast cannot be represented in terms of any of the other casts: it ignores access specifiers.
Some examples that illustrate differences:
class b0 { int a; };
class b1 { int b; };
class b2 { int c; };
class d : public b0, public b1, b2 {};
int main() {
d x;
assert( static_cast<b1*>(&x) == (b1*)&x );
assert( reinterpret_cast<b1*>(&x) != (b1*)&x ); // Different value
assert( reinterpret_cast<b2*>(&x) != (b2*)&x ); // Different value,
// cannot be done with static_cast
const d *p = &x;
// reinterpret_cast<b0*>(p); // Error cannot cast const away
(b0*)p; // C style can
}

Which cast to use; static_cast or reinterpret_cast?

int i = 1000;
void *p = &i;
int *x = static_cast<int*>(p);
int *y = reinterpret_cast<int*>(p);
which cast should be used to convert from void* to int* and why?
static_cast provided that you know (by design of your program) that the thing pointed to really is an int.
static_cast is designed to reverse any implicit conversion. You converted to void* implicitly, therefore you can (and should) convert back with static_cast if you know that you really are just reversing an earlier conversion.
With that assumption, nothing is being reinterpreted - void is an incomplete type, meaning that it has no values, so at no point are you interpreting either a stored int value "as void" or a stored "void value" as int. void* is just an ugly way of saying, "I don't know the type, but I'm going to pass the pointer on to someone else who does".
reinterpret_cast if you've omitted details that mean you might actually be reading memory using a type other than the type is was written with, and be aware that your code will have limited portability.
By the way, there are not very many good reasons for using a void* pointer in this way in C++. C-style callback interfaces can often be replaced with either a template function (for anything that resembles the standard function qsort) or a virtual interface (for anything that resembles a registered listener). If your C++ code is using some C API then of course you don't have much choice.
In current C++, you can't use reinterpret_cast like in that code. For a conversion of void* to int* you can only use static_cast (or the equivalent C-style cast).
For a conversion between different function type pointers or between different object type pointers you need to use reinterpret_cast.
In C++0x, reinterpret_cast<int*>(p) will be equivalent to static_cast<int*>(p). It's probably incorporated in one of the next WPs.
It's a misconception that reinterpret_cast<T*>(p) would interpret the bits of p as if they were representing a T*. In that case it will read the value of p using p's type, and that value is then converted to a T*. An actual type-pun that directly reads the bits of p using the representation of type T* only happens when you cast to a reference type, as in reinterpret_cast<T*&>(p).
As far as I know, all current compilers allow to reinterpret_cast from void* and behave equivalent to the corresponding static_cast, even though it is not allowed in current C++03. The amount of code broken when it's rejected will be no fun, so there is no motivation for them to forbid it.
When should static_cast, dynamic_cast, const_cast and reinterpret_cast be used? gives some good details.
From the semantics of your problem, I'd go with reinterpret, because that's what you actually do.