Has a std::byte pointer the same aliasing implications as char*? - c++

C++ (and C) strict aliasing rules include that a char* and unsigned char* may alias any other pointer.
AFAIK there is no analogous rule for uint8_t*.
Thus my question: What are the aliasing rules for a std::byte pointer?
The C++ reference currently just specifies:
Like the character types (char, unsigned char, signed char) it can be used to access raw memory occupied by other objects (object representation), but unlike those types, it is not a character type and is not an arithmetic type.

From the current Standard draft ([basic.types]/2):
For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes ([intro.memory]) making up the object can be
copied into an array of char, unsigned char, or std​::​byte
([cstddef.syn]).43 If the content of that array is copied back into
the object, the object shall subsequently hold its original value.
So yes, the same aliasing rules apply for the three types, just as cppreference sums up.
It also might be valuable to mention ([basic.lval]/8.8):
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
a char, unsigned char, or std​::​byte type.

Related

reinterpret_cast between char* and std::byte*

I'm reading type aliasing rules but can't figure out if this code has UB in it:
std::vector<std::byte> vec = {std::byte{'a'}, std::byte{'b'}};
auto sv = std::string_view(reinterpret_cast<char*>(vec.data()), vec.size());
std::cout << sv << '\n';
I'm fairly sure it does not, but I often get surprised by C++.
Is reinterpret_cast between char*, unsigned char* and std::byte* always allowed?
Additionally, is addition of const legal in such cast, e.g:
std::array<char, 2> arr = {'a', 'b'};
auto* p = reinterpret_cast<const std::byte*>(arr.data());
Again, I suspect it is legal since it says
AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType
but I would like to be sure with reinterpret_casting once and for all.
The code is ok.
char* and std::byte* are allowed by the standard to alias any pointer type. (Be careful as the reverse is not true).
([basic.types]/2):
For any object (other than a base-class subobject) of trivially
copyable type T, whether or not the object holds a valid value of type
T, the underlying bytes ([intro.memory]) making up the object can be
copied into an array of char, unsigned char, or std​::​byte
([cstddef.syn]).43 If the content of that array is copied back into
the object, the object shall subsequently hold its original value.
([basic.lval]/8.8):
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
a char, unsigned char, or std​::​byte type.
And yes you can add const.

Converting from std::string to const unsigned int

My payload is stored in a std::string xyz (holds binary data), and I need to pass it to a function that takes it as const unsigned int*. How would I convert from std::string to const unsigned int*?
I tried reinterpret_cast<const unsigned int*>(&xyz.front()) but it is not working!
The function prototype is as follows:
void roll(void *pdst, const unsigned int *psrc);
pdst will hold the results.
Don't use std::string to store binary data; that class is specifically designed for working with strings. It feels like there was original C code that was using a char array to store a sequence of bytes and translated that to std::string for C++. In this case, it's not being used as a string, so it doesn't make sense to store it in a std::string.
From there, translating to an unsigned int, well for starters, you can't simply cast it even if you were using a more primitive type such as a char *, as it would violate the rules of strict aliasing resulting in undefined behavior. What you want to do is create a new variable and memcpy the data into this new variable.
Here is the section from the C++14 standard working draft describing compatible types (3.10 p10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:
54
— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type similar (as defined in 4.4) to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type
of the object,
— an aggregate or union type that includes one of the aforementioned types among its elements or non-
static data members (including, recursively, an element or non-static data member of a subaggregate
or contained union),
— a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
— a
char
or
unsigned char
type.
As you can see, it explicitly allows for accessing any object as a char or unsigned char, but it gives no such allowance to access a char or unsigned char as anything else.
The problem is how you store binary data to std string? If you are simply using the constructor, you could get your binary data by xyz.data().

Pointer arithmetics on non-array types

Let's consider following piece of code:
struct Blob {
double x, y, z;
} blob;
char* s = reinterpret_cast<char*>(&blob);
s[2] = 'A';
Assuming that sizeof(double) is 8, does this code trigger undefined behaviour?
Quoting from N4140 (roughly C++14):
3.9 Types [basic.types]
2 For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char.42 If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
42) By using, for example, the library functions (17.6.1.2) std::memcpy or std::memmove.
3 For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes (1.7) making up obj1 are copied
into obj2,43 obj2 shall subsequently hold the same value as obj1. [ Example: ... ]
43) By using, for example, the library functions (17.6.1.2) std::memcpy or std::memmove.
This does, in principle, allow assignment directly to s[2] if you take the position that assignment to s[2] is indirectly required to be equivalent to copying all of some other Blob into an array that just happens to be bytewise identical except for the third byte, and copying it into your Blob: you're not assigning to s[0], s[1], etc. For trivially copyable types including char, that is equivalent to setting them to the exact value they already have, which also has no observable effect.
However, if the only way to get s[2] == 'A' is by memory manipulation, then a valid argument could also be made that what you're copying back into your Blob isn't the underlying bytes that made up any previous Blob. In that case, technically, the behaviour would be undefined by omission.
I do strongly suspect, especially given the "whether or not the object holds a valid value of type T" comment, that it's intended to be allowed.
Chapter 3.10 of the standard seems to allow for that specific case, assuming that "access the stored value" means "read or write", which is unclear.
3.10-10
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
—(10.1) the dynamic type of the object,
—(10.2) a cv-qualified version of the dynamic type of the object,
—(10.3) a type similar (as defined in 4.4) to the dynamic type of the
object,
—(10.4) a type that is the signed or unsigned type corresponding to
the dynamic type of the object,
—(10.5) a type that is the signed or unsigned type corresponding to a
cv-qualified version of the dynamic type of the object,
—(10.6) an aggregate or union type that includes one of the
aforementioned types among its elements or nonstatic data members
(including, recursively, an element or non-static data member of a
subaggregate or contained union),
—(10.7) a type that is a (possibly cv-qualified) base class type of the
dynamic type of the object,
—(10.8) a char or unsigned char type.

Aliasing T* with char* is allowed. Is it also allowed the other way around?

Note: This question has been renamed and reduced to make it more focused and readable. Most of the comments refer to the old text.
According to the standard, objects of different type may not share the same memory location. So this would not be legal:
std::array<short, 4> shorts;
int* i = reinterpret_cast<int*>(shorts.data()); // Not OK
The standard, however, allows an exception to this rule: any object may be accessed through a pointer to char or unsigned char:
int i = 0;
char * c = reinterpret_cast<char*>(&i); // OK
However, it is not clear to me whether this is also allowed the other way around. For example:
char * c = read_socket(...);
unsigned * u = reinterpret_cast<unsigned*>(c); // huh?
Some of your code is questionable due to the pointer conversions involved. Keep in mind that in those instances reinterpret_cast<T*>(e) has the semantics of static_cast<T*>(static_cast<void*>(e)) because the types that are involved are standard-layout. (I would in fact recommend that you always use static_cast via cv void* when dealing with storage.)
A close reading of the Standard suggests that during a pointer conversion to or from T* it is assumed that there really is an actual object T* involved -- which is hard to fulfill in some of your snippet, even when 'cheating' thanks to the triviality of types involved (more on this later). That would be besides the point however because...
Aliasing is not about pointer conversions. This is the C++11 text that outlines the rules that are commonly referred to as 'strict aliasing' rules, from 3.10 Lvalues and rvalues [basic.lval]:
10 If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
(This is paragraph 15 of the same clause and subclause in C++03, with some minor changes in the text with e.g. 'lvalue' being used instead of 'glvalue' since the latter is a C++11 notion.)
In the light of those rules, let's assume that an implementation provides us with magic_cast<T*>(p) which 'somehow' converts a pointer to another pointer type. Normally this would be reinterpret_cast, which yields unspecified results in some cases, but as I've explained before this is not so for pointers to standard-layout types. Then it's plainly true that all of your snippets are correct (substituting reinterpret_cast with magic_cast), because no glvalues are involved whatsoever with the results of magic_cast.
Here is a snippet that appears to incorrectly use magic_cast, but which I will argue is correct:
// assume constexpr max
constexpr auto alignment = max(alignof(int), alignof(short));
alignas(alignment) char c[sizeof(int)];
// I'm assuming here that the OP really meant to use &c and not c
// this is, however, inconsequential
auto p = magic_cast<int*>(&c);
*p = 42;
*magic_cast<short*>(p) = 42;
To justify my reasoning, assume this superficially different snippet:
// alignment same as before
alignas(alignment) char c[sizeof(int)];
auto p = magic_cast<int*>(&c);
// end lifetime of c
c.~decltype(c)();
// reuse storage to construct new int object
new (&c) int;
*p = 42;
auto q = magic_cast<short*>(p);
// end lifetime of int object
p->~decltype(0)();
// reuse storage again
new (p) short;
*q = 42;
This snippet is carefully constructed. In particular, in new (&c) int; I'm allowed to use &c even though c was destroyed due to the rules laid out in paragraph 5 of 3.8 Object lifetime [basic.life]. Paragraph 6 of same gives very similar rules to references to storage, and paragraph 7 explains what happens to variables, pointers and references that used to refer to an object once its storage is reused -- I will refer collectively to those as 3.8/5-7.
In this instance &c is (implicitly) converted to void*, which is one of the correct use of a pointer to storage that has not been yet reused. Similarly p is obtained from &c before the new int is constructed. Its definition could perhaps be moved to after the destruction of c, depending on how deep the implementation magic is, but certainly not after the int construction: paragraph 7 would apply and this is not one of the allowed situations. The construction of the short object also relies on p becoming a pointer to storage.
Now, because int and short are trivial types, I don't have to use the explicit calls to destructors. I don't need the explicit calls to the constructors, either (that is to say, the calls to the usual, Standard placement new declared in <new>). From 3.8 Object lifetime [basic.life]:
1 [...] The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-trivial initialization, its initialization is complete.
The lifetime of an object of type T ends when:
if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
the storage which the object occupies is reused or released.
This means that I can rewrite the code such that, after folding the intermediate variable q, I end up with the original snippet.
Do note that p cannot be folded away. That is to say, the following is defintively incorrect:
alignas(alignment) char c[sizeof(int)];
*magic_cast<int*>(&c) = 42;
*magic_cast<short*>(&c) = 42;
If we assume that an int object is (trivially) constructed with the second line, then that must mean &c becomes a pointer to storage that has been reused. Thus the third line is incorrect -- although due to 3.8/5-7 and not due to aliasing rules strictly speaking.
If we don't assume that, then the second line is a violation of aliasing rules: we're reading what is actually a char c[sizeof(int)] object through a glvalue of type int, which is not one of the allowed exception. By comparison, *magic_cast<unsigned char>(&c) = 42; would be fine (we would assume a short object is trivially constructed on the third line).
Just like Alf, I would also recommend that you explicitly make use of the Standard placement new when using storage. Skipping destruction for trivial types is fine, but when encountering *some_magic_pointer = foo; you're very much likely facing either a violation of 3.8/5-7 (no matter how magically that pointer was obtained) or of the aliasing rules. This means storing the result of the new expression, too, since you most likely can't reuse the magic pointer once your object is constructed -- due to 3.8/5-7 again.
Reading the bytes of an object (this means using char or unsigned char) is fine however, and you don't even to use reinterpret_cast or anything magic at all. static_cast via cv void* is arguably fine for the job (although I do feel like the Standard could use some better wording there).
This too:
// valid: char -> type
alignas(int) char c[sizeof(int)];
int * i = reinterpret_cast<int*>(c);
That is not correct. The aliasing rules state under which circumstances it is legal/illegal to access an object through an lvalue of a different type. There is an specific rule that says that you can access any object through a pointer of type char or unsigned char, so the first case is correct. That is, A => B does not necessarily mean B => A. You can access an int through a pointer to char, but you cannot access a char through a pointer to int.
For the benefit of Alf:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Regarding the validity of …
alignas(int) char c[sizeof(int)];
int * i = reinterpret_cast<int*>(c);
The reinterpret_cast itself is OK or not, in the sense of producing a useful pointer value, depending on the compiler. And in this example the result isn't used, in particular, the character array isn't accessed. So there is not much more that can be said about the example as-is: it just depends.
But let's consider an extended version that does touch on the aliasing rules:
void foo( char* );
alignas(int) char c[sizeof( int )];
foo( c );
int* p = reinterpret_cast<int*>( c );
cout << *p << endl;
And let's only consider the case where the compiler guarantees a useful pointer value, one that would place the pointee in the same bytes of memory (the reason that this depends on the compiler is that the standard, in §5.2.10/7, only guarantees it for pointer conversions where the types are alignment-compatible, and otherwise leave it as "unspecified" (but then, the whole of §5.2.10 is somewhat inconsistent with §9.2/18).
Now, one interpretation of the standard's §3.10/10, the so called "strict aliasing" clause (but note that the standard does not ever use the term "strict aliasing"),
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
is that, as it itself says, concerns the dynamic type of the object residing in the c bytes.
With that interpretation, the read operation on *p is OK if foo has placed an int object there, and otherwise not. So in this case, a char array is accessed via an int* pointer. And nobody is in any doubt that the other way is valid: even though foo may have placed an int object in those bytes, you can freely access that object as a sequence of char values, by the last dash of §3.10/10.
So with this (usual) interpretation, after foo has placed an int there, we can access it as char objects, so at least one char object exists within the memory region named c; and we can access it as int, so at least that one int exists there also; and so David’s assertion in another answer that char objects cannot be accessed as int, is incompatible with this usual interpretation.
David's assertion is also incompatible with the most common use of placement new.
Regarding what other possible interpretations there are, that perhaps could be compatible with David's assertion, well, I can't think of any that make sense.
So in conclusion, as far as the Holy Standard is concerned, merely casting oneself a T* pointer to the array is practically useful or not depending on the compiler, and accessing the pointed to could-be-value is valid or not depending on what's present. In particular, think of a trap representation of int: you would not want that blowing up on you, if the bitpattern happened to be that. So to be safe you have to know what's in there, the bits, and as the call to foo above illustrates the compiler can in general not know that, like, the g++ compiler's strict alignment-based optimizer can in general not know that…

casting via void* instead of using reinterpret_cast [duplicate]

This question already has answers here:
Should I use static_cast or reinterpret_cast when casting a void* to whatever
(9 answers)
Closed 1 year ago.
I'm reading a book and I found that reinterpret_cast should not be used directly, but rather casting to void* in combination with static_cast:
T1 * p1=...
void *pv=p1;
T2 * p2= static_cast<T2*>(pv);
Instead of:
T1 * p1=...
T2 * p2= reinterpret_cast<T2*>(p1);
However, I can't find an explanation why is this better than the direct cast. I would very appreciate if someone can give me an explanation or point me to the answer.
Thanks in advance
p.s. I know what is reinterpret_cast used for, but I never saw that is used in this way
For types for which such cast is permitted (e.g. if T1 is a POD-type and T2 is unsigned char), the approach with static_cast is well-defined by the Standard.
On the other hand, reinterpret_cast is entirely implementation-defined - the only guarantee that you get for it is that you can cast a pointer type to any other pointer type and then back, and you'll get the original value; and also, you can cast a pointer type to an integral type large enough to hold a pointer value (which varies depending on implementation, and needs not exist at all), and then cast it back, and you'll get the original value.
To be more specific, I'll just quote the relevant parts of the Standard, highlighting important parts:
5.2.10[expr.reinterpret.cast]:
The mapping performed by reinterpret_cast is implementation-defined. [Note: it might, or might not, produce a representation different from the original value.] ... A pointer to an object can be explicitly converted to a pointer to an object of different type.) Except that converting an rvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So something like this:
struct pod_t { int x; };
pod_t pod;
char* p = reinterpret_cast<char*>(&pod);
memset(p, 0, sizeof pod);
is effectively unspecified.
Explaining why static_cast works is a bit more tricky. Here's the above code rewritten to use static_cast which I believe is guaranteed to always work as intended by the Standard:
struct pod_t { int x; };
pod_t pod;
char* p = static_cast<char*>(static_cast<void*>(&pod));
memset(p, 0, sizeof pod);
Again, let me quote the sections of the Standard that, together, lead me to conclude that the above should be portable:
3.9[basic.types]:
For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
3.9.2[basic.compound]:
Objects of cv-qualified (3.9.3) or cv-unqualified type void* (pointer to void), can be used to point to objects of unknown type. A void* shall be able to hold any object pointer. A cv-qualified or cv-unqualified (3.9.3) void* shall have the same representation and alignment requirements as a cv-qualified or cv-unqualified char*.
3.10[basic.lval]:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined):
...
a char or unsigned char type.
4.10[conv.ptr]:
An rvalue of type “pointer to cv T,” where T is an object type, can be converted to an rvalue of type “pointer to cv void.” The result of converting a “pointer to cv T” to a “pointer to cv void” points to the start of the storage location where the object of type T resides, as if the object is a most derived object (1.8) of type T (that is, not a base class subobject).
5.2.9[expr.static.cast]:
The inverse of any standard conversion sequence (clause 4), other than the lvalue-to-rvalue (4.1), array-topointer (4.2), function-to-pointer (4.3), and boolean (4.12) conversions, can be performed explicitly using static_cast.
[EDIT] On the other hand, we have this gem:
9.2[class.mem]/17:
A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment. ]
which seems to imply that reinterpret_cast between pointers somehow implies "same address". Go figure.
There is not the slightest doubt that the intent is that both forms are well defined, but the wording fails to capture that.
Both forms will work in practice.
reinterpret_cast is more explicit about the intent and should be preferred.
The real reason this is so is because of how C++ defines inheritance, and because of member pointers.
With C, pointer is pretty much just an address, as it should be. In C++ it has to be more complex because of some of its features.
Member pointers are really an offset into a class, so casting them is always a disaster using C style.
If you have multiply inherited two virtual objects that also have some concrete parts, that's also a disaster for C style. This is the case in multiple inheritance that causes all the problems, though, so you should not ever want to use this anyway.
Really hopefully you never use these cases in the first place. Also, if you are casting a lot that's another sign you are messing up in in your design.
The only time I end up casting is with the primitives in areas C++ decides are not the same but where obviously they have to be. For actual objects, any time you want to cast something, start to question your design because you should be 'programming to the interface' most of the time. Of course, you can't change how 3rd party APIs work so you don't always have much choice.