Confusion about [expr.static.cast]/13

Confusion about [expr.static.cast]/13 - c++

I can't understand the quote (specifically, the bold part):
A prvalue of type “pointer to cv1 void” can be converted to a prvalue
of type “pointer to cv2 T”, where T is an object type and cv2 is the
same cv-qualification as, or greater cv-qualification than, cv1. If
the original pointer value represents the address A of a byte in
memory and A does not satisfy the alignment requirement of T, then the
resulting pointer value is unspecified.
int i = 0;
void *vp = &i;
auto *res = static_cast<double*>(vp);
My questions are:
does the address pointed by res (address of int) satisfy the alignment requirement of double?
does the resulting pointer res has an unspecified value?
And when I have something like this: static_cast<double*>(static_cast<void *>(&i))
Does i type is not stricter than the target type double? so that the result of this expression is not unspecified

does the address pointed by res (address of int) satisfy the alignment requirement of double?
That would depend on the implementation. Most likely it doesn't. Typically the alignment requirement of int is smaller than that of double.
For example on the x86-64 System V ABI used e.g. on Linux, int has alignment requirement 4 and double has 8.
You can check that the alignment requirement is satisfied for example with the following static_assert:
static_assert(alignof(int) >= alignof(double));
does the resulting pointer res has an unspecified value?
If the alignment requirement is not fulfilled (i.e. the static_assert fails), yes. Otherwise it will point to the object i.
And when I have something like this: static_cast<double*>(static_cast<void *>(&i))
This is exactly equivalent to what is shown in the snippet.
Note that even if the alignment requirements are satisfied, res cannot be used to access the value of the object it points to, because that would violate the pointer aliasing rules. So the result of the cast is very likely not useful for anything.

Related

How can the alignment requirement be satisfied?

I think that I am misreading the standard quotation, hence I do not fully understand what's the exact intent of the wording.
Firstly, I am already aware of what alignment requirement is, but I can't figure out the exact relation between alignment requirement and casting in general, and what're the points I should care about, regarding alignment requirement, when I perform static_casting or reinterpret_casting. I think now the reader got my first question.
Secondly, there're some words in the standard quotation I spend two days to understand them but I don't. From the paragraph in:
N4885: 7.6.1.9 Static cast [expr.static.cast]
A prvalue of type “pointer to cv1 void” can be converted to a prvalue
of type “pointer to cv2 T”, where T is an object type and cv2 is the
same cv-qualification as, or greater cv-qualification than, cv1. If
the original pointer value represents the address A of a byte in
memory and A does not satisfy the alignment requirement of T, then the
resulting pointer value is unspecified.
Here, they said "if the original pointer value doesn't satisfy the alignment requirement of T, then the resulting pointer value is unspecified". What really does that mean?
What's I can't understand is when does this "original pointer value" satisfies the alignment requirement of T, and when does not, to avoid such unspecified pointer value. I just need someone to explain the bold part from the above quote with simple examples; what I have to know, as a programmer, from that bold part and what I have to avoid. For example:
int i = 12;
double *pd = static_cast<double *>(static_cast<void *>(&i)); // does 'pd' has unspecified address value?.
short *ps = static_cast<short *>(static_cast<void *>(&i)); // does 'ps' has unspecified address value?.
Finally, there's a relatively same sentence I need to understand in:
N4885: 7.6.1.10 Reinterpret cast [expr.reinterpret.cast]
An object pointer can be explicitly converted to an object pointer of
a different type.61 When a prvalue v of object pointer type is
converted to the object pointer type “pointer to cv T”, the result is
static_cast<cv T*>(static_cast<cv void*>(v)). [Note 7: Converting a
pointer of type “pointer to T1” that points to an object of type T1 to
the type “pointer to T2” (where T2 is an object type and the
alignment requirements of T2 are no stricter than those of T1) and
back to its original type yields the original pointer value. — end
note].
What does the standard mean by this sentence "the alignment requirements of T2 are no stricter than those of T1", what does the word "stricter than" mean.
I think if I have this static_assert expression, then maybe the alignment requirements of T2 would not be stricter than those of T1: static_assert(alignof(T1) >= alignof(T2)); Or this assertion is not true for some cases.
int i = 34;
double *pd = reinterpret_cast<double *>(&i); // does 'pd' has unspecified address value?.
short *ps = reinterpret_cast<short *>(&i); // does 'ps' has unspecified address value?.
I am added these example to just clear what my problem lies, not to just answer the questions in the // comments

While common implementations have integer-like pointers (such that reinterpret_cast behaves like memcpy between pointers and integers and arithmetic on the pointers is reflected in the integer values), the standard as usual provides only weak guarantees to support less common architectures where pointers have other formats and/or special registers. As such, it’s impossible to observe alignment of a dynamic pointer value: the unspecified value applies if alignof(expression_type)<alignof(cast_type) unless the pointer refers to an object declared with alignas or to an object whose actual type is more strongly aligned.
This means that double* is a poor type to use for the sort of “temporary pointer storage” for which reinterpret_cast exists; fortunately, most C++ code uses templates (so this doesn’t come up), old C code uses char* (whose alignment is 1), and other code uses void* (which has no alignment), so there’s rarely an actual issue here.

What would happen if T1 is not stricter than T2 in reinterpret-casting?

[expr.reinterpret.cast]/7:
Any object pointer type T1* can be
converted to another object pointer type cv T2*. This is exactly
equivalent to static_cast<cv T2*>(static_cast<cv void*>(T1_var))
(which implies that if T2's alignment requirement is not stricter than
T1's, the value of the pointer does not change and conversion of the
resulting pointer back to its original type yields the original value)
Am I wondering that there's no limitation while casting a pointer from one type to another using reinterpret_cast. The only thing I notice, when reading the given quote, is that the target type shall be stricter than the original type, [otherwise] not mentioned in this quote. For example
int *p = 0;
double *q = reinterpret_cast<double *>(p)
Here alignof(int) is not greater (stricter?) than or equal alignof(double). Although the compiler gives no errors or even warnings, I think in this case, q pointer holds an unspecified value? and dereferencing it is undefined behavior.
So does this mean converting a pointer of type T1* into T2*, the resulting pointer always has an unspecified value as long as static_assert(alignof(T1) >= alignof(T2)) fails?

So does this mean converting a pointer of type T1* into T2*, the
resulting pointer always has an unspecified value as long as
static_assert(alignof(T1) >= alignof(T2)) fails?
Not really. The result is only (but always) unspecified if the actual address of the T1 type doesn't fulfil the alignment requirements of the T2 type. It may do so, in which case the behaviour is well-defined.
The paragraph from the Standard you cite states the equivalence of a reinterpret_cast operation to two static_cast operations via an intermediate void* pointer. And, from the Standard, in the previous section about static_cast (bolding for emphasis mine):
8.5.1.9 Static cast     [expr.static.cast]
13     A prvalue of type “pointer to cv1
void” can be converted to a prvalue of type “pointer to cv2 T”,
where T is an object type and cv2 is the same cv-qualification as,
or greater cv-qualification than, cv1. If the original pointer value
represents the address A of a byte in memory and A does not satisfy
the alignment requirement of T, then the resulting pointer value is
unspecified. Otherwise, if the original pointer value points to an
object a, and there is an object b of type T (ignoring
cv-qualification) that is pointer-interconvertible (6.7.2) with a, the
result is a pointer to b. Otherwise, the pointer value is unchanged by
the conversion.

Can pointers to different types have different binary representations?

I wonder if C++ implementations are allowed to represent pointers to different types differently. For instance, if we had 4-byte sized/aligned int and 8-byte sized/aligned long, would it be possible to represent pointers-to-int/long as object addresses shifted right by 2/3 bits, respectively? This would effectively forbid to convert a pointer-to-long into a pointer-to-int.
I am asking because of [expr.reinterpret.cast/7]:
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)).
[Note 7: Converting a pointer of type “pointer to T1” that points to an object of type T1 to the type “pointer to T2” (where T2 is an object type and the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. — end note]
The first sentence suggests that we can convert pointers to any two object types. However, the empathized text in the (not normative) Note 7 then says that the alignment plays some role here as well. (That's why I came up with that int-long example above.)

Yep
As a concrete example, there is a C++ implementation where pointers to single-byte elements are larger than pointers to multi-byte elements, because the hardware uses word (not byte) addressing. To emulate byte pointers, C++ uses a hardware pointer plus an extra byte offset.
void* stores that extra offset, but int* does not. Converting int* to char* works (as it must under the standard), but char* to int* loses that offset (which your note implicitly permits).
The Cray T90 supercomputer is an example of such hardware.
I will see if I can find the standards argument why this is valid thing for a compliant C++ compiler to do; I am only aware someone did it, not that it is legal to do it, but that note rather implies it is intended to be legal.
The rules are going to be in the to-from void pointer casting rules. The paragraph you quoted implicitly forwards the meaning of the conversion to there.
7.6.1.9 Static cast [expr.static.cast]
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1.
If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified.
Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b.
Otherwise, the pointer value is unchanged by the conversion.
This demonstrates that converting to more-aligned types generates an unspecified pointer, but converting to equal-or-less aligned types that aren't actually there does not change the pointer value.
Which is permission to make a cast from a pointer to 4 byte aligned data converted to a pointer to 8 byte aligned data result in garbage.
Every object unrelated pointer cast needs to logically round-trip through a void* however.
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)).
(From the OP)
That covers void* to T*; I have yet to find the T* to void* conversion text to make this a complete language-lawyer level answer.

The answer is yes. Simply because as the standard does not forbid it, an implementation could decide to have different representations for pointers to different types, or even different possible representations for a same pointer.
As most architecture now use flat addressing (meaning that the representation of the pointer is just the address), there are no good reason to do that. But I can still remember the old segment:offset address representation of the 8086 systems, that used to allow 16 bits systems to process 20 bits addresses (1024k). It used a 16 bit segment address (shifted by 4 bits to get a real address), and an offset of 16 bits for far pointers, or only 16 bits (relative to the current segment) for near addresses. In this mode, far pointers had a bunch of possible representations. BTW, far addressing was the default (so what was produced by normal source) in the large and compact mode (ref).

guarantee of reinterpret_cast output for serialization purpose

int main()
{
char buffer[5] = { 0 };
buffer[0] = 23;
std::string s(&buffer[0], 4);
std::uint32_t nb = *reinterpret_cast<const std::uint32_t*>(s.data());
return 0;
}
For this program, is reinterpret_cast's output implementation dependent? Or will any compiler conforming to the c++ standard always produce the same output?

For your example code, if you're looking for something that "any compiler conforming to the c++ standard always produce the same output", the answer is that there's no such guarantee.
A couple easy examples: alignment issues (as mentioned in several comments) and endianness differences.
C++11 5.2.10/7 "Reinterpret cast" says:
An object pointer can be explicitly converted to an object pointer of
a different type. When a prvalue v of type “pointer to T1” is
converted to the type “pointer to cv T2”, the result is
static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout
types (3.9) and the alignment requirements of T2 are no stricter than
those of T1, or if either type is void. Converting a prvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value. The result of any other such pointer
conversion is unspecified.
Since uint32_t will generally have a stricter alignment requirement than char[], the standard doesn't make any promises about the behavior (since the above only talks about the situation where the alignment requirements are met). So strictly speaking the behavior is undefined.
Now, lets assume that you're interested only in platforms where the alignment requirements are met (ie., uint32_t can be aligned on any address, same as char). Then your expression involving the reinterpret cast is equivalent to (note that you'd have to cast away the const from the const char* returned from std::string::data() as well):
std::uint32_t nb = *(static_cast<std::uint32_t*>(static_cast<void*>(const_cast<char*>(s.data()))));
The standard says this about using static_cast with object pointers (other than conversion between pointers in a class heirarchy) in 5.2.9/13 "Static cast":
A prvalue of type “pointer to cv1 void” can be converted to a prvalue
of type “pointer to cv2 T,” where T is an object type and cv2 is the
same cv-qualification as, or greater cv-qualification than, cv1. The
null pointer value is converted to the null pointer value of the
destination type. A value of type pointer to object converted to
“pointer to cv void” and back, possibly with different
cv-qualification, shall have its original value.
So, as far as the standard is concerned, all that you can do with the resulting pointer is cast it back to get the original value. Anything else would be undefined behavior (that an implementation might give a better guarantee on).
3.10/10 "Lvalues and rvalues" allows an object to be accessed through char or unsigned char types as well.
However, to reiterate: the standard does not guarantee that "any compiler conforming to the c++ standard always produce the same output" for the example you posted.

You're casting to std::uint32_t a buffer that is not necessarily properly aligned for such a value.
That's likely to blow up and/or be hugely inefficient.
The unsigned integer type means that any bitpattern for the value representation bits is OK, and on the PC platform for built-in type there are no bits other than the value representation bits; in particular no trap bits or trapping total bitpatterns.
Thus, you can do a memcpy and you'll be fine, technically – provided there are enough bytes, that s.length() >= sizeof(std::uint32_t).
However, such a conversion, if it occurred in ordinary code, would be a strong code-smell, an indication of something fundamentally wrong in the design.
Addendum, regarding “Or a compiler respectfull to the c++ standard will always produce the same output”.
I somehow didn’t see that when I answered. But the short answer is that if the conversion is performed in a way that works, such as using memcpy, then it depends on the endianness, a.k.a. byte order, in practice whether the most significant or least significant part of an integer is placed at lowest address.
In practice you can use network-oriented functions that convert to from network byte order. Just assume network byte order for the serialized data. Check out ntohl et al (these are not part of the C++ standard library, but commonly available).

Reshaping a 1-d array to a multidimensional array

Taking into consideration the entire C++11 standard, is it possible for any conforming implementation to succeed the first assertion below but fail the latter?
#include <cassert>
int main(int, char**)
{
const int I = 5, J = 4, K = 3;
const int N = I * J * K;
int arr1d[N] = {0};
int (&arr3d)[I][J][K] = reinterpret_cast<int (&)[I][J][K]>(arr1d);
assert(static_cast<void*>(arr1d) ==
static_cast<void*>(arr3d)); // is this necessary?
arr3d[3][2][1] = 1;
assert(arr1d[3 * (J * K) + 2 * K + 1] == 1); // UB?
}
If not, is this technically UB or not, and does that answer change if the first assertion is removed (is reinterpret_cast guaranteed to preserve addresses here?)? Also, what if the reshaping is done in the opposite direction (3d to 1d) or from a 6x35 array to a 10x21 array?
EDIT: If the answer is that this is UB because of the reinterpret_cast, is there some other strictly compliant way of reshaping (e.g., via static_cast to/from an intermediate void *)?

Update 2021-03-20:
This same question was asked on Reddit recently and it was pointed out that my original answer is flawed because it does not take into account this aliasing rule:
If a program attempts to access the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined:
the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
a char, unsigned char, or std::byte type.
Under the rules for similarity, these two array types are not similar for any of the above cases and therefore it is technically undefined behaviour to access the 1D array through the 3D array. (This is definitely one of those situations where, in practice, it will almost certainly work with most compilers/targets)
Note that the references in the original answer refer to an older C++11 draft standard
Original answer:
reinterpret_cast of references
The standard states that an lvalue of type T1 can be reinterpret_cast to a reference to T2 if a pointer to T1 can be reinterpret_cast to a pointer to T2 (§5.2.10/11):
An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast.
So we need to determine if a int(*)[N] can be converted to an int(*)[I][J][K].
reinterpret_cast of pointers
A pointer to T1 can be reinterpret_cast to a pointer to T2 if both T1 and T2 are standard-layout types and T2 has no stricter alignment requirements than T1 (§5.2.10/7):
When a prvalue v of type “pointer to T1” is converted to the type “pointer to cv T2”, the result is static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1, or if either type is void.
Are int[N] and int[I][J][K] standard-layout types?
int is a scalar type and arrays of scalar types are considered to be standard-layout types (§3.9/9).
Scalar types, standard-layout class types (Clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called standard-layout types.
Does int[I][J][K] have no stricter alignment requirements than int[N].
The result of the alignof operator gives the alignment requirement of a complete object type (§3.11/2).
The result of the alignof operator reflects the alignment requirement of the type in the complete-object case.
Since the two arrays here are not subobjects of any other object, they are complete objects. Applying alignof to an array gives the alignment requirement of the element type (§5.3.6/3):
When alignof is applied to an array type, the result shall be the alignment of the element type.
So both array types have the same alignment requirement.
That makes the reinterpret_cast valid and equivalent to:
int (&arr3d)[I][J][K] = *reinterpret_cast<int (*)[I][J][K]>(&arr1d);
where * and & are the built-in operators, which is then equivalent to:
int (&arr3d)[I][J][K] = *static_cast<int (*)[I][J][K]>(static_cast<void*>(&arr1d));
static_cast through void*
The static_cast to void* is allowed by the standard conversions (§4.10/2):
A prvalue of type “pointer to cv T,” where T is an object type, can be converted to a prvalue of type “pointer to cv void”. The result of converting a “pointer to cv T” to a “pointer to cv void” points to the start of the storage location where the object of type T resides, as if the object is a most derived object (1.8) of type T (that is, not a base class subobject).
The static_cast to int(*)[I][J][K] is then allowed (§5.2.9/13):
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T,” where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1.
So the cast is fine! But are we okay to access objects through the new array reference?
Accessing array elements
Performing array subscripting on an array like arr3d[E2] is equivalent to *((E1)+(E2)) (§5.2.1/1). Let's consider the following array subscripting:
arr3d[3][2][1]
Firstly, arr3d[3] is equivalent to *((arr3d)+(3)). The lvalue arr3d undergoes array-to-pointer conversion to give a int(*)[2][1]. There is no requirement that the underlying array must be of the correct type to do this conversion. The pointers value is then accessed (which is fine by §3.10) and then the value 3 is added to it. This pointer arithmetic is also fine (§5.7/5):
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
This this pointer is dereferenced to give an int[2][1]. This undergoes the same process for the next two subscripts, resulting in the final int lvalue at the appropriate array index. It is an lvalue due to the result of * (§5.3.1/1):
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
It is then perfectly fine to access the actual int object through this lvalue because the lvalue is of type int too (§3.10/10):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object
[...]
So unless I've missed something. I'd say this program is well-defined.

I am under the impression that it will work. You allocate the same piece of contiguous memory. I know the C-standard guarantees it will be contiguous at least. I don't know what is said in the C++11 standard.
However the first assert should always be true. The address of the first element of the array will always be the same. All memory address will be the same since the same piece of memory is allocated.
I would therefore also say that the second assert will always hold true. At least as long as the ordering of the elements are always in row major order. This is also guaranteed by the C-standard and I would be surprised if the C++11 standard says anything differently.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js