How can the alignment requirement be satisfied? - c++

I think that I am misreading the standard quotation, hence I do not fully understand what's the exact intent of the wording.
Firstly, I am already aware of what alignment requirement is, but I can't figure out the exact relation between alignment requirement and casting in general, and what're the points I should care about, regarding alignment requirement, when I perform static_casting or reinterpret_casting. I think now the reader got my first question.
Secondly, there're some words in the standard quotation I spend two days to understand them but I don't. From the paragraph in:
N4885: 7.6.1.9 Static cast [expr.static.cast]
A prvalue of type “pointer to cv1 void” can be converted to a prvalue
of type “pointer to cv2 T”, where T is an object type and cv2 is the
same cv-qualification as, or greater cv-qualification than, cv1. If
the original pointer value represents the address A of a byte in
memory and A does not satisfy the alignment requirement of T, then the
resulting pointer value is unspecified.
Here, they said "if the original pointer value doesn't satisfy the alignment requirement of T, then the resulting pointer value is unspecified". What really does that mean?
What's I can't understand is when does this "original pointer value" satisfies the alignment requirement of T, and when does not, to avoid such unspecified pointer value. I just need someone to explain the bold part from the above quote with simple examples; what I have to know, as a programmer, from that bold part and what I have to avoid. For example:
int i = 12;
double *pd = static_cast<double *>(static_cast<void *>(&i)); // does 'pd' has unspecified address value?.
short *ps = static_cast<short *>(static_cast<void *>(&i)); // does 'ps' has unspecified address value?.
Finally, there's a relatively same sentence I need to understand in:
N4885: 7.6.1.10 Reinterpret cast [expr.reinterpret.cast]
An object pointer can be explicitly converted to an object pointer of
a different type.61 When a prvalue v of object pointer type is
converted to the object pointer type “pointer to cv T”, the result is
static_cast<cv T*>(static_cast<cv void*>(v)). [Note 7: Converting a
pointer of type “pointer to T1” that points to an object of type T1 to
the type “pointer to T2” (where T2 is an object type and the
alignment requirements of T2 are no stricter than those of T1) and
back to its original type yields the original pointer value. — end
note].
What does the standard mean by this sentence "the alignment requirements of T2 are no stricter than those of T1", what does the word "stricter than" mean.
I think if I have this static_assert expression, then maybe the alignment requirements of T2 would not be stricter than those of T1: static_assert(alignof(T1) >= alignof(T2)); Or this assertion is not true for some cases.
int i = 34;
double *pd = reinterpret_cast<double *>(&i); // does 'pd' has unspecified address value?.
short *ps = reinterpret_cast<short *>(&i); // does 'ps' has unspecified address value?.
I am added these example to just clear what my problem lies, not to just answer the questions in the // comments

While common implementations have integer-like pointers (such that reinterpret_cast behaves like memcpy between pointers and integers and arithmetic on the pointers is reflected in the integer values), the standard as usual provides only weak guarantees to support less common architectures where pointers have other formats and/or special registers. As such, it’s impossible to observe alignment of a dynamic pointer value: the unspecified value applies if alignof(expression_type)<alignof(cast_type) unless the pointer refers to an object declared with alignas or to an object whose actual type is more strongly aligned.
This means that double* is a poor type to use for the sort of “temporary pointer storage” for which reinterpret_cast exists; fortunately, most C++ code uses templates (so this doesn’t come up), old C code uses char* (whose alignment is 1), and other code uses void* (which has no alignment), so there’s rarely an actual issue here.

Related

Confusion about [expr.static.cast]/13

I can't understand the quote (specifically, the bold part):
A prvalue of type “pointer to cv1 void” can be converted to a prvalue
of type “pointer to cv2 T”, where T is an object type and cv2 is the
same cv-qualification as, or greater cv-qualification than, cv1. If
the original pointer value represents the address A of a byte in
memory and A does not satisfy the alignment requirement of T, then the
resulting pointer value is unspecified.
int i = 0;
void *vp = &i;
auto *res = static_cast<double*>(vp);
My questions are:
does the address pointed by res (address of int) satisfy the alignment requirement of double?
does the resulting pointer res has an unspecified value?
And when I have something like this: static_cast<double*>(static_cast<void *>(&i))
Does i type is not stricter than the target type double? so that the result of this expression is not unspecified
does the address pointed by res (address of int) satisfy the alignment requirement of double?
That would depend on the implementation. Most likely it doesn't. Typically the alignment requirement of int is smaller than that of double.
For example on the x86-64 System V ABI used e.g. on Linux, int has alignment requirement 4 and double has 8.
You can check that the alignment requirement is satisfied for example with the following static_assert:
static_assert(alignof(int) >= alignof(double));
does the resulting pointer res has an unspecified value?
If the alignment requirement is not fulfilled (i.e. the static_assert fails), yes. Otherwise it will point to the object i.
And when I have something like this: static_cast<double*>(static_cast<void *>(&i))
This is exactly equivalent to what is shown in the snippet.
Note that even if the alignment requirements are satisfied, res cannot be used to access the value of the object it points to, because that would violate the pointer aliasing rules. So the result of the cast is very likely not useful for anything.

Can pointers to different types have different binary representations?

I wonder if C++ implementations are allowed to represent pointers to different types differently. For instance, if we had 4-byte sized/aligned int and 8-byte sized/aligned long, would it be possible to represent pointers-to-int/long as object addresses shifted right by 2/3 bits, respectively? This would effectively forbid to convert a pointer-to-long into a pointer-to-int.
I am asking because of [expr.reinterpret.cast/7]:
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)).
[Note 7: Converting a pointer of type “pointer to T1” that points to an object of type T1 to the type “pointer to T2” (where T2 is an object type and the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. — end note]
The first sentence suggests that we can convert pointers to any two object types. However, the empathized text in the (not normative) Note 7 then says that the alignment plays some role here as well. (That's why I came up with that int-long example above.)
Yep
As a concrete example, there is a C++ implementation where pointers to single-byte elements are larger than pointers to multi-byte elements, because the hardware uses word (not byte) addressing. To emulate byte pointers, C++ uses a hardware pointer plus an extra byte offset.
void* stores that extra offset, but int* does not. Converting int* to char* works (as it must under the standard), but char* to int* loses that offset (which your note implicitly permits).
The Cray T90 supercomputer is an example of such hardware.
I will see if I can find the standards argument why this is valid thing for a compliant C++ compiler to do; I am only aware someone did it, not that it is legal to do it, but that note rather implies it is intended to be legal.
The rules are going to be in the to-from void pointer casting rules. The paragraph you quoted implicitly forwards the meaning of the conversion to there.
7.6.1.9 Static cast [expr.static.cast]
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1.
If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified.
Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b.
Otherwise, the pointer value is unchanged by the conversion.
This demonstrates that converting to more-aligned types generates an unspecified pointer, but converting to equal-or-less aligned types that aren't actually there does not change the pointer value.
Which is permission to make a cast from a pointer to 4 byte aligned data converted to a pointer to 8 byte aligned data result in garbage.
Every object unrelated pointer cast needs to logically round-trip through a void* however.
An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)).
(From the OP)
That covers void* to T*; I have yet to find the T* to void* conversion text to make this a complete language-lawyer level answer.
The answer is yes. Simply because as the standard does not forbid it, an implementation could decide to have different representations for pointers to different types, or even different possible representations for a same pointer.
As most architecture now use flat addressing (meaning that the representation of the pointer is just the address), there are no good reason to do that. But I can still remember the old segment:offset address representation of the 8086 systems, that used to allow 16 bits systems to process 20 bits addresses (1024k). It used a 16 bit segment address (shifted by 4 bits to get a real address), and an offset of 16 bits for far pointers, or only 16 bits (relative to the current segment) for near addresses. In this mode, far pointers had a bunch of possible representations. BTW, far addressing was the default (so what was produced by normal source) in the large and compact mode (ref).

Is it UB to access a member by casting an object pointer to `char *`, then doing `*(member_type*)(pointer + offset)`?

Here's an example:
#include <cstddef>
#include <iostream>
struct A
{
char padding[7];
int x;
};
constexpr int offset = offsetof(A, x);
int main()
{
A a;
a.x = 42;
char *ptr = (char *)&a;
std::cout << *(int *)(ptr + offset) << '\n'; // Well-defined or not?
}
I always assumed that it's well-defined (otherwise what would be the point of offsetof), but wasn't sure.
Recently I was told that it's in fact UB, so I want to figure it out once and for all.
Does the example above cause UB or not? If you modify the class to not be standard-layout, does it affect the result?
And if it's UB, are there any workarounds for it (e.g. applying std::launder)?
This entire topic seems to be moot and underspecified.
Here's some information I was able to find:
Is adding to a “char *” pointer UB, when it doesn't actually point to a char array? - In 2011, CWG confirmed that we're allowed to examine the representation of a standard-layout object through an unsigned char pointer.
Unclear if a char pointer can be used insteaed, common sense says it can.
Unclear if staring from C++17 std::launder needs to be applied to the result of the (unsigned char *) cast. Given that it would be a breaking change, it's probably unnecessarly, at least in practice.
Unclear why C++17 changed offsetof to conditionally-support non-standard-layout types (used to be UB). It seems to imply that if an implementation supports that, then it also lets you examine the representation of non-standard-layout objects through unsigned char *.
Do we need to use std::launder when doing pointer arithmetic within a standard-layout object (e.g., with offsetof)? - A question similar to this one. No definitive answer was given.
Here I will refer to C++20 (draft) wording, because one relevant editorial issue was fixed between C++17 and C++20 and also it is possible to refer to specific sentences in HTML version of the C++20 draft, but otherwise there is nothing new in comparison to C++17.
At first, definitions of pointer values [basic.compound]/3:
Every value of pointer type is one of the following:
— a pointer to an object or function (the pointer is said to point to the object or function), or
— a pointer past the end of an object ([expr.add]), or
— the null pointer value for that type, or
— an invalid pointer value.
Now, lets see what happens in the (char *)&a expression.
Let me not prove that a is an lvalue denoting the object of type A, and I will say «the object a» to refer to this object.
The meaning of the &a subexpression is covered in [expr.unary.op]/(3.2):
if the operand is an lvalue of type T, the resulting expression is a prvalue of type “pointer to T” whose result is a pointer to the designated object
So, &a is a prvalue of type A* with the value «pointer to (the object) a».
Now, the cast in (char *)&a is equivalent to reinterpret_cast<char*>(&a), which is defined as static_cast<char*>(static_cast<void*>(&a)) ([expr.reinterpret.cast]/7).
Cast to void* doesn't change the pointer value ([conv.ptr]/2):
A prvalue of type “pointer to cv T”, where T is an object type, can be converted to a prvalue of type “pointer to cv void”. The pointer value ([basic.compound]) is unchanged by this conversion.
i.e. it is still «pointer to (the object) a».
[expr.static.cast]/13 covers the outer static_cast<char*>(...):
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1.
If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified.
Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b.
Otherwise, the pointer value is unchanged by the conversion.
There is no object of type char which is pointer-interconvertible with the object a ([basic.compound]/4):
Two objects a and b are pointer-interconvertible if:
— they are the same object, or
— one is a union object and the other is a non-static data member of that object ([class.union]), or
— one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object ([class.mem]), or
— there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
which means that the static_cast<char*>(...) doesn't change the pointer value and it is the same as in its operand, namely: «pointer to a».
So, (char *)&a is a prvalue of type char* whose value is «pointer to a». This value is stored into char* ptr variable. Then, when you try to do pointer arithmetic with such a value, namely ptr + offset, you step into [expr.add]/6:
For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T and the array element type are not similar, the behavior is undefined.
For the purposes of pointer arithmetic, the object a is considered to be an element of an array A[1] ([basic.compound]/3), so the array element type is A, the type of the pointer expression P is «pointer to char», char and A are not similar types (see [conv.qual]/2), so the behavior is undefined.
This question, and the other one about launder, both seem to me to boil down to interpretation of the last sentence of C++17 [expr.static.cast]/13, which covers what happens for static_cast<T *> applied to an operand of pointer to unrelated type which is correctly aligned:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T ”,
[...]
Otherwise, the pointer value is unchanged by the conversion.
Some posters appear to take this to mean that the result of the cast cannot point to an object of type T, and consequently that reinterpret_cast with pointers or references can only be used on pointer-interconvertible types.
But I don't see that as justified, and (this is a reductio ad absurdum argument) that position would also imply:
The resolution to CWG1314 is overturned.
Inspecting any byte of a standard-layout object is not possible (since casting to unsigned char * or whatever character type supposedly cannot be used to access that byte).
The strict aliasing rule would be redundant since the only way to actually achieve such aliasing is to use such casts.
There would be no normative text to justify the note "[Note: Converting a prvalue of type “pointer to T1 ” to the type “pointer to T2 ” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. —end note ]"
offsetof is useless (and the C++17 changes to it were therefore redundant)
It seems like a more sensible interpretation to me that this sentence means the result of the cast points to the same byte in memory as the operand. (As opposed to pointing to some other byte, as can happen for some pointer casts not covered by this sentence). Saying "the value is unchanged" does not mean "the type is unchanged", e.g. we describe conversion from int to long as preserving the value.
Also, I guess this may be controversial to some but I am taking as axiomatic that if a pointer's value is the address of an object, then the pointer points to that object, unless the Standard specifically excludes the case.
This is consistent with the text of [basic.compound]/3 which says the converse, i.e. that if a pointer points to an object, then its value is the address of the object.
There doesn't seem to be any other explicit statement defining when a pointer can or cannot be said to point to an object, but basic.compound/3 says that all pointers must be one of four cases (points to an object, points past the end, null, invalid).
Examples of excluded cases include:
The use case of std::launder specifically addresses a situation where there was such language ruling out the use of the un-laundered pointer.
A past-the-end pointer does not point to an object. (basic.compound/3)

guarantee of reinterpret_cast output for serialization purpose

int main()
{
char buffer[5] = { 0 };
buffer[0] = 23;
std::string s(&buffer[0], 4);
std::uint32_t nb = *reinterpret_cast<const std::uint32_t*>(s.data());
return 0;
}
For this program, is reinterpret_cast's output implementation dependent? Or will any compiler conforming to the c++ standard always produce the same output?
For your example code, if you're looking for something that "any compiler conforming to the c++ standard always produce the same output", the answer is that there's no such guarantee.
A couple easy examples: alignment issues (as mentioned in several comments) and endianness differences.
C++11 5.2.10/7 "Reinterpret cast" says:
An object pointer can be explicitly converted to an object pointer of
a different type. When a prvalue v of type “pointer to T1” is
converted to the type “pointer to cv T2”, the result is
static_cast<cv T2*>(static_cast<cv void*>(v)) if both T1 and T2 are standard-layout
types (3.9) and the alignment requirements of T2 are no stricter than
those of T1, or if either type is void. Converting a prvalue of type
“pointer to T1” to the type “pointer to T2” (where T1 and T2 are
object types and where the alignment requirements of T2 are no
stricter than those of T1) and back to its original type yields the
original pointer value. The result of any other such pointer
conversion is unspecified.
Since uint32_t will generally have a stricter alignment requirement than char[], the standard doesn't make any promises about the behavior (since the above only talks about the situation where the alignment requirements are met). So strictly speaking the behavior is undefined.
Now, lets assume that you're interested only in platforms where the alignment requirements are met (ie., uint32_t can be aligned on any address, same as char). Then your expression involving the reinterpret cast is equivalent to (note that you'd have to cast away the const from the const char* returned from std::string::data() as well):
std::uint32_t nb = *(static_cast<std::uint32_t*>(static_cast<void*>(const_cast<char*>(s.data()))));
The standard says this about using static_cast with object pointers (other than conversion between pointers in a class heirarchy) in 5.2.9/13 "Static cast":
A prvalue of type “pointer to cv1 void” can be converted to a prvalue
of type “pointer to cv2 T,” where T is an object type and cv2 is the
same cv-qualification as, or greater cv-qualification than, cv1. The
null pointer value is converted to the null pointer value of the
destination type. A value of type pointer to object converted to
“pointer to cv void” and back, possibly with different
cv-qualification, shall have its original value.
So, as far as the standard is concerned, all that you can do with the resulting pointer is cast it back to get the original value. Anything else would be undefined behavior (that an implementation might give a better guarantee on).
3.10/10 "Lvalues and rvalues" allows an object to be accessed through char or unsigned char types as well.
However, to reiterate: the standard does not guarantee that "any compiler conforming to the c++ standard always produce the same output" for the example you posted.
You're casting to std::uint32_t a buffer that is not necessarily properly aligned for such a value.
That's likely to blow up and/or be hugely inefficient.
The unsigned integer type means that any bitpattern for the value representation bits is OK, and on the PC platform for built-in type there are no bits other than the value representation bits; in particular no trap bits or trapping total bitpatterns.
Thus, you can do a memcpy and you'll be fine, technically – provided there are enough bytes, that s.length() >= sizeof(std::uint32_t).
However, such a conversion, if it occurred in ordinary code, would be a strong code-smell, an indication of something fundamentally wrong in the design.
Addendum, regarding “Or a compiler respectfull to the c++ standard will always produce the same output”.
I somehow didn’t see that when I answered. But the short answer is that if the conversion is performed in a way that works, such as using memcpy, then it depends on the endianness, a.k.a. byte order, in practice whether the most significant or least significant part of an integer is placed at lowest address.
In practice you can use network-oriented functions that convert to from network byte order. Just assume network byte order for the serialized data. Check out ntohl et al (these are not part of the C++ standard library, but commonly available).

casting via void* instead of using reinterpret_cast [duplicate]

This question already has answers here:
Should I use static_cast or reinterpret_cast when casting a void* to whatever
(9 answers)
Closed 1 year ago.
I'm reading a book and I found that reinterpret_cast should not be used directly, but rather casting to void* in combination with static_cast:
T1 * p1=...
void *pv=p1;
T2 * p2= static_cast<T2*>(pv);
Instead of:
T1 * p1=...
T2 * p2= reinterpret_cast<T2*>(p1);
However, I can't find an explanation why is this better than the direct cast. I would very appreciate if someone can give me an explanation or point me to the answer.
Thanks in advance
p.s. I know what is reinterpret_cast used for, but I never saw that is used in this way
For types for which such cast is permitted (e.g. if T1 is a POD-type and T2 is unsigned char), the approach with static_cast is well-defined by the Standard.
On the other hand, reinterpret_cast is entirely implementation-defined - the only guarantee that you get for it is that you can cast a pointer type to any other pointer type and then back, and you'll get the original value; and also, you can cast a pointer type to an integral type large enough to hold a pointer value (which varies depending on implementation, and needs not exist at all), and then cast it back, and you'll get the original value.
To be more specific, I'll just quote the relevant parts of the Standard, highlighting important parts:
5.2.10[expr.reinterpret.cast]:
The mapping performed by reinterpret_cast is implementation-defined. [Note: it might, or might not, produce a representation different from the original value.] ... A pointer to an object can be explicitly converted to a pointer to an object of different type.) Except that converting an rvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.
So something like this:
struct pod_t { int x; };
pod_t pod;
char* p = reinterpret_cast<char*>(&pod);
memset(p, 0, sizeof pod);
is effectively unspecified.
Explaining why static_cast works is a bit more tricky. Here's the above code rewritten to use static_cast which I believe is guaranteed to always work as intended by the Standard:
struct pod_t { int x; };
pod_t pod;
char* p = static_cast<char*>(static_cast<void*>(&pod));
memset(p, 0, sizeof pod);
Again, let me quote the sections of the Standard that, together, lead me to conclude that the above should be portable:
3.9[basic.types]:
For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
3.9.2[basic.compound]:
Objects of cv-qualified (3.9.3) or cv-unqualified type void* (pointer to void), can be used to point to objects of unknown type. A void* shall be able to hold any object pointer. A cv-qualified or cv-unqualified (3.9.3) void* shall have the same representation and alignment requirements as a cv-qualified or cv-unqualified char*.
3.10[basic.lval]:
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined):
...
a char or unsigned char type.
4.10[conv.ptr]:
An rvalue of type “pointer to cv T,” where T is an object type, can be converted to an rvalue of type “pointer to cv void.” The result of converting a “pointer to cv T” to a “pointer to cv void” points to the start of the storage location where the object of type T resides, as if the object is a most derived object (1.8) of type T (that is, not a base class subobject).
5.2.9[expr.static.cast]:
The inverse of any standard conversion sequence (clause 4), other than the lvalue-to-rvalue (4.1), array-topointer (4.2), function-to-pointer (4.3), and boolean (4.12) conversions, can be performed explicitly using static_cast.
[EDIT] On the other hand, we have this gem:
9.2[class.mem]/17:
A pointer to a POD-struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [Note: There might therefore be unnamed padding within a POD-struct object, but not at its beginning, as necessary to achieve appropriate alignment. ]
which seems to imply that reinterpret_cast between pointers somehow implies "same address". Go figure.
There is not the slightest doubt that the intent is that both forms are well defined, but the wording fails to capture that.
Both forms will work in practice.
reinterpret_cast is more explicit about the intent and should be preferred.
The real reason this is so is because of how C++ defines inheritance, and because of member pointers.
With C, pointer is pretty much just an address, as it should be. In C++ it has to be more complex because of some of its features.
Member pointers are really an offset into a class, so casting them is always a disaster using C style.
If you have multiply inherited two virtual objects that also have some concrete parts, that's also a disaster for C style. This is the case in multiple inheritance that causes all the problems, though, so you should not ever want to use this anyway.
Really hopefully you never use these cases in the first place. Also, if you are casting a lot that's another sign you are messing up in in your design.
The only time I end up casting is with the primitives in areas C++ decides are not the same but where obviously they have to be. For actual objects, any time you want to cast something, start to question your design because you should be 'programming to the interface' most of the time. Of course, you can't change how 3rd party APIs work so you don't always have much choice.