UB When Dereferencing Array of Unions

UB When Dereferencing Array of Unions - c++

Which of these are undefined behaviour:
template <class T> struct Struct { T t; };
template <class T> union Union { T t; };
template <class T> void function() {
Struct aS[10];
Union aU[10];
// do something with aS[9].t and aU[9].t including initialization
T *aSP = reinterpret_cast<T *>(aS);
T *aUP = reinterpret_cast<T *>(aU);
// so here is this undefined behaviour?
T valueS = aSP[9];
// use valueS in whatever way
// so here is this undefined behaviour?
T valueU = aUP[9];
// use valueU in whatever way
// now is accessing aS[9].t or aU[9].t now UB?
}
So yeah, which of the last 3 operations is UB?
(My reasoning: I don't know about the struct, if there is any requirement for its size to be the same as its single element, but AFAIK the union has to be the same size as the element. Alignment requirements I don't know for the union, but I am guessing it is the same. For the struct I have no idea. In the case of the union I would guess that it is not UB, but as I said, I am really really not sure. For the struct I actually have no idea)

tl;dr: the last two statements in your code above will always invoke undefined behavior, simply casting a pointer to a union to a pointer to one of its member types is generally fine because it doesn't really do anything (it's unspecified at worst, but never undefined behavior; note: we're talking about just the cast itself, using the result of the cast to access an object is a whole different story).
Depending on what T ends up being, Struct<T> may potentially be a standard-layout struct [class.prop]/3 in which case
T *aSP = reinterpret_cast<T *>(aS);
would be well-defined because a Struct<T> would be pointer-interconvertible with its first member (which is of type T) [basic.compound]/4.3. Above reinterpret_cast is equivalent to [expr.reinterpret.cast]/7
T *aSP = static_cast<T *>(static_cast<void *>(aS));
which will invoke the array-to-pointer conversion [conv.array], resulting in a Struct<T>* pointing to the first element of aS. This pointer is then converted to void* (via [expr.static.cast]/4 and [conv.ptr]/2), which is then converted to T*, which would be legal via [expr.static.cast]/13:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.
Similarly,
T *aUP = reinterpret_cast<T *>(aU);
would be well-defined in C++17 if Union<T> is a standard-layout union and looks to be well-defined in general with the coming version of C++ based on the current standard draft, where a union and one of its members are always pointer-interconvertible [basic.compound]/4.2
All of the above is irrelevant, however, because
T valueS = aSP[9];
and
T valueU = aUP[9];
will invoke undefined behavior no matter what. aSP[9] and aUP[9] are (by definition) the same as *(aSP + 9) and *(aUP + 9) respectively [expr.sub]/1. The pointer arithmetic in these expressions is subject to [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n.
Otherwise, the behavior is undefined.
aSP and aUP do not point to an element of an array. Even if aSP and aUP would be pointer-interconvertible with T, you'd only ever be allowed to access element 0 and compute the address of (but not access) element 1 of the hypothetical single-element array…

So if we look at the doc of reinterpret_cast (here)
5) Any object pointer type T1* can be converted to another object
pointer type cv T2*. This is exactly equivalent to static_cast(static_cast(expression)) (which implies that if T2's
alignment requirement is not stricter than T1's, the value of the
pointer does not change and conversion of the resulting pointer back
to its original type yields the original value). In any case, the
resulting pointer may only be dereferenced safely if allowed by the
type aliasing rules (see below)
Now What say the aliasing rules ?
Whenever an attempt is made to read or modify the stored value of an
object of type DynamicType through a glvalue of type AliasedType, the
behavior is undefined unless one of the following is true:
AliasedType and DynamicType are similar.
AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
AliasedType is std::byte, (since C++17)char, or unsigned char: this permits examination of the object representation of any object as
an array of bytes.
So it's not 2 nor 3. May be 1?
Similar:
Informally, two types are similar if, ignoring top-level
cv-qualification:
they are the same type; or
they are both pointers, and the pointed-to types are similar; or
they are both pointers to member of the same class, and the types of the pointed-to members are similar; or
they are both arrays of the same size or both arrays of unknown bound, and the array element types are similar.
And, from C++17 draft:
Two objects a and b are pointer-interconvertible if:
they are the same object, or
one is a union object and the other is a non-static data member of that object ([class.union]), or
one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no
non-static data members, any base class subobject of that object
([class.mem]), or
there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
If two objects are pointer-interconvertible, then they have the same
address, and it is possible to obtain a pointer to one from a pointer
to the other via a reinterpret_cast. [ Note: An array object and its
first element are not pointer-interconvertible, even though they have
the same address. — end note]
So, for me :
T *aSP = reinterpret_cast<T *>(aS); // Is OK
T *aUP = reinterpret_cast<T *>(aU); // Is OK.

I found c++ - Is sizeof(T) == sizeof(int). This specifies that structs do not have to have the same size as their elements (sigh). As for unions, the same would probably apply (after reading the answers, I am led to believe so). This is alone necessary to make this situation UB. However, if sizeof(Struct) == sizeof(T), and "It's well-established that" in https://stackoverflow.com/a/21515546, a pointer to aSP[9] would be the same location as that of aS[9] (at least I think so), and reinterpret_cast'ing that is guarantied by the standard (according to the quote in https://stackoverflow.com/a/21509729).
EDIT: This is actually wrong. The correct answer is here.

Related

process array in chunks using struct then cast as flat array - how to avoid UB (strict aliasing)?

An external API expects a pointer to an array of values (int as simple example here) plus a size.
It is logically clearer to deal with the elements in groups of 4.
So process elements via a "group of 4" struct and then pass the array of those structs to the external API using a pointer cast. See code below.
Spider sense says: "strict aliasing violation" in the reinterpret_cast => possible UB?
Are the static_asserts below enough to ensure:
a) this works in practice
b) this is actually standards compliant and not UB?
Otherwise, what do I need to do, to make it "not UB". A union? How exactly please?
or, is there overall a different, better way?
#include <cstddef>
void f(int*, std::size_t) {
// external implementation
// process array
}
int main() {
static constexpr std::size_t group_size = 4;
static constexpr std::size_t number_groups = 10;
static constexpr std::size_t total_number = group_size * number_groups;
static_assert(total_number % group_size == 0);
int vals[total_number]{};
struct quad {
int val[group_size]{};
};
quad vals2[number_groups]{};
// deal with values in groups of four using member functions of `quad`
static_assert(alignof(int) == alignof(quad));
static_assert(group_size * sizeof(int) == sizeof(quad));
static_assert(sizeof(vals) == sizeof(vals2));
f(vals, total_number);
f(reinterpret_cast<int*>(vals2), total_number); /// is this UB? or OK under above asserts?
}

No, this is not permitted. The relevant C++ standard section is §7.6.1.10. From the first paragraph, we have (emphasis mine)
The result of the expression reinterpret_cast<T>(v) is the result of converting the expression v to type T.
If T is an lvalue reference type or an rvalue reference to function type, the result is an lvalue; if T is an rvalue reference to object type, the result is an xvalue; otherwise, the result is a prvalue and the lvalue-to-rvalue, array-to-pointer, and function-to-pointer standard conversions are performed on the expression v.
Conversions that can be performed explicitly using reinterpret_cast are listed below.
No other conversion can be performed explicitly using reinterpret_cast.
So unless your use case is listed on that particular page, it's not valid. Most of the sections are not relevant to your use case, but this is the one that comes closest.
An object pointer can be explicitly converted to an object pointer of a different type.[58]
When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)).
So a reinterpret_cast from one pointer type to another is equivalent to a static_cast through an appropriately cv-qualified void*. Now, a static_cast that goes from T* to S* can be acceptably used as a S* if the types T and S are pointer-interconvertible. From §6.8.4
Two objects a and b are pointer-interconvertible if:
they are the same object, or
one is a union object and the other is a non-static data member of that object ([class.union]), or
one is a standard-layout class object and the other is the first non-static data member of that object or any base class subobject of that object ([class.mem]), or
there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast ([expr.reinterpret.cast]).
[Note 4: An array object and its first element are not pointer-interconvertible, even though they have the same address.
— end note]
To summarize, you can cast a pointer to a class C to a pointer to its first member (and back) if there's no vtable to stop you. You can cast a pointer to C into another pointer to C (that can come up if you're adding cv-qualifiers; for instance, reinterpret_cast<const C*>(my_c_ptr) is valid if my_c_ptr is C*). There are also some special rules for unions, which don't apply here. However, you can't factor through arrays, as per Note 4. The conversion you want here is quad[] -> quad -> int -> int[], and you can't convert between the quad[] and the quad. If quad was a simple struct that contained only an int, then you could reinterpret a quad* as an int*, but you can't do it through arrays, and certainly not through a nested layer of them.
None of the sections I've cited say anything about alignment. Or size. Or packing. Or padding. None of that matters. All your static_asserts are doing is slightly increasing the probability that the undefined behavior (which is still undefined) will happen to work on more compilers. But you're using a bandaid to repair a dam; it's not going to work.

No amount of static_asserts is going to make something which is categorically UB into well-defined behavior in accord with the standard. You did not create an array of ints; you created a struct containing an array of ints. So that's what you have.
It's legal to convert a pointer to a quad into a pointer to an int[group_size] (though you'll need to alter your code appropriately. Or you could just access the array directly and cast that to an int*.
Regardless of how you get a pointer to the first element, it's legal to do pointer arithmetic within that array. But the moment you try to do pointer arithmetic past the boundaries of the array within that quad object, you achieve undefined behavior. Pointer arithmetic is defined based on the existence of an array: [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
Otherwise, the behavior is undefined.
The pointer isn't null, so case 1 doesn't apply. The n above is group_size (because the array is the one within quad), so if the index is > group_size, then case 2 doesn't apply.
Therefore, undefined behavior will happen whenever someone tries to access the array past index 4. There is no cast that can wallpaper over that.
Otherwise, what do I need to do, to make it "not UB". A union? How exactly please?
You don't. What you're trying to do is simply not valid with respect to the C++ object model. You need an array of ints, so you must create an array of ints. You cannot treat an array of something other than ints as an array of ints (well, with minor exceptions of byte-wise arrays, but that's unhelpful to you).
The simplest valid way to process the array in groups is to just... do some nested loops:
int arr[total_number];
for(int* curr = arr; curr != std::end(arr); curr += 4)
{
//Use `curr[0]` to `curr[3]`;
//Or create a `std::span<int, 4> group(curr)`;
}

Is reinterpret_cast<char*>(myTypePtr) assumed to point to an array?

We know that char* can alias anything: According to cppreference
Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:
[...]
AliasedType is std::byte, char, or unsigned char: this permits examination of the object representation of any object as an array of bytes. [...]
The statement in boldface is not present in n4659 draft [6.10, (8.8)].
Since doing pointer arithmetic on pointers that don't point to elements of the same array is undefined, can we really access bytes other than the first one using only reinterpret_cast?
Or maybe std::memcpy must be used for that purpose?

auto ptr = reinterpret_cast<char*>(myTypePtr);
The standard permit this conversion, due to:
An object pointer can be explicitly converted to an object pointer of a different type.73 When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)). [ Note: Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value.  — end note ]
So, the conversion is equivalent to:
assume myTypePtr has no any cv qualifier.
auto ptr = static_cast<char*>(static_cast<void*>(myTypePtr))
And you are permitted to dereference myTypePtr to access the value within the object(the pointer point to), due to:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
a char, unsigned char, or std::byte type.
If myTypePtr is not an object of array of char type, as long as you applied addition to ptr, It will result in undefined behavior, due to:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements,86 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element
x[j + p] if 0 ≤ i+j≤n ; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i - j] if 0 ≤ i - j≤n ; otherwise, the behavior is undefined.
For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T and the array element type are not similar, the behavior is undefined.
Because the element of myTypePtr is not of type char. Hence applying addition to ptr result in undefined behavior.
Or maybe std::memcpy must be used for that purpose?
Yes, If the object to which myTypePtr point subject to the following rules:
For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std::byte ([cstddef.syn]).43 If the content of that array is copied back into the object, the object shall subsequently hold its original value.
OR
For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes ([intro.memory]) making up obj1 are copied into obj2,44 obj2 shall subsequently hold the same value as obj1.
However, It's unfortunately we can't implement such a memcpy subject to the current standard.

As std::as_bytes and std::as_writable_bytes essentially depend on such usage (which is specified in [span.objectrep]), I guess we can assume it is supported, even though it is not true according to C++17/20 (and the latest working draft).
This defect has been revealed by P1839, and, unfortunately, has not been resolved yet.

Is it UB to access a member by casting an object pointer to `char `, then doing `(member_type*)(pointer + offset)`?

Here's an example:
#include <cstddef>
#include <iostream>
struct A
{
char padding[7];
int x;
};
constexpr int offset = offsetof(A, x);
int main()
{
A a;
a.x = 42;
char *ptr = (char *)&a;
std::cout << *(int *)(ptr + offset) << '\n'; // Well-defined or not?
}
I always assumed that it's well-defined (otherwise what would be the point of offsetof), but wasn't sure.
Recently I was told that it's in fact UB, so I want to figure it out once and for all.
Does the example above cause UB or not? If you modify the class to not be standard-layout, does it affect the result?
And if it's UB, are there any workarounds for it (e.g. applying std::launder)?
This entire topic seems to be moot and underspecified.
Here's some information I was able to find:
Is adding to a “char *” pointer UB, when it doesn't actually point to a char array? - In 2011, CWG confirmed that we're allowed to examine the representation of a standard-layout object through an unsigned char pointer.
Unclear if a char pointer can be used insteaed, common sense says it can.
Unclear if staring from C++17 std::launder needs to be applied to the result of the (unsigned char *) cast. Given that it would be a breaking change, it's probably unnecessarly, at least in practice.
Unclear why C++17 changed offsetof to conditionally-support non-standard-layout types (used to be UB). It seems to imply that if an implementation supports that, then it also lets you examine the representation of non-standard-layout objects through unsigned char *.
Do we need to use std::launder when doing pointer arithmetic within a standard-layout object (e.g., with offsetof)? - A question similar to this one. No definitive answer was given.

Here I will refer to C++20 (draft) wording, because one relevant editorial issue was fixed between C++17 and C++20 and also it is possible to refer to specific sentences in HTML version of the C++20 draft, but otherwise there is nothing new in comparison to C++17.
At first, definitions of pointer values [basic.compound]/3:
Every value of pointer type is one of the following:
— a pointer to an object or function (the pointer is said to point to the object or function), or
— a pointer past the end of an object ([expr.add]), or
— the null pointer value for that type, or
— an invalid pointer value.
Now, lets see what happens in the (char *)&a expression.
Let me not prove that a is an lvalue denoting the object of type A, and I will say «the object a» to refer to this object.
The meaning of the &a subexpression is covered in [expr.unary.op]/(3.2):
if the operand is an lvalue of type T, the resulting expression is a prvalue of type “pointer to T” whose result is a pointer to the designated object
So, &a is a prvalue of type A* with the value «pointer to (the object) a».
Now, the cast in (char *)&a is equivalent to reinterpret_cast<char*>(&a), which is defined as static_cast<char*>(static_cast<void*>(&a)) ([expr.reinterpret.cast]/7).
Cast to void* doesn't change the pointer value ([conv.ptr]/2):
A prvalue of type “pointer to cv T”, where T is an object type, can be converted to a prvalue of type “pointer to cv void”. The pointer value ([basic.compound]) is unchanged by this conversion.
i.e. it is still «pointer to (the object) a».
[expr.static.cast]/13 covers the outer static_cast<char*>(...):
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1.
If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified.
Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b.
Otherwise, the pointer value is unchanged by the conversion.
There is no object of type char which is pointer-interconvertible with the object a ([basic.compound]/4):
Two objects a and b are pointer-interconvertible if:
— they are the same object, or
— one is a union object and the other is a non-static data member of that object ([class.union]), or
— one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object ([class.mem]), or
— there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
which means that the static_cast<char*>(...) doesn't change the pointer value and it is the same as in its operand, namely: «pointer to a».
So, (char *)&a is a prvalue of type char* whose value is «pointer to a». This value is stored into char* ptr variable. Then, when you try to do pointer arithmetic with such a value, namely ptr + offset, you step into [expr.add]/6:
For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T and the array element type are not similar, the behavior is undefined.
For the purposes of pointer arithmetic, the object a is considered to be an element of an array A[1] ([basic.compound]/3), so the array element type is A, the type of the pointer expression P is «pointer to char», char and A are not similar types (see [conv.qual]/2), so the behavior is undefined.

This question, and the other one about launder, both seem to me to boil down to interpretation of the last sentence of C++17 [expr.static.cast]/13, which covers what happens for static_cast<T *> applied to an operand of pointer to unrelated type which is correctly aligned:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T ”,
[...]
Otherwise, the pointer value is unchanged by the conversion.
Some posters appear to take this to mean that the result of the cast cannot point to an object of type T, and consequently that reinterpret_cast with pointers or references can only be used on pointer-interconvertible types.
But I don't see that as justified, and (this is a reductio ad absurdum argument) that position would also imply:
The resolution to CWG1314 is overturned.
Inspecting any byte of a standard-layout object is not possible (since casting to unsigned char * or whatever character type supposedly cannot be used to access that byte).
The strict aliasing rule would be redundant since the only way to actually achieve such aliasing is to use such casts.
There would be no normative text to justify the note "[Note: Converting a prvalue of type “pointer to T1 ” to the type “pointer to T2 ” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. —end note ]"
offsetof is useless (and the C++17 changes to it were therefore redundant)
It seems like a more sensible interpretation to me that this sentence means the result of the cast points to the same byte in memory as the operand. (As opposed to pointing to some other byte, as can happen for some pointer casts not covered by this sentence). Saying "the value is unchanged" does not mean "the type is unchanged", e.g. we describe conversion from int to long as preserving the value.
Also, I guess this may be controversial to some but I am taking as axiomatic that if a pointer's value is the address of an object, then the pointer points to that object, unless the Standard specifically excludes the case.
This is consistent with the text of [basic.compound]/3 which says the converse, i.e. that if a pointer points to an object, then its value is the address of the object.
There doesn't seem to be any other explicit statement defining when a pointer can or cannot be said to point to an object, but basic.compound/3 says that all pointers must be one of four cases (points to an object, points past the end, null, invalid).
Examples of excluded cases include:
The use case of std::launder specifically addresses a situation where there was such language ruling out the use of the un-laundered pointer.
A past-the-end pointer does not point to an object. (basic.compound/3)

Is reinterpret_cast-ing a memory buffer back and forth safe, and is the alignas necessary in placement new context? [duplicate]

(In reference to this question and answer.)
Before the C++17 standard, the following sentence was included in [basic.compound]/3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.
But since C++17, this sentence has been removed.
For example I believe that this sentence made this example code defined, and that since C++17 this is undefined behavior:
alignas(int) unsigned char buffer[2*sizeof(int)];
auto p1=new(buffer) int{};
auto p2=new(p1+1) int{};
*(p1+1)=10;
Before C++17, p1+1 holds the address to *p2 and has the right type, so *(p1+1) is a pointer to *p2. In C++17 p1+1 is a pointer past-the-end, so it is not a pointer to object and I believe it is not dereferencable.
Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of the cited sentence?

Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of this cited sentence?
Yes, this interpretation is correct. A pointer past the end isn't simply convertible to another pointer value that happens to point to that address.
The new [basic.compound]/3 says:
Every value of pointer type is one of the following:
(3.1)
a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2)
a pointer past the end of an object ([expr.add]), or
Those are mutually exclusive. p1+1 is a pointer past the end, not a pointer to an object. p1+1 points to a hypothetical x[1] of a size-1 array at p1, not to p2. Those two objects are not pointer-interconvertible.
We also have the non-normative note:
[ Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address. [...]
which clarifies the intent.
As T.C. points out in numerous comments (notably this one), this is really a special case of the problem that comes with trying to implement std::vector - which is that [v.data(), v.data() + v.size()) needs to be a valid range and yet vector doesn't create an array object, so the only defined pointer arithmetic would be going from any given object in the vector to past-the-end of its hypothetical one-size array. Fore more resources, see CWG 2182, this std discussion, and two revisions of a paper on the subject: P0593R0 and P0593R1 (section 1.3 specifically).

In your example, *(p1 + 1) = 10; should be UB, because it is one past the end of the array of size 1. But we are in a very special case here, because the array was dynamically constructed in a larger char array.
Dynamic object creation is described in 4.5 The C++ object model [intro.object], §3 of the n4659 draft of the C++ standard:
3 If a complete object is created (8.3.4) in storage associated with another object e of type “array of N
unsigned char” or of type “array of N std::byte” (21.2.1), that array provides storage for the created
object if:
(3.1) — the lifetime of e has begun and not ended, and
(3.2) — the storage for the new object fits entirely within e, and
(3.3) — there is no smaller array object that satisfies these constraints.
The 3.3 seems rather unclear, but the examples below make the intent more clear:
struct A { unsigned char a[32]; };
struct B { unsigned char b[16]; };
A a;
B *b = new (a.a + 8) B; // a.a provides storage for *b
int *p = new (b->b + 4) int; // b->b provides storage for *p
// a.a does not provide storage for *p (directly),
// but *p is nested within a (see below)
So in the example, the buffer array provides storage for both *p1 and *p2.
The following paragraphs prove that the complete object for both *p1 and *p2 is buffer:
4 An object a is nested within another object b if:
(4.1) — a is a subobject of b, or
(4.2) — b provides storage for a, or
(4.3) — there exists an object c where a is nested within c, and c is nested within b.
5 For every object x, there is some object called the complete object of x, determined as follows:
(5.1) — If x is a complete object, then the complete object of x is itself.
(5.2) — Otherwise, the complete object of x is the complete object of the (unique) object that contains x.
Once this is established, the other relevant part of draft n4659 for C++17 is [basic.coumpound] §3(emphasize mine):
3 ... Every
value of pointer type is one of the following:
(3.1) — a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) — a pointer past the end of an object (8.7), or
(3.3) — the null pointer value (7.11) for that type, or
(3.4) — an invalid pointer value.
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (4.4) occupied by the object or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (8.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address. A pointer value
becomes invalid when the storage it denotes reaches the end of its storage duration; see 6.7. —end note ]
For purposes of pointer arithmetic (8.7) and comparison (8.9, 8.10), a pointer past the end of the last element
of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n]. The
value representation of pointer types is implementation-defined. Pointers to layout-compatible types shall
have the same value representation and alignment requirements (6.11)...
The note A pointer past the end... does not apply here because the objects pointed to by p1 and p2 and not unrelated, but are nested into the same complete object, so pointer arithmetics make sense inside the object that provide storage: p2 - p1 is defined and is (&buffer[sizeof(int)] - buffer]) / sizeof(int) that is 1.
So p1 + 1 is a pointer to *p2, and *(p1 + 1) = 10; has defined behaviour and sets the value of *p2.
I have also read the C4 annex on the compatibility between C++14 and current (C++17) standards. Removing the possibility to use pointer arithmetics between objects dynamically created in a single character array would be an important change that IMHO should be cited there, because it is a commonly used feature. As nothing about it exist in the compatibility pages, I think that it confirms that it was not the intent of the standard to forbid it.
In particular, it would defeat that common dynamic construction of an array of objects from a class with no default constructor:
class T {
...
public T(U initialization) {
...
}
};
...
unsigned char *mem = new unsigned char[N * sizeof(T)];
T * arr = reinterpret_cast<T*>(mem); // See the array as an array of N T
for (i=0; i<N; i++) {
U u(...);
new(arr + i) T(u);
}
arr can then be used as a pointer to the first element of an array...

To expand on the answers given here is an example of what I believe the revised wording is excluding:
Warning: Undefined Behaviour
#include <iostream>
int main() {
int A[1]{7};
int B[1]{10};
bool same{(B)==(A+1)};
std::cout<<B<< ' '<< A <<' '<<sizeof(*A)<<'\n';
std::cout<<(same?"same":"not same")<<'\n';
std::cout<<*(A+1)<<'\n';//!!!!!
return 0;
}
For entirely implementation dependent (and fragile) reasons possible output of this program is:
0x7fff1e4f2a64 0x7fff1e4f2a60 4
same
10
That output shows that the two arrays (in that case) happen to be stored in memory such that 'one past the end' of A happens to hold the value of the address of the first element of B.
The revised specification is ensuring that regardless A+1 is never a valid pointer to B. The old phrase 'regardless of how the value is obtained' says that if 'A+1' happens to point to 'B[0]' then it's a valid pointer to 'B[0]'.
That can't be good and surely never the intention.

Is a pointer with the right address and type still always a valid pointer since C++17?

(In reference to this question and answer.)
Before the C++17 standard, the following sentence was included in [basic.compound]/3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.
But since C++17, this sentence has been removed.
For example I believe that this sentence made this example code defined, and that since C++17 this is undefined behavior:
alignas(int) unsigned char buffer[2*sizeof(int)];
auto p1=new(buffer) int{};
auto p2=new(p1+1) int{};
*(p1+1)=10;
Before C++17, p1+1 holds the address to *p2 and has the right type, so *(p1+1) is a pointer to *p2. In C++17 p1+1 is a pointer past-the-end, so it is not a pointer to object and I believe it is not dereferencable.
Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of the cited sentence?

Is this interpretation of this modification of the standard right or are there other rules that compensate the deletion of this cited sentence?
Yes, this interpretation is correct. A pointer past the end isn't simply convertible to another pointer value that happens to point to that address.
The new [basic.compound]/3 says:
Every value of pointer type is one of the following:
(3.1)
a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2)
a pointer past the end of an object ([expr.add]), or
Those are mutually exclusive. p1+1 is a pointer past the end, not a pointer to an object. p1+1 points to a hypothetical x[1] of a size-1 array at p1, not to p2. Those two objects are not pointer-interconvertible.
We also have the non-normative note:
[ Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address. [...]
which clarifies the intent.
As T.C. points out in numerous comments (notably this one), this is really a special case of the problem that comes with trying to implement std::vector - which is that [v.data(), v.data() + v.size()) needs to be a valid range and yet vector doesn't create an array object, so the only defined pointer arithmetic would be going from any given object in the vector to past-the-end of its hypothetical one-size array. Fore more resources, see CWG 2182, this std discussion, and two revisions of a paper on the subject: P0593R0 and P0593R1 (section 1.3 specifically).

In your example, *(p1 + 1) = 10; should be UB, because it is one past the end of the array of size 1. But we are in a very special case here, because the array was dynamically constructed in a larger char array.
Dynamic object creation is described in 4.5 The C++ object model [intro.object], §3 of the n4659 draft of the C++ standard:
3 If a complete object is created (8.3.4) in storage associated with another object e of type “array of N
unsigned char” or of type “array of N std::byte” (21.2.1), that array provides storage for the created
object if:
(3.1) — the lifetime of e has begun and not ended, and
(3.2) — the storage for the new object fits entirely within e, and
(3.3) — there is no smaller array object that satisfies these constraints.
The 3.3 seems rather unclear, but the examples below make the intent more clear:
struct A { unsigned char a[32]; };
struct B { unsigned char b[16]; };
A a;
B *b = new (a.a + 8) B; // a.a provides storage for *b
int *p = new (b->b + 4) int; // b->b provides storage for *p
// a.a does not provide storage for *p (directly),
// but *p is nested within a (see below)
So in the example, the buffer array provides storage for both *p1 and *p2.
The following paragraphs prove that the complete object for both *p1 and *p2 is buffer:
4 An object a is nested within another object b if:
(4.1) — a is a subobject of b, or
(4.2) — b provides storage for a, or
(4.3) — there exists an object c where a is nested within c, and c is nested within b.
5 For every object x, there is some object called the complete object of x, determined as follows:
(5.1) — If x is a complete object, then the complete object of x is itself.
(5.2) — Otherwise, the complete object of x is the complete object of the (unique) object that contains x.
Once this is established, the other relevant part of draft n4659 for C++17 is [basic.coumpound] §3(emphasize mine):
3 ... Every
value of pointer type is one of the following:
(3.1) — a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) — a pointer past the end of an object (8.7), or
(3.3) — the null pointer value (7.11) for that type, or
(3.4) — an invalid pointer value.
A value of a pointer type that is a pointer to or past the end of an object represents the address of the
first byte in memory (4.4) occupied by the object or the first byte in memory after the end of the storage
occupied by the object, respectively. [ Note: A pointer past the end of an object (8.7) is not considered to
point to an unrelated object of the object’s type that might be located at that address. A pointer value
becomes invalid when the storage it denotes reaches the end of its storage duration; see 6.7. —end note ]
For purposes of pointer arithmetic (8.7) and comparison (8.9, 8.10), a pointer past the end of the last element
of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n]. The
value representation of pointer types is implementation-defined. Pointers to layout-compatible types shall
have the same value representation and alignment requirements (6.11)...
The note A pointer past the end... does not apply here because the objects pointed to by p1 and p2 and not unrelated, but are nested into the same complete object, so pointer arithmetics make sense inside the object that provide storage: p2 - p1 is defined and is (&buffer[sizeof(int)] - buffer]) / sizeof(int) that is 1.
So p1 + 1 is a pointer to *p2, and *(p1 + 1) = 10; has defined behaviour and sets the value of *p2.
I have also read the C4 annex on the compatibility between C++14 and current (C++17) standards. Removing the possibility to use pointer arithmetics between objects dynamically created in a single character array would be an important change that IMHO should be cited there, because it is a commonly used feature. As nothing about it exist in the compatibility pages, I think that it confirms that it was not the intent of the standard to forbid it.
In particular, it would defeat that common dynamic construction of an array of objects from a class with no default constructor:
class T {
...
public T(U initialization) {
...
}
};
...
unsigned char *mem = new unsigned char[N * sizeof(T)];
T * arr = reinterpret_cast<T*>(mem); // See the array as an array of N T
for (i=0; i<N; i++) {
U u(...);
new(arr + i) T(u);
}
arr can then be used as a pointer to the first element of an array...

To expand on the answers given here is an example of what I believe the revised wording is excluding:
Warning: Undefined Behaviour
#include <iostream>
int main() {
int A[1]{7};
int B[1]{10};
bool same{(B)==(A+1)};
std::cout<<B<< ' '<< A <<' '<<sizeof(*A)<<'\n';
std::cout<<(same?"same":"not same")<<'\n';
std::cout<<*(A+1)<<'\n';//!!!!!
return 0;
}
For entirely implementation dependent (and fragile) reasons possible output of this program is:
0x7fff1e4f2a64 0x7fff1e4f2a60 4
same
10
That output shows that the two arrays (in that case) happen to be stored in memory such that 'one past the end' of A happens to hold the value of the address of the first element of B.
The revised specification is ensuring that regardless A+1 is never a valid pointer to B. The old phrase 'regardless of how the value is obtained' says that if 'A+1' happens to point to 'B[0]' then it's a valid pointer to 'B[0]'.
That can't be good and surely never the intention.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

UB When Dereferencing Array of Unions - c++

Related

process array in chunks using struct then cast as flat array - how to avoid UB (strict aliasing)?

Is reinterpret_cast<char*>(myTypePtr) assumed to point to an array?

Is it UB to access a member by casting an object pointer to `char `, then doing `(member_type*)(pointer + offset)`?

Is reinterpret_cast-ing a memory buffer back and forth safe, and is the alignas necessary in placement new context? [duplicate]

Is a pointer with the right address and type still always a valid pointer since C++17?

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

UB When Dereferencing Array of Unions - c++

Related

process array in chunks using struct then cast as flat array - how to avoid UB (strict aliasing)?

Is reinterpret_cast<char*>(myTypePtr) assumed to point to an array?

Is it UB to access a member by casting an object pointer to `char *`, then doing `*(member_type*)(pointer + offset)`?

Is reinterpret_cast-ing a memory buffer back and forth safe, and is the alignas necessary in placement new context? [duplicate]

Is a pointer with the right address and type still always a valid pointer since C++17?

Categories

Resources

Is it UB to access a member by casting an object pointer to `char `, then doing `(member_type*)(pointer + offset)`?