Do data member addresses lie between (this) and (this+1)? - c++

Suppose that we have the following two inequalities inside a member function
this <= (void *) &this->data_member
and
&this->data_member < (void *) (this+1)
Are they guaranteed to be true?
(They seem to be true in a few cases that I checked.)
Edit: I missed ampersands, now it's the correct form of inequalities.

From CPP standard draft 4713:
6.6.2 Object model [intro.object]/7
An object of trivially copyable or standard-layout type (6.7) shall occupy contiguous bytes of storage.
12.2 Class members [class.mem]/18
Non-static data members of a (non-union) class with the same access control (Clause 14) are allocated so that later members have higher addresses within a class object.
12.2 Class members [class.mem]/25
If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member. Otherwise, its address is the same as the address of its first base class subobject (if any).
Taking all the above together, we can say the first equation holds for at least trivially copyable objects.
Also from the online cpp reference:
The result of comparing two pointers to objects (after conversions) is defined as follows:
1) If two pointers point to different elements of the same array, or to subobjects within different elements of the same array, the pointer to the element with the higher subscript compares greater. In other words, they results of comparing the pointers is the same as the result of comparing the indexes of the elements they point to.
2) If one pointer points to an element of an array, or to a subobject of the element of the array, and another pointer points one past the last element of the array, the latter pointer compares greater. Pointers to single objects are treated as pointers to arrays of one: &obj+1 compares greater than &obj (since C++17)
So if your data_member is not a pointer and has not been allocated memory separately, the equations you have posted hold good
for at least trivially copyable objects.

The full standard text amounts to this:
[expr.rel] - 4: The result of comparing unequal pointers to objects82 is defined in terms of a partial order consistent with the following rules:
We are dealing with a partial order here, not a total order. That does mean a < b and b < c implies a < c, but not much else.
(Note 82 states that non-array objects are considered elements of a single-element array for this purpose, with the intuitive meaning/behavior of "pointer to element one past the end").
(4.1)
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript is required to compare greater.
Pointers to different members are not pointers to (subobjects of) elements of the same array. This rule does not apply.
(4.2)
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member is required to compare greater provided the two members have the same access control ([class.access]), neither member is a subobject of zero size, and their class is not a union.
This rule only relates pointers to data members of the same object, not of a different object.
(4.3)
Otherwise, neither pointer is required to compare greater than the other.
Thus, you do not get any guarantees from the standard. Whether you can find a real-world system where you get a different result than you expect is another question.

The value of an object is within its representation, and this representation is a sequence of unsigned char: [basic.types]/4
The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T).
The value representation of an object of type T is the set of bits that participate in representing a value of type T.[...]
So for formalism fundamentalists, it is true that value is not defined but the term appears in the definition of access:[defns.access]:
read or modify the value of an object
So is the value of a suboject part of the value of a complete object? I suppose this is what is intended by the standard.
The comparison should be true if you cast object pointers to unsigned char*. (This is a common practice that falls on an under-specification core issue #1701)

No, here's a counterexample
#include <iostream>
struct A
{
int a_member[10];
};
struct B : public virtual A
{
int b_member[10];
void print_b() { std::cout << static_cast<void*>(this) << " " << static_cast<void*>(std::addressof(this->a_member)) << " " << static_cast<void*>(this + 1) << std::endl; }
};
struct C : public virtual A
{
int c_member[10];
void print_c() { std::cout << static_cast<void*>(this) << " " << static_cast<void*>(std::addressof(this->a_member)) << " " << static_cast<void*>(this + 1) << std::endl; }
};
struct D : public B, public C
{
void print_d()
{
print_b();
print_c();
}
};
int main()
{
D d;
d.print_d();
}
With the possible output (as seen here)
0x7fffc6bf9fb0 0x7fffc6bfa010 0x7fffc6bfa008
0x7fffc6bf9fe0 0x7fffc6bfa010 0x7fffc6bfa038
Note that the a_member is outside of the B pointed to by this in print_b

Related

process array in chunks using struct then cast as flat array - how to avoid UB (strict aliasing)?

An external API expects a pointer to an array of values (int as simple example here) plus a size.
It is logically clearer to deal with the elements in groups of 4.
So process elements via a "group of 4" struct and then pass the array of those structs to the external API using a pointer cast. See code below.
Spider sense says: "strict aliasing violation" in the reinterpret_cast => possible UB?
Are the static_asserts below enough to ensure:
a) this works in practice
b) this is actually standards compliant and not UB?
Otherwise, what do I need to do, to make it "not UB". A union? How exactly please?
or, is there overall a different, better way?
#include <cstddef>
void f(int*, std::size_t) {
// external implementation
// process array
}
int main() {
static constexpr std::size_t group_size = 4;
static constexpr std::size_t number_groups = 10;
static constexpr std::size_t total_number = group_size * number_groups;
static_assert(total_number % group_size == 0);
int vals[total_number]{};
struct quad {
int val[group_size]{};
};
quad vals2[number_groups]{};
// deal with values in groups of four using member functions of `quad`
static_assert(alignof(int) == alignof(quad));
static_assert(group_size * sizeof(int) == sizeof(quad));
static_assert(sizeof(vals) == sizeof(vals2));
f(vals, total_number);
f(reinterpret_cast<int*>(vals2), total_number); /// is this UB? or OK under above asserts?
}
No, this is not permitted. The relevant C++ standard section is §7.6.1.10. From the first paragraph, we have (emphasis mine)
The result of the expression reinterpret_­cast<T>(v) is the result of converting the expression v to type T.
If T is an lvalue reference type or an rvalue reference to function type, the result is an lvalue; if T is an rvalue reference to object type, the result is an xvalue; otherwise, the result is a prvalue and the lvalue-to-rvalue, array-to-pointer, and function-to-pointer standard conversions are performed on the expression v.
Conversions that can be performed explicitly using reinterpret_­cast are listed below.
No other conversion can be performed explicitly using reinterpret_­cast.
So unless your use case is listed on that particular page, it's not valid. Most of the sections are not relevant to your use case, but this is the one that comes closest.
An object pointer can be explicitly converted to an object pointer of a different type.[58]
When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_­cast<cv T*>(static_­cast<cv void*>(v)).
So a reinterpret_cast from one pointer type to another is equivalent to a static_cast through an appropriately cv-qualified void*. Now, a static_cast that goes from T* to S* can be acceptably used as a S* if the types T and S are pointer-interconvertible. From §6.8.4
Two objects a and b are pointer-interconvertible if:
they are the same object, or
one is a union object and the other is a non-static data member of that object ([class.union]), or
one is a standard-layout class object and the other is the first non-static data member of that object or any base class subobject of that object ([class.mem]), or
there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast ([expr.reinterpret.cast]).
[Note 4: An array object and its first element are not pointer-interconvertible, even though they have the same address.
— end note]
To summarize, you can cast a pointer to a class C to a pointer to its first member (and back) if there's no vtable to stop you. You can cast a pointer to C into another pointer to C (that can come up if you're adding cv-qualifiers; for instance, reinterpret_cast<const C*>(my_c_ptr) is valid if my_c_ptr is C*). There are also some special rules for unions, which don't apply here. However, you can't factor through arrays, as per Note 4. The conversion you want here is quad[] -> quad -> int -> int[], and you can't convert between the quad[] and the quad. If quad was a simple struct that contained only an int, then you could reinterpret a quad* as an int*, but you can't do it through arrays, and certainly not through a nested layer of them.
None of the sections I've cited say anything about alignment. Or size. Or packing. Or padding. None of that matters. All your static_asserts are doing is slightly increasing the probability that the undefined behavior (which is still undefined) will happen to work on more compilers. But you're using a bandaid to repair a dam; it's not going to work.
No amount of static_asserts is going to make something which is categorically UB into well-defined behavior in accord with the standard. You did not create an array of ints; you created a struct containing an array of ints. So that's what you have.
It's legal to convert a pointer to a quad into a pointer to an int[group_size] (though you'll need to alter your code appropriately. Or you could just access the array directly and cast that to an int*.
Regardless of how you get a pointer to the first element, it's legal to do pointer arithmetic within that array. But the moment you try to do pointer arithmetic past the boundaries of the array within that quad object, you achieve undefined behavior. Pointer arithmetic is defined based on the existence of an array: [expr.add]/4
When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.
If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n.
Otherwise, the behavior is undefined.
The pointer isn't null, so case 1 doesn't apply. The n above is group_size (because the array is the one within quad), so if the index is > group_size, then case 2 doesn't apply.
Therefore, undefined behavior will happen whenever someone tries to access the array past index 4. There is no cast that can wallpaper over that.
Otherwise, what do I need to do, to make it "not UB". A union? How exactly please?
You don't. What you're trying to do is simply not valid with respect to the C++ object model. You need an array of ints, so you must create an array of ints. You cannot treat an array of something other than ints as an array of ints (well, with minor exceptions of byte-wise arrays, but that's unhelpful to you).
The simplest valid way to process the array in groups is to just... do some nested loops:
int arr[total_number];
for(int* curr = arr; curr != std::end(arr); curr += 4)
{
//Use `curr[0]` to `curr[3]`;
//Or create a `std::span<int, 4> group(curr)`;
}

Is it implementation-defined that how to deal with [[no_unique_address]]?

Below is excerpted from cppref but reduced to demo:
#include <iostream>
struct Empty {}; // empty class
struct W
{
char c[2];
[[no_unique_address]] Empty e1, e2;
};
int main()
{
std::cout << std::boolalpha;
// e1 and e2 cannot have the same address, but one of them can share with
// c[0] and the other with c[1]
std::cout << "sizeof(W) == 2 is " << (sizeof(W) == 2) << '\n';
}
The documentation says the output might be:
sizeof(W) == 2 is true
However, both gcc and clang output as follows:
sizeof(W) == 2 is false
Is it implementation-defined that how to deal with [[no_unique_address]]?
See [intro.object]/8:
An object has nonzero size if ... Otherwise, if the object is a base class subobject of a standard-layout class type with no non-static data members, it has zero size.
Otherwise, the circumstances under which the object has zero size are implementation-defined.
Empty base class optimization became mandatory for standard-layout classes in C++11 (see here for discussion). Empty member optimization is never mandatory. It is implementation-defined, as you suspected.
Yes, virtually all aspects of no_unique_address are implementation-defined. It is a tool to allow for optimizations, not to enforce them.
That being said, you should never assume that no_unique_address will work when you attempt to have two subobjects with the same type. The standard still requires that all distinct subobjects of the same type have different addresses, no_unique_address or not. And while it is possible that the compiler could assign these empty subobjects distinct addresses by radically reordering the members... they're pretty much not going to do that.
Your best bet for reasonably taking advantage of no_unique_address optimizations is to never have two subobjects of the same type, and try to put all possibly empty members first. That is, you should expect implementations to assign an empty no_unique_address member to the offset of the next member (or to the offset of the containing struct as a whole).

`std::complex<T>[n]` and `T[n*2]` type aliasing

Since C++11 std::complex<T>[n] is guaranteed to be aliasable as T[n*2], with well defined values. Which is exactly what one would expect for any mainstream architecture. Is this guarantee achievable with standard C++ for my own types, say struct vec3 { float x, y, z; } or is it only possible with special support from the compiler?
TL;DR: The compiler must inspect reinterpret_casts and figure out that (standard library) specializations of std::complex are involved. We cannot conformably mimic the semantics.
I think it's fairly clear that treating three distinct members as array elements is not going to work, since pointer arithmetic on pointers to them is extremely restricted (e.g. adding 1 yields a pointer past-the-end).
So let's assume vec3 contained an array of three ints instead.
Even then, the underlying reinterpret_cast<int*>(&v) you implicitly need (where v is a vec3) does not leave you with a pointer to the first element. See the exhaustive requirements on pointer-interconvertibility:
Two objects a and b are pointer-interconvertible if:
they are the same object, or
one is a standard-layout union object and the other is a non-static data member of that object ([class.union]), or
one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no
non-static data members, the first base class subobject of that object
([class.mem]), or
there exists an object c such that a and c are pointer-interconvertible, and c and b are
pointer-interconvertible.
If two objects are pointer-interconvertible, then they have the same
address, and it is possible to obtain a pointer to one from a pointer
to the other via a reinterpret_­cast. [ Note: An array object
and its first element are not pointer-interconvertible, even though
they have the same address.  — end note ]
That's quite unequivocal; while we can get a pointer to the array (being the first member), and while pointer-interconvertibility is transitive, we cannot obtain a pointer to its first element.
And finally, even if you managed to obtain a pointer to the first element of your member array, if you had an array of vec3s, you cannot traverse all the member arrays using simple pointer increments, since we get pointers past-the-end of the arrays in between. launder doesn't solve this problem either, because the objects that the pointers are associated with don't share any storage (cf [ptr.launder] for specifics).
It's only possible with special support from the compiler, mostly.
Unions don't get you there because the common approach actually has undefined behaviour, although there are exceptions for layout-compatible initial sequences, and you may inspect an object through an unsigned char* as a special case. That's it, though.
Interestingly, unless we assume a broad and useless meaning of "below", the standard is technically contradictory in this regard:
[C++14: 5.2.10/1]: [..] Conversions that can be performed explicitly using reinterpret_cast are listed below. No other conversion can be performed explicitly using reinterpret_cast.
The case for complex<T> is then not mentioned. Finally the rule you're referring to is introduced much, much later, in [C++14: 26.4/4].
I think it would work for a single vec3 if your type contained float x[3] instead, and you ensure sizeof(vec3) == 3*sizeof(float) && is_standard_layout_v<vec3>. Given those conditions, the standard guarantees that the first member is at zero offset so the address of the first float is the address of the object, and you can perform array arithmetic to get the other elements in the array:
struct vec3 { float x[3]; } v = { };
float* x = reinterpret_cast<float*>(&v); // points to first float
assert(x == v.x);
assert(&x[0] == &v.x[0]);
assert(&x[1] == &v.x[1]);
assert(&x[2] == &v.x[2]);
What you can't do is treat an array of vec3 as an array of floats three times the length. Array arithmetic on the array inside each vec3 won't allow you to access the array inside the next vec3. CWG 2182 is relevant here.

When are two pointers comparable?

There so many questions on comparing two pointers, but I found none on whether the two types are such that the pointers can be compared. Given
A* a;
B* b;
I want to know if expression a # b is valid, where # is one of ==,!=,<,<=,>,>= (I don't mind about nullptr_t or any other type that can be implicitly converted to a pointer). Is it when A, B are
equal?
equal except for cv-qualification?
in the same class hierarchy?
...?
I didn't find anything in std::type_traits. I could always do my own SFINAE test, but I am looking for the rules to apply them directly. I guess that will be easier for the compiler, right?
EDIT To clarify again: I am comparing pointers, not objects pointed to. I want to know in advance when a # b will give a compiler error, not what will be its value (true or false or unspecified).
C++ Standard
5.9 Relational operators
Pointers to objects or functions of the same type (after pointer conversions) can be compared,
with a result defined as follows:
— If two pointers p and q of the same type point to the same object or function, or both point one past
the end of the same array, or are both null, then p<=q and p>=q both yield true and p<q and p>q both
yield false.
— If two pointers p and q of the same type point to different objects that are not members of the same
object or elements of the same array or to different functions, or if only one of them is null, the results
of pq, p<=q, and p>=q are unspecified.
— If two pointers point to non-static data members of the same object, or to subobjects or array elements
of such members, recursively, the pointer to the later declared member compares greater provided the
two members have the same access control (Clause 11) and provided their class is not a union.
— If two pointers point to non-static data members of the same object with different access control
(Clause 11) the result is unspecified.
— If two pointers point to non-static data members of the same union object, they compare equal (after
conversion to void*, if necessary). If two pointers point to elements of the same array or one beyond
the end of the array, the pointer to the object with the higher subscript compares higher.
— Other pointer comparisons are unspecified.
§
Those comparison operators are always 'valid', in that you can always use them. But the results usually will not be meaningful or useful.
== and != will essentially tell you whether or not a and b refer to the same object. If a == b, then changing *a will affect *b and vice-versa.
>, <, >=, and <= will tell you where the memory addresses are, relative to one another. Usually this information will be irrelevant to the functionality of your program, and in most cases it will be unpredictable. However, one example that comes to mind where you might be able to use it is if you know that a and b point to member of the same array. In which case a < b will tell you whether or not the object *a comes before *b in the array.

Is it safe to compare pointers of same type?

char** buffer{ /* some buffer */ };
char* ptr1{buffer[0]};
char* ptr2{buffer[10]};
assert(ptr1 < ptr2);
If two pointers point to different locations in the same buffer, is it safe to compare them?
I want to know if a range of pointers is valid by comparing: assert(rangeBeginPtr < rangeEndPtr).
You can compare pointers with the relational operators (<, >, <= and >=) provided they both point to an element of the same array, or one past that array. Anything else is unspecified behaviour as per C++11 5.9 Relational operators. So, given:
char xyzzy[10];
char plugh[10];
All these are specified to function correctly:
assert(&(xyzzy[1]) < &(xyzzy[4]));
assert(&(xyzzy[9]) < &(xyzzy[10])); // even though [10] isn't there.
but these are not:
assert(&(xyzzy[1]) < &(xyzzy[15]));
assert(&(xyzzy[9]) < &(plugh[3]));
The type doesn't come into it except that it has to be the same type if you're comparing two elements in the same array. If you have two char * variables, that's unspecified if they point to different arrays even though they have the same type.
You can determine the order of pointers only with on array object and if they are non-void. However, within one array object the comparison is well defined. The relevant clause in the standard is 5.9 [expr.rel] paragraph 2:
[...] Pointers to objects or functions of the same type (after pointer conversions) can be compared, with a result defined as follows:
If two pointers p and q of the same type point to the same object or function, or both point one past the end of the same array, or are both null, then p<=q and p>=q both yield true and p<q and p>q both yield false.
If two pointers p and q of the same type point to different objects that are not members of the same object or elements of the same array or to different functions, or if only one of them is null, the results of p<q, p>q, p<=q, and p>=q are unspecified.
If two pointers point to non-static data members of the same object, or to subobjects or array elements of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control (Clause 11) and provided their class is not a union.
If two pointers point to non-static data members of the same object with different access control (Clause 11) the result is unspecified.
If two pointers point to non-static data members of the same union object, they compare equal (after conversion to void*, if necessary). If two pointers point to elements of the same array or one beyond the end of the array, the pointer to the object with the higher subscript compares higher.
Other pointer comparisons are unspecified.
== and != are valid and well-defined for all pointers of the same type. <, <=, >, and >= are only meaningful for pointers that point to objects in the same array or one-past-the-end of the array. They are also meaningful for pointers to sub-objects of a class object if the sub-objects have the same type and the same access specifier. If those conditions aren't met, the result is unspecified; one immediate consequence is that a<b and b<c does not imply that a<c, so you cannot use <, etc. as the comparator for a sort function.
std::less, std::less_equal, std::greater, and std::greater_equal for pointer types all define a total ordering; they can be used for sorting.