Reference to type-erased void*

Reference to type-erased void* - c++

I'm implementing an iterator adaptor that allows to deal with an old data type saving void*, and I would like to get a forward_iterator that allows to swap values of that old data type, by giving the user a view of the real pointer that is saved in that structure. Example:
auto it = iterator_adaptor<T*>(pos);
where pos->object is a void* that was originally of type T*. The thing is about the reference type:
// within iterator_adaptor
typedef T*& reference;
// I want to promise that to the user.
typedef std::forward_iterator_tag iterator_category;
reference operator*() const { return static_cast<reference>(_pos->object); }
Which yields a compiler error since I cannot wrap a reference over an object of a distinct type. I could cast between references, or between pointers, if types are related, but how could I cast a void* lvalue to a T*& in a non-undefined behaviour way, knowing that _pos->object points to an object of type T?
The only thing that I can think of that might be swallow by the compiler is:
return *reinterpret_cast<T**>(&_pos->object);
or something in this direction, but that must be defined as undefined behaviour by the standard with 100% probability.
NOTE: I would like to return a T*&, not a T&, since some semantics of each T are defined by its address (specifically, there's hash tables that maps T::id() to its address, since each T::id() is unique per T*). If I return T& and the user swaps them, address and id doesn't match anymore, to give some example that might broke the application. I want rather to allow the user to swap the positions of each T* within the structure (because the user saves pointers after all; each T is created dynamically before inserting into the structure), to personalize its ordering for example, or use any std algorithm requiring both, forward and input iterators.
Actually, the "swap" positions thing is not so important, but using the <algorithm> library for algorithms requiring forward iterators is a feature I would like to offer.

Ok, let me get this straight (a mcve would have helped so much):
You have this situation:
X x1{}, x2;
X* p = &x1;
void* vp = reinterpret_cast<void*>(p);
// p is lost
// here you want to recover p such that:
X*& q = /* something magic from vp */;
q = &x2; // this will modify p
If this is the case that is simply impossible because you lost the object p forever. You saved in vp to what p pointed, a.k.a you saved its value, aka you saved the address of x1 (in a type erased way) and that is recoverable, the pointee is recoverable (if you know the original type), but p is lost, it was never saved.
If you want to recover p then you need to save it's address:
X x1{11}, x2{27};
X* p = &x1;
void* vpp = reinterpret_cast<void*>(&p);
// p must not end lifetime !! very important
X*& q = *reinterpret_cast<X**>(vpp);
q = &x2; // will indeed modify p (p must still be alive)
Otherwise you can do this, it's perfectly valid:
T& a = *reinterpret_cast<T*>(pos->object);
T* p = reinterpret_cast<T*>(pos->object);
And finally some standard dessert (emphasis mine):
§8.5.1.10 Reinterpret cast [expr.reinterpret.cast]
An object pointer can be explicitly converted to an object pointer of a different type. 73 When a prvalue v of object pointer
type is converted to the object pointer type “pointer to cv T”, the
result is static_cast<cv T*>(static_cast<cv void*>(v)). [ Note:
Converting a prvalue of type “pointer to T1” to the type “pointer to
T2” (where T1 and T2 are object types and where the alignment
requirements of T2 are no stricter than those of T1) and back to its
original type yields the original pointer value. — end note ]
In the simplest example:
X* p = /* ... */;
void* v = reinterpret_cast<void*>(p);
X* q = reinterpret_cast<X*>(v);
// q is guaranteed to have the original value of p,
// i.e. p == q is true

Related

Trouble understanding how pointer dereference works in C++

I'm having some trouble understanding how pointer dereferencing in C++ works. Let's look at this simple example:
struct Value {
int x = 0;
void Inc() { x++; }
};
int main(int argc, char* argv[]) {
Value* v = new Value();
v->Inc();
std::cout << v->x << std::endl; // prints 1, as I would expect
(*v).Inc();
std::cout << v->x << std::endl; // prints 2, but I would have expected it to print 1,
// as I thought (*v) would create a local copy of
// the original `Value` object.
Value v2 = *v;
v2.Inc();
std::cout << v->x << std::endl; // prints 2, as I would expect
I'm a bit confused here. I would assume that the 2nd and 3rd calls to Inc() would be equivalent. Namely, that (*v).Inc() would unfold into a temporary variable holding a copy of v on the stack, and that Inc() would then increment that copy on the stack of v instead of the original v. Why is that not the case?
Thanks

In the (*v).Inc(); statement, the LHS of the . operator is the result of the indirection of the v pointer. This will be an lvalue expression referring to the object to which v points. From this Draft C++ Standard (emphasis mine):
8.5.2.1 Unary operators      [expr.unary.op]
1     The unary *
operator performs indirection: the expression to which it is applied
shall be a pointer to an object type, or a pointer to a function type
and the result is an lvalue referring to the object or function to
which the expression points.
So, in this first case, no temporary object need be created and the Inc() function is called on the original Value object created by the new operation.
However, in this statement: Value v2 = *v;, you are declaring a separate Value object and initialising it with a copy of the Value pointed to by v. Thus, any subsequent modifications to v2 will not affect the object referred to by v.

*pointer just returns an object the pointer points to, quoting [expr.unary.op]/1:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points
Value v2 = *v is a form of initialisation, so it actually calls a constructor. This would be equivalent to Value v2{ *v } (for this particular class).
For the part why *pointer doesn't create a temporary, there are well-defined rules on when temporaries are created:
Temporary objects are created when a prvalue is materialized so that
it can be used as a glvalue, which occurs (since C++17) in the
following situations:
binding a reference to a prvalue
initializing an object of type
std::initializer_list from a braced-init-list (since C++11)
returning a prvalue from a function
conversion that creates a prvalue
(including T(a,b,c) and T{})
lambda expression (since C++11)
copy-initialization that requires conversion of the initializer,
reference-initialization to a different but convertible type or to a
bitfield.
plus some others scenarios for C++17. For this particular case the most important part is that indirection returns an lvalue, so there is no rule applicable to it if the expression doesn't partake in any other expression.

Take a reference to a pointer with erased type (void*)

I can take a T*& from a T*. Now I need to store my T* in a type-erased way, more specifically as a void*. Can I take a T*& from a void* ? (knowing, of course, that my void* does point to Ts)
Example:
#include <iostream>
#include <cstdlib>
#include <numeric>
int main() {
int n = 10;
void* mdbuf = malloc(n*sizeof(double));
double* mdarr = (double*)mdbuf;
std::iota(mdarr,mdarr+n,0.); // initialize the memory with doubles
// solution 1: works but not what I want since I want to refer to the type-erased mdbuf variable
double*& mdarr_ref = mdarr; // ok, now mdarr_ref refers to variable mdarr
// solution 2: does not compile
double*& mdbuf_ref = (double*)mdbuf; // error: cannot bind non-const lvalue reference of type 'double*&' to an rvalue of type 'double*'
// solution 3: compiles and work but I want to be sure this is not out of pure luck: is it undefined behavior?
double*& mdbuf_ref = (double*&)mdbuf; // we would like mdbuf_ref to refer to variable mdbuf. It compiles...
std::iota(mdbuf_ref,mdbuf_ref+n,100.);
for (int i=0; i<n; ++i) {
std::cout << mdbuf_ref[i] << ", "; // ...does what we want in this case... is it valid however?
}
}
Edit: Maybe one way to look at it is the following:
double d;
void* v_ptr = &d;
double* d_ptr = (double*)v_ptr; // (1) valid
double& d_ref = d; // (2) valid
double& d_ref2 = (double&)d; // (3) valid? Should be the same as (2) ?
double*& d_ref3 = (double*&)v_ptr; // (4)
The question is: is (4) valid? If (1) and (3) hold, it is just chaining both, so I expect it to be valid, but I would like some evidence of it

I'm going to take your second example and rewrite parts of it using aliases to better illustrate what you're asking for.
using V = void*;
using K = double*;
double d;
V v_ptr = reinterpret_cast<V>(&d);
V &v_ptr_ref1 = v_ptr; //Refers to the `V` object denoted by `v_ptr`.
K d_ptr = &d;
K &d_ptr_ref1 = d_ptr; //Refers to the `K` object denoted by `d_ptr`.
V &d_ptr_ref2 = reinterpret_cast<V&>(d_ptr);
So, we have two types: K and V. In the last line, we initialize a reference to a V using an object of type K. So d_ptr_ref2 is initialized to reference an object of type K, but the type of the reference is V.
It doesn't matter if they are "just" pointer types. In C++, pointers are object types and they follow all the rules of any other object type.
C++'s strict aliasing rule forbids accessing an object of one type through a glvalue (like a reference) of a different type, outside of certain very specific circumstances. The specific exceptions vary slightly from version to version, but there is no version of C++ where void* and double* are an exception.
Attempting to access d_ptr_ref2 means that you're accessing an object of type K through a reference of an unrelated type V. That violates strict aliasing, thus yielding undefined behavior.

Your solution 1 is the only answer; you can’t lie about the pointer type itself. (That question is about C, but the rules for C++ references are equivalent.)

std::launder use cases in C++20

[1]
Are there any cases in which the addition of p0593r6 into C++20 (§ 6.7.2.11 Object model [intro.object]) made std::launder not necessary, where the same use case in C++17 required std::launder, or are they completely orthogonal?
[2]
The example in the spec for [ptr::launder] is:
struct X { int n; };
const X *p = new const X{3};
const int a = p->n;
new (const_cast<X*>(p)) const X{5}; // p does not point to new object ([basic.life]) because its type is const
const int b = p->n; // undefined behavior
const int c = std::launder(p)->n; // OK
Another example is given by #Nicol Bolas in this SO answer, using a pointer that points to a valid storage but of a different type:
aligned_storage<sizeof(int), alignof(int)>::type data;
new(&data) int;
int *p = std::launder(reinterpret_cast<int*>(&data));
Are there other use cases, not related to allowing casting of two objects which are not transparently replaceable, for using std::launder?
Specifically:
Would reinterpret_cast from A* to B*, both are pointer-interconvertible, may require using std::launder in any case? (i.e. can two pointers be pointer-interconvertible and yet not be transparently replaceable? the spec didn't relate between these two terms).
Does reinterpret_cast from void* to T* require using std::launder?
Does the following code below require use of std::launder? If so, under which case in the spec does it fall to require that?
A struct with reference member, inspired by this discussion:
struct A {
constexpr A(int &x) : ref(x) {}
int &ref;
};
int main() {
int n1 = 1, n2 = 2;
A a { n1 };
a.~A();
new (&a) A {n2};
a.ref = 3; // do we need to launder somebody here?
std::cout << a.ref << ' ' << n1 << ' ' << n2 << std::endl;
}

Before C++17, a pointer with a given address and type always pointed to an object of that type located at that address, provided that the code respects the rules of [basic.life]. (see: Is a pointer with the right address and type still always a valid pointer since C++17?).
But in the C++17 standard added a new quality to a pointer value. This quality is not encode within the pointer type but qualifies directly the value, independently of the type (this is the case also of the traceability). It is described in [basic.compound]/3
Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value for that type, or
an invalid pointer value.
This quality of a pointer value has its own semantic (transition rules), and for the case of reinterpret_cast it is described in the next paragraph:
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast.
In [basic-life], we can find an other rule that describes how transitions this quality when an object storage is reused:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, [...]
As you can see the quality "pointer to an object" is attached to a specific object.
That means that in the variation bellow of the first example you give, the reinterpret_cast does not allow us not to use the pointer optimization barrier:
struct X { int n; };
const X *p = new const X{3};
const int a = p->n;
new (const_cast<X*>(p)) const X{5}; // p does not point to new object ([basic.life]) because its type is const
const int b = *reinterpret_cast <int*> (p); // undefined behavior
const int c = *std::launder(reinterpret_cast <int*> (p));
A reinterpret_cast is not a pointer optimization barrier: reinterpret_cast <int*>(p) points to the member of the destroyed object.
An other way to conceive it is that the "pointer to" quality is conserved by reinterpret_cast as long as the object are pointer inter-convertible or if its casted to void and then back to a pointer inter-convertible type. (See [exp.static_cast]/13). So reinterpret_cast <int*>(reinterpret_cast <void*>(p)) still points to the destroyed object.
For the last example you gives, the name a refers to a non const complete object, so the original a is transparently replaceable by the new object.
For the first question you ask: "Are there any cases in which the addition of p0593r6 into C++20 (§ 6.7.2.11 Object model [intro.object]) made std::launder not necessary, where the same use case in C++17 required std::launder, or are they completely orthogonal?"
Honestly, I have not been able to find any cases that where std::launder could compensate implict-lifetime objects. But I found an example were implicit-lifetime object makes std::launder usefull:
class my_buffer {
alignas(int) std::byte buffer [2*sizeof(int)];
int * begin(){
//implictly created array of int inside the buffer
//nevertheless to get a pointer to this array,
//std::launder is necessary as the buffer is not
//pointer inconvertible with that array
return *std::launder (reinterpret_cast <int(*)[2]>(&buffer));
}
create_int(std::size_t index, int value){
new (begin()+index) auto{value};
}
};

reinterpret_cast vs strict aliasing

I was reading about strict aliasing, but its still kinda foggy and I am never sure where is the line of defined / undefined behaviour. The most detailed post i found concentrates on C. So it would be nice if you could tell me if this is allowed and what has changed since C++98/11/...
#include <iostream>
#include <cstring>
template <typename T> T transform(T t);
struct my_buffer {
char data[128];
unsigned pos;
my_buffer() : pos(0) {}
void rewind() { pos = 0; }
template <typename T> void push_via_pointer_cast(const T& t) {
*reinterpret_cast<T*>(&data[pos]) = transform(t);
pos += sizeof(T);
}
template <typename T> void pop_via_pointer_cast(T& t) {
t = transform( *reinterpret_cast<T*>(&data[pos]) );
pos += sizeof(T);
}
};
// actually do some real transformation here (and actually also needs an inverse)
// ie this restricts allowed types for T
template<> int transform<int>(int x) { return x; }
template<> double transform<double>(double x) { return x; }
int main() {
my_buffer b;
b.push_via_pointer_cast(1);
b.push_via_pointer_cast(2.0);
b.rewind();
int x;
double y;
b.pop_via_pointer_cast(x);
b.pop_via_pointer_cast(y);
std::cout << x << " " << y << '\n';
}
Please dont pay too much attention to a possible out-of-bounds access and the fact that maybe there is no need to write something like that. I know that char* is allowed to point to anything, but I also have a T* that points to a char*. And maybe there is something else I am missing.
Here is a complete example also including push/pop via memcpy, which afaik isn't affected by strict aliasing.
TL;DR: Does the above code exhibit undefined behaviour (neglecting a out-of-bound acces for the moment), if yes, why? Did anything change with C++11 or one of the newer standards?

Aliasing is a situation when two entities refer to the same object. It may be either references or pointers.
int x;
int* p = &x;
int& r = x;
// aliases: x, r и *p refer to same object.
It's important for compiler to expect that if a value was written using one name it would be accessible through another.
int foo(int* a, int* b) {
*a = 0;
*b = 1;
return *a;
// *a might be 0, might be 1, if b points at same object.
// Compiler can't short-circuit this to "return 0;"
}
Now if pointers are of unrelated types, there is no reason for compiler to expect that they point at same address. This is the simplest UB:
int foo( float *f, int *i ) {
*i = 1;
*f = 0.f;
return *i;
}
int main() {
int a = 0;
std::cout << a << std::endl;
int x = foo(reinterpret_cast<float*>(&a), &a);
std::cout << a << "\n";
std::cout << x << "\n"; // Surprise?
}
// Output 0 0 0 or 0 0 1 , depending on optimization.
Simply put, strict aliasing means that compiler expects names of unrelated types refer to object of different type, thus located in separate storage units. Because addresses used to access those storage units are de-facto same, result of accessing stored value is undefined and usually depends on optimization flags.
memcpy() circumvents that by taking the address, by pointer to char, and makes copy of data stored, within code of library function.
Strict aliasing applies to union members, which described separately, but reason is same: writing to one member of union doesn't guarantee the values of other members to change. That doesn't apply to shared fields in beginning of struct stored within union. Thus, type punning by union is prohibited. (Most compilers do not honor this for historical reasons and convenience of maintaining legacy code.)
From 2017 Standard: 6.10 Lvalues and rvalues
8 If a program attempts to access the stored value of an object
through a glvalue of other than one of the following types the
behavior is undefined
(8.1) — the dynamic type of the object,
(8.2) — a cv-qualified version of the dynamic type of the object,
(8.3) — a type similar (as defined in 7.5) to the dynamic type of the
object,
(8.4) — a type that is the signed or unsigned type corresponding to
the dynamic type of the object,
(8.5) — a type that is the signed or unsigned type corresponding to a
cv-qualified version of the dynamic type of the object,
(8.6) — an aggregate or union type that includes one of the
aforementioned types among its elements or nonstatic data members
(including, recursively, an element or non-static data member of a
subaggregate or contained union),
(8.7) — a type that is a (possibly cv-qualified) base class type of
the dynamic type of the object,
(8.8) — a char, unsigned char, or std::byte type.
In 7.5
1 A cv-decomposition of a type T is a sequence of cvi and Pi such that T is “cv0 P0 cv1 P1 · · · cvn−1 Pn−1 cvn U” for n > 0, where each
cvi is a set of cv-qualifiers (6.9.3), and each Pi is “pointer to”
(11.3.1), “pointer to member of class Ci of type” (11.3.3), “array of
Ni”, or “array of unknown bound of” (11.3.4). If Pi designates an
array, the cv-qualifiers cvi+1 on the element type are also taken as
the cv-qualifiers cvi of the array. [ Example: The type denoted by the
type-id const int ** has two cv-decompositions, taking U as “int” and
as “pointer to const int”. —end example ] The n-tuple of cv-qualifiers
after the first one in the longest cv-decomposition of T, that is,
cv1, cv2, . . . , cvn, is called the cv-qualification signature of T.
2 Two types T1 and T2 are similar if they have cv-decompositions with
the same n such that corresponding Pi components are the same and the
types denoted by U are the same.
Outcome is: while you can reinterpret_cast the pointer to a different, unrelated and not similar type, you can't use that pointer to access stored value:
char* pc = new char[100]{1,2,3,4,5,6,7,8,9,10}; // Note, initialized.
int* pi = reinterpret_cast<int*>(pc); // no problem.
int i = *pi; // UB
char* pc2 = reinterpret_cast<char*>(pi+2); // *(pi+2) would be UB
char c = *pc2; // no problem, unless increment didn't put us beyond array bound.
// c equals to 9
'reinterpret_cast' doesn't create objects. To dereference a pointer at a non-existing object is Undefined Behaviour, so you can't use dereferenced result of cast for writing if class it points to wasn't trivial.

I know that char* is allowed to point to anything, but I also have a T* that points to a char*.
Right, and that is a problem. While the pointer cast itself has defined behaviour, using it to access a non-existent object of type T is not.
Unlike C, C++ does not allow impromptu creation of objects*. You cannot simply assign to some memory location as type T and have an object of that type be created, you need an object of that type to be there already. This requires placement new. Previous standards were ambiguous on it, but currently, per [intro.object]:
1 [...] An object is created by a definition (6.1), by a new-expression (8.3.4), when implicitly changing the active member of a union (12.3), or when a temporary object is created (7.4, 15.2). [...]
Since you are not doing any of these things, no object is created.
Furthermore, C++ does not implicitly consider pointers to different object at the same address as equivalent. Your &data[pos] computes a pointer to a char object. Casting it to T* does not make it point to any T object residing at that address, and dereferencing that pointer has undefined behaviour. C++17 adds std::launder, which is a way to let the compiler know that you want to access a different object at that address than what you have a pointer to.
When you modify your code to use placement new and std::launder, and ensure you have no misaligned accesses (I presume you left that out for brevity), your code will have defined behaviour.
* There is discussion on allowing this in a future version of C++.

Short answer:
You may not do this: *reinterpret_cast<T*>(&data[pos]) = until there has been an object of type T constructed at the pointed-to address. Which you can accomplish by placement new.
Even then, you might need to use std::launder as for C++17 and later, since you access the created object (of type T) through a pointer &data[pos] of type char*.
"Direct" reinterpret_cast is allowed only in some special cases, e.g., when T is std::byte, char, or unsigned char.
Before C++17 I would use the memcpy-based solution. Compiler will likely optimize away any unnecessary copies.

Why declare a constant pointer using the const keyword when the reference (const pointer) is available?

For example:
I could make a constant pointer, which points to an object that I can change through my pointer. The pointer cannot be reassigned:
MyObj const *ptrObj = MyObj2
Why would I use this over:
MyObj &ptrObj = MyObj2

What you have there isn't a const pointer, it's a pointer to a const object - that is, the pointer can be changed but the object can't. A const pointer would be:
MyObj *const ptrObj = &MyObj2;
As to why you might prefer it over a reference, you might want the flexibility of using the NULL special value for something - you don't get that with a reference.

You got it wrong. What you have is a mutable pointer to a constant object:
T const * p;
p = 0; // OK, p isn't const
p->mutate(); // Error! *p is const
T const & r = *p; // "same thing"
What you really want is a constant pointer to mutable object:
T * const p = &x; // OK, cannot change p
T & r = x; // "same thing"
p->mutate(); // OK, *p is mutable
Indeed, references are morally equivalent to constant pointers, i.e. T & vs T * const, and the constant version T const & vs T const * const.
If you insist on getting some advice, then I'd say, "don't use pointers".

The important difference between a pointer and a reference is how many objects they may refer to. A reference always refers to exactly one object. A pointer may refer to zero (when the pointer is null), one (when the pointer was assigned the location of a single object) or n objects (when the pointer was assigned to some point inside an array).
The ability of pointers to refer to 0 to n objects means that a pointer is more flexible in what it can represent. When the extra flexibility of a pointer is not necessary it is generally better to use a reference. That way someone reading your code doesn't have to work out whether the pointer refers to zero, one or n objects.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js