Array-like container implementation vs strict aliasing

Array-like container implementation vs strict aliasing - c++

I'm trying to implement an array-like container with some special requirements and a subset of std::vector interface. Here is a code excerpt:
template<typename Type>
class MyArray
{
public:
explicit MyArray(const uint32_t size) : storage(new char[size * sizeof(Type)]), maxElements(size) {}
MyArray(const MyArray&) = delete;
MyArray& operator=(const MyArray&) = delete;
MyArray(MyArray&& op) { /* some code */ }
MyArray& operator=(MyArray&& op) { /* some code */ }
~MyArray() { if (storage != nullptr) delete[] storage; /* No explicit destructors. Let it go. */ }
Type* data() { return reinterpret_cast<Type*>(storage); }
const Type* data() const { return reinterpret_cast<const Type*>(storage); }
template<typename... Args>
void emplace_back(Args&&... args)
{
assert(current < maxElements);
new (storage + current * sizeof(Type)) Type(std::forward<Args>(args)...);
++current;
}
private:
char* storage = nullptr;
uint32_t maxElements = 0;
uint32_t current = 0;
};
It works perfectly well on my system, but dereferencing a pointer returned by data seems to violate strict aliasing rules. It's also a case for naive implementation of subscript operator, iterators, etc.
So what is a proper way to implement containers backed by arrays of char without breaking strict aliasing rules? As far as I understand, using std::aligned_storage will only provide a proper alignment, but will not save the code from being broken by compiler optimizations which rely on strict aliasing. Also, I don't want to use -fno-strict-aliasing and similar flags due to performance considerations.
For example, consider subscript operator (nonconstant for brevity), which is a classical code snippet from articles about UB in C++:
Type& operator[](const uint32_t idx)
{
Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr)); // Cast is OK.
return *ptr; // Dereference is UB.
}
What is a proper way to implement it without any risk to find my program broken? How is it implemented is standard containers? Is there any cheating with non-documented compiler intrinsics in all compilers?
Sometimes I see code with two static casts through void* instead of one reinterpret cast:
Type* ptr = static_cast<Type*>(static_cast<void*>(storage + idx * sizeof(ptr)));
How is it better than reinterpret cast? As to me, it does not solve any problems, but looks overcomplicated.

but dereferencing a pointer returned by data seems to violate strict aliasing rules
I disagree.
Both char* storage and a pointer returned by data() point to the same region of memory.
This is irrelevant. Multiple pointers pointing to same object doesn't violate aliasing rules.
Moreover, subscript operator will ... dereference a pointer of incompatible type, which is UB.
But the object isn't of incompatible type. In emplace_back, you use placement new to construct objects of Type into the memory. Assuming no code path can avoid this placement new and therefore assuming that the subscript operator returns a pointer which points at one of these objects, then dereferencing the pointer of Type* is well defined, because it points to an object of Type, which is compatible.
This is what is relevant for pointer aliasing: The type of the object in memory, and the type of the pointer that is dereferenced. Any intermediate pointer that the dereferenced pointer was converted from is irrelevant to aliasing.
Note that your destructor does not call the detructor of objects constructed within storage, so if Type isn't trivially destructable, then the behaviour is undefined.
Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr));
The sizeof is wrong. What you need is sizeof(Type), or sizeof *ptr. Or more simply
auto ptr = reinterpret_cast<Type*>(storage) + idx;
Sometimes I see code with two static casts through void* instead of one reinterpret cast: How is it better than reinterpret cast?
I can't think of any situation where the behaviour would be different.

Related

How to cast from const void* in a constexpr expression?

I'm trying to reimplement memchr as constexpr (1). I haven't expected issues as I have already successfully done the same thing with strchr which is very simmilar.
However both clang and gcc refuse to cast const void* to anything else within constexpr function, which prevents me to access the actual values.
I understand that working with void* in constexpr function is weird, as we cannot do malloc and there is no way to specify arbitrary data as literal values. I'm doing this basically as a part of an excercise to rewrite as much as I can from as constexpr (2).
Still I'd like to know why this is not allowed and if there is any way around it.
Thank you!
(1) My implementation of memchr:
constexpr void const *memchr(const void *ptr, int ch, size_t count) {
const auto block_address = static_cast<const uint8_t *>(ptr);
const auto needle = static_cast<uint8_t>(ch);
for (uintptr_t pos{0}; pos < count; ++pos) {
auto byte_address = block_address + pos;
const uint8_t value = *byte_address;
if (needle == value) {
return static_cast<void const *>(byte_address);
}
}
return nullptr;
}
(2) The entire project on Github: https://github.com/jlanik/constexprstring

No, it is impossible to use void* in such a way in constant expressions. Casts from void* to other object pointer types are forbidden in constant expressions. reinterpret_cast is forbidden as well.
This is probably intentional to make it impossible to access the object representation at compile-time.
You cannot have a memchr with its usual signature at compile-time.
The best that I think you can do is write the function for pointers to char and its cv-qualified versions, as well as std::byte (either as overloads or as template), instead of void*.
For pointers to objects of other types it is going to be tricky in some cases and impossible in most cases to implement the exact semantics of memchr.
While I am not certain that it is possible, maybe, in a templated version of memchr, one can read the underlying bytes of the objects passed-by-pointer via a std::bit_cast into a struct containing a std::byte/unsigned char array of appropriate size.

Is this strict aliasing violation? Can any type pointer alias a char pointer?

I'm still struggling to understand what's allowed and not allowed with strict aliasing. With this concrete example is it violation of strict aliasing rule? If not, why? Is it because I placement new a different type into a char* buffer?
template <typename T>
struct Foo
{
struct ControlBlock { unsigned long long numReferences; };
Foo()
{
char* buffer = new char[sizeof(T) + sizeof(ControlBlock)];
// Construct control block
new (buffer) ControlBlock{};
// Construct the T after the control block
this->ptr = buffer + sizeof(ControlBlock);
new (this->ptr) T{};
}
char* ptr;
T* get() {
// Here I cast the char* to T*.
// Is this OK because T* can alias char* or because
// I placement newed a T at char*
return (T*)ptr;
}
};
For the record, a void* can alias any other type pointer, and any type pointer can alias a void*. A char* can alias any type pointer, but is the reverse true? Can any type alias a char* assuming the alignment is correct? So is the following allowed?
char* buffer = (char*)malloc(16);
float* pFloat = buffer;
*pFloat = 6; // Can any type pointer alias a char pointer?
// If the above is illegal, then how about:
new (pFloat) float; // Placement new construct a float at pointer
*pFloat = 7; // What about now?
Once I've assigned char* buffer pointer to the new allocation, in order to use it as a float buffer do I need to loop through and placement new a float at each place? If I had not assigned the allocation to a char* in the first place, but a float* to begin with, I'd be able to use it immediately as a float buffer, right?

Strict aliasing means that to dereference a T* ptr, there must be a T object at that address, alive obviously. Effectively this means you cannot naively bit-cast between two incompatible types and also that a compiler can assume that no two pointers of incompatible types point to the same location.
The exception is unsigned char , char and std::byte, meaning you can reinterpret cast any object pointer to a pointer of these 3 types and dereference it.
(T*)ptr; is valid because at ptr there exists a T object. That is all that is required, it does not matter how you got that pointer*, through how many casts it went. There are some more requirements when T has constant members but that has to do more with placement new and object resurrection - see this answer if you are interested.
*It does matter even in case of no const members, probably, not sure, relevant question . #eerorika 's answer is more correct to suggest std::launder or assigning from the placement new expression.
For the record, a void* can alias any other type pointer, and any type pointer can alias a void*.
That is not true, void is not one of the three allowed types. But I assume you are just misinterpreting the word "alias" - strict aliasing only applies when a pointer is dereferenced, you are of course free to have as many pointers pointing to wherever you want as long as you do not dereference them. Since void* cannot be dereferenced, it's a moo point.
Addresing your second example
char* buffer = (char*)malloc(16); //OK
// Assigning pointers is always defined the rules only say when
// it is safe to dereference such pointer.
// You are missing a cast here, pointer cannot be casted implicitly in C++, C produces a warning only.
float* pFloat = buffer;
// -> float* pFloat =reinterpret_cast<float*>(buffer);
// NOT OK, there is no float at `buffer` - violates strict aliasing.
*pFloat = 6;
// Now there is a float
new (pFloat) float;
// Yes, now it is OK.
*pFloat = 7;

Is this strict aliasing violation?
Yes.
Can any type pointer alias a char pointer?
No.
You can launder the pointer:
T* get() {
return std::launder(reinterpret_cast<T*>(ptr)); // OK
}
Or, you could store the result of the placement new:
Foo()
{
...
this->ptr = new (buffer + sizeof(ControlBlock)) T{};
}
T* ptr;
T* get() {
return ptr; // OK
}
do I need to loop through and placement new a float at each place
Not since the proposal P0593R6 was accepted into the language (C++20). Prior to that, placement-new was required by the standard. You don't necessarily have to write that loop yourself since there are function templates for that in the standard library: std::uninitialized_fill_n, uninitialized_default_construct_n etc. Also, you can rest assured that a decent optimiser will compile such loop to zero instructions.
constexpr std::size_t N = 4;
float* pFloat = static_cast<float*>(malloc(N * sizeof(float)));
// OK since P0593R6, C++20
pFloat[0] = 6;
// OK prior to P0593R6, C++20 (to the extent it can be OK)
std::uninitialized_default_construct_n(pFloat, N);
pFloat[0] = 7;
// don't forget
free(pFloat);
P.S. Don't use std::malloc in C++, unless you need it for interacting with C API that requires it (which is a somewhat rare requirement even in C). I also recommend against reusal of new char[] buffer as it is unnecessary for the demonstrated purpose. Instead, use the operator ::new which allocates storage without creating objects (even trivial ones). Or even better, since you already have a template, let the user of the template provide an allocator of their own to make your template more generally useful.

std::launder use cases in C++20

[1]
Are there any cases in which the addition of p0593r6 into C++20 (§ 6.7.2.11 Object model [intro.object]) made std::launder not necessary, where the same use case in C++17 required std::launder, or are they completely orthogonal?
[2]
The example in the spec for [ptr::launder] is:
struct X { int n; };
const X *p = new const X{3};
const int a = p->n;
new (const_cast<X*>(p)) const X{5}; // p does not point to new object ([basic.life]) because its type is const
const int b = p->n; // undefined behavior
const int c = std::launder(p)->n; // OK
Another example is given by #Nicol Bolas in this SO answer, using a pointer that points to a valid storage but of a different type:
aligned_storage<sizeof(int), alignof(int)>::type data;
new(&data) int;
int *p = std::launder(reinterpret_cast<int*>(&data));
Are there other use cases, not related to allowing casting of two objects which are not transparently replaceable, for using std::launder?
Specifically:
Would reinterpret_cast from A* to B*, both are pointer-interconvertible, may require using std::launder in any case? (i.e. can two pointers be pointer-interconvertible and yet not be transparently replaceable? the spec didn't relate between these two terms).
Does reinterpret_cast from void* to T* require using std::launder?
Does the following code below require use of std::launder? If so, under which case in the spec does it fall to require that?
A struct with reference member, inspired by this discussion:
struct A {
constexpr A(int &x) : ref(x) {}
int &ref;
};
int main() {
int n1 = 1, n2 = 2;
A a { n1 };
a.~A();
new (&a) A {n2};
a.ref = 3; // do we need to launder somebody here?
std::cout << a.ref << ' ' << n1 << ' ' << n2 << std::endl;
}

Before C++17, a pointer with a given address and type always pointed to an object of that type located at that address, provided that the code respects the rules of [basic.life]. (see: Is a pointer with the right address and type still always a valid pointer since C++17?).
But in the C++17 standard added a new quality to a pointer value. This quality is not encode within the pointer type but qualifies directly the value, independently of the type (this is the case also of the traceability). It is described in [basic.compound]/3
Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value for that type, or
an invalid pointer value.
This quality of a pointer value has its own semantic (transition rules), and for the case of reinterpret_cast it is described in the next paragraph:
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast.
In [basic-life], we can find an other rule that describes how transitions this quality when an object storage is reused:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, [...]
As you can see the quality "pointer to an object" is attached to a specific object.
That means that in the variation bellow of the first example you give, the reinterpret_cast does not allow us not to use the pointer optimization barrier:
struct X { int n; };
const X *p = new const X{3};
const int a = p->n;
new (const_cast<X*>(p)) const X{5}; // p does not point to new object ([basic.life]) because its type is const
const int b = *reinterpret_cast <int*> (p); // undefined behavior
const int c = *std::launder(reinterpret_cast <int*> (p));
A reinterpret_cast is not a pointer optimization barrier: reinterpret_cast <int*>(p) points to the member of the destroyed object.
An other way to conceive it is that the "pointer to" quality is conserved by reinterpret_cast as long as the object are pointer inter-convertible or if its casted to void and then back to a pointer inter-convertible type. (See [exp.static_cast]/13). So reinterpret_cast <int*>(reinterpret_cast <void*>(p)) still points to the destroyed object.
For the last example you gives, the name a refers to a non const complete object, so the original a is transparently replaceable by the new object.
For the first question you ask: "Are there any cases in which the addition of p0593r6 into C++20 (§ 6.7.2.11 Object model [intro.object]) made std::launder not necessary, where the same use case in C++17 required std::launder, or are they completely orthogonal?"
Honestly, I have not been able to find any cases that where std::launder could compensate implict-lifetime objects. But I found an example were implicit-lifetime object makes std::launder usefull:
class my_buffer {
alignas(int) std::byte buffer [2*sizeof(int)];
int * begin(){
//implictly created array of int inside the buffer
//nevertheless to get a pointer to this array,
//std::launder is necessary as the buffer is not
//pointer inconvertible with that array
return *std::launder (reinterpret_cast <int(*)[2]>(&buffer));
}
create_int(std::size_t index, int value){
new (begin()+index) auto{value};
}
};

How can boost::asio::buffer_cast violate type-safety?

Comments are sprinkled throughout boost::asio that say this:
The boost::asio::buffer_cast function permits violations of type
safety, so uses of it in application code should be carefully
considered.
However, ultimately what the buffer interface boils down to is this:
struct buffer {
void *data;
friend void* cast_helper(const buffer& b);
};
void* cast_helper(const buffer& b) {
return b.data;
}
template <typename to_t>
to_t buffer_cast(const buffer& b) {
return static_cast<to_t>(cast_helper(b));
}
static_cast a void* to a pointer type is well-defined and considered the appropriate thing to do for void* data (see "Should I use static_cast or reinterpret_cast when casting a void* to whatever"). So what does it mean by violating type-safety?

Consider the following code:
char i = 2;
buffer b;
b.data = &i;
double *pd = buffer_cast<double*>(b);
*pd = 1.0;
This will compile correctly, but it obviously invokes undefined behaviour. It's no different really to:
char i = 2;
void *pv = &i;
double *pd = static_cast<double*>(pv);
*pd = 1.0;
In both the case of static_cast and buffer_cast a reviewer needs to look carefully at the code to make sure the cast is legal.
Using static_cast to convert a void* to a pointer type is only well defined if the void* was originally obtained from a pointer of that type or something similar (where "similar" includes some, but not all, base/derived relationships, and unsigned char vs plain char vs signed char, etc).

Can a std::array alias a fragment of a larger array?

Suppose we have a pointer T* ptr; and ptr, ptr+1, … ptr+(n-1) all refer to valid objects of type T.
Is it possible to access them as if they were an STL array? Or does the following code:
std::array<T,n>* ay = (std::array<T,n>*) ptr
invoke undefined behaviour?

Yes, its an Undefined Behavior, a classic one...
First, understand that what you just did:
std::array<T,n>* ay = (std::array<T,n>*) ptr
can be translated as:
using Arr = std::array<T,n>;
std::array<T,n>* ay = reinterpret_cast<Arr*>( const_cast<TypeOfPtr>(ptr));
You've not just casted away all, const and volatile qualification but also casted the type. See this answer: https://stackoverflow.com/a/103868/1621391 ...indiscriminately casting away cv qualifications can also lead to UB.
Secondly, It is undefined behavior to access an object through a pointer that was casted from an unrelated type. See the strict aliasing rule (Thanks zenith). Therefore any read or write access through the pointer ay is undefined. If you are extremely lucky, the code should crash instantly. If it works, evil days are awaiting you....
Note that std::array is not and will never be the same as anything that isn't std::array.
Just to add... In the working draft of the C++ standard, it lists out explicit conversion rules. (you can read them) and has a clause stating that
.....
5.4.3: Any type conversion not mentioned below and not explicitly defined by
the user ([class.conv]) is ill-formed.
.....
I suggest you cook up your own array_view (hopefully coming in C++17). Its really easy. Or, if you want some ownership, you can cook up a simple one like this:
template<typename T>
class OwnedArray{
T* data_ = nullptr;
std::size_t sz = 0;
OwnedArray(T* ptr, std::size_t len) : data_(ptr), sz(len) {}
public:
static OwnedArray own_from(T* ptr, std::size_t len)
{ return OwnedArray(ptr, len); }
OwnedArray(){}
OwnedArray(OwnedArray&& o)
{ data_ = o.data_; sz = o.sz; o.data_=nullptr; o.sz=0; }
OwnedArray& operator = (OwnedArray&& o)
{ delete[] data_; data_ = o.data_; sz = o.sz; o.data_=nullptr; o.sz=0; }
OwnedArray(const OwnedArray& o) = delete;
OwnedArray& operator = (const OwnedArray& o) = delete;
~OwnedArray(){ delete[] data_; }
std::size_t size() const { return sz; }
T* data() return { data_; }
T& operator[] (std::size_t idx) { return data_[idx]; }
};
...and you can roll out more member functions/const qualifications as you like. But this has caveats... The pointer must have been allocated the through new T[len]
Thus you can use it in your example like this:
auto ay = OwnedArray<decltype(*ptr)>::own_from(ptr, ptr_len);

Yes, this invokes undefined behaviour. Generally you can't cast pointers to unrelated types between each other.
The code is no different from
std::string str;
std::array<double,10>* arr = (std::array<double,10>*)(&str);
Explanation: Standard does not provide any guarantee for any compatibility between std::array<T,n> and T*. It is simply not there. It doesn't say that std::array is trivial type either. Absent such guarantees, any conversion between T* and std::array<T,n> is undefined behavior on the same scale as conversion between pointers to any unrelated types.
I also fail to see what is the benefit of accessing already constructed dynamic array as an std::array.
P.S. Usual disclaimer. Cast, on it's own, is always 100% fine. It is indirection of resulted pointer which triggers the fireworks - but this part is omited for simplicty.

I'm answering the first question here, as the second one has already been treated in the other answers:
Recap: you ...
have a pointer T* ptr; and ptr, ptr+1, … ptr+(n-1) all refer to valid objects of type T.
And you ask whether it is ...
possible to access them as if they were an STL array?
Answer: This is no problem -- but it works differently as you estimated in your code example:
std::array<T*, N> arr;
for(int i = 0; i<N; ++i)
{
arr[i] = ptr + i;
}
Now you can use the array-elements as if they were the original pointers. And there is no undefined behaviour anywhere.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js