Can a std::array alias a fragment of a larger array? - c++

Suppose we have a pointer T* ptr; and ptr, ptr+1, … ptr+(n-1) all refer to valid objects of type T.
Is it possible to access them as if they were an STL array? Or does the following code:
std::array<T,n>* ay = (std::array<T,n>*) ptr
invoke undefined behaviour?

Yes, its an Undefined Behavior, a classic one...
First, understand that what you just did:
std::array<T,n>* ay = (std::array<T,n>*) ptr
can be translated as:
using Arr = std::array<T,n>;
std::array<T,n>* ay = reinterpret_cast<Arr*>( const_cast<TypeOfPtr>(ptr));
You've not just casted away all, const and volatile qualification but also casted the type. See this answer: https://stackoverflow.com/a/103868/1621391 ...indiscriminately casting away cv qualifications can also lead to UB.
Secondly, It is undefined behavior to access an object through a pointer that was casted from an unrelated type. See the strict aliasing rule (Thanks zenith). Therefore any read or write access through the pointer ay is undefined. If you are extremely lucky, the code should crash instantly. If it works, evil days are awaiting you....
Note that std::array is not and will never be the same as anything that isn't std::array.
Just to add... In the working draft of the C++ standard, it lists out explicit conversion rules. (you can read them) and has a clause stating that
.....
5.4.3: Any type conversion not mentioned below and not explicitly defined by
the user ([class.conv]) is ill-formed.
.....
I suggest you cook up your own array_view (hopefully coming in C++17). Its really easy. Or, if you want some ownership, you can cook up a simple one like this:
template<typename T>
class OwnedArray{
T* data_ = nullptr;
std::size_t sz = 0;
OwnedArray(T* ptr, std::size_t len) : data_(ptr), sz(len) {}
public:
static OwnedArray own_from(T* ptr, std::size_t len)
{ return OwnedArray(ptr, len); }
OwnedArray(){}
OwnedArray(OwnedArray&& o)
{ data_ = o.data_; sz = o.sz; o.data_=nullptr; o.sz=0; }
OwnedArray& operator = (OwnedArray&& o)
{ delete[] data_; data_ = o.data_; sz = o.sz; o.data_=nullptr; o.sz=0; }
OwnedArray(const OwnedArray& o) = delete;
OwnedArray& operator = (const OwnedArray& o) = delete;
~OwnedArray(){ delete[] data_; }
std::size_t size() const { return sz; }
T* data() return { data_; }
T& operator[] (std::size_t idx) { return data_[idx]; }
};
...and you can roll out more member functions/const qualifications as you like. But this has caveats... The pointer must have been allocated the through new T[len]
Thus you can use it in your example like this:
auto ay = OwnedArray<decltype(*ptr)>::own_from(ptr, ptr_len);

Yes, this invokes undefined behaviour. Generally you can't cast pointers to unrelated types between each other.
The code is no different from
std::string str;
std::array<double,10>* arr = (std::array<double,10>*)(&str);
Explanation: Standard does not provide any guarantee for any compatibility between std::array<T,n> and T*. It is simply not there. It doesn't say that std::array is trivial type either. Absent such guarantees, any conversion between T* and std::array<T,n> is undefined behavior on the same scale as conversion between pointers to any unrelated types.
I also fail to see what is the benefit of accessing already constructed dynamic array as an std::array.
P.S. Usual disclaimer. Cast, on it's own, is always 100% fine. It is indirection of resulted pointer which triggers the fireworks - but this part is omited for simplicty.

I'm answering the first question here, as the second one has already been treated in the other answers:
Recap: you ...
have a pointer T* ptr; and ptr, ptr+1, … ptr+(n-1) all refer to valid objects of type T.
And you ask whether it is ...
possible to access them as if they were an STL array?
Answer: This is no problem -- but it works differently as you estimated in your code example:
std::array<T*, N> arr;
for(int i = 0; i<N; ++i)
{
arr[i] = ptr + i;
}
Now you can use the array-elements as if they were the original pointers. And there is no undefined behaviour anywhere.

Related

How to cast from const void* in a constexpr expression?

I'm trying to reimplement memchr as constexpr (1). I haven't expected issues as I have already successfully done the same thing with strchr which is very simmilar.
However both clang and gcc refuse to cast const void* to anything else within constexpr function, which prevents me to access the actual values.
I understand that working with void* in constexpr function is weird, as we cannot do malloc and there is no way to specify arbitrary data as literal values. I'm doing this basically as a part of an excercise to rewrite as much as I can from as constexpr (2).
Still I'd like to know why this is not allowed and if there is any way around it.
Thank you!
(1) My implementation of memchr:
constexpr void const *memchr(const void *ptr, int ch, size_t count) {
const auto block_address = static_cast<const uint8_t *>(ptr);
const auto needle = static_cast<uint8_t>(ch);
for (uintptr_t pos{0}; pos < count; ++pos) {
auto byte_address = block_address + pos;
const uint8_t value = *byte_address;
if (needle == value) {
return static_cast<void const *>(byte_address);
}
}
return nullptr;
}
(2) The entire project on Github: https://github.com/jlanik/constexprstring
No, it is impossible to use void* in such a way in constant expressions. Casts from void* to other object pointer types are forbidden in constant expressions. reinterpret_cast is forbidden as well.
This is probably intentional to make it impossible to access the object representation at compile-time.
You cannot have a memchr with its usual signature at compile-time.
The best that I think you can do is write the function for pointers to char and its cv-qualified versions, as well as std::byte (either as overloads or as template), instead of void*.
For pointers to objects of other types it is going to be tricky in some cases and impossible in most cases to implement the exact semantics of memchr.
While I am not certain that it is possible, maybe, in a templated version of memchr, one can read the underlying bytes of the objects passed-by-pointer via a std::bit_cast into a struct containing a std::byte/unsigned char array of appropriate size.

Is this strict aliasing violation? Can any type pointer alias a char pointer?

I'm still struggling to understand what's allowed and not allowed with strict aliasing. With this concrete example is it violation of strict aliasing rule? If not, why? Is it because I placement new a different type into a char* buffer?
template <typename T>
struct Foo
{
struct ControlBlock { unsigned long long numReferences; };
Foo()
{
char* buffer = new char[sizeof(T) + sizeof(ControlBlock)];
// Construct control block
new (buffer) ControlBlock{};
// Construct the T after the control block
this->ptr = buffer + sizeof(ControlBlock);
new (this->ptr) T{};
}
char* ptr;
T* get() {
// Here I cast the char* to T*.
// Is this OK because T* can alias char* or because
// I placement newed a T at char*
return (T*)ptr;
}
};
For the record, a void* can alias any other type pointer, and any type pointer can alias a void*. A char* can alias any type pointer, but is the reverse true? Can any type alias a char* assuming the alignment is correct? So is the following allowed?
char* buffer = (char*)malloc(16);
float* pFloat = buffer;
*pFloat = 6; // Can any type pointer alias a char pointer?
// If the above is illegal, then how about:
new (pFloat) float; // Placement new construct a float at pointer
*pFloat = 7; // What about now?
Once I've assigned char* buffer pointer to the new allocation, in order to use it as a float buffer do I need to loop through and placement new a float at each place? If I had not assigned the allocation to a char* in the first place, but a float* to begin with, I'd be able to use it immediately as a float buffer, right?
Strict aliasing means that to dereference a T* ptr, there must be a T object at that address, alive obviously. Effectively this means you cannot naively bit-cast between two incompatible types and also that a compiler can assume that no two pointers of incompatible types point to the same location.
The exception is unsigned char , char and std::byte, meaning you can reinterpret cast any object pointer to a pointer of these 3 types and dereference it.
(T*)ptr; is valid because at ptr there exists a T object. That is all that is required, it does not matter how you got that pointer*, through how many casts it went. There are some more requirements when T has constant members but that has to do more with placement new and object resurrection - see this answer if you are interested.
*It does matter even in case of no const members, probably, not sure, relevant question . #eerorika 's answer is more correct to suggest std::launder or assigning from the placement new expression.
For the record, a void* can alias any other type pointer, and any type pointer can alias a void*.
That is not true, void is not one of the three allowed types. But I assume you are just misinterpreting the word "alias" - strict aliasing only applies when a pointer is dereferenced, you are of course free to have as many pointers pointing to wherever you want as long as you do not dereference them. Since void* cannot be dereferenced, it's a moo point.
Addresing your second example
char* buffer = (char*)malloc(16); //OK
// Assigning pointers is always defined the rules only say when
// it is safe to dereference such pointer.
// You are missing a cast here, pointer cannot be casted implicitly in C++, C produces a warning only.
float* pFloat = buffer;
// -> float* pFloat =reinterpret_cast<float*>(buffer);
// NOT OK, there is no float at `buffer` - violates strict aliasing.
*pFloat = 6;
// Now there is a float
new (pFloat) float;
// Yes, now it is OK.
*pFloat = 7;
Is this strict aliasing violation?
Yes.
Can any type pointer alias a char pointer?
No.
You can launder the pointer:
T* get() {
return std::launder(reinterpret_cast<T*>(ptr)); // OK
}
Or, you could store the result of the placement new:
Foo()
{
...
this->ptr = new (buffer + sizeof(ControlBlock)) T{};
}
T* ptr;
T* get() {
return ptr; // OK
}
do I need to loop through and placement new a float at each place
Not since the proposal P0593R6 was accepted into the language (C++20). Prior to that, placement-new was required by the standard. You don't necessarily have to write that loop yourself since there are function templates for that in the standard library: std::uninitialized_fill_n, uninitialized_default_construct_n etc. Also, you can rest assured that a decent optimiser will compile such loop to zero instructions.
constexpr std::size_t N = 4;
float* pFloat = static_cast<float*>(malloc(N * sizeof(float)));
// OK since P0593R6, C++20
pFloat[0] = 6;
// OK prior to P0593R6, C++20 (to the extent it can be OK)
std::uninitialized_default_construct_n(pFloat, N);
pFloat[0] = 7;
// don't forget
free(pFloat);
P.S. Don't use std::malloc in C++, unless you need it for interacting with C API that requires it (which is a somewhat rare requirement even in C). I also recommend against reusal of new char[] buffer as it is unnecessary for the demonstrated purpose. Instead, use the operator ::new which allocates storage without creating objects (even trivial ones). Or even better, since you already have a template, let the user of the template provide an allocator of their own to make your template more generally useful.

Is it legal c++ to use reference as array/pointer?

My team (including myself) is new to C++. A piece of our new development is a C++ function that needs to interface with a C function that takes an array as input. Something like the following construct was made to achieve this:
#include "stdio.h"
void the_c_function(double *array, int len)
{
for (int i = 0; i < len; i++)
{
printf("%d: %g\n", i, array[i]);
}
}
void the_cpp_wrapper(double& dref, int len)
{
the_c_function(&dref, len);
}
int main()
{
const int LEN = 4;
double dbl_array[LEN] = { 3,4,5,6 };
the_cpp_wrapper(dbl_array[0], LEN);
return 0;
}
When compiled, this works as expected: it prints the contents of the array:
0: 3
1: 4
2: 5
3: 6
But this feels hardly legal to me or at the best something that should be discouraged.
Is this legal C++, i.e. is it guaranteed that a pointer to a reference of an array points to the original array?
Is there any reason why one would do it like this instead of using a pointer directly instead of using the reference as inbetween?
My team (including myself) is new to C++. ...
[...]
... something that should be discouraged.
You should get in the habit now of using the Standard C++ Library, in your case the best choice is std::vector:
#include <stdio.h>
#include <stdlib>
#include <vector>
void the_c_function(const double *array, size_t len) {/*...*/}
void the_cpp_wrapper(const std::vector<double>& v)
{
the_c_function(v.data(), v.size());
}
// ----------------------------
int main()
{
const std::vector<double> dbl_array { 3,4,5,6 };
the_cpp_wrapper(dbl_array);
return EXIT_SUCCESS;
}
You also should be clearer about const double* vs. double*, C++ intentionally wants you to use a much more verbose const_cast<double*> to cast-away const-ness.
If you want to go "all in" with C++, you can make the_cpp_wrapper() a bit more generic with a template:
template<typename TSpan>
void the_cpp_wrapper(const TSpan& v)
{
the_c_function(v.data(), v.size());
}
With this code, you can pass anything to the_cpp_wrapper that has data() and size() methods. (Note that TSpan "can" be std::span<int> which could cause some obscure compiler errors; there are ways to fix that, but it's more C++.)
Not directly related, but you'll probably find std::span useful too.
The question of code readability aside,
is it guaranteed that a pointer to a reference of an array points to the original array?
Yes, see § 5.5 Expressions:
If an expression initially has the type “reference to T” ([dcl.ref], [dcl.init.ref]), the type is adjusted to T prior to any further analysis. The expression designates the object or function denoted by the reference, and the expression is an lvalue or an xvalue, depending on the expression.
And §8.3.2 References:
4   It is unspecified whether or not a reference requires storage.
5   There shall be no references to references, no arrays of references, and no pointers to references.
In other words, an "address of a reference" isn't a thing; given double& dref, taking an address &dref will give the address of the original element inside the array.
Yes it is legal and it is guaranteed to reference to the original element in the array based on your code.
Some people like to design interface to force the caller to pass by reference to avoid checking whether the argument is a null pointer, which might be required when passing by pointer.

Array-like container implementation vs strict aliasing

I'm trying to implement an array-like container with some special requirements and a subset of std::vector interface. Here is a code excerpt:
template<typename Type>
class MyArray
{
public:
explicit MyArray(const uint32_t size) : storage(new char[size * sizeof(Type)]), maxElements(size) {}
MyArray(const MyArray&) = delete;
MyArray& operator=(const MyArray&) = delete;
MyArray(MyArray&& op) { /* some code */ }
MyArray& operator=(MyArray&& op) { /* some code */ }
~MyArray() { if (storage != nullptr) delete[] storage; /* No explicit destructors. Let it go. */ }
Type* data() { return reinterpret_cast<Type*>(storage); }
const Type* data() const { return reinterpret_cast<const Type*>(storage); }
template<typename... Args>
void emplace_back(Args&&... args)
{
assert(current < maxElements);
new (storage + current * sizeof(Type)) Type(std::forward<Args>(args)...);
++current;
}
private:
char* storage = nullptr;
uint32_t maxElements = 0;
uint32_t current = 0;
};
It works perfectly well on my system, but dereferencing a pointer returned by data seems to violate strict aliasing rules. It's also a case for naive implementation of subscript operator, iterators, etc.
So what is a proper way to implement containers backed by arrays of char without breaking strict aliasing rules? As far as I understand, using std::aligned_storage will only provide a proper alignment, but will not save the code from being broken by compiler optimizations which rely on strict aliasing. Also, I don't want to use -fno-strict-aliasing and similar flags due to performance considerations.
For example, consider subscript operator (nonconstant for brevity), which is a classical code snippet from articles about UB in C++:
Type& operator[](const uint32_t idx)
{
Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr)); // Cast is OK.
return *ptr; // Dereference is UB.
}
What is a proper way to implement it without any risk to find my program broken? How is it implemented is standard containers? Is there any cheating with non-documented compiler intrinsics in all compilers?
Sometimes I see code with two static casts through void* instead of one reinterpret cast:
Type* ptr = static_cast<Type*>(static_cast<void*>(storage + idx * sizeof(ptr)));
How is it better than reinterpret cast? As to me, it does not solve any problems, but looks overcomplicated.
but dereferencing a pointer returned by data seems to violate strict aliasing rules
I disagree.
Both char* storage and a pointer returned by data() point to the same region of memory.
This is irrelevant. Multiple pointers pointing to same object doesn't violate aliasing rules.
Moreover, subscript operator will ... dereference a pointer of incompatible type, which is UB.
But the object isn't of incompatible type. In emplace_back, you use placement new to construct objects of Type into the memory. Assuming no code path can avoid this placement new and therefore assuming that the subscript operator returns a pointer which points at one of these objects, then dereferencing the pointer of Type* is well defined, because it points to an object of Type, which is compatible.
This is what is relevant for pointer aliasing: The type of the object in memory, and the type of the pointer that is dereferenced. Any intermediate pointer that the dereferenced pointer was converted from is irrelevant to aliasing.
Note that your destructor does not call the detructor of objects constructed within storage, so if Type isn't trivially destructable, then the behaviour is undefined.
Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr));
The sizeof is wrong. What you need is sizeof(Type), or sizeof *ptr. Or more simply
auto ptr = reinterpret_cast<Type*>(storage) + idx;
Sometimes I see code with two static casts through void* instead of one reinterpret cast: How is it better than reinterpret cast?
I can't think of any situation where the behaviour would be different.

Find a pointer in a std::set based on a const pointer [duplicate]

Is there any good way to obviate the const_cast below, while keeping const correctness?
Without const_cast the code below doesn't compile. set::find gets a const reference to the set's key type, so in our case it guarantees not to change the passed-in pointer value; however, nothing it guaranteed about not changing what the pointer points to.
class C {
public:
std::set<int*> m_set;
bool isPtrInSet(const int* ptr) const
{
return m_set.find(const_cast<int*>(ptr)) != m_set.end();
}
};
Yes.
In C++14, you can use your own comparator that declares int const* as transparent. This would enable the template overload of find() that can compare keys against arbitrary types. See this related SO question. And here's Jonathan Wakely's explanation.
I want to explain the underlying logic of why this is impossible.
Suppose set<int*>::find(const int*) would be legitimate. Then you could do the following:
set<int*> s;
const int* p_const;
// fill s and p
auto it = s.find(p_const);
int* p = *it;
Hey presto! You transformed const int* to int* without performing const_cast.
Is there any good way to obviate the const_cast below, while keeping const correctness?
I am not sure whether what I am going to suggest qualifies as a "good way". However, you can avoid the const_cast if you don't mind iterating over the contents of the set yourself. Keep in mind that this transforms what could be an O(log(N)) operation to an O(N) operation.
bool isPtrInSet(const int* ptr) const
{
for ( auto p : m_set )
{
if ( p == ptr )
{
return true;
}
}
return false;
}