In C++ 20 we're getting assume aligned, this would be very usefull for audio code where pointers to aligned blocks of floats are passed around all the time. Let's say we have the following span type:
template<typename T>
struct Signal
{
const T* data
size_t size;
};
How would one indicate that the data pointer in this struct is aligned by some constexpr integer? Is something like this already possible in C++ 20?
constexpr int SIMDAlignment = 16;
template<typename T>
struct Signal
{
aligned<SIMDAlignment> const T* data
size_t size;
};
It seems that the assume-aligned hint is a property of a particular pointer object, and it cannot be made a property of a pointer type. However, you might try to wrap that pointer by an (inline) getter function and use std::assume_aligned for its return value. For example, in my experiment, when I used the pointer returned by such a function, it was treated as "aligned" (pointing to aligned data) correctly by GCC:
double* f()
{
static double* data =
(double*)std::aligned_alloc(64, 1024 * sizeof(double));
return std::assume_aligned<64>(data);
}
void g()
{
double* a = f();
for (int i = 0; i < 1024; i++)
a[i] = 123.45;
}
In this case, the array was filled by vmovapd which requires aligned memory access.
On the contrary, when I changed:
return std::assume_aligned<64>(data);
to:
return data;
The generated assembly contained vmovupd which works with unaligned data.
Live demo: https://godbolt.org/z/d5aPPj — check the .L19 loop in both cases.
Related
I've taken over some code, and came across a weird reallocation of an array. This is a function from within an Array class (used by the JsonValue)
void reserve( uint32_t newCapacity ) {
if ( newCapacity > length + additionalCapacity ) {
newCapacity = std::min( newCapacity, length + std::numeric_limits<decltype( additionalCapacity )>::max() );
JsonValue *newPtr = new JsonValue[newCapacity];
if ( length > 0 ) {
memcpy( newPtr, values, length * sizeof( JsonValue ) );
memset( values, 0, length * sizeof( JsonValue ) );
}
delete[] values;
values = newPtr;
additionalCapacity = uint16_t( newCapacity - length );
}
}
I get the point of this; it is just allocating a new array, and doing a copy of the memory contents from the old array into the new array, then zero-ing out the old array's contents. I also know this was done in order to prevent calling destructors, and moves.
The JsonValue is a class with functions, and some data which is stored in a union (string, array, number, etc.).
My concern is whether this is actually defined behaviour or not. I know it works, and has not had a problem since we began using it a few months ago; but if its undefined then it doesn't mean it is going to keep working.
EDIT:
JsonValue looks something like this:
struct JsonValue {
// …
~JsonValue() {
switch ( details.type ) {
case Type::Array:
case Type::Object:
array.destroy();
break;
case Type::String:
delete[] string.buffer;
break;
default: break;
}
}
private:
struct Details {
Key key = Key::Unknown;
Type type = Type::Null; // (0)
};
union {
Array array;
String string;
EmbedString embedString;
Number number;
Details details;
};
};
Where Array is a wrapper around an array of JsonValues, String is a char*, EmbedString is char[14], Number is a union of int, unsigned int, and double, Details contains the type of value it holds. All values have 16-bits of unused data at the beginning, which is used for Details. Example:
struct EmbedString {
uint16_t : 16;
char buffer[14] = { 0 };
};
Whether this code has well-defined behavior basically depends on two things: 1) is JsonValue trivially-copyable and, 2) if so, are a bunch of all-zero Bytes a valid object representation for a JsonValue.
If JsonValue is trivially-copyable, then the memcpy from one array of JsonValues to another will indeed be equivalent to copying all the elements over [basic.types]/3. If all-zeroes is a valid object representation for a JsonValue, then the memset should be ok (I believe this actually falls into a bit of a grey-area with the current wording of the standard, but I believe at least the intention would be that this is fine).
I'm not sure why you'd need to "prevent calling destructors and moves", but overwriting objects with zeroes does not prevent destructors from running. delete[] values will call the destructurs of the array members. And moving the elements of an array of trivially-copyable type should compile down to just copying over the bytes anyways.
Furthermore, I would suggest to get rid of these String and EmbedString classes and simply use std::string. At least, it would seem to me that the sole purpose of EmbedString is to manually perform small string optimization. Any std::string implementation worth its salt is already going to do exactly that under the hood. Note that std::string is not guaranteed (and will often not be) trivially-copyable. Thus, you cannot simply replace String and EmbedString with std::string while keeping the rest of this current implementation.
If you can use C++17, I would suggest to simply use std::variant instead of or at least inside this custom JsonValue implementation as that seems to be exactly what it's trying to do. If you need some common information stored in front of whatever the variant value may be, just have a suitable member holding that information in front of the member that holds the variant value rather than relying on every member of the union starting with the same couple of members (which would only be well-defined if all union members are standard-layout types that keep this information in their common initial sequence [class.mem]/23).
The sole purpose of Array would seem to be to serve as a vector that zeroes memory before deallocating it for security reasons. If this is the case, I would suggest to just use an std::vector with an allocator that zeros memory before deallocating instead. For example:
template <typename T>
struct ZeroingAllocator
{
using value_type = T;
T* allocate(std::size_t N)
{
return reinterpret_cast<T*>(new unsigned char[N * sizeof(T)]);
}
void deallocate(T* buffer, std::size_t N) noexcept
{
auto ptr = reinterpret_cast<volatile unsigned char*>(buffer);
std::fill(ptr, ptr + N, 0);
delete[] reinterpret_cast<unsigned char*>(buffer);
}
};
template <typename A, typename B>
bool operator ==(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return true; }
template <typename A, typename B>
bool operator !=(const ZeroingAllocator<A>&, const ZeroingAllocator<B>&) noexcept { return false; }
and then
using Array = std::vector<JsonValue, ZeroingAllocator<JsonValue>>;
Note: I fill the memory via volatile unsigned char* to prevent the compiler from optimizing away the zeroing. If you need to support overaligned types, you can replace the new[] and delete[] with direct calls to ::operator new and ::operator delete (doing this will prevent the compiler from optimizing away allocations). Pre C++17, you will have to allocate a sufficiently large buffer and then manually align the pointer, e.g., using std::align…
void Manager::byteArrayToDoubleArray(byte ch[]) {
int counter = 0;
// temp array to break the byte array into size of 8 and read it
byte temp[64];
// double result values
double res[8];
int index = 0;
int size = (sizeof(ch) / sizeof(*ch));
for (int i = 0; i < size; i++) {
counter++;
temp[i] = ch[i];
if (counter % 8 == 0) {
res[index] = *reinterpret_cast<double * const>(temp);
index++;
counter = 0;
}
}
}
Here result would be a list of double values with count = 8.
Your problem is two things. You have some typos and misunderstanding. And the C++ standard is somewhat broken in this area.
I'll try to fix both.
First, a helper function called laundry_pods. It takes raw memory and "launders" it into an array of a type of your choice, so long as you pick a pod type:
template<class T, std::size_t N>
T* laundry_pods( void* ptr ) {
static_assert( std::is_pod<std::remove_cv_t<T>>{} );
char optimized_away[sizeof(T)*N];
std::memcpy( optimized_away, ptr , sizeof(T)*N );
T* r = ::new( ptr ) T[N];
assert( r == ptr );
std::memcpy( r, optimized_away, sizeof(T)*N );
return r;
}
now simply do
void Manager::byteArrayToDoubleArray(byte ch[]) {
double* pdouble = laundry_pods<double, 8>(ch);
}
and pdouble is a pointer to memory of ch interpreted as an array of 8 doubles. (It is not a copy of it, it interprets those bytes in-place).
While laundry_pods appears to copy the bytes around, both g++ and clang optimize it down into a binary noop. The seeming copying of bytes around is a way to get around aliasing restrictions and object lifetime rules in the C++ standard.
It relies on arrays of pod not having extra bookkeeping overhead (which C++ implementations are free to do; none do that I know of. That is what the non-static assert double-checks), but it returns a pointer to a real honest to goodness array of double. If you want to avoid that assumption, you could instead create each doulbe as a separate object. However, then they aren't an array, and pointer arithmetic over non-arrays is fraught as far as the standard is concerned.
The use of the term "launder" has to do with getting around aliasing and object lifetime requirements. The function does nothing at runtime, but in the C++ abstract machine it takes the memory and converts it into binary identical memory that is now a bunch of doubles.
The trick of doing this kind of "conversion" is to always cast the double* to a char* (or unsigned char or std::byte). Never the other way round.
You should be able to do something like this:
void byteArrayToDoubleArray(byte* in, std::size_t n, double* out)
{
for(auto out_bytes = (byte*) out; n--;)
*out_bytes++ = *in++;
}
// ...
byte ch[64];
// .. fill ch with double data somehow
double res[8];
byteArrayToDoubleArray(ch, 64, res);
Assuming that type byte is an alias of char or unsigned char or std::byte.
I am not completly sure what you are trying to achieve here because of the code (sizeof(ch) / sizeof(*ch)) which does not make sense for an array of undefined size.
If you have a byte-Array (POD data type; something like a typedef char byte;) then this most simple solution would be a reinterpret_cast:
double *result = reinterpret_cast<double*>(ch);
This allows you to use result[0]..result[7] as long as ch[] is valid and contains at least 64 bytes. Be aware that this construct does not generate code. It tells the compiler that result[0] corresponds to ch[0..7] and so on. An access to result[] will result in an access to ch[].
But you have to know the number of elements in ch[] to calculate the number of valid double elements in result.
If you need a copy (because - for example - the ch[] is a temporary array) you could use
std::vector<double> result(reinterpret_cast<double*>(ch), reinterpret_cast<double*>(ch) + itemsInCh * sizeof(*ch) / sizeof(double));
So if ch[] is an array with 64 items and a byte is really an 8-bit value, then
std::vector<double> result(reinterpret_cast<double*>(ch), reinterpet_cast<double*>(ch) + 8);
will provide a std::vector containing 8 double values.
There is another possible method using a union:
union ByteToDouble
{
byte b[64];
double d[8];
} byteToDouble;
the 8 double values will occupie the same memory as the 64 byte values. So you can write the byte values to byteToDouble.b[] and read the resultingdouble values from byteToDouble.d[].
I'm trying to allocate an array of struct and I want each struct to be aligned to 64 bytes.
I tried this (it's for Windows only for now), but it doesn't work (I tried with VS2012 and VS2013):
struct __declspec(align(64)) A
{
std::vector<int> v;
A()
{
assert(sizeof(A) == 64);
assert((size_t)this % 64 == 0);
}
void* operator new[] (size_t size)
{
void* ptr = _aligned_malloc(size, 64);
assert((size_t)ptr % 64 == 0);
return ptr;
}
void operator delete[] (void* p)
{
_aligned_free(p);
}
};
int main(int argc, char* argv[])
{
A* arr = new A[200];
return 0;
}
The assert ((size_t)this % 64 == 0) breaks (the modulo returns 16). It looks like it works if the struct only contains simple types though, but breaks when it contains an std container (or some other std classes).
Am I doing something wrong? Is there a way of doing this properly? (Preferably c++03 compatible, but any solution that works in VS2012 is fine).
Edit:
As hinted by Shokwav, this works:
A* arr = (A*)new std::aligned_storage<sizeof(A), 64>::type[200];
// this works too actually:
//A* arr = (A*)_aligned_malloc(sizeof(A) * 200, 64);
for (int i=0; i<200; ++i)
new (&arr[i]) A();
So it looks like it's related to the use of new[]... I'm very curious if anybody has an explanation.
I wonder why you need such a huge alignment requirement, moreover to store a dynamic heap allocated object in the struct. But you can do this:
struct __declspec(align(64)) A
{
unsigned char ___padding[64 - sizeof(std::vector<int>)];
std::vector<int> v;
void* operator new[] (size_t size)
{
// Make sure the buffer will fit even in the worst case
unsigned char* ptr = (unsigned char*)malloc(size + 63);
// Find out the next aligned position in the buffer
unsigned char* endptr = (unsigned char*)(((intptr_t)ptr + 63) & ~63ULL);
// Also store the misalignment in the first padding of the structure
unsigned char misalign = (unsigned char)(endptr - ptr);
*endptr = misalign;
return endptr;
}
void operator delete[] (void* p)
{
unsigned char * ptr = (unsigned char*)p;
// It's required to call back with the original pointer, so subtract the misalignment offset
ptr -= *ptr;
free(ptr);
}
};
int main()
{
A * a = new A[2];
printf("%p - %p = %d\n", &a[1], &a[0], int((char*)&a[1] - (char*)&a[0]));
return 0;
}
I did not have your align_malloc and free function, so the implementation I'm providing is doing this:
It allocates larger to make sure it will fit in 64-bytes boundaries
It computes the offset from the allocation to the closest 64-bytes boundary
It stores the "offset" in the padding of the first structure (else I would have required a larger allocation space each time)
This is used to compute back the original pointer to the free()
Outputs:
0x7fff57b1ca40 - 0x7fff57b1ca00 = 64
Warning: If there is no padding in your structure, then the scheme above will corrupt data, since I'll be storing the misalignement offset in a place that'll be overwritten by the constructor of the internal members.
Remember that when you do "new X[n]", "n" has to be stored "somewhere" so when calling delete[], "n" calls to the destructors will be done. Usually, it's stored before the returned memory buffer (new will likely allocate the required size + 4 for storing the number of elements). The scheme here avoid this.
Another warning: Because C++ calls this operator with some additional padding included in the size for storing the array's number of elements, you'll might still get a "shift" in the returned pointer address for your objects. You might need to account for it. This is what the std::align does, it takes the extra space, compute the alignment like I did and return the aligned pointer. However, you can not get both done in the new[] overload, because of the "count storage" shift that happens after returning from new(). However, you can figure out the "count storage" space once by a single allocation, and adjust the offset accordingly in the new[] implementation.
I'm writing a skip list.
What I have:
template<typename T>
struct SkipListNode
{
T data;
SkipListNode* next[32];
};
The problem with this code is that it wastes space - it requires all nodes to contain 32 pointers. Especially considering that in typical list, half of the nodes will only need one pointer.
The C language has a neat feature called flexible array member that could solve that problem. Had it existed in C++ (even for trivial classes), I could write code like this:
template<typename T>
struct SkipListNode
{
alignas(T) char buffer[sizeof(T)];
SkipListNode* next[];
};
and then manually create nodes with a factory function and destroying them when deleting elements.
Which brings the question - how can I emulate such functionality portably, without undefined behaviour in C++?
I considered mallocing the buffer and then manipulating the offsets appropriately by hand - but it's too easy to violate the alignment requirements - if you malloc(sizeof(char) + sizeof(void*)*5), the pointers are unaligned. Also, I'm not even sure if such hand-created buffers are portable to C++.
Note that I don't require the exact syntax, or even ease of use - this is a node class, internal to the skip list class, which won't be a part of the interface at all.
This is the implementation I wrote, based on R. Martinho Fernandes's idea - it constructs a buffer that happens to have a correct size and alignment in specific places (the AlignmentExtractor is used extract the offset of the pointer array, which ensures that the pointers in the buffer have correct alignment). Then, placement-new is used to construct the type in the buffer.
T isn't used directly in AlignmentExtractor because offsetof requires standard layout type.
#include <cstdlib>
#include <cstddef>
#include <utility>
template<typename T>
struct ErasedNodePointer
{
void* ptr;
};
void* allocate(std::size_t size)
{
return ::operator new(size);
}
void deallocate(void* ptr)
{
return ::operator delete(ptr);
}
template<typename T>
struct AlignmentExtractor
{
static_assert(alignof(T) <= alignof(std::max_align_t), "extended alignment types not supported");
alignas(T) char data[sizeof(T)];
ErasedNodePointer<T> next[1];
};
template<typename T>
T& get_data(ErasedNodePointer<T> node)
{
return *reinterpret_cast<T*>(node.ptr);
}
template<typename T>
void destroy_node(ErasedNodePointer<T> node)
{
get_data(node).~T();
deallocate(node.ptr);
}
template<typename T>
ErasedNodePointer<T>& get_pointer(ErasedNodePointer<T> node, int pos)
{
auto next = reinterpret_cast<ErasedNodePointer<T>*>(reinterpret_cast<char*>(node.ptr) + offsetof(AlignmentExtractor<T>, next));
next += pos;
return *next;
}
template<typename T, typename... Args>
ErasedNodePointer<T> create_node(std::size_t height, Args&& ...args)
{
ErasedNodePointer<T> p = { nullptr };
try
{
p.ptr = allocate(sizeof(AlignmentExtractor<T>) + sizeof(ErasedNodePointer<T>)*(height-1));
::new (p.ptr) T(std::forward<T>(args)...);
for(std::size_t i = 0; i < height; ++i)
get_pointer(p, i).ptr = nullptr;
return p;
}
catch(...)
{
deallocate(p.ptr);
throw;
}
}
#include <iostream>
#include <string>
int main()
{
auto p = create_node<std::string>(5, "Hello world");
auto q = create_node<std::string>(2, "A");
auto r = create_node<std::string>(2, "B");
auto s = create_node<std::string>(1, "C");
get_pointer(p, 0) = q;
get_pointer(p, 1) = r;
get_pointer(r, 0) = s;
std::cout << get_data(p) << "\n";
std::cout << get_data(get_pointer(p, 0)) << "\n";
std::cout << get_data(get_pointer(p, 1)) << "\n";
std::cout << get_data(get_pointer(get_pointer(p, 1), 0)) << "\n";
destroy_node(s);
destroy_node(r);
destroy_node(q);
destroy_node(p);
}
Output:
Hello world
A
B
C
Longer explanation:
The point of this code is to create a node dynamically, without using types directly (type erasure). This node stores an object, and N pointers, with N variable at runtime.
You can use any memory as if it had a specific type, provided that:
size is correct
alignment is correct
(only non-triviably constructible types) you manually call the constructor before using
(only non-triviably destructible types) you manually call the destructor after using
In fact, you rely on this every time you call malloc:
// 1. Allocating a block
int* p = (int*)malloc(5 * sizeof *p);
p[2] = 42;
free(p);
Here, we treat the chunk of memory returned by malloc as if it was an array of ints. This must work because of these guarantees:
malloc returns a pointer guaranteed to be properly aligned for any object type.
If your pointer p points to aligned memory, (int*)((char*)p + sizeof(int)) (or p + 1, which is equivalent) also does.
The dynamically created node must have enough size to contain N ErasedNodePointers (which are used as handles here) and one object of size T. This is satisfied by allocating enough memory in create_node function - it will allocate sizeof(T) + sizeof(ErasedNodePointer<T>)*N bytes or more, but not less.
That was the first step. The second is now we extract the required position relative to the beginning of a block. That's where AlignmentExtractor<T> comes in.
AlignmentExtractor<T> is a dummy struct I use to ensure correct alignment:
// 2. Finding position
AlignmentExtractor<T>* p = (AlignmentExtractor<T>*)malloc(sizeof *p);
p->next[0].ptr = nullptr;
// or
void* q = (char*)p + offsetof(AlignmentExtractor<T>, next);
(ErasedTypePointer<T>*)q->ptr = nullptr;
It doesn't matter how I got the position of the pointer, as long as I obey the rules of pointer arithmetic.
The assumptions here are:
I can cast any pointer to void* and back.
I can cast any pointer to char* and back.
I can operate on a struct as if it was a char array of size equal to the size of the struct.
I can use pointer arithmetic to point at any element of an array.
These all are guaranteed by C++ standard.
Now, after I have allocated the block of enough size, I calculate the offset with offsetof(AlignmentExtractor<T>, next) and add it to the pointer pointing to the block. We "pretend" (the same way the code "1. Allocating a block" pretends it has an array of ints) the result pointer points to beginning of the array. This pointer is aligned correctly, because otherwise the code "2. Finding position" couldn't access the next array due to misaligned access.
If you have a struct of standard layout type, the pointer to the struct has the same address as the first member of the struct. AlignmentExtractor<T> is standard layout.
That's not all though - requirements 1. and 2. are satisfied, but we need to satisfy requirements 3. and 4. - the data in the node doesn't have to be trivially constructible or destructible. That's why we use placement-new to construct the data - the create_node uses variadic templates and perfect forwarding to forward arguments to the constructor. And the data is destroyed in the destroy_node function by calling the destructor.
Pointers can be declared like this:
int
a = 1,
*b = &a, // 1st order pointer
**c = &b, // 2nd order pointer
***d = &c, // 3rd order pointer
****e = &d, // 4th order pointer
*****f = &e, // 5th order pointer
*** n-stars *** f; // n-th order pointer
Here, we need to know at compile-time the order of the pointer when we are declaring it.
Is it possible at all to declare a pointer, whose order is only known at run time? Linked to this question is whether is it possible to query at run-time the order of an arbitrary pointer?
int order = GET_ORDER_OF_PTR(f) // returns 5
int /* insert some syntax here to make ptr a pointer of order (order + 1) */ ptr = &f;
Note:
I already know this (generally) might not be a good idea. Still want to know if it's doable :)
In runtime you cannot - because C++ is statically typed. During compilation it is possible with templates, e.g.
template<typename T, int order> struct P
{
typedef typename P<T, order-1>::pointer* pointer;
};
template<typename T> struct P<T, 0>
{
typedef T pointer;
};
Then P<int, 3>::pointer is equivalent to int***.
You can't. C++ is statically typed, so the order of a pointer (which is part of its type) must be known at compile time.
Not in C or C++ because they're statically typed languages (i.e. the type of the values you store in a variable are known at compile time and fixed).
You can emulate this kind of possibility by defining a C++ class
template<typename T>
struct NPtr {
int order;
void *p; // order 0 -> T*, otherwise NPtr<T>*
NPtr(NPtr *ptr) : islast(ptr->order+1), p(ptr) {}
NPtr(T *final) : islast(0), ptr(final) {}
NPtr<T>& nptr() {
assert(order > 0);
return *(NPtr<T>*)p;
}
T& final() {
assert(order == 0);
return *(T*)p;
}
};
A TPtr<int> instance can either be a pointer to an integer (when order=0) or a pointer to another TPtr<int> instance.
The semantic equivalent of what you want is a linked list, with a number of node determined at runtime:
#include <iostream>
union kind_of_pointer
{
int data;
kind_of_pointer *next;
kind_of_pointer(int val) : data(val) {}
kind_of_pointer(kind_of_pointer* ptr) : next(ptr) {}
operator int()
{
return data;
}
kind_of_pointer& operator *()
{
return *next;
}
};
int main(void)
{
kind_of_pointer dyn_ptr{new kind_of_pointer{new kind_of_pointer{new kind_of_pointer{42}}}};
int*** static_ptr = new int**{new int *{new int{42}}};
std::cout << ***dyn_ptr << std::endl;
std::cout << ***static_ptr << std::endl;
}
I find this funny, interesting and horrible :)
You can do this:
int *f = nullptr;
decltype(f) *newptr = 0;
As for the
whether is it possible to query at run-time the order of an arbitrary
pointer?
part: no, you most definitely can not, unless you write your own wrapper-class that stores the order of the pointer. Pointer is, basically, a number: the address in memory.
This gives you at least one problem: you can't follow the pointer "all the way through" (which you would have to do to check if the thing your pointer is pointing at is itself a pointer) without potentially causing a segfault or reading "not your application's memory" (which is often prohibited by OS and will cause your application to be aborted; not sure about weather you can prevent this or not).
For example, NULL (or 0) can be cast to any pointer type, so is itself a pointer. But does it point to another pointer? Let's find out... BAM! SEGFAULT.
Oh, wait, there's another problem: you can cast (with c-style cast or with reinterpret_cast) pointer of any type to pointer of any other type. So, say, a might be pointing to a pointer, but was cast to a different type. Or a might have a type of "pointing to a pointer", but actually isn't pointing to one.
P.S. Sorry for using the verb "point" so freely.
... we need to know at compile-time the order of the pointer when we are declaring it.
Yes it is possible at compile time to determine the order of the pointer type being declared; based on the type (possibly via typedef) or the variable, if C++11 can be used (via decltype()).
#include <iostream>
using namespace std;
template <typename P>
struct ptr_order
{
static const int order = 0;
};
template <typename P>
struct ptr_order<P*>
{
static const int order = ptr_order<P>::order + 1;
};
int main()
{
typedef int*** pointer;
cout << ptr_order<pointer>::order << endl; // outputs 3
// could also use decltype if available...
// int*** p;
// ptr_order<decltype(p)>::order is also 3
}
The runtime calculation of the order of the pointer is not possible, since C++ is statically typed.