Have been reading a few strict aliasing questions, such as Cast array of bytes to POD or Aliasing `T*` with `char*` is allowed. Is it also allowed the other way around?
From these I gather that the only legal way to access a memory location declared to be any type (specifically also (array of) char) as another type is to invoke placement new on it, as that would change the dynamic type.
Since std::aligned_storage normally has to have an underlying type other than the intended use type, it seems to me it is impossible to use the storage without invoking placement new on it first.
So I would not be allowed to create aligned_storage for, e.g. a double and use it as a double via pointer casting? Or rather, before I would be allowed to access the memory as double via pointer cast, I'd have to do a placement new on it, "turning it into" a dynamic object of type double?
Related
When implementing certain data structures in C++ one needs to be able to create an array that has uninitialized elements. Because of that, having
buffer = new T[capacity];
is not suitable, as new T[capacity] initializes the array elements, which is not always possible (if T does not have a default constructor) or desired (as constructing objects might take time). The typical solution is to allocate memory and use placement new.
For that, if we know the number of elements is known (or at least we have an upper bound) and allocate on stack, then, as far as I am aware, one can use an aligned array of bytes or chars, and then use std::launder to access the members.
alignas(T) std::byte buffer[capacity];
However, it solves the problem only for stack allocations, but it does not solve the problem for heap alloations. For that, I assume one needs to use aligned new, and write something like this:
auto memory = ::operator new(sizeof(T) * capacity, std::align_val_t{alignof(T)});
and then cast it either to std::byte* or unsigned char* or T*.
// not sure what the right type for reinterpret cast should be
buffer = reinterpret_cast(memory);
However, there are several things that are not clear to me.
The result reinterpret_cast<T*>(ptr) is defined if ptr points an object that is pointer-interconvertible with T. (See this answer or https://eel.is/c++draft/basic.types#basic.compound-3) for more detail. I assume, that converting it to T* is not valid, as T is not necessarily pointer-interconvertible with result of new. However, is it well defined for char* or std::byte?
When converting the result of new to a valid pointer type (assuming it is not implementation defined), is it treated as a pointer to first element of array, or just a pointer to a single object? While, as far as I know, it rarely (if at all) matters in practice, there is a semantic difference, an expression of type pointer_type + integer is well defined only if pointed element is an array member, and if the result of arithmetic points to another array element. (see https://eel.is/c++draft/expr.add#4).
As for lifetimes are concerned, an object of type array unsigned char or std::byte can provide storage for result of placement new (https://eel.is/c++draft/basic.memobj#intro.object-3), however is it defined for arrays of other types?
As far as I knowT::operator new and T::operator new[] expressions call ::operator new or ::operator new[] behind the scenes. Since the result of builtin new is void, how conversion to the right type is done? Are these implementation based or we have well defined rules to handle these?
When freeing the memory, should one use
::operator delete(static_cast<void*>(buffer), sizeof(T) * capacity, std::align_val_t{alignof(T)});
or there is another way?
PS: I'd probably use the standard library for these purposes in real code, however I try to understand how things work behind the scenes.
Thanks.
pointer-interconvertibility
Regarding pointer-interconvertibility, it doesn't matter if you use T * or {[unsigned] char|std::byte} *. You will have to cast it to T * to use it anyway.
Note that you must call std::launder (on the result of the cast) to access the pointed T objects. The only exception is the placement-new call that creates the objects, because they don't exist yet. The manual destructor call is not an exception.
The lack of pointer-interconvertibility would only be a problem if you didn't use std::launder.
When converting the result of new to a valid pointer type (assuming it is not implementation defined), is it treated as a pointer to first element of array, or just a pointer to a single object?
If you want to be extra safe, store the pointer as {[unsigned] char|std::byte} * and reinterpret_cast it after peforming any pointer arithmetic.
an object of type array unsigned char or std::byte can provide storage for result of placement new
The standard doesn't say anywhere that "providing storage" is required for placement-new to work. I think this term is defined solely to be used in definitions of other terms in the standard.
Consider [basic.life]/example-2 where operator= uses placement-new to reconstruct an object in place, even though type T doesn't "provide storage" for the same type T.
Since the result of builtin new is void, how conversion to the right type is done?
Not sure what the standard has to say about it, but what else can it be other than reinterpret_cast?
freeing the memory
Your approach looks correct, but I think you don't have to pass the size.
I think your premise may be incorrect. If T is a class the default constructor should be called. However that can be blank and if your class contains all POD (plain old data) then nothing will be initialized. I actually count on this all the time because I often don't want things initialized for performance reasons.
I believe there are are a few caveats with this for global data and so forth where some things are zero initialized. But in general heap stuff isn't. You can test it and you will find there's a bunch of garbage in memory, at least when compiled in release mode. Some compilers will initialize memory in debug mode but that's done outside constructors.
For instance you can set data in a custom placement new function and if it's POD it will still be there in the constructor. Some people will argue this is UB but I think the standard says "nothing is done" for POD, which implies no initialization.
A common idiom when constructing buffers (say a ring buffer) for objects of class type T is to initialize a T* object with the address of memory obtained from std::malloc() or operator new(), and then to construct objects in that buffer on demand using placement new, using pointer arithmetic on the T pointer to traverse the block of memory.
While it seems highly unlikely that there is any compiler on which this would not work (it does work with g++ and clang++), it seems to me that strictly speaking this may have undefined behavior. This is because §8.7/4 of C++17 seems only to permit pointer arithmetic on arrays, and a block of memory returned by malloc, operator new or operator new[] is not an array - as I understand it only the new[] expression can create an array in dynamic memory, which will thereby be fully initialized at the point of construction.
This also got me thinking that the reference implementation of std::uninitialized_copy has undefined behaviour for dynamically allocated uninitialized memory, because its reference implementation in §23.10.10.4/1 of C++17 uses pointer arithmetic on the destination iterator, which would here be a pointer.
Arguably the same applies for std::uninitialized_copy if the uninitialized memory is obtained non-dynamically, say using an aligned array of unsigned char or of std::byte as permitted by §4.5/3 of C++17, because the arithmetic in §8.7/4 implies that the destination pointer type upon which the arithmetic is carried out should be that of the array element type (unsigned char or std::byte) and not the type constructed in it using placement new.
This seems surprising. Can anyone point out the flaw (if any) in this reasoning.
Yes, pointer arithmetic on a pointer returned from malloc or operator new has undefined behavior without a previous array placement-new (which itself cannot be done reliably) and so does using std::unintialized_copy on it, which is defined to behave as if this pointer arithmetic was done.
The best you can do is to create a std::byte (or unsigned char) array as storage, directly using new[], and then placement-new individual objects into that storage array, which will make these objects nested in the buffer array.
Pointer arithmetic on the storage array is well-defined, so you can reach pointers to the positions of the individual object in the storage array and with std::launder or by using the pointer returned from the placement-new you can obtain a pointer to the nested object. Even then you will not be able to use pointer arithmetic on pointers to the nested objects, because these do not form an array.
See the paper P0593R5
Implicit creation of objects for low-level object manipulation for further examples of this surprisingly undefined behavior and suggestions to make it defined in the future.
In this link from the isocpp.org faq in the example provided, a Fred object is being constructed with placement new to a buffer that is being allocated for another object i.e. for
char memory[sizeof(Fred)]
As I know the strict aliasing rules allows us to do the opposite i.e. for an object of whatever type, we are allowed to have a char* point at it and we can dereference that pointer and use it as we want.
But here in the example the opposite is happening. What am I missing?
The strict aliasing rules doesn't mention that Fred* must be cast to char*. Only that variables of type char* and Fred* may point to the same object, and be used to access it.
Quoting [basic.lval] paragraph 8
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
the dynamic type of the object,
[..]
a char or unsigned char type.
Placement-new creates a new object. It doesn't alias the old object. The old object (the char array in this example) is considered to stop existing when the placement-new executes.
Before placement-new, there is storage filled with char objects. After placement-new, there is storage filled with one Fred object.
Since there is no aliasing, there are no strict-aliasing problems.
The context is:
Writing a container, containing type T, and a char * p to a memory region. Let's suppose the pointer is already suitably aligned for type T - the alignment issue is not part of the question.
How do I default construct an element on that memory region?
((*T)(p))->T();
works for classes, but not with some builtin types.
((*T)(p)) = 0; // or simply memset
for integral types, pointers.
Do these two cover everything, unions and what not?
Is there a best practice for this, or some standard library feature?
std::allocator::construct can do it, that is what e.g. std::vector uses, but it is not a static method, so I would need an instance of it. Is there some freestanding or static function that can do it?
--EDIT--
Yes, the answer is obvious, and I was dumb today -- placement new
BTW, Now I'm trying to destroy the element...
"Placement new" is the term to look for. It is a standard library operator new overload that does not actually allocate memory, but just returns whatever pointer you pass to it.
Include the <new> header and use its placement new allocation function like this:
::new (p) T()
The :: qualification avoids picking up a class-specific allocation function.
The paranthesis (p) is an argument list for the allocation function.
This allocation function just returns the passed in pointer.
To be pedantic about things you would also cast the pointer to void*, to avoid picking up some hypothetical other operator new in the global namespace.
The code shown in the question, ((*T)(p))->T();, should not compile. The standard explicitly points out that a constructor doesn't have a name. So it can't be called like an ordinary function.
Say I have the following C++:
char *p = new char[cb];
SOME_STRUCT *pSS = (SOME_STRUCT *) p;
delete pSS;
Is this safe according to the C++ standard? Do I need to cast back to a char* and then use delete[]? I know it'll work in most C++ compilers, because it's plain-ordinary-data, with no destructors. Is it guaranteed to be safe?
It's not guaranteed to be safe. Here's a relevant link in the C++ FAQ lite:
[16.13] Can I drop the [] when deleting array of some built-in type (char, int, etc.)?
http://www.parashift.com/c++-faq-lite/freestore-mgmt.html#faq-16.13
No, it's undefined behaviour - a compiler could plausibly do something different, and as the C++ FAQ entry that thudbang linked to says, operator delete[] might be overloaded to do something different to operator delete. You can sometimes get away with it, but it's also good practice to get into the habit of matching delete[] with new[] for the cases where you can't.
I highly doubt it.
There are a lot of questionable ways of freeing memory, for example you can use delete on your char array (rather than delete[]) and it will likely work fine. I blogged in detail about this (apologies for the self-link, but it's easier than rewriting it all).
The compiler is not so much the issue as the platform. Most libraries will use the allocation methods of the underlying operating system, which means the same code could behave differently on Mac vs. Windows vs. Linux. I have seen examples of this and every single one was questionable code.
The safest approach is to always allocate and free memory using the same data type. If you are allocating chars and returning them to other code, you may be better off providing specific allocate/deallocate methods:
SOME_STRUCT* Allocate()
{
size_t cb; // Initialised to something
return (SOME_STRUCT*)(new char[cb]);
}
void Free(SOME_STRUCT* obj)
{
delete[] (char*)obj;
}
(Overloading the new and delete operators may also be an option, but I have never liked doing this.)
C++ Standard [5.3.5.2] declares:
If the operand has a class type, the operand is converted to a pointer type by calling the above-mentioned conversion
function, and the converted operand is used in place of the original operand for the remainder of this section. In either
alternative, the value of the operand of delete may be a null pointer value. If it is not a null pointer value, in the first
alternative (delete object), the value of the operand of delete shall be a pointer to a non-array object or a pointer to a
subobject (1.8) representing a base class of such an object (clause 10). If not, the behavior is undefined. In the second
alternative (delete array), the value of the operand of delete shall be the pointer value which resulted from a previous
array new-expression.77) If not, the behavior is undefined. [ Note: this means that the syntax of the delete-expression
must match the type of the object allocated by new, not the syntax of the new-expression. —end note ] [ Note: a pointer
to a const type can be the operand of a delete-expression; it is not necessary to cast away the constness (5.2.11) of the
pointer expression before it is used as the operand of the delete-expression. —end note ]
This is a very similar question to the one that I answered here: link text
In short, no, it's not safe according to the C++ standard. If, for some reason, you need a SOME_STRUCT object allocated in an area of memory that has a size difference from size_of(SOME_STRUCT) (and it had better be bigger!), then you are better off using a raw allocation function like global operator new to perform the allocation and then creating the object instance in raw memory with a placement new. Placement new will be extremely cheap if the object type has no constructor.
void* p = ::operator new( cb );
SOME_STRUCT* pSS = new (p) SOME_STRUCT;
// ...
delete pSS;
This will work most of the time. It should always work if SOME_STRUCT is a POD-struct. It will also work in other cases if SOME_STRUCT's constructor does not throw and if SOME_STRUCT does not have a custom operator delete. This technique also removes the need for any casts.
::operator new and ::operator delete are C++'s closest equivalent to malloc and free and as these (in the absence of class overrides) are called as appropriate by new and delete expressions they can (with care!) be used in combination.
While this should work, I don't think you can guarantee it to be safe because the SOME_STRUCT is not a char* (unless it's merely a typedef).
Additionally, since you're using different types of references, if you continue to use the *p access, and the memory has been deleted, you will get a runtime error.
This will work OK if the memory being pointed to and the pointer you are pointing with are both POD. In this case, no destructor would be called anyhow, and the memory allocator does not know or care about the type stored within the memory.
The only case this is OK with non-POD types, is if the pointee is a subtype of the pointer, (e.g. You are pointing at a Car with a Vehicle*) and the pointer's destructor has been declared virtual.
This isn't safe, and non of the responses so far have emphasized enough the madness of doing this. Simply don't do it, if you consider yourself a real programmer, or ever want to work as a professional programmer in a team. You can only say that your struct contains non destructor at the moment, however you are laying a nasty possibly compiler and system specific trap for the future. Also, your code is unlikely to work as expected. The very best you can hope for is it doesn't crash. However I suspect you will slowly get a memory leak, as array allocations via new very often allocate extra memory in the bytes prior to the returned pointer. You won't be freeing the memory you think you are. A good memory allocation routine should pick up this mismatch, as would tools like Lint etc.
Simply don't do that, and purge from your mind whatever thinking process led you to even consider such nonsense.
I've changed the code to use malloc/free. While I know how MSVC implements new/delete for plain-old-data (and SOME_STRUCT in this case was a Win32 structure, so simple C), I just wanted to know if it was a portable technique.
It's not, so I'll use something that is.
If you use malloc/free instead of new/delete, malloc and free won't care about the type.
So if you're using a C-like POD (plain old data, like a build-in type, or a struct), you can malloc some type, and free another. note that this is poor style even if it works.