Managing trivial types - c++

I have found the intricacies of trivial types in C++ non-trivial to understand and hope someone can enlighten me on the following.
Given type T, storage for T allocated using ::operator new(std::size_t) or ::operator new[](std::size_t) or std::aligned_storage, and a void * p pointing to a location in that storage suitably aligned for T so that it may be constructed at p:
If std::is_trivially_default_constructible<T>::value holds, is the code invoking undefined behavior when code skips initialization of T at p (i.e. by using T * tPtr = new (p) T();) before otherwise accessing *p as T? Can one just use T * tPtr = static_cast<T *>(p); instead without fear of undefined behavior in this case?
If std::is_trivially_destructible<T>::value holds, does skipping destruction of T at *p (i.e by calling tPtr->~T();) cause undefined behavior?
For any type U for which std::is_trivially_assignable<T, U>::value holds, is std::memcpy(&t, &u, sizeof(U)); equivalent to t = std::forward<U>(u); (for any t of type T and u of type U) or will it cause undefined behavior?

No, you can't. There is no object of type T in that storage, and accessing the storage as if there was is undefined. See also T.C.'s answer here.
Just to clarify on the wording in [basic.life]/1, which says that objects with vacuous initialization are alive from the storage allocation onward: that wording obviously refers to an object's initialization. There is no object whose initialization is vacuous when allocating raw storage with operator new or malloc, hence we cannot consider "it" alive, because "it" does not exist. In fact, only objects created by a definition with vacuous initialization can be accessed after storage has been allocated but before the vacuous initialization occurs (i.e. their definition is encountered).
Omitting destructor calls never per se leads to undefined behavior. However, it's pointless to attempt any optimizations in this area in e.g. templates, since a trivial destructor is just optimized away.
Right now, the requirement is being trivially copyable, and the types have to match. However, this may be too strict. Dos Reis's N3751 at least proposes distinct types to work as well, and I could imagine this rule being extended to trivial copy assignment across one type in the future.
However, what you've specifically shown does not make a lot of sense (not least because you're asking for assignment to a scalar xvalue, which is ill-formed), since trivial assignment can hold between types whose assignment is not actually "trivial", that is, has the same semantics as memcpy. E.g. is_trivially_assignable<int&, double> does not imply that one can be "assigned" to the other by copying the object representation.

Technically reinterpreting storage is not enough to introduce a new object as. Look at the note for Trivial default constructor states:
A trivial default constructor is a constructor that performs no action. All data types compatible with the C language (POD types) are trivially default-constructible. Unlike in C, however, objects with trivial default constructors cannot be created by simply reinterpreting suitably aligned storage, such as memory allocated with std::malloc: placement-new is required to formally introduce a new object and avoid potential undefined behavior.
But the note says it's a formal limitation, so probably it is safe in many cases. Not guaranteed though.
No. is_assignable does not even guarantee the assignment will be legal under certain conditions:
This trait does not check anything outside the immediate context of the assignment expression: if the use of T or U would trigger template specializations, generation of implicitly-defined special member functions etc, and those have errors, the actual assignment may not compile even if std::is_assignable::value compiles and evaluates to true.
What you describe looks more like is_trivially_copyable, which says:
Objects of trivially-copyable types are the only C++ objects that may be safely copied with std::memcpy or serialized to/from binary files with std::ofstream::write()/std::ifstream::read().
I don't really know. I would trust KerrekSB's comments.

Related

Is it legal to construct data members of a struct separately?

class A;
class B;
//we have void *p pointing to enough free memory initially
std::pair<A,B> *pp=static_cast<std::pair<A,B> *>(p);
new (&pp->first) A(/*...*/);
new (&pp->second) B(/*...*/);
After the code above get executed, is *pp guaranteed to be in a valid state? I know the answer is true for every compiler I have tested, but the question is whether this is legal according to the standard and hence. In addition, is there any other way to obtain such a pair if A or B is not movable in C++98/03? (thanks to #StoryTeller-UnslanderMonica , there is a piecewise constructor for std::pair since C++11)
“Accessing” the members of the non-existent pair object is undefined behavior per [basic.life]/5; pair is never a POD-class (having user-declared constructors), so a pointer to its storage out of lifetime may not be used for its members. It’s not clear whether forming a pointer to the member is already undefined, or if the new is.
Neither is there a way to construct a pair of non-copyable (not movable, of course) types in C++98—that’s why the piecewise constructor was added along with move semantics.
A more simple question: is using a literal string well defined?
Not even that is, as its lifetime is not defined. You can't use a string literal in conforming code.
So the committee that never took the time to make string literals well defined obviously did not bother with specifying which objects of class type can be made to exist by placement new of its subobjects - polymorphic objects obviously cannot be created that way!
That standard did not even bother describing the semantic of union.
About lifetime the standard is all over the place, and that isn't just editorial: it reflects a deep disagreement between serious people about what begins a lifetime, what an object is, what an lvalue is, etc.
Notably people have all sorts of false or contradicting intuitions:
an infinite number of objects cannot be created by one call to malloc
an lvalue refers to an object
overlapping objects are against the object model
a unnamed object can only be created by new or by the compiler (temporaries)
...

Does a member have to be initialized to take its address?

Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
#include <string>
class Klass {
public:
Klass()
: ptr_str{&str}
, str{}
{}
private:
std::string *ptr_str;
std::string str;
};
this question is similar to mine, but the order is correct there, and the answer says
I'd advise against coding like this in case someone changes the order of the members in your class.
Which seems to mean reversing the order would be illegal but I couldn't be sure.
Does a member have to be initialized to take its address?
No.
Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
Yes. Yes.
There is no restriction that operand of unary & need to be initialised. There is an example in the standard in specification of unary & operator:
int a;
int* p1 = &a;
Here, the value of a is indeterminate and it is OK to point to it.
What that example doesn't demonstrate is pointing to an object before its lifetime has begun, which is what happens in your example. Using a pointer to an object before and after its lifetime is explicitly allowed if the storage is occupied. Standard draft says:
[basic.life] Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways ...
The rule goes on to list how the usage is restricted. You can get by with common sense. In short, you can treat it as you could treat a void*, except violating these restrictions is UB rather than ill-formed. Similar rule exists for references.
There are also restrictions on computing the address of non-static members specifically. Standard draft says:
[class.cdtor] ... To form a pointer to (or access the value of) a direct non-static member of an object obj, the construction of obj shall have started and its destruction shall not have completed, otherwise the computation of the pointer value (or accessing the member value) results in undefined behavior.
In the constructor of Klass, the construction of Klass has started and destruction hasn't completed, so the above rule is satisfied.
P.S. Your class is copyable, but the copy will have a pointer to the member of another instance. Consider whether that makes sense for your class. If not, you will need to implement custom copy and move constructors and assignment operators. A self-reference like this is a rare case where you may need custom definitions for those, but not a custom destructor, so it is an exception to the rule of five (or three).
P.P.S If your intention is to point to one of the members, and no object other than a member, then you might want to use a pointer to member instead of pointer to object.
Funny question.
It is legitimate and will "work", though barely. There is a little "but" related to types which makes the whole thing a bit awkward with a bad taste (but not illegitimate), and which might make it illegal some border cases involving inheritance.
You can, of course, take the address of any object whether it's initialized or not, as long as it exists in the scope and has a name which you can prepend operator& to. Dereferencing the pointer is a different thing, but that wasn't the question.
Now, the subtle problem is that the standard defines the result of operator& for non-static struct members as "“pointer to member of class C of type T” and is a prvalue designating C::m".
Which basically means that ptr_str{&str} will take the address of str, but the type is not pointer-to, but pointer-to-member-of. It is then implicitly and silently cast to pointer-to.
In other words, although you do not need to explicitly write &this->str, that's nevertheless what its type is -- it's what it is and what it means [1].
Is this valid, and is it safe to use it within the initializer list? Well yes, just... barely. It's safe to use it as long as it's not being used to access uninitialized members or virtual functions, directly or indirectly. Which, as it happens, is the case here (it might not be the case in a different, arguably contrived case).
[1] Funnily, paragraph 4 starts with a clause that says that no member pointer is formed when you put stuff in parentheses. That's remarkable because most people would probably do that just to be 100% sure they got operator precedence right. But if I read correctly, then &this->foo and &(this->foo) are not in any way the same!

What is the purpose of std::launder?

P0137 introduces the function template std::launder and makes many, many changes to the standard in the sections concerning unions, lifetime, and pointers.
What is the problem this paper is solving? What are the changes to the language that I have to be aware of? And what are we laundering?
std::launder is aptly named, though only if you know what it's for. It performs memory laundering.
Consider the example in the paper:
struct X { const int n; };
union U { X x; float f; };
...
U u = {{ 1 }};
That statement performs aggregate initialization, initializing the first member of U with {1}.
Because n is a const variable, the compiler is free to assume that u.x.n shall always be 1.
So what happens if we do this:
X *p = new (&u.x) X {2};
Because X is trivial, we need not destroy the old object before creating a new one in its place, so this is perfectly legal code. The new object will have its n member be 2.
So tell me... what will u.x.n return?
The obvious answer will be 2. But that's wrong, because the compiler is allowed to assume that a truly const variable (not merely a const&, but an object variable declared const) will never change. But we just changed it.
[basic.life]/8 spells out the circumstances when it is OK to access the newly created object through variables/pointers/references to the old one. And having a const member is one of the disqualifying factors.
So... how can we talk about u.x.n properly?
We have to launder our memory:
assert(*std::launder(&u.x.n) == 2); //Will be true.
Money laundering is used to prevent people from tracing where you got your money from. Memory laundering is used to prevent the compiler from tracing where you got your object from, thus forcing it to avoid any optimizations that may no longer apply.
Another of the disqualifying factors is if you change the type of the object. std::launder can help here too:
alignas(int) char data[sizeof(int)];
new(&data) int;
int *p = std::launder(reinterpret_cast<int*>(&data));
[basic.life]/8 tells us that, if you allocate a new object in the storage of the old one, you cannot access the new object through pointers to the old. launder allows us to side-step that.
std::launder is a mis-nomer. This function performs the opposite of laundering: It soils the pointed-to memory, to remove any expectation the compiler might have regarding the pointed-to value. It precludes any compiler optimizations based on such expectations.
Thus in #NicolBolas' answer, the compiler might be assuming that some memory holds some constant value; or is uninitialized. You're telling the compiler: "That place is (now) soiled, don't make that assumption".
If you're wondering why the compiler would always stick to its naive expectations in the first place, and would need to you to conspicuously soil things for it - you might want to read this discussion:
Why introduce `std::launder` rather than have the compiler take care of it?
... which led me to this view of what std::launder means.
I think there are two purposes of std::launder.
A barrier for constant folding/propagation, including devirtualization.
A barrier for fine-grained object-structure-based alias analysis.
Barrier for overaggressive constant folding/propagation (abandoned)
Historically, the C++ standard allowed compilers to assume that the value of a const-qualified or reference non-static data member obtained in some ways to be immutable, even if its containing object is non-const and may be reused by placement new.
In C++17/P0137R1, std::launder is introduced as a functionality that disables the aforementioned (mis-)optimization (CWG 1776), which is needed for std::optional. And as discussed in P0532R0, portable implementations of std::vector and std::deque may also need std::launder, even if they are C++98 components.
Fortunately, such (mis-)optimization is forbidden by RU007 (included in P1971R0 and C++20). AFAIK there's no compiler performing this (mis-)optimization.
Barrier for devirtualization
A virtual table pointer (vptr) can be considered constant during the lifetime of its containing polymorphic object, which is needed for devirtualization. Given that vptr is not non-static data member, compilers is still allowed to perform devirtualization based on the assumption that the vptr is not changed (i.e., either the object is still in its lifetime, or it is reused by a new object of the same dynamic type) in some cases.
For some unusual uses that replace a polymorphic object with a new object of different dynamic type (shown here), std::launder is needed as a barrier for devirtualization.
IIUC Clang implemented std::launder (__builtin_launder) with these semantics (LLVM-D40218).
Barrier for object-structure-based alias analysis
P0137R1 also changes the C++ object model by introducing pointer-interconvertibility. IIUC such change enables some "object-structure-based alias analysis" proposed in N4303.
As a result, P0137R1 makes the direct use of dereferencing a reinterpret_cast'd pointer from an unsigned char [N] array undefined, even if the array is providing storage for another object of correct type. And then std::launder is needed for access to the nested object.
This kind of alias analysis seems overaggressive and may break many useful code bases. AFAIK it's currently not implemented by any compiler.
Relation to type-based alias analysis/strict aliasing
IIUC std::launder and type-based alias analysis/strict aliasing are unrelated. std::launder requires that an living object of correct type to be at the provided address.
However, it seems that they are accidently made related in Clang (LLVM-D47607).

When is placement new well defined, and what happens to an existing type when calling placement new?

I've seen some examples of placement new, and am a little confused as to what's happening internally with the various types.
A simple example:
using my_type = std::string;
using buffer_type = char;
buffer_type buffer[1000];
my_type* p{ new (buffer) my_type() };
p->~my_type();
From what I understand, this is valid, though I'm concerned about what happens to the char array of buffer[]. It seems like this is ok to do as long as I don't access the variable buffer in any form after creating a new object in place on its memory.
For the sake of keeping this simple, I'm not concerned about anything to do with proper alignment here, or any other topics of what can go wrong when calling placement new other than: what happens to the original type? Could I use another type such as buffer_type = int to achieve a similar effect (ignoring the possibility of how much memory that will actually take)? Is any POD safe as buffer_type? How about non-POD types? Do I have to tell the compiler that the char array is no longer valid in some way? Are there restrictions on what my_type could be or not be here?
Does this code do what I expect it to, is it well defined, and are there any minor modifications that would either keep this as well defined or break it into undefined behaviour?
what happens to the original type?
You mean the original object? It get's destroyed, that is, its lifetime ends. The lifetime of the array object of type buffer_type [1000] ends as soon as you reuse its storage.
A program may end the lifetime of any object by reusing the storage
which the object occupies or by explicitly calling the destructor for
an object of a class type with a non-trivial destructor. For an object
of a class type with a non-trivial destructor, the program is not
required to call the destructor explicitly before the storage which
the object occupies is reused or released; however, if there is no
explicit call to the destructor or if a delete-expression (5.3.5) is
not used to release the storage, the destructor shall not be
implicitly called and any program that depends on the side effects
produced by the destructor has undefined behavior.
Note that this implies that we should not use something with a non-trivial destructor for the buffer: The elements' destructors are called at the end of the scope it's defined in. For e.g. std::string as the element type that would imply a destructor call on an non-existing array subobject, which clearly triggers undefined behavior.
If a program ends the lifetime of an object of type T with […]
automatic (3.7.3) storage duration and if T has a non-trivial
destructor, the program must ensure that an object of the original
type occupies that same storage location when the implicit destructor
call takes place; otherwise the behavior of the program is undefined.
To avoid that you would have to construct std:strings into that buffer after you're done with it, which really seems nonsensical.
Is any POD safe as buffer_type?
We do not necessarily need PODs - they have a lot of requirements that are not a necessity here.
The only things that bother us are the constructor and the destructor.
It matters whether the types destructor is trivial (or for an array, the arrays element types' destructor). Also it's feasible that the constructor is trivial, too.
POD types feel safer since they suffice both requirements and convey the idea of "bare storage" very well.
Are there restrictions on what my_type could be or not be here?
Yes. It shall be an object type. It can be any object type, but it cannot be a reference.
Any POD type would be safe. Also, while a bit dangerous, I believe most non-POD types whose destructor is empty would work here as well. If the destructor is not empty, it will be called on buffer and try to access its data which is no longer vaild (due to the placement new).

What's the c++'s default assign operation behavior?

eg, it puzzles me:
struct A {
// some fileds...
char buf[SIZE];
};
A a;
a = a;
Through A's field buf, it looks like probably that the default assign operation will call something like memcpy to assign an object X to Y, so what if assign an object to itself and there are no explicit assign operation defined, like a = a; above.
memcpy manual page:
DESCRIPTION
The memcpy() function copies n bytes from memory area src to memory area dest. The memory areas must not overlap. Use memmove(3) if the memory areas do overlap.
If use memcpy, there may some undefined behavior occur.
So, what's the default assign operation behavior in C++ object?
The assignment operator is not defined in terms of memcpy (§12.8/28).
The implicitly-defined copy/move assignment operator for a non-union
class X performs memberwise copy/move assignment of its subobjects.
The direct base classes of X are assigned first, in the order of their
declaration in the base-specifier-list, and then the immediate
non-static data members of X are assigned, in the order in which they
were declared in the class definition. Let x be either the parameter
of the function or, for the move operator, an xvalue referring to the
parameter. Each subobject is assigned in the manner appropriate to its
type:
[...]
— if the subobject is an array, each element is assigned, in the
manner appropriate to the element type;
[...]
As you see, each char element will be assigned individually. That is always safe.
However, under the as-if rule, a compiler may replace this with a memmove because it has identical behaviour for a char array. It could also replace it with a memcpy if it can guarantee that memcpy will result in this same behaviour, even if theoretically such a thing is undefined. Compilers can rely on theoretically undefined behaviour; one of the reasons undefined behaviour exists is so that compilers can define it to whatever is more appropriate for their operation.
Actually, in this case a compiler could take the as-if rule even further and not do anything with the array at all, since that also results in the same behaviour.
Default assign (and copy) behaviour does not memcpy the whole class, which would break things. Each member is copied using their copy constructor or assignment operator (depending on operation). This is applied recursively for members and their members. When a basic data type is reached, it simply performs a straight copy of data, similar to memcpy. So an array of basic data types may be copied similar to memcpy, but the whole class is not. If you add std::string to your class its = operator would be called, alongside copy of array. If you used array of std::string, each string in your array will have their operator called. They won't memcpy.
Some limited experimentation tells me that g++ completely removes any attempt to copy a = a; [assuming it is obvious - I'm sure with sufficient messing about with pointers, it will eventually be possible to copy the same object over itself, and get undefined behaviour].
If use memcpy, there may some undefined behavior occur.
It's an implementation detail how the given class will be copied. Both memcpy() function and copy constructor will be converted into some machine code. However your objects in memory should not overlap because default assignment does not guarantee you'll have a proper result in case they overlap.
So, what's the default assign operation behavior in C++ object?
As in other responses, the behaviour is such that it will call assignments on all class/struct members recursively. However technically, as in your case, it may just copy whole block of memory, especially if your structure is POD (plain old data).