Is it legal to construct data members of a struct separately? - c++

class A;
class B;
//we have void *p pointing to enough free memory initially
std::pair<A,B> *pp=static_cast<std::pair<A,B> *>(p);
new (&pp->first) A(/*...*/);
new (&pp->second) B(/*...*/);
After the code above get executed, is *pp guaranteed to be in a valid state? I know the answer is true for every compiler I have tested, but the question is whether this is legal according to the standard and hence. In addition, is there any other way to obtain such a pair if A or B is not movable in C++98/03? (thanks to #StoryTeller-UnslanderMonica , there is a piecewise constructor for std::pair since C++11)

“Accessing” the members of the non-existent pair object is undefined behavior per [basic.life]/5; pair is never a POD-class (having user-declared constructors), so a pointer to its storage out of lifetime may not be used for its members. It’s not clear whether forming a pointer to the member is already undefined, or if the new is.
Neither is there a way to construct a pair of non-copyable (not movable, of course) types in C++98—that’s why the piecewise constructor was added along with move semantics.

A more simple question: is using a literal string well defined?
Not even that is, as its lifetime is not defined. You can't use a string literal in conforming code.
So the committee that never took the time to make string literals well defined obviously did not bother with specifying which objects of class type can be made to exist by placement new of its subobjects - polymorphic objects obviously cannot be created that way!
That standard did not even bother describing the semantic of union.
About lifetime the standard is all over the place, and that isn't just editorial: it reflects a deep disagreement between serious people about what begins a lifetime, what an object is, what an lvalue is, etc.
Notably people have all sorts of false or contradicting intuitions:
an infinite number of objects cannot be created by one call to malloc
an lvalue refers to an object
overlapping objects are against the object model
a unnamed object can only be created by new or by the compiler (temporaries)
...

Related

Is it really possible to separate storage allocation from object initialization?

From [basic.life/1]:
The lifetime of an object or reference is a runtime property of the object or reference. A variable is said to have vacuous initialization if it is default-initialized and, if it is of class type or a (possibly multi-dimensional) array thereof, that class type has a trivial default constructor. The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
its initialization (if any) is complete (including vacuous initialization) ([dcl.init]),
except that if the object is a union member or subobject thereof, its lifetime only begins if that union member is the initialized member in the union ([dcl.init.aggr], [class.base.init]), or as described in [class.union] and [class.copy.ctor], and except as described in [allocator.members].
From [dcl.init.general/1]:
If no initializer is specified for an object, the object is default-initialized.
From [basic.indet/1]:
When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced ([expr.ass]).
[Note 1: Objects with static or thread storage duration are zero-initialized, see [basic.start.static]. — end note]
Consider this C++ program:
int main() {
int i;
i = 3;
return 0;
}
Is initialization performed in the first statement int i; or second statement i = 3; of the function main according to the C++ standard?
I think it is the former, which performs vacuous initialization to an indeterminate value and therefore begins the lifetime of the object (the latter does not perform initialization, it performs assignment to the value 3). If that is so, is it really possible to separate storage allocation from object initialization?
If that is so, is it really possible to separate storage allocation from object initialization?
Yes:
void *ptr = malloc(sizeof(int));
ptr points to allocated storage, but no objects live in that storage (C++20 says that some objects may be there, but nevermind that now). Objects won't exist unless we create some there:
new(ptr) int;
You are getting confused between allocating storage for an object and initializing an object, and they are definitely not the same thing.
In your example, the object i is never initialized. As a local value, it has space reserved for its storage, but it is not initialized with any value.
The second line’s statement assigns a value of 3 to it. This again is not initialization.
Objects with global storage are required by the standard to be both allocated and initialized (to zero or whatever the default initializer does). All other objects are only initialized if the written language construct can support it.
C++ allocators work on this same distinction. The new operator, behind the scenes, both allocates and initializes objects, after which you may assign the object a new value. If you need to, though, you can use the underlying language constructs to manage the two things separately.
For most purposes you do not need to care about the difference between object initialization and assignment in your code. If you get to the point where it matters, you either already know how the concepts differ or need to learn really quickly.
Is initialization performed in the first statement int i; or second statement i = 3; of the function main according to the C++ standard?
The first. The second statement is assignment, not initialization. The second statement marks the point "until that value is replaced ([expr.ass])" from your quotes of the standard.
If [initialization is the first statement] is so, is it really possible to separate storage allocation from object initialization?
Yes, but not in a such a simple example as yours. A common example that comes to mind is a std::vector. Reserving capacity allocates storage space, but that storage is not initialized until an object is added to the vector.
std::vector<int> v; // Allocates and initializes the vector object.
v.reserve(1); // Ensures space has been allocated for an int object.
/*
At this point, the first contained element has space allocated, but has
not yet been initialized. If you want to do nutty things between allocation
and object initialization, this is the place to do it. Note that you are
not allowed to access the allocated space since it belongs to the vector.
You'd have to replicate the inner workings of a vector to do that...
*/
v.emplace_back(3); // Initializes the first contained object.
Quoting the standard
Short version:
There is nothing to quote because the standard does not explicitly prohibit all spurious actions. Compilers avoid spurious actions by their own volition.
Long version:
Strictly speaking, the standard does not guarantee that reserve() does not initialize anything. The requirements imposed on reserve() in [vector.capacity] are more focused on what must be done than on prohibiting spurious activity. The closest it comes to this guarantee is the requirement that the time complexity of reserve() be linear in the size of the container (not in the capacity, but in the current size). This would make it impossible to always initialize everything that was reserved. However, a compiler could still choose to initialize a fixed number of reserved elements, say up to 10 million of them. As long as this limit is fixed, it counts as constant-time complexity, so is allowed by [vector.capacity].
Now let's get real. Compilers are designed to produce fast code, without introducing unnecessary, useless busywork. Compilers do not seek out the possibility of doing additional work simply because the standard does not prohibit it. Except for debug builds, no compiler is going to introduce an initialization when it it not required. The people who view the possibility as something worth considering are language lawyers who lose sight of the big picture. You don't pay for what you don't need. The question to ask here is not "Could you quote the standard supporting that no initialization happens?" but "Could you quote the standard supporting that no initialization is required?" Since the additional work of initialization is not required, it will not happen in practice.
Still, reality means little to some language lawyers, and this question does have that tag. To be thorough, I will demonstrate that it is "possible to separate storage allocation from object initialization" even if you happen to use a pathological, yet standards-compliant, compiler that was over-engineered by masochists. I need only one case to demonstrate "possible", so let's abandon int for a more bizarre, yet fully legal, type.
The sole precondition for reserve() is that the contained type can be move-inserted into the container. This precondition is satisfied by the following class.
class C {
// Default construction is not supported.
C() = delete;
public:
// Move construction is allowed, even outside this class.
C(C &&) = default;
};
I have designed this class to be rather hard to initialize. The only allowed construction is move-construction; in order to initialize an object of this type, you need to already have an object of this type. Who creates the first object? No one. No objects of this type can exist. However, one can still create a vector of these objects (an empty vector, but still a vector).
It is legal to define std::vector<C> v;, and to follow that by a call to v.reserve(1);. This allocates space (1 byte is needed on my system) for an object of type C, and yet there is no possible initialization of this object. QED.

Does a member have to be initialized to take its address?

Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
#include <string>
class Klass {
public:
Klass()
: ptr_str{&str}
, str{}
{}
private:
std::string *ptr_str;
std::string str;
};
this question is similar to mine, but the order is correct there, and the answer says
I'd advise against coding like this in case someone changes the order of the members in your class.
Which seems to mean reversing the order would be illegal but I couldn't be sure.
Does a member have to be initialized to take its address?
No.
Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
Yes. Yes.
There is no restriction that operand of unary & need to be initialised. There is an example in the standard in specification of unary & operator:
int a;
int* p1 = &a;
Here, the value of a is indeterminate and it is OK to point to it.
What that example doesn't demonstrate is pointing to an object before its lifetime has begun, which is what happens in your example. Using a pointer to an object before and after its lifetime is explicitly allowed if the storage is occupied. Standard draft says:
[basic.life] Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways ...
The rule goes on to list how the usage is restricted. You can get by with common sense. In short, you can treat it as you could treat a void*, except violating these restrictions is UB rather than ill-formed. Similar rule exists for references.
There are also restrictions on computing the address of non-static members specifically. Standard draft says:
[class.cdtor] ... To form a pointer to (or access the value of) a direct non-static member of an object obj, the construction of obj shall have started and its destruction shall not have completed, otherwise the computation of the pointer value (or accessing the member value) results in undefined behavior.
In the constructor of Klass, the construction of Klass has started and destruction hasn't completed, so the above rule is satisfied.
P.S. Your class is copyable, but the copy will have a pointer to the member of another instance. Consider whether that makes sense for your class. If not, you will need to implement custom copy and move constructors and assignment operators. A self-reference like this is a rare case where you may need custom definitions for those, but not a custom destructor, so it is an exception to the rule of five (or three).
P.P.S If your intention is to point to one of the members, and no object other than a member, then you might want to use a pointer to member instead of pointer to object.
Funny question.
It is legitimate and will "work", though barely. There is a little "but" related to types which makes the whole thing a bit awkward with a bad taste (but not illegitimate), and which might make it illegal some border cases involving inheritance.
You can, of course, take the address of any object whether it's initialized or not, as long as it exists in the scope and has a name which you can prepend operator& to. Dereferencing the pointer is a different thing, but that wasn't the question.
Now, the subtle problem is that the standard defines the result of operator& for non-static struct members as "“pointer to member of class C of type T” and is a prvalue designating C::m".
Which basically means that ptr_str{&str} will take the address of str, but the type is not pointer-to, but pointer-to-member-of. It is then implicitly and silently cast to pointer-to.
In other words, although you do not need to explicitly write &this->str, that's nevertheless what its type is -- it's what it is and what it means [1].
Is this valid, and is it safe to use it within the initializer list? Well yes, just... barely. It's safe to use it as long as it's not being used to access uninitialized members or virtual functions, directly or indirectly. Which, as it happens, is the case here (it might not be the case in a different, arguably contrived case).
[1] Funnily, paragraph 4 starts with a clause that says that no member pointer is formed when you put stuff in parentheses. That's remarkable because most people would probably do that just to be 100% sure they got operator precedence right. But if I read correctly, then &this->foo and &(this->foo) are not in any way the same!

Is it legal to use placement new on initialised memory?

I am exploring the possibility of implementing true (partially) immutable data structures in C++. As C++ does not seem to distinguish between a variable and the object that variable stores, the only way to truly replace the object (without assignment operation!) is to use placement new:
auto var = Immutable(state0);
// the following is illegal as it requires assignment to
// an immutable object
var = Immutable(state1);
// however, the following would work as it constructs a new object
// in place of the old one
new (&var) Immutable(state1);
Assuming that there is no non-trivial destructor to run, is this legal in C++ or should I expect undefined behaviour? If its standard-dependant, which is the minimal/maximal standard version where I can expect this to work?
Addendum: since it seems people still read this in 2019, a quick note — this pattern is actually legally possible in modern (post 17) C++ using std::launder().
What you wrote is technically legal but almost certainly useless.
Suppose
struct Immutable {
const int x;
Immutable(int val):x(val) {}
};
for our really simple immutable type.
auto var = Immutable(0);
::new (&var) Immutable(1);
this is perfectly legal.
And useless, because you cannot use var to refer to the state of the Immutable(1) you stored within it after the placement new. Any such access is undefined behavior.
You can do this:
auto var = Immutable(0);
auto* pvar1 = ::new (&var) Immutable(1);
and access to *pvar1 is legal. You can even do:
auto var = Immutable(0);
auto& var1 = *(::new (&var) Immutable(1));
but under no circumstance may you ever refer to var after you placement new'd over it.
Actual const data in C++ is a promise to the compiler that you'll never, ever change the value. This is in comparison to references to const or pointers to const, which is just a suggestion that you won't modify the data.
Members of structures declared const are "actually const". The compiler will presume they are never modified, and won't bother to prove it.
You creating a new instance in the spot where an old one was in effect violates this assumption.
You are permitted to do this, but you cannot use the old names or pointers to refer to it. C++ lets you shoot yourself in the foot. Go right ahead, we dare you.
This is why this technique is legal, but almost completely useless. A good optimizer with static single assignment already knows that you would stop using var at that point, and creating
auto var1 = Immutable(1);
it could very well reuse the storage.
Caling placement new on top of another variable is usually defined behaviour. It is usually a bad idea, and it is fragile.
Doing so ends the lifetime of the old object without calling the destructor. References and pointers to and the name of the old object refer to the new one if some specific assumptions hold (exact same type, no const problems).
Modifying data declared const, or a class containing const fields, results in undefined behaviour at the drop of a pin. This includes ending the lifetime of an automatic storage field declared const and creating a new object at that location. The old names and pointers and references are not safe to use.
[Basic.life 3.8]/8:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or
released, a new object is created at the storage location which the original object occupied, a pointer that
pointed to the original object, a reference that referred to the original object, or the name of the original
object will automatically refer to the new object and, once the lifetime of the new object has started, can be
used to manipulate the new object, if:
(8.1)
the storage for the new object exactly overlays the storage location which the original object occupied,
and
(8.2)
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
(8.3)
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static
data member whose type is const-qualified or a reference type, and
(8.4)
the original object was a most derived object (1.8) of type
T
and the new object is a most derived
object of type
T
(that is, they are not base class subobjects).
In short, if your immutability is encoded via const members, using the old name or pointers to the old content is undefined behavior.
You may use the return value of placement new to refer to the new object, and nothing else.
Exception possibilities make it extremely difficult to prevent code that exdcutes undefined behaviour or has to summarially exit.
If you want reference semantics, either use a smart pointer to a const object or an optional const object. Both handle object lifetime. The first requires heap allocation but permits move (and possibly shared references), the second permits automatic storage. Both move manual object lifetime management out of business logic. Now, both are nullable, but avoiding that robustly is difficult doing it manually anyhow.
Also consider copy on write pointers that permit logically const data with mutation for efficiency purposes.
From the C++ standard draft N4296:
3.8 Object lifetime
[...]
The lifetime of an object of type T ends when:
(1.3) — if T is a class
type with a non-trivial destructor (12.4), the destructor call starts,
or
(1.4) — the storage which the object occupies is reused or
released.
[...]
4 A program may end the lifetime of any object by
reusing the storage which the object occupies or by explicitly calling
the destructor for an object of a class type with a non-trivial
destructor. For an object of a class type with a non-trivial
destructor, the program is not required to call the destructor
explicitly before the storage which the object occupies is reused or
released; however, if there is no explicit call to the destructor or
if a delete-expression (5.3.5) is not used to release the storage, the
destructor shall not be implicitly called and any program that depends
on the side effects produced by the destructor has undefined behavior.
So yes, you can end the lifetime of an object by reusing its memory, even of one with non-trivial destructor, as long as you don't depend on the side effects of the destructor call.
This applies when you have non-const instances of objects like struct ImmutableBounds { const void* start; const void* end; }
You've actually asked 3 different questions :)
1. The contract of immutability
It's just that - a contract, not a language construct.
In Java for instance, instances of String class are immutable. But that means that all methods of the class have been designed to return new instances of class rather than modifying the instance.
So if you would like to make Java's String into a mutable object, you couldn't, without having access to its source code.
Same applies to classes written in C++, or any other language. You have an option to create a wrapper (or use a Proxy pattern), but that's it.
2. Using placement constructor and allocating into an initialized are off memory.
That's actually what they were created to do in the first place.
The most common use case for the placement constructor are memory pools - you preallocate a large memory buffer, and then you allocate your stuff into it.
So yes - it is legal, and nobody won't mind.
3. Overwriting class instance's contents using a placement allocator.
Don't do that.
There's a special construct that handles this type of operation, and it's called a copy constructor.

Managing trivial types

I have found the intricacies of trivial types in C++ non-trivial to understand and hope someone can enlighten me on the following.
Given type T, storage for T allocated using ::operator new(std::size_t) or ::operator new[](std::size_t) or std::aligned_storage, and a void * p pointing to a location in that storage suitably aligned for T so that it may be constructed at p:
If std::is_trivially_default_constructible<T>::value holds, is the code invoking undefined behavior when code skips initialization of T at p (i.e. by using T * tPtr = new (p) T();) before otherwise accessing *p as T? Can one just use T * tPtr = static_cast<T *>(p); instead without fear of undefined behavior in this case?
If std::is_trivially_destructible<T>::value holds, does skipping destruction of T at *p (i.e by calling tPtr->~T();) cause undefined behavior?
For any type U for which std::is_trivially_assignable<T, U>::value holds, is std::memcpy(&t, &u, sizeof(U)); equivalent to t = std::forward<U>(u); (for any t of type T and u of type U) or will it cause undefined behavior?
No, you can't. There is no object of type T in that storage, and accessing the storage as if there was is undefined. See also T.C.'s answer here.
Just to clarify on the wording in [basic.life]/1, which says that objects with vacuous initialization are alive from the storage allocation onward: that wording obviously refers to an object's initialization. There is no object whose initialization is vacuous when allocating raw storage with operator new or malloc, hence we cannot consider "it" alive, because "it" does not exist. In fact, only objects created by a definition with vacuous initialization can be accessed after storage has been allocated but before the vacuous initialization occurs (i.e. their definition is encountered).
Omitting destructor calls never per se leads to undefined behavior. However, it's pointless to attempt any optimizations in this area in e.g. templates, since a trivial destructor is just optimized away.
Right now, the requirement is being trivially copyable, and the types have to match. However, this may be too strict. Dos Reis's N3751 at least proposes distinct types to work as well, and I could imagine this rule being extended to trivial copy assignment across one type in the future.
However, what you've specifically shown does not make a lot of sense (not least because you're asking for assignment to a scalar xvalue, which is ill-formed), since trivial assignment can hold between types whose assignment is not actually "trivial", that is, has the same semantics as memcpy. E.g. is_trivially_assignable<int&, double> does not imply that one can be "assigned" to the other by copying the object representation.
Technically reinterpreting storage is not enough to introduce a new object as. Look at the note for Trivial default constructor states:
A trivial default constructor is a constructor that performs no action. All data types compatible with the C language (POD types) are trivially default-constructible. Unlike in C, however, objects with trivial default constructors cannot be created by simply reinterpreting suitably aligned storage, such as memory allocated with std::malloc: placement-new is required to formally introduce a new object and avoid potential undefined behavior.
But the note says it's a formal limitation, so probably it is safe in many cases. Not guaranteed though.
No. is_assignable does not even guarantee the assignment will be legal under certain conditions:
This trait does not check anything outside the immediate context of the assignment expression: if the use of T or U would trigger template specializations, generation of implicitly-defined special member functions etc, and those have errors, the actual assignment may not compile even if std::is_assignable::value compiles and evaluates to true.
What you describe looks more like is_trivially_copyable, which says:
Objects of trivially-copyable types are the only C++ objects that may be safely copied with std::memcpy or serialized to/from binary files with std::ofstream::write()/std::ifstream::read().
I don't really know. I would trust KerrekSB's comments.

What are some of the disadvantages of using a reference instead of a pointer?

Given a class "A" exists and is correct. What would be some of the negative results of using a reference to "A" instead of a pointer in a class "B". That is:
// In Declaration File
class A;
class B
{
public:
B();
~B();
private:
A& a;
};
// In Definition File
B::B(): a(* new A())
{}
B::~B()
{
delete &a;
}
Omitted extra code for further correctness of "B", such as the copy constructor and assignment operator, just wanted to demonstrate the concept of the question.
The immediate limitations are that:
You cannot alter a reference's value. You can alter the A it refers to, but you cannot reallocate or reassign a during B's lifetime.
a must never be 0.
Thus:
The object is not assignable.
B should not be copy constructible, unless you teach A and its subtypes to clone properly.
B will not be a good candidate as an element of collections types if stored as value. A vector of Bs would likely be implemented most easily as std::vector<B*>, which may introduce further complications (or simplifications, depending on your design).
These may be good things, depending on your needs.
Caveats:
slicing is another problem to be aware of if a is assignable and assignment is reachable within B.
You can't change the object referred to by a after the fact, e.g. on assignment. Also, it makes your type non-POD (the type given would be non-POD anyway due to the private data member anyway, but in some cases it might matter).
But the main disadvantage is probably it might confuse readers of your code.
Of course, by adding a reference member to your class B means that the compiler can no longer generate the implicit default and copy constructors, and assignment operators; and neither a manually written assignment operator can reasign a.
I don't think there are negative results, other than the fact that delete &a may look odd. The fact that the object was created by new is somewhat lost by binding the result to a reference, and it may only matter since the fact that its lifetime has to be controlled by B is not clear.
If you use a reference:
You must provide the value at construction time
You cannot change what it refers to
It cannot be null
It will prevent your class from being assignable
You might perhaps consider using a smart pointer of some sort instead (std::unique_ptr, std::shared_ptr, etc). This has the added benefit of automatically deleting the object for you.
A reference to a dynamically-allocated object violates the principle of least surprise; that is, no one normally expects code written as you have. In general that will make the maintenance cost higher in the future.
The problem with references is that it looks like a alias or other name for same variable but it you look at the machine code generated internally they use constant pointers to do operations, and this can be a performance issue because you may assume within a code that variable arithmetic is taking place but at assembly code level they are manipulating pointers which you may not realize at code level.
You may not want to use pointer manipulation where you want fast integer or float arithmetic.