Class that holds a reference to itself - c++

Skimming through the standard draft (n3242) I found this sentence in Clause 9.2 (emphasis mine):
Non-static (9.4) data members shall not have incomplete types. In
particular, a class C shall not contain a non-static member of class
C, but it can contain a pointer or reference to an object of class
C.
From this I argue that is fine to define a class like this:
class A {
public:
A(A& a) : a_(a){
}
private:
A& a_;
};
Then in clause 8.3.2 I found the following:
A reference shall be initialized to refer to a valid object or
function
Question 1: Is it permitted to define an object of this type passing its name as a reference:
A a(a);
or will this trigger undefined behavior?
Question 2: If yes, what are the parts of the standard that permit the initialization of the reference from a still-to-be-constructed object?
Question 3: If no, does this mean the definition of class A is well formed but no first object can be created without triggering UB? In this case what is the rationale behind this?

"valid object" is not defined anywhere in the standard, but it is intented to mean a region of memory with appropriate size and alignment that can contain an object of the specified type. It just means to exclude references to such things as dereferenced null pointers, misaligned regions of memory, etc. An uninitialised object is valid.
There is an open issue to clear up the wording, CWG 453.

n3337 § 3.8/6
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see 12.7. Otherwise, such a glvalue
refers to allocated storage (3.7.4.2), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
— an lvalue-to-rvalue conversion (4.1) is
applied to such a glvalue,
— the glvalue is used to access a
non-static data member or call a non-static member function of the
object, or
— the glvalue is implicitly converted (4.10) to a reference
to a base class type, or
— the glvalue is used as the operand of a
static_cast (5.2.9) except when the conversion is ultimately to cv
char& or cv unsigned char&, or
— the glvalue is used as the operand of
a dynamic_cast (5.2.7) or as the operand of typeid.
So, to answer your questions:
Question 1: Is it permitted to define an object of this type passing
its name as a reference?
Yes. Using just the address seems not to violate this (at least for a variable put on stack).
A a(a);
or will this trigger undefined behavior?
No.
Question 2: If yes, what are the parts of the standard that permit the
initialization of the reference from a still-to-be-constructed object?
§ 3.8/6 (above)
The only question that remains is how this correspond to
A reference shall be initialized to refer to a valid object or
function.
The problem is in term valid object. Because § 8.3.2/4 says that
It is unspecified whether or not a reference requires storage
it seems that § 8.3.2 is problematic and should be reworded. The confusion lead to change proposed in document C++ Standard Core Language Active Issues, Revision 87 dated on 20.01.2014:
A reference shall be initialized to refer to an object or function.
Change 8.3.2 [dcl.ref] paragraph 4 as follows:
If an lvalue to which a reference is directly bound designates neither
an existing object or function of an appropriate type (8.5.3
[dcl.init.ref]), nor a region of memory of suitable size and alignment
to contain an object of the reference's type (1.8 [intro.object], 3.8
[basic.life], 3.9 [basic.types]), the behavior is undefined.

From n1905, 3.3.1.1
The point of declaration for a name is immediately after its complete
declarator (clause 8 ) and before its initializer (if any), except as
noted below.
[ Example:
int x = 12;
{ int x = x; }
Here the second x
is initialized with its own (indeterminate) value.
—end example ]
My emphasis ( correct me if I am wrong ): In your example -
A a(a);
is equivalent to -
A a = a; // Copy initialization
So, according to standard a is initialized with it's own indeterminate value. And the member is holding reference to one such indeterminate value.

Related

C++ lifetime of union member

In the current version of the C++ standard draft, [basic.life]/1 states:
The lifetime of an object or reference is a runtime property of the
object or reference. A variable is said to have vacuous initialization
if it is default-initialized and, if it is of class type or a
(possibly multi-dimensional) array thereof, that class type has a
trivial default constructor. The lifetime of an object of type T
begins when:
storage with the proper alignment and size for type T is obtained, and
its initialization (if any) is complete (including vacuous
initialization) ([dcl.init]),
except that if the object is a union
member or subobject thereof, its lifetime only begins if that union
member is the initialized member in the union ([dcl.init.aggr],
[class.base.init]), or as described in [class.union]. [...]
From that paragraph I understand that the only way a member of a union begins its lifetime is if:
that member "is the initialized member in the union" (e.g. if it is referenced in a mem-initializer), or
some other way mentioned in [class.union]
However, the only normative paragraph in [class.union] that specifies how a union member can begin its lifetime is [class.union]/5 (but it only applies to specific types, i.e. either non-class, non-array, or class type with a trivial constructor that is not deleted, or array of such types).
The next paragraph, [class.union]/6 (comprising a note and an example, therefore it contains no normative text), describes a way to change the active member of a union, by using a placement new-expression, such as new (&u.n) N;, where
struct N { N() { /* non-trivial constructor */ } };
struct M { M() { /* non-trivial constructor */ } };
union
{
N n;
M m;
} u;
My question is where in the standard is it specified that new (&u.n) N; begins the lifetime of u.n?
Thank you!
An important rule regarding this is:
[class.union]/1
In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended ([basic.life]). ...
As far as this rule is considered, the active member could change at any time a member object begins its lifetime. The rule [class.union]/5 further allows changing the active member also by assigning to a non-active member of a limited set of types. The lack of a separate rule for placement new by itself doesn't disallow changing the member. If it begins the lifetime of the member, then the member is the active member of the union.
So, [basic.life/1] says that the lifetime of the member begins only if [class.union] says so1, and [class.union/1] says that the member is active only if its lifetime has begun2. This does seem like a bit of a catch-22.
My best attempt at reading the rules in a way that makes sense is to interpret that placement-new begins the lifetime of the member, therefore [class.union/1] applies, and therefore "or as described in [class.union]" applies and therefore the highlighted exception doesn't apply. Next I would like to say therefore the lifetime begins, but that logic is circular.
The non-normative [class.union]/6 makes it quite clear that the placement new is intended to be allowed, but the normative rules are tangled. I would say that the wording could be improved.
1 (or when the union is initialised with that member, which isn't the case we are considering)
2 (or after assignment as per [class.union]/5, which isn't the case we are considering)
My question is where in the standard is it specified that new (&u.n) N; begins the lifetime of u.n?
Nowhere. Placement new creates a new object which becomes union member subobject per [intro.object]/2:
If an object is created in storage associated with a member subobject or array element e (which may or may not be within its lifetime), the created object is a subobject of e's containing object if:
— the lifetime of e's containing object has begun and not ended, and
— the storage for the new object exactly overlays the storage location associated
with e, and
— the new object is of the same type as e (ignoring cv-qualification).
C++ can't define unions at that point, because:
lvalues by definition must refer to an object
objects have a lifetime; it isn't clear what's a pre-lifetime object
in theory two unrelated objects can't be at the same address, but pre-lifetime objects can, so there is no such thing as pre-lifetime object
So mutable unions can't be well defined in C++, end of story.

Does copying an empty object involve accessing it

Inspired from this question.
struct E {};
E e;
E f(e); // Accesses e?
To access is to
read or modify the value of an object
The empty class has an implicitly defined copy constructor
The implicitly-defined copy/move constructor for a non-union class X performs a memberwise copy/move of its bases and members. [...] The order of initialization is the same as the order of initialization of bases and members in a user-defined constructor. Let x be either the parameter of the constructor or, for the move constructor, an xvalue referring to the parameter. Each base or non-static data member is copied/moved in the manner appropriate to its type:
[...] the base or member is direct-initialized with the corresponding base or member of x.
I think that the part of the standard that describes the most precisely what performs an access is [basic.life]. In this paragraph it is explained what can be done with a reference that refers to, or a pointer that point to, an object which is out of its lifetime period. Everything that is authorized to do with such entities does not perform an access to the object value since such value does not exist (otherwise the standard would be inconsistent).
So we can take a more drastic example, if this is not undefined behavior, so there are no access to e in your example code (accordingly to the reasonning above):
struct E{
E()=default;
E(const E&){}
};
E e;
e.~E();
E f(e);
Here e is an object whose lifetime has ended but whose storage is still allocated. What can be done with such a lvalue is described in [basic.life]/6
Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise, such a glvalue refers to allocated storage ([basic.stc.dynamic.deallocation]), and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:
an lvalue-to-rvalue conversion ([conv.lval]) is applied to such a glvalue,
the glvalue is used to access a non-static data member or call a non-static member function of the object, or
the glvalue is implicitly converted ([conv.ptr]) to a reference to a base class type, or
the glvalue is used as the operand of a static_cast ([expr.static.cast]) except when the conversion is ultimately to cv char& or cv unsigned char&, or
the glvalue is used as the operand of a dynamic_cast ([expr.dynamic.cast]) or as the operand of typeid.
None of the cited point above does happen inside E copy constructor so the example code in this answer is well defined, which implies that there have been no access to the value of the destroyed object. So there is no access to e in your example code.
I think it does not access the object, though a valid object is required to be present.
E f(e);
This calls E's implicitly defined constructor E::E(const E&). Obviously the body of this constructor is empty (because there is nothing to do). So if anything happens, it must happen during argument passing, i.e. during the initialization of const E& from e.
It is self-evident that this initialization does not modify e. Now, to read the value of e, a lvalue-to-rvalue conversion must take place. However, the standard actually says that this conversion does not take place during direct reference binding1. That is to say, no read is performed.
However, the standard does require that a reference must be initialized to refer to a valid object or function2 (though this is subject to CWG 453), so things like E f(*reinterpret_cast<E*>(nullptr)); will be ill-formed.
1. This is done by not normatively requiring such conversion, and further strengthened by the non-normative note in [dcl.init.ref].
2. [dcl.ref].

Creating an invalid reference via reinterpret cast

I am trying to determine whether the following code invokes undefined behavior:
#include <iostream>
class A;
void f(A& f)
{
char* x = reinterpret_cast<char*>(&f);
for (int i = 0; i < 5; ++i)
std::cout << x[i];
}
int main(int argc, char** argue)
{
A* a = reinterpret_cast<A*>(new char[5])
f(*a);
}
My understanding is that reinterpret_casts to and from char* are compliant because the standard permits aliasing with char and unsigned char pointers (emphasis mine):
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
However, I am not sure whether f(*a) invokes undefined behavior by creating a A& reference to the invalid pointer. The deciding factor seems to be what "attempts to access" verbiage means in the context of the C++ standard.
My intuition is that this does not constitute an access, since an access would require A to be defined (it is declared, but not defined in this example). Unfortunately, I cannot find a concrete definition of "access" in the C++ standard:
Does f(*a) invoke undefined behavior? What constitutes "access" in the C++ standard?
I understand that, regardless of the answer, it is likely a bad idea to rely on this behavior in production code. I am asking this question primarily out of a desire to improve my understanding of the language.
[Edit] #SergeyA cited this section of the standard. I've included it here for easy reference (emphasis mine):
5.3.1/1 [expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.” [Note: indirection through a pointer to an incomplete type (other than cv void) is valid. The lvalue thus obtained can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to a prvalue, see 4.1. — end note ]
Tracing the reference to 4.1, we find:
4.1/1 [conv.lval]
A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If T is a non-class type, the type of the prvalue is the cv-unqualified version of T. Otherwise, the type of the prvalue is T.
When an lvalue-to-rvalue conversion is applied to an expression e, and either:
e is not potentially evaluated, or
the evaluation of e results in the evaluation of a member ex of the set of potential results of e, and ex names a variable x that is not odr-used by ex (3.2)
the value contained in the referenced object is not accessed.
I think our answer lies in whether *a satisfies the second bullet point. I am having trouble parsing that condition, so I am not sure.
char* x = reinterpret_cast<char*>(&f); is valid. Or, more specifically, access through x is allowed - the cast itself is always valid.
A* a = reinterpret_cast<A*>(new char[5]) is not valid - or, to be precise, access through a will trigger undefined behaviour.
The reason for this is that while it's OK to access object through a char*, it's not OK to access array of chars through a random object. Standard allows first, but not the second.
Or, in layman terms, you can alias a type* through char*, but you can't alias char* through type*.
EDIT
I just noticed I didn't answer direct question ("What constitutes "access" in the C++ standard"). Apparently, Standard does not define access (at least, I was not able to find the formal definition), but dereferencing the pointer is commonly understood to qualify for access.

Can initializing expression use the variable itself?

Consider the following code:
#include <iostream>
struct Data
{
int x, y;
};
Data fill(Data& data)
{
data.x=3;
data.y=6;
return data;
}
int main()
{
Data d=fill(d);
std::cout << "x=" << d.x << ", y=" << d.y << "\n";
}
Here d is copy-initialized from the return value of fill(), but fill() writes to d itself before returning its result. What I'm concerned about is that d is non-trivially used before being initialized, and use of uninitialized variables in some(all?) cases leads to undefined behavior.
So is this code valid, or does it have undefined behavior? If it's valid, will the behavior become undefined once Data stops being POD or in some other case?
This does not seem like valid code. It is similar to the case outlined in the question: Is passing a C++ object into its own constructor legal?, although in that case the code was valid. The mechanics are not identical but the base reasoning can at least get us started.
We start with defect report 363 which asks:
And if so, what is the semantics of the self-initialization of UDT?
For example
#include <stdio.h>
struct A {
A() { printf("A::A() %p\n", this); }
A(const A& a) { printf("A::A(const A&) %p %p\n", this, &a); }
~A() { printf("A::~A() %p\n", this); }
};
int main()
{
A a=a;
}
can be compiled and prints:
A::A(const A&) 0253FDD8 0253FDD8
A::~A() 0253FDD8
and the proposed resolution was:
3.8 [basic.life] paragraph 6 indicates that the references here are valid. It's permitted to take the address of a class object before it
is fully initialized, and it's permitted to pass it as an argument to
a reference parameter as long as the reference can bind directly.
[...]
So although d is not fully initialized we can pass it as a reference.
Where we start to get into trouble is here:
data.x=3;
The draft C++ standard section 3.8(The same section and paragraph the defect report quotes) says (emphasis mine):
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see 12.7. Otherwise, such a glvalue
refers to allocated storage (3.7.4.2), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
an lvalue-to-rvalue conversion (4.1) is applied to such a glvalue,
the glvalue is used to access a non-static data member or call a non-static member function of the
object, or
the glvalue is bound to a reference to a virtual base class (8.5.3), or
the glvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.
So what does access mean? That was clarified with defect report 1531 which defines access as:
access
to read or modify the value of an object
So fill accesses a non-static data member and hence we have undefined behavior.
This also agrees with section 12.7 which says:
[...]To form a pointer to (or
access the value of) a direct non-static member of an object obj, the construction of obj shall have started
and its destruction shall not have completed, otherwise the computation of the pointer value (or accessing
the member value) results in undefined behavior.
Since you are using a copy anyway you might as well create an instance of Data inside of fill and initialize that. The you avoid having to pass d.
As pointed out by T.C. it is important to explicitly quote the details on when lifetime starts. From section 3.8:
The lifetime of an object is a runtime property of the object. An
object is said to have non-trivial initialization if it is of a class
or aggregate type and it or one of its members is initialized by a
constructor other than a trivial default constructor. [ Note:
initialization by a trivial copy/move constructor is non-trivial
initialization. — end note ] The lifetime of an object of type T
begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-trivial initialization, its initialization is complete.
The initialization is non-trivial since we are initializing via the copy constructor.
I don't see a problem. Accessing the uninitialized integer members is valid, because you're accessing for the purpose of writing. Reading them would cause UB.
I think it is valid ( crazy, but valid ).
This would be both legal and logically acceptable :
Data d ;
d = fill( d ) ;
and the fact is that this form is the same :
Data d = fill( d ) ;
As far as the logical structure of the language is concerned those two versions are equivalent.
So it's legal and logically correct for the language.
However, as we normally expect people to initialize variables to a default when we created them ( for safety ), it is bad programming practice.
It is interesting that g++ -Wall compiles this code without a blurp.

What is "a value not associated with an object"?

The C++11 and C++14 standard (and working draft, respectively) say in §3.10.1:
A prvalue (“pure” rvalue) is an rvalue that is not an xvalue. [Example: The result of calling a function
whose return type is not a reference is a prvalue. The value of a literal such as 12, 7.3e5, or true is
also a prvalue. —end example ]
and
An rvalue (so called, historically, because rvalues could appear on the right-hand side of an assignment
expression) is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated
with an object.
Which leads me to the question: How can an expression be "a value not associated with an object"?
I was under the impression, that it is the purpose of expressions to return objects or void (which I do not expect to be a value either).
Is there some simple and common example for such expressions?
Edit 1
To further complicate things, consider the following:
int const& x = 3;
int&& y = 4;
In context of §8.3.2.5, which contains the most interesting snippet:
[...] A reference shall be initialized to refer to a valid object or
function [...]
Which is reinforced by §8.5.3.1:
A variable declared to be a T& or T&&, that is, “reference to type T” (8.3.2), shall be initialized by an object,
or function, of type T or by an object that can be converted into a T. [...]
[intro.object]:
The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is a region of storage. [ Note: A function is not an object, regardless of whether or not it occupies storage in the way that objects do. —end note ] An object is created by a definition (3.1), by a new-expression (5.3.4) or by the implementation (12.2) when needed.
So "a value not associated with an object" is something created not by definition or with new-expression, which also means that it doesn't have corresponding region of storage, like for example a literal.
Edit: Except string literals (see comments)
Examples for such values are all non-array, non-class non-temporary prvalues (a temporary prvalue corresponds to a temporary object). Examples include 2.0 and 1. Counterexamples include "hello" (which is an array), std::string("haha") (which is a class object) or the float prvalue temporary initialized from 2 that is bound to the reference in (const float&){2} (the reference itself is an lvalue!). I think that this simple rule covers the rules accurately.
A C++ Standard's footnote on the lvalue to rvalue conversion says (a little bit outdated, because it was not amended to mention array types)
In C ++ class prvalues can have cv-qualified types (because they are objects). This differs from ISO C, in which non-lvalues never have cv-qualified types.
So the deeper reason that decltype((const int)0) still is type int is that it does not refer to an object. So because there is no object, there is nothing to make const, and consequently the expression will never be const either.
This quote is not as precisely worded as it could be:
An rvalue (so called, historically, because rvalues could appear on the right-hand side of an assignment expression) is an xvalue, a temporary object (12.2) or subobject thereof, or a value that is not associated with an object.
An rvalue is an expression , so it cannot be an object (temporary or otherwise). The intent of the section of this quote talking about temporary objects is to say that value resulting from evaluating the rvalue is a temporary object, and so on.
This is a common shortcut, e.g. with int x; we would casually say "x is in int" , when in fact x is an identifier; and the expression x has type int and designates an int.
Anyway, it divides possible rvalues up into three categories:
xvalue
temporary object
value not associated with an object
The definition of temporary object includes being an object of class type, so it seems to me that "value not associated with an object" should be any non-xvalue of non-class type. For example 1 + 1.