Can initializing expression use the variable itself? - c++

Consider the following code:
#include <iostream>
struct Data
{
int x, y;
};
Data fill(Data& data)
{
data.x=3;
data.y=6;
return data;
}
int main()
{
Data d=fill(d);
std::cout << "x=" << d.x << ", y=" << d.y << "\n";
}
Here d is copy-initialized from the return value of fill(), but fill() writes to d itself before returning its result. What I'm concerned about is that d is non-trivially used before being initialized, and use of uninitialized variables in some(all?) cases leads to undefined behavior.
So is this code valid, or does it have undefined behavior? If it's valid, will the behavior become undefined once Data stops being POD or in some other case?

This does not seem like valid code. It is similar to the case outlined in the question: Is passing a C++ object into its own constructor legal?, although in that case the code was valid. The mechanics are not identical but the base reasoning can at least get us started.
We start with defect report 363 which asks:
And if so, what is the semantics of the self-initialization of UDT?
For example
#include <stdio.h>
struct A {
A() { printf("A::A() %p\n", this); }
A(const A& a) { printf("A::A(const A&) %p %p\n", this, &a); }
~A() { printf("A::~A() %p\n", this); }
};
int main()
{
A a=a;
}
can be compiled and prints:
A::A(const A&) 0253FDD8 0253FDD8
A::~A() 0253FDD8
and the proposed resolution was:
3.8 [basic.life] paragraph 6 indicates that the references here are valid. It's permitted to take the address of a class object before it
is fully initialized, and it's permitted to pass it as an argument to
a reference parameter as long as the reference can bind directly.
[...]
So although d is not fully initialized we can pass it as a reference.
Where we start to get into trouble is here:
data.x=3;
The draft C++ standard section 3.8(The same section and paragraph the defect report quotes) says (emphasis mine):
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see 12.7. Otherwise, such a glvalue
refers to allocated storage (3.7.4.2), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
an lvalue-to-rvalue conversion (4.1) is applied to such a glvalue,
the glvalue is used to access a non-static data member or call a non-static member function of the
object, or
the glvalue is bound to a reference to a virtual base class (8.5.3), or
the glvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.
So what does access mean? That was clarified with defect report 1531 which defines access as:
access
to read or modify the value of an object
So fill accesses a non-static data member and hence we have undefined behavior.
This also agrees with section 12.7 which says:
[...]To form a pointer to (or
access the value of) a direct non-static member of an object obj, the construction of obj shall have started
and its destruction shall not have completed, otherwise the computation of the pointer value (or accessing
the member value) results in undefined behavior.
Since you are using a copy anyway you might as well create an instance of Data inside of fill and initialize that. The you avoid having to pass d.
As pointed out by T.C. it is important to explicitly quote the details on when lifetime starts. From section 3.8:
The lifetime of an object is a runtime property of the object. An
object is said to have non-trivial initialization if it is of a class
or aggregate type and it or one of its members is initialized by a
constructor other than a trivial default constructor. [ Note:
initialization by a trivial copy/move constructor is non-trivial
initialization. — end note ] The lifetime of an object of type T
begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-trivial initialization, its initialization is complete.
The initialization is non-trivial since we are initializing via the copy constructor.

I don't see a problem. Accessing the uninitialized integer members is valid, because you're accessing for the purpose of writing. Reading them would cause UB.

I think it is valid ( crazy, but valid ).
This would be both legal and logically acceptable :
Data d ;
d = fill( d ) ;
and the fact is that this form is the same :
Data d = fill( d ) ;
As far as the logical structure of the language is concerned those two versions are equivalent.
So it's legal and logically correct for the language.
However, as we normally expect people to initialize variables to a default when we created them ( for safety ), it is bad programming practice.
It is interesting that g++ -Wall compiles this code without a blurp.

Related

Why can we use an object during its declaration? [duplicate]

I am surprised to accidentally discover that the following works:
#include <iostream>
int main(int argc, char** argv)
{
struct Foo {
Foo(Foo& bar) {
std::cout << &bar << std::endl;
}
};
Foo foo(foo); // I can't believe this works...
std::cout << &foo << std::endl; // but it does...
}
I am passing the address of the constructed object into its own constructor. This looks like a circular definition at the source level. Do the standards really allow you to pass an object into a function before the object is even constructed or is this undefined behavior?
I suppose it's not that odd given that all class member functions already have a pointer to the data for their class instance as an implicit parameter. And the layout of the data members is fixed at compile time.
Note, I'm NOT asking if this is useful or a good idea; I'm just tinkering around to learn more about classes.
This is not undefined behavior. Although foo is uninitialized, you are using it a way that is allowed by the standard. After space is allocated for an object but before it is fully initialized, you are allowed to use it limited ways. Both binding a reference to that variable and taking its address are allowed.
This is covered by defect report 363: Initialization of class from self which says:
And if so, what is the semantics of the self-initialization of UDT?
For example
#include <stdio.h>
struct A {
A() { printf("A::A() %p\n", this); }
A(const A& a) { printf("A::A(const A&) %p %p\n", this, &a); }
~A() { printf("A::~A() %p\n", this); }
};
int main()
{
A a=a;
}
can be compiled and prints:
A::A(const A&) 0253FDD8 0253FDD8
A::~A() 0253FDD8
and the resolution was:
3.8 [basic.life] paragraph 6 indicates that the references here are valid. It's permitted to take the address of a class object before it is fully initialized, and it's permitted to pass it as an argument to a reference parameter as long as the reference can bind directly. Except for the failure to cast the pointers to void * for the %p in the printfs, these examples are standard-conforming.
The full quote of section 3.8 [basic.life] from the draft C++14 standard is as follows:
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see 12.7. Otherwise, such a glvalue
refers to allocated storage (3.7.4.2), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
an lvalue-to-rvalue conversion (4.1) is applied to such a glvalue,
the glvalue is used to access a non-static data member or call a non-static member function of the
object, or
the glvalue is bound to a reference to a virtual base class (8.5.3), or
the glvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.
We are not doing anything with foo that falls under undefined behavior as defined by the bullets above.
If we try this with Clang, we see an ominous warning (see it live):
warning: variable 'foo' is uninitialized when used within its own initialization [-Wuninitialized]
It is a valid warning since producing an indeterminate value from an uninitialized automatic variable is undefined behavior. However, in this case you are just binding a reference and taking the address of the variable within the constructor, which does not produce an indeterminate value and is valid. On the other hand, the following self-initialization example from the draft C++11 standard:
int x = x ;
does invoke undefined behavior.
Active issue 453: References may only bind to “valid” objects also seems relevant but is still open. The initial proposed language is consistent with Defect Report 363.
The constructor is called at a point where memory is allocated for the object-to-be. At that point, no object exists at that location (or possibly an object with a trivial destructor). Furthermore, the this pointer refers to that memory and the memory is properly aligned.
Since it's allocated and aligned memory, we may refer to it using lvalue expressions of Foo type (i.e. Foo&). What we may not yet do is have an lvalue-to-rvalue conversion. That's only allowed after the constructor body is entered.
In this case, the code just tries to print &bar inside the constructor body. It would even be legal to print bar.member here. Since the constructor body has been entered, a Foo object exists and its members may be read.
This leaves us with one small detail, and that's name lookup. In Foo foo(foo), the first foo introduces the name in scope and the second foo therefore refers back to the just-declared name. That's why int x = x is invalid, but int x = sizeof(x) is valid.
As said in other answers, an object can be initialized with itself as long as you do not use its values before they are initialized. You can still bind the object to a reference or take its address.
But beyond the fact that it is valid, let's explore a usage example.
The example below might be controversial, you can surely propose many other ideas for implementing it. And yet, it presents a valid usage of this strange C++ property, that you can pass an object into its own constructor.
class Employee {
string name;
// manager may change so we don't hold it as a reference
const Employee* pManager;
public:
// we prefer to get the manager as a reference and not as a pointer
Employee(std::string name, const Employee& manager)
: name(std::move(name)), pManager(&manager) {}
void modifyManager(const Employee& manager) {
// TODO: check for recursive connection and throw an exception
pManager = &manager;
}
friend std::ostream& operator<<(std::ostream& out, const Employee& e) {
out << e.name << " reporting to: ";
if(e.pManager == &e)
out << "self";
else
out << *e.pManager;
return out;
}
};
Now comes the usage of initializing an object with itself:
// it is valid to create an employee who manages itself
Employee jane("Jane", jane);
In fact, with the given implementation of class Employee, the user has no other choice but to initialize the first Employee ever created, with itself as its own manager, as there is no other Employee yet that can be passed. And in a way that makes sense, as the first employee created should manage itself.
Code: http://coliru.stacked-crooked.com/a/9c397bce622eeacd

Passing references to later members during construction [duplicate]

Consider the following example. When bar is constructed, it gives it's base type (foo) constructor the address of my_member.y where my_member is data member that hasn't been initialized yet.
struct foo {
foo(int * p_x) : x(p_x) {}
int * x;
};
struct member {
member(int p_y) : y(p_y) {}
int y;
};
struct bar : foo
{
bar() : foo(&my_member.y), my_member(42) {}
member my_member;
};
#include <iostream>
int main()
{
bar my_bar;
std::cout << *my_bar.x;
}
Is this well defined? Is it legal to take the address of an uninitialized object's data member? I've found this question about passing a reference to an uninitialized object but it's not quite the same thing. In this case, I'm using the member access operator . on an uninitialized object.
It's true that the address of an object's data member shouldn't be changed by initialization, but that doesn't necessarily make taking that address well defined. Additionally, the ccpreference.com page on member access operators has this to say :
The first operand of both operators is evaluated even if it is not necessary (e.g. when the second operand names a static member).
I understand it to mean that in the case of &my_member.y my_member would be evaluated, which I believe is fine (like int x; x; seems fine) but I can't find documentation to back that up either.
First let's make accurate the question.
What you are doing isn't using an uninitialized object, you are using an object not within its lifetime. my_member is constructed after foo, therefore the lifetime of my_member hasn't begun in foo(&my_member.y).
From [basic.life]
before the lifetime of an object has started but after the storage which the object will occupy has been allocated [...], any glvalue that refers to the original object may be used but only in limited ways. [...] such a glvalue refers to allocated storage, and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:
the glvalue is used to access the object, or [...]
Here accessing it means specifically to either read or modify the value of the object.
The evaluation of my_member yields a lvalue, and there is nothing necessitating a conversion to prvalue, hence it stays a lvalue. Likewise, the evaluation of my_member.y is also a lvalue. We then conclude that no object's value have been accessed, this is well-defined.
Yes you are allowed to pass &my_member.y to foo's constructor, and even copy the pointer - which you do with x(p_x).
The behaviour on dereferencing that pointer though in foo's constructor is undefined. (But you don't do that.)

Is taking the address of a member of an uninitialized object well defined?

Consider the following example. When bar is constructed, it gives it's base type (foo) constructor the address of my_member.y where my_member is data member that hasn't been initialized yet.
struct foo {
foo(int * p_x) : x(p_x) {}
int * x;
};
struct member {
member(int p_y) : y(p_y) {}
int y;
};
struct bar : foo
{
bar() : foo(&my_member.y), my_member(42) {}
member my_member;
};
#include <iostream>
int main()
{
bar my_bar;
std::cout << *my_bar.x;
}
Is this well defined? Is it legal to take the address of an uninitialized object's data member? I've found this question about passing a reference to an uninitialized object but it's not quite the same thing. In this case, I'm using the member access operator . on an uninitialized object.
It's true that the address of an object's data member shouldn't be changed by initialization, but that doesn't necessarily make taking that address well defined. Additionally, the ccpreference.com page on member access operators has this to say :
The first operand of both operators is evaluated even if it is not necessary (e.g. when the second operand names a static member).
I understand it to mean that in the case of &my_member.y my_member would be evaluated, which I believe is fine (like int x; x; seems fine) but I can't find documentation to back that up either.
First let's make accurate the question.
What you are doing isn't using an uninitialized object, you are using an object not within its lifetime. my_member is constructed after foo, therefore the lifetime of my_member hasn't begun in foo(&my_member.y).
From [basic.life]
before the lifetime of an object has started but after the storage which the object will occupy has been allocated [...], any glvalue that refers to the original object may be used but only in limited ways. [...] such a glvalue refers to allocated storage, and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:
the glvalue is used to access the object, or [...]
Here accessing it means specifically to either read or modify the value of the object.
The evaluation of my_member yields a lvalue, and there is nothing necessitating a conversion to prvalue, hence it stays a lvalue. Likewise, the evaluation of my_member.y is also a lvalue. We then conclude that no object's value have been accessed, this is well-defined.
Yes you are allowed to pass &my_member.y to foo's constructor, and even copy the pointer - which you do with x(p_x).
The behaviour on dereferencing that pointer though in foo's constructor is undefined. (But you don't do that.)

Is passing a C++ object into its own constructor legal?

I am surprised to accidentally discover that the following works:
#include <iostream>
int main(int argc, char** argv)
{
struct Foo {
Foo(Foo& bar) {
std::cout << &bar << std::endl;
}
};
Foo foo(foo); // I can't believe this works...
std::cout << &foo << std::endl; // but it does...
}
I am passing the address of the constructed object into its own constructor. This looks like a circular definition at the source level. Do the standards really allow you to pass an object into a function before the object is even constructed or is this undefined behavior?
I suppose it's not that odd given that all class member functions already have a pointer to the data for their class instance as an implicit parameter. And the layout of the data members is fixed at compile time.
Note, I'm NOT asking if this is useful or a good idea; I'm just tinkering around to learn more about classes.
This is not undefined behavior. Although foo is uninitialized, you are using it a way that is allowed by the standard. After space is allocated for an object but before it is fully initialized, you are allowed to use it limited ways. Both binding a reference to that variable and taking its address are allowed.
This is covered by defect report 363: Initialization of class from self which says:
And if so, what is the semantics of the self-initialization of UDT?
For example
#include <stdio.h>
struct A {
A() { printf("A::A() %p\n", this); }
A(const A& a) { printf("A::A(const A&) %p %p\n", this, &a); }
~A() { printf("A::~A() %p\n", this); }
};
int main()
{
A a=a;
}
can be compiled and prints:
A::A(const A&) 0253FDD8 0253FDD8
A::~A() 0253FDD8
and the resolution was:
3.8 [basic.life] paragraph 6 indicates that the references here are valid. It's permitted to take the address of a class object before it is fully initialized, and it's permitted to pass it as an argument to a reference parameter as long as the reference can bind directly. Except for the failure to cast the pointers to void * for the %p in the printfs, these examples are standard-conforming.
The full quote of section 3.8 [basic.life] from the draft C++14 standard is as follows:
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see 12.7. Otherwise, such a glvalue
refers to allocated storage (3.7.4.2), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
an lvalue-to-rvalue conversion (4.1) is applied to such a glvalue,
the glvalue is used to access a non-static data member or call a non-static member function of the
object, or
the glvalue is bound to a reference to a virtual base class (8.5.3), or
the glvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.
We are not doing anything with foo that falls under undefined behavior as defined by the bullets above.
If we try this with Clang, we see an ominous warning (see it live):
warning: variable 'foo' is uninitialized when used within its own initialization [-Wuninitialized]
It is a valid warning since producing an indeterminate value from an uninitialized automatic variable is undefined behavior. However, in this case you are just binding a reference and taking the address of the variable within the constructor, which does not produce an indeterminate value and is valid. On the other hand, the following self-initialization example from the draft C++11 standard:
int x = x ;
does invoke undefined behavior.
Active issue 453: References may only bind to “valid” objects also seems relevant but is still open. The initial proposed language is consistent with Defect Report 363.
The constructor is called at a point where memory is allocated for the object-to-be. At that point, no object exists at that location (or possibly an object with a trivial destructor). Furthermore, the this pointer refers to that memory and the memory is properly aligned.
Since it's allocated and aligned memory, we may refer to it using lvalue expressions of Foo type (i.e. Foo&). What we may not yet do is have an lvalue-to-rvalue conversion. That's only allowed after the constructor body is entered.
In this case, the code just tries to print &bar inside the constructor body. It would even be legal to print bar.member here. Since the constructor body has been entered, a Foo object exists and its members may be read.
This leaves us with one small detail, and that's name lookup. In Foo foo(foo), the first foo introduces the name in scope and the second foo therefore refers back to the just-declared name. That's why int x = x is invalid, but int x = sizeof(x) is valid.
As said in other answers, an object can be initialized with itself as long as you do not use its values before they are initialized. You can still bind the object to a reference or take its address.
But beyond the fact that it is valid, let's explore a usage example.
The example below might be controversial, you can surely propose many other ideas for implementing it. And yet, it presents a valid usage of this strange C++ property, that you can pass an object into its own constructor.
class Employee {
string name;
// manager may change so we don't hold it as a reference
const Employee* pManager;
public:
// we prefer to get the manager as a reference and not as a pointer
Employee(std::string name, const Employee& manager)
: name(std::move(name)), pManager(&manager) {}
void modifyManager(const Employee& manager) {
// TODO: check for recursive connection and throw an exception
pManager = &manager;
}
friend std::ostream& operator<<(std::ostream& out, const Employee& e) {
out << e.name << " reporting to: ";
if(e.pManager == &e)
out << "self";
else
out << *e.pManager;
return out;
}
};
Now comes the usage of initializing an object with itself:
// it is valid to create an employee who manages itself
Employee jane("Jane", jane);
In fact, with the given implementation of class Employee, the user has no other choice but to initialize the first Employee ever created, with itself as its own manager, as there is no other Employee yet that can be passed. And in a way that makes sense, as the first employee created should manage itself.
Code: http://coliru.stacked-crooked.com/a/9c397bce622eeacd

Class that holds a reference to itself

Skimming through the standard draft (n3242) I found this sentence in Clause 9.2 (emphasis mine):
Non-static (9.4) data members shall not have incomplete types. In
particular, a class C shall not contain a non-static member of class
C, but it can contain a pointer or reference to an object of class
C.
From this I argue that is fine to define a class like this:
class A {
public:
A(A& a) : a_(a){
}
private:
A& a_;
};
Then in clause 8.3.2 I found the following:
A reference shall be initialized to refer to a valid object or
function
Question 1: Is it permitted to define an object of this type passing its name as a reference:
A a(a);
or will this trigger undefined behavior?
Question 2: If yes, what are the parts of the standard that permit the initialization of the reference from a still-to-be-constructed object?
Question 3: If no, does this mean the definition of class A is well formed but no first object can be created without triggering UB? In this case what is the rationale behind this?
"valid object" is not defined anywhere in the standard, but it is intented to mean a region of memory with appropriate size and alignment that can contain an object of the specified type. It just means to exclude references to such things as dereferenced null pointers, misaligned regions of memory, etc. An uninitialised object is valid.
There is an open issue to clear up the wording, CWG 453.
n3337 § 3.8/6
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see 12.7. Otherwise, such a glvalue
refers to allocated storage (3.7.4.2), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
— an lvalue-to-rvalue conversion (4.1) is
applied to such a glvalue,
— the glvalue is used to access a
non-static data member or call a non-static member function of the
object, or
— the glvalue is implicitly converted (4.10) to a reference
to a base class type, or
— the glvalue is used as the operand of a
static_cast (5.2.9) except when the conversion is ultimately to cv
char& or cv unsigned char&, or
— the glvalue is used as the operand of
a dynamic_cast (5.2.7) or as the operand of typeid.
So, to answer your questions:
Question 1: Is it permitted to define an object of this type passing
its name as a reference?
Yes. Using just the address seems not to violate this (at least for a variable put on stack).
A a(a);
or will this trigger undefined behavior?
No.
Question 2: If yes, what are the parts of the standard that permit the
initialization of the reference from a still-to-be-constructed object?
§ 3.8/6 (above)
The only question that remains is how this correspond to
A reference shall be initialized to refer to a valid object or
function.
The problem is in term valid object. Because § 8.3.2/4 says that
It is unspecified whether or not a reference requires storage
it seems that § 8.3.2 is problematic and should be reworded. The confusion lead to change proposed in document C++ Standard Core Language Active Issues, Revision 87 dated on 20.01.2014:
A reference shall be initialized to refer to an object or function.
Change 8.3.2 [dcl.ref] paragraph 4 as follows:
If an lvalue to which a reference is directly bound designates neither
an existing object or function of an appropriate type (8.5.3
[dcl.init.ref]), nor a region of memory of suitable size and alignment
to contain an object of the reference's type (1.8 [intro.object], 3.8
[basic.life], 3.9 [basic.types]), the behavior is undefined.
From n1905, 3.3.1.1
The point of declaration for a name is immediately after its complete
declarator (clause 8 ) and before its initializer (if any), except as
noted below.
[ Example:
int x = 12;
{ int x = x; }
Here the second x
is initialized with its own (indeterminate) value.
—end example ]
My emphasis ( correct me if I am wrong ): In your example -
A a(a);
is equivalent to -
A a = a; // Copy initialization
So, according to standard a is initialized with it's own indeterminate value. And the member is holding reference to one such indeterminate value.