Reference to uninitialized memory. Undefined behavior? - c++

Allow me to preface by saying that I don't recommend any of the practices below, for obvious reasons. However, I had a discussion today regarding it and some people were adamant about using a reference like this as being undefined behavior.
Here is a test case:
#include <string>
struct my_object {
int a = 1;
int b = 2;
std::string hi = "hello";
};
// Using union purely to reserve uninitialized memory for a class.
union my_object_storage {
char dummy;
my_object memory;
// C++ will yell at you for doing this without some constructors.
my_object_storage() {}
~my_object_storage() {}
} my_object_storage_instance;
// This is so we can easily access the storage memory through "I"
constexpr my_object &I = my_object_storage_instance.memory;
//-------------------------------------------------------------
int main() {
// Initialize the object.
new (&I) my_object();
// Use the reference.
I.a = 1;
// Destroy the object (typically this should be done using RAII).
I.~my_object();
// Phase two, REINITIALIZE an object with the SAME reference.
// We still have the memory allocated which is static, so why not?
new (&I) my_object();
// Use the reference.
I.a = 1;
// Destroy the object again.
I.~my_object();
}
https://wandbox.org/permlink/YEp9aQUcWdA9YiBI
Basically what the code does is reserves static memory for a struct, and then initializes it in main(). Why would you want to do that? It isn't extremely useful and you should just use a pointer, but here is the question:
With this statement given,
constexpr my_object &I = my_object_storage_instance.memory;
is defining a reference to uninitialized memory undefined behavior? Other people have told me it is, but I'm trying to figure out concretely if that's the case. In the C++ standard we see this paragraph:
A reference shall be initialized to refer to a valid object or function. [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.
Specifically "a valid object", which may boil down to: is an object that hasn't had its constructor called yet "valid"? What makes it invalid that it would cause undefined behavior? Are there actually real side effects that could arise?
My argument for this being labeled as undefined behavior is:
Compilers might be free to treat it like a valid object, because the standard states that it should be, especially during the assignment and especially if there are hidden debug instructions being inserted for diagnostics that assume such, which would certainly cause undefined behavior.
My arguments against it being undefined behavior is that:
It's not dereferencing anything - the paragraph states that, during initialization of a reference, dereferencing nullptr is undefined. It doesn't specifically state undefined behavior if there isn't any dereferencing.
Dangling references are a thing, and appear in many cases in normal programs. They only cause undefined behavior IF they are used. This is similar to starting with a dangling reference.
Again, not very useful in practice because there are much better ways to spend your time, but what better place for odd questions and expert opinions than stackoverflow? :)

You're perfectly fine, your usage of the reference falls into the explicit exception to the rule that a live object is required. In [basic.life]:
Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways.
For an object under construction or destruction, see [class.cdtor]. Otherwise, such a glvalue refers to allocated storage ([basic.stc.dynamic.allocation]), and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:
the glvalue is used to access the object, or
the glvalue is used to call a non-static member function of the object, or
the glvalue is bound to a reference to a virtual base class ([dcl.init.ref]), or
the glvalue is used as the operand of a dynamic_­cast ([expr.dynamic.cast]) or as the operand of typeid.
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
the storage for the new object exactly overlays the storage location which the original object occupied, and
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type, and
neither the original object nor the new object is a potentially-overlapping subobject ([intro.object]).
Thus, your reference validly refers to allocated storage, which is exactly what you need to perform a placement-new and vivify the union member.
And since the dynamic (runtime) type of the object you create exactly matches the static type of the reference you hold, it can be used to access the new object after placement new (either the first or the second).

Related

Is it UB to access a non-existent object?

There seems to be no more silly question than this. But does the standard allow it?
Consider:
void* p = operator new(sizeof(std::string));
*static_cast<std::string*>(p) = "string";
[basic.life]/6:
Before the lifetime of an object has started but after the storage which the object will occupy has been allocated24 or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released... The program has undefined behavior if:
the pointer is used to access a non-static data member or call a non-static member function of the object, or
the pointer is used as the operand of a static_­cast ([expr.static.cast]), except when the conversion is to pointer to cv void, or to pointer to cv void and subsequently to pointer to cv char, cv unsigned char, or cv std​::​byte ([cstddef.syn]), or
(Note that according to [intro.object]/10, a std::string object is not implicitly created by operator new because it is not of implicit-lifetime type.)
However, [basic.life]/6 does not apply to this code because there are no objects at all.
What am I missing?
[intro.object]/10
Some operations are described as implicitly creating objects within a specified region of storage. For each operation that is specified as implicitly creating objects, that operation implicitly creates and starts the lifetime of zero or more objects of implicit-lifetime types in its specified region of storage if doing so would result in the program having defined behavior. If no such set of objects would give the program defined behavior, the behavior of the program is undefined.
[intro.object]/11
Further, after implicitly creating objects within a specified region of storage, some operations are described as producing a pointer to a suitable created object. These operations select one of the implicitly-created objects whose address is the address of the start of the region of storage, and produce a pointer value that points to that object, if that value would result in the program having defined behavior. If no such pointer value would give the program defined behavior, the behavior of the program is undefined.
[intro.object]/13
Any implicit or explicit invocation of a function named operator new or operator new[] implicitly creates objects in the returned region of storage and returns a pointer to a suitable created object.
If an std::string[1] (or std::string[1][1] etc.) object were created, and a pointer to the std::string subobject were produced by operator new(sizeof(std::string)), then *static_cast<std::string*>(p) = "string" would have undefined behavior per [basic.life]/(7.2)
the glvalue [denoting an out-of-lifetime object] is used to call a non-static member function of the object
If operator new(sizeof(std::string)) produced a pointer to object of some other type (like int or double), then undefined behavior would be triggered by [expr.ref]/8:
If E2 is a non-static member and the result of E1 is an object whose type is not similar to the type of E1, the behavior is undefined.
So, there is no set of objects which would give the program defined behavior. Thus, the highlighted sentence of [intro.object]/10 apply here.
static_cast is fine, however dereferencing resulting pointer to (non-existing) std::string object leads to Undefined Behaviour:
7.2.1 Value category [basic.lval]
11 If a program attempts to access (3.1) the stored value of an object through a glvalue whose type is not similar (7.3.5) to one of the following types the behavior is undefined:
(11.1) the dynamic type of the object,
(11.2) a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
(11.3) a char, unsigned char, or std::byte type.
...
Edit:
Quote from the question is not really applicable here, it refers to a situations like this:
struct foo
{
bar b1;
bar b2;
foo(void): b1{&b2}, b2{} {}
};
The behaviour is undefined by omission.
The assignment operator is overloaded, so a member function named operator= is invoked. The standard says
A non-static member function may be called for an object of its class type
There isn't any object of an appropriate type, and there is nothing else in the standard that might be giving meaning to this program.
Is it legal to access a non-existent object?
No, you wrote it yourself:
The program has undefined behavior if:
the pointer is used to access a non-static data member or call a non-static member function of the object
which is exactly what *static_cast<std::string*>(p) = "string"; does. You must construct a std::string before you call the assignment operator:
int main() {
void* p = operator new(sizeof(std::string));
std::string* sp = new(p) std::string; // construct the string
*sp = "string"; // now assignment is fine
sp->~basic_string();
operator delete(p);
}
However, [basic.life]/6 does not apply to this code because there are no objects at all.
Yes it's applicable here. Your code has allocated the storage but not started the lifetime of the object and calls a non-static member function of the (non-existing) object.
The confusion seems to stem from the fact that you never start the lifetime of the object and you therefore think that "before the lifetime" doesn't apply. It does. It's "before the lifetime" until you start the lifetime. If you never start the lifetime of the object, it's "before the timetime" during the complete program run.
The standard clause could have said: The program has undefined behavior if the pointer is used to access a non-static data member or call a non-static member function of the object if the object's lifetime has not been started - and it would mean the same thing.

When I perform placement new on trivial object, Is it guaranteed to preserve the object/value representation?

struct A
{
int x;
}
A t{};
t.x = 5;
new (&t) A;
// is it always safe to assume that t.x is 5?
assert(t.x == 5);
As far as I know, when a trivial object of class type is created, the compiler can omit the call of explicit or implicit default constructor because no initialization is required.
(is that right?)
Then, If placement new is performed on a trivial object whose lifetime has already begun, is it guaranteed to preserve its object/value representation?
(If so, I want to know where I can find the specification..)
Well, let's ask some compilers for their opinion. Reading an indeterminate value is UB, which means that if it occurs inside a constant expression, it must be diagnosed. We can't directly use placement new in a constant expression, but we can use std::construct_at (which has a typed interface). I also modified the class A slightly so that value-initialization does the same thing as default-initialization:
#include <memory>
struct A
{
int x;
constexpr A() {}
};
constexpr int foo() {
A t;
t.x = 5;
std::construct_at(&t);
return t.x;
}
static_assert(foo() == 5);
As you can see on Godbolt, Clang, ICC, and MSVC all reject the code, saying that foo() is not a constant expression. Clang and MSVC additionally indicate that they have a problem with the read of t.x, which they consider to be a read of an uninitialized value.
P0593, while not directly related to this issue, contains an explanation that seems relevant:
The properties ascribed to objects and references throughout this document apply for a given object or reference only during its lifetime.
That is, reusing the storage occupied by an object in order to create a new object always destroys whatever value was held by the old object, because an object's value dies with its lifetime. Now, objects of type A are transparently replaceable by other objects of type A, so it is permitted to continue to use the name t even after its storage has been reused. That does not imply that the new t holds the value that the old t does. It only means that t is not a dangling reference to the old object.
Going off what is said in P0593, GCC is wrong and the other compilers are right. In constant expression evaluation, this kind of code is required to be diagnosed. Otherwise, it's just UB.
From looking at the Standard, the program has undefined behavior because of an invalid use of an object with indeterminate value.
Per [basic.life]/8, since the object of type A created by the placement new-expression exactly overlays the original object t, using the name t after that point refers to the A object created by the new-expression.
In [basic.indet]/1, we have:
When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced ([expr.ass]).
One important detail here (which I missed at first) is that "obtaining storage" is different from "allocating storage" or the storage duration of a storage region. The "obtain storage" words are also used to define the beginning of an object's lifetime in [basic.life]/1 and in the context of a new-expression in [expr.new]/10:
A new-expression may obtain storage for the object by calling an allocation function ([basic.stc.dynamic.allocation]). ... [ Note: ... The set of allocation and deallocation functions that may be called by a new-expression may include functions that do not perform allocation or deallocation; for example, see [new.delete.placement]. — end note ]
So the placement new-expression "obtains storage" for the object of type A and its subobject of type int when it calls operator new(void*). For this purpose, it doesn't make a difference that the memory locations in the storage region actually have static storage duration. Since "no initialization is performed" for the created subobject of type int with dynamic storage duration, it has an indeterminate value.
See also this Q&A: What does it mean to obtain storage?

Is reference to object still valid after its destruction and recreation in-place? [duplicate]

In order to stem the argument going on in the comments of an answer I gave recently, I'd like some constructive answers to the following questions:
Is a reference's lifetime distinct from the object it refers to? Is a reference simply an alias for its target?
Can a reference outlive its target in a well-formed program without resulting in undefined behaviour?
Can a reference be made to refer to a new object if the storage allocated for the original object is reused?
Does the following code demonstrate the above points without invoking undefined behaviour?
Example code by Ben Voigt and simplified (run it on ideone.com):
#include <iostream>
#include <new>
struct something
{
int i;
};
int main(void)
{
char buffer[sizeof (something) + 40];
something* p = new (buffer) something;
p->i = 11;
int& outlives = p->i;
std::cout << outlives << "\n";
p->~something(); // p->i dies with its parent object
new (p) char[40]; // memory is reused, lifetime of *p (and p->i) is so done
new (&outlives) int(13);
std::cout << outlives << "\n"; // but reference is still alive and well
// and useful, because strict aliasing was respected
}
Is a reference's lifetime distinct from the object it refers to? Is a reference simply an alias for its target?
A reference has its own lifetime:
int x = 0;
{
int& r = x;
} // r dies now
x = 5; // x is still alive
A ref-to-const additionally may extend the lifetime of its referee:
int foo() { return 0; }
const int& r = foo(); // note, this is *not* a reference to a local variable
cout << r; // valid; the lifetime of the result of foo() is extended
though this is not without caveats:
A reference to const only extends the lifetime of a temporary object if the reference is a) local and b) bound to a prvalue whose evaluation creates said temporary object. (So it doesn't work for members, or local references which are bound to xvalues.) Also, non-const rvalue references extend the lifetime in the exact same fashion. [#FredOverflow]
Can a reference outlive its target in a well-formed program without resulting in undefined behaviour?
Sure, as long as you don't use it.
Can a reference be made to refer to a new object if the storage allocated for the original object is reused?
Yes, under some conditions:
[C++11: 3.8/7]: If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original
object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
the storage for the new object exactly overlays the storage location which the original object occupied, and
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type, and
the original object was a most derived object (1.8) of type T and the new object is a most derived object of type T (that is, they are not base class subobjects).
Does the following code demonstrate the above points without invoking undefined behaviour?
Tl;dr.
Yes. For example local nonstatic references have automatic storage duration and corresponding liifetime and can refer to objects that have longer lifetime.
Yes, dangling references are an example. As long as such references are not used in any expressions when they become dangling, they are fine.
There is a special rule in clause 3 about this case. Names of objects, pointers and references automatically refer to the new object that reuses the storage under restricted conditions. I believe it is at the end of 3.8. Someone who has the spec handy please fill in the correct ref here.

After an object is destroyed, what happens to subobjects of scalar type?

Consider this code (for different values of renew and cleanse):
struct T {
int mem;
T() { }
~T() { mem = 42; }
};
// identity functions,
// but breaks any connexion between input and output
int &cleanse_ref(int &r) {
int *volatile pv = &r; // could also use cin/cout here
return *pv;
}
void foo () {
T t;
int &ref = t.mem;
int &ref2 = cleanse ? cleanse_ref(ref) : ref;
t.~T();
if (renew)
new (&t) T;
assert(ref2 == 42);
exit(0);
}
Is the assert guaranteed to pass?
I understand that this style is not recommended. Opinions like "this is not a sound practice" are not of interest here.
I want an answer showing a complete logical proof from standard quotes. The opinion of compiler writers might also be interesting.
EDIT: now with two questions in one! See the renew parameter (with renew == 0, this is the original question).
EDIT 2: I guess my question really is: what is a member object?
EDIT 3: now with another cleanse parameter!
I first had these two quotes, but now I think they actually just specify that things like int &ref = t.mem; must happen during the lifetime of t. Which it does, in your example.
12.7 paragraph 1:
For an object with a non-trivial destructor, referring to any non-static member or base class of the object after the destructor finishes execution results in undefined behavior.
And paragraph 3:
To form a pointer to (or access the value of) a direct non-static member of an object obj, the construction of obj shall have started and its destruction shall not have completed, otherwise the computation of the pointer value (or accessing the member value) results in undefined behavior.
We have here a complete object of type T and a member subobject of type int.
3.8 paragraph 1:
The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-trivial initialization, its initialization is complete.
The lifetime of an object of type T ends when:
if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or
the storage which the object occupies is reused or released.
By the way, 3.7.3 p1:
The storage for these [automatic storage duration] entities lasts until the block in which they are created exits.
And 3.7.5:
The storage duration of member subobjects, base class subobjects and array elements is that of their complete object (1.8).
So no worries about the compiler "releasing" the storage before the exit in this example.
A non-normative note in 3.8p2 mentions that "12.6.2 describes the lifetime of base and member subobjects," but the language there only talks about initialization and destructors, not "storage" or "lifetime", so I conclude that section does not affect the definition of "lifetime" for subobjects of trivial type.
If I'm interpreting all this right, when renew is false, the lifetime of the complete class object ends at the end of the explicit destructor call, BUT the lifetime of the int subobject continues to the end of the program.
3.8 paragraphs 5 and 6 say that pointers and references to "allocated storage" before or after any object's lifetime can be used in limited ways, and list a whole lot of things you may not do with them. Lvalue-to-rvalue conversion, like the expression ref == 42 requires, is one of those things, but that's not an issue if the lifetime of the int has not yet ended.
So I think with renew false, the program is well-formed and the assert succeeds!
With renew true, the storage is "reused" by the program, so the lifetime of the original int is over, and the lifetime of another int begins. But then we get into 3.8 paragraph 7:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
the storage for the new object exactly overlays the storage location which the original object occupied, and
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type, and
the original object was a most derived object (1.8) of type T and the new object is a most derived object of type T (that is, they are not base class subobjects).
The first bullet point here is the trickiest one. For a standard-layout class like your T, the same member certainly must always be in the same storage. I'm not certain whether or not this is technically required when the type is not standard-layout.
Although whether ref may still be used or not, there's another issue in this example.
12.6.2 paragraph 8:
After the call to a constructor for class X has completed, if a member of X is neither initialized nor given a value during execution of the compound-statement of the body of the constructor, the member has indeterminate value.
Meaning the implementation is compliant if it sets t.mem to zero or 0xDEADBEEF (and sometimes debug modes will actually do such things before calling a constructor).
You have not destroyed memory, you only manually called destructor (in this context it's not different then calling normal method). Memory (stack part) of your t variable was not 'released'. So this assert will always pass with your current code.

The lifetime of a reference with regard to its target

In order to stem the argument going on in the comments of an answer I gave recently, I'd like some constructive answers to the following questions:
Is a reference's lifetime distinct from the object it refers to? Is a reference simply an alias for its target?
Can a reference outlive its target in a well-formed program without resulting in undefined behaviour?
Can a reference be made to refer to a new object if the storage allocated for the original object is reused?
Does the following code demonstrate the above points without invoking undefined behaviour?
Example code by Ben Voigt and simplified (run it on ideone.com):
#include <iostream>
#include <new>
struct something
{
int i;
};
int main(void)
{
char buffer[sizeof (something) + 40];
something* p = new (buffer) something;
p->i = 11;
int& outlives = p->i;
std::cout << outlives << "\n";
p->~something(); // p->i dies with its parent object
new (p) char[40]; // memory is reused, lifetime of *p (and p->i) is so done
new (&outlives) int(13);
std::cout << outlives << "\n"; // but reference is still alive and well
// and useful, because strict aliasing was respected
}
Is a reference's lifetime distinct from the object it refers to? Is a reference simply an alias for its target?
A reference has its own lifetime:
int x = 0;
{
int& r = x;
} // r dies now
x = 5; // x is still alive
A ref-to-const additionally may extend the lifetime of its referee:
int foo() { return 0; }
const int& r = foo(); // note, this is *not* a reference to a local variable
cout << r; // valid; the lifetime of the result of foo() is extended
though this is not without caveats:
A reference to const only extends the lifetime of a temporary object if the reference is a) local and b) bound to a prvalue whose evaluation creates said temporary object. (So it doesn't work for members, or local references which are bound to xvalues.) Also, non-const rvalue references extend the lifetime in the exact same fashion. [#FredOverflow]
Can a reference outlive its target in a well-formed program without resulting in undefined behaviour?
Sure, as long as you don't use it.
Can a reference be made to refer to a new object if the storage allocated for the original object is reused?
Yes, under some conditions:
[C++11: 3.8/7]: If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original
object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
the storage for the new object exactly overlays the storage location which the original object occupied, and
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type, and
the original object was a most derived object (1.8) of type T and the new object is a most derived object of type T (that is, they are not base class subobjects).
Does the following code demonstrate the above points without invoking undefined behaviour?
Tl;dr.
Yes. For example local nonstatic references have automatic storage duration and corresponding liifetime and can refer to objects that have longer lifetime.
Yes, dangling references are an example. As long as such references are not used in any expressions when they become dangling, they are fine.
There is a special rule in clause 3 about this case. Names of objects, pointers and references automatically refer to the new object that reuses the storage under restricted conditions. I believe it is at the end of 3.8. Someone who has the spec handy please fill in the correct ref here.