Updated below: In clang, using an lvalue of a polymorphic object through its name does not activate virtual dispatch, but it does through its address.
For the following base class B and derived D, virtual function something, union Space
#include <iostream>
using namespace std;
struct B {
void *address() { return this; }
virtual ~B() { cout << "~B at " << address() << endl; }
virtual void something() { cout << "B::something"; }
};
struct D: B {
~D() { cout << "~D at " << address() << endl; }
void something() override { cout << "D::something"; }
};
union Space {
B b;
Space(): b() {}
~Space() { b.~B(); }
};
If you have a value s of Space, in Clang++: (update: incorrectly claimed g++ had the same behavior)
If you do s.b.something(), B::something() will be called, not doing the dynamic binding on s.b, however, if you call (&s.b)->something() will do the dynamic binding to what b really contains (either a B or D).
The completion code is this:
union SpaceV2 {
B b;
SpaceV2(): b() {}
~SpaceV2() { (&b)->~B(); }
};
static_assert(sizeof(D) == sizeof(B), "");
static_assert(alignof(D) == alignof(B), "");
#include <new>
int main(int argc, const char *argv[]) {
{
Space s;
cout << "Destroying the old B: ";
s.b.~B();
new(&s.b) D;
cout << "\"D::something\" expected, but \"";
s.b.something();
cout << "\" happened\n";
auto &br = s.b;
cout << "\"D::something\" expected, and \"";
br.something();
cout << "\" happened\n";
cout << "Destruction of D expected:\n";
}
cout << "But did not happen!\n";
SpaceV2 sv2;
new(&sv2.b) D;
cout << "Destruction of D expected again:\n";
return 0;
}
When compile with -O2 optimization and I run the program, this is the output:
$./a.out
Destroying the old B: ~B at 0x7fff4f890628
"D::something" expected, but "B::something" happened
"D::something" expected, and "D::something" happened
Destruction of D expected:
~B at 0x7fff4f890628
But did not happen!
Destruction of D expected again:
~D at 0x7fff4f890608
~B at 0x7fff4f890608
What surprises me is that setting the dynamic type of s.b using placement new leads to a difference calling something on the very same l-value through its name or through its address. The first question is essential, but I have not been able to find an answer:
Is doing placement new to a derived class, like new(&s.b) D undefined behavior according to the C++ standard?
If it is not undefined behavior, is this choice of not activating virtual dispatch through the l-value of the named member something specified in the standard or a choice in G++, Clang?
Thanks, my first question in S.O. ever.
UPDATE
The answer and the comment that refers to the standard are accurate: According to the standard, s.b will forever refer to an object of exact type B, the memory is allowed to change type, but then any use of that memory through s.b is "undefined behavior", that is, prohibited, or that the compiler can translate however it pleases. If Space was just a buffer of chars, it would be valid to in-place construct, destruct, change the type. Did exactly that in the code that led to this question and it works with standards-compliance AFAIK.
Thanks.
The expression new(&s.b) D; re-uses the storage named s.b and formerly occupied by a B for for storage of a new D.
However you then write s.b.something(); . This causes undefined behaviour because s.b denotes a B but the actual object stored in that location is a D. See C++14 [basic.life]/7:
If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:
the storage for the new object exactly overlays the storage location which the original object occupied, and
the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and
[...]
The last bullet point is not satisfied because the new type differs.
(There are other potential issues later in the code too but since undefined behaviour is caused here, they're moot; you'd need to have a major design change to avoid this problem).
Related
In my current understanding, roughly speaking, (non-static) local variables are destroyed when the thread of execution leaves their scope. However, I found that the exact points when the local variables are destroyed in different situations differ. I've been struggling with understanding how to determine when local variables are destroyed, and, in particular, the reason behind it. (possible related questions I asked include: 1, 2, 3)
Given 2 toy classes:
class A {
public:
A() { cout << "A created\n"; }
A(const A&) { cout << "A copied\n"; }
~A() { cout << "A destroyed" << endl; }
};
class B{
public:
B() { cout << "B created\n"; }
B(const B&) { cout << "B copied\n"; }
~B() { cout << "B destroyed\n"; }
};
Example 1:
B& f() {
A a;
static B b;
return b;
}
int main()
{
B b = f();
}
here the outputs are as follows:
A created
B created
A destroyed
B copied
B destroyed
B destroyed
In my current understanding, in example 1, the local (non-static) variable(s) are destroyed immediately after the reference is returned, then the object being referenced is copied to b.
Example 2:
B g() {
A a;
B b;
return b;
}
int main()
{
B b = g();
}
here the outputs are as follows:
A created
B created
B copied
B destroyed
A destroyed
B destroyed
this time, the copying to b happens before the local variables are destroyed.
In my understanding, for C++17 and newer, what is returned to g() is a prvalue and it is not an object. As a result, if the destruction of the local variables happens after the prvalue of g() is created but before the copying (as in example 1), we would be copying memory that has been destroyed. So, I think of the outputs as "the compiler is waiting for the copying to be done before it destroys the local variable".
However, this is just a really vague understanding of mine. May I ask what actually happens behind the scenes so that such "waiting" can always happen properly?
Example 3: (just for demonstration)
void fun() {
A a;
B b;
throw b;
}
int main()
try
{
fun();
}
catch (B cb) { }
here the outputs are as follows:
A created
B created
B copied
B copied
B destroyed
A destroyed
B destroyed
B destroyed
In this situation, I think b is used to copy initialize a temporary, this temporary is then used to copy initialize cb. The local variables are destroyed after all those copying even though the temporary is an actually object, not prvalue. So, this situation is also different from example 1.
I am using Visual Studio 2022, C++20.
Function local variables are destroyed immediately after the function's return value is initialized.
In the first example, that return value is just a reference, so its creation has no observable side-effect. After the function returns, b is then initialized using the returned reference, so you see the "B copied" output after f has returned.
In the second example, the function's return value is a B object. Specifically the variable b in main due to the way prvalues work. That means that b is initialized directly by g, so you see the "B copied" output before g has returned.
In the third example, an exception is involved, and exceptions are complicated, and their specific mechanisms aren't terribly standardized. You can be sure that fun's local variables will be destroyed before the catch block is entered, but beyond that it's possible that the exception object could be copied any number of times during the stack unwinding.
As shown here, one can use dynamic_cast to detect a deleted pointer:
#include <iostream>
using namespace std;
class A
{
public:
A() {}
virtual ~A() {}
};
class B : public A
{
public:
B() {}
};
int main()
{
B* pB = new B;
cout << "dynamic_cast<B*>( pB) ";
cout << ( dynamic_cast<B*>(pB) ? "worked" : "failed") << endl;
cout << "dynamic_cast<B*>( (A*)pB) ";
cout << ( dynamic_cast<B*>( (A*)pB) ? "worked" : "failed") << endl;
delete pB;
cout << "dynamic_cast<B*>( pB) ";
cout << ( dynamic_cast<B*>(pB) ? "worked" : "failed") << endl;
cout << "dynamic_cast<B*>( (A*)pB) ";
cout << ( dynamic_cast<B*>( (A*)pB) ? "worked" : "failed") << endl;
}
the output:
dynamic_cast<B*>( pB) worked
dynamic_cast<B*>( (A*)pB) worked
dynamic_cast<B*>( pB) worked
dynamic_cast<B*>( (A*)pB) failed
It explains that the deletion of the vtable is detected.
But I am wondering how is that possible since we do not overwrite the freed memory?
And is that solution fully portable ?
Thanks
First off, trying to use a deleted object in any form results in undefined behavior: whatever result you see could happen!
The reason of the observed behavior is simply that an object changes type during destruction: from being an object of the concrete type it change through all of the types in the hierarchy. At each point the virtual functions change and the vtable (or similar) gets replaced. The dynamic_cast<...>() simply detects this change in the bytes strored at the location of the object.
In case you feel like wanting to show that this technique doesn't reliably work you can just set the content of deleted memory to a random bit pattern or the bit pattern of an object of the most derived type: a random bit pattern probably yields a crash and memcpy() probably claims that the object is still life. Of course, since it is undefined behavior anything can happen.
One relevant section on this 3.8 [basic.life] paragraph 5:
Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that refers to the storage location where the object will be or was located may be used but only in limited ways. For an object under construction or destruction, see 12.7. Otherwise, such a pointer refers to allocated storage (3.7.4.2), and using the pointer as if the pointer were of type void*,
is well-defined. Indirection through such a pointer is permitted but the resulting lvalue may only be used in limited ways, as described below. The program has undefined behavior if:
...
the pointer is used as the operand of a dynamic_cast (5.2.7). ...
Oddly, the example on the last bullet on dynamic_cast doesn't use dynamic_cast.
Of course, the object is also probably released in which case the above guarantees don't even apply.
I have some difficulties with understanding what is really done behind returning values in C++.
Let's have following code:
class MyClass {
public:
int id;
MyClass(int id) {
this->id = id;
cout << "[" << id << "] MyClass::ctor\n";
}
MyClass(const MyClass& other) {
cout << "[" << id << "] MyClass::ctor&\n";
}
~MyClass() {
cout << "[" << id << "] MyClass::dtor\n";
}
MyClass& operator=(const MyClass& r) {
cout << "[" << id << "] MyClass::operator=\n";
return *this;
}
};
MyClass foo() {
MyClass c(111);
return c;
}
MyClass& bar() {
MyClass c(222);
return c;
}
MyClass* baz() {
MyClass* c = new MyClass(333);
return c;
}
I use gcc 4.7.3.
Case 1
When I call:
MyClass c1 = foo();
cout << c1.id << endl;
The output is:
[111] MyClass::ctor
111
[111] MyClass::dtor
My understanding is that in foo object is created on the stack and then destroyed upon return statement because it's end of a scope. Returning is done by object copying (copy constructor) which is later assigned to c1 in main (assignment operator). If I'm right why there is no output from copy constructor nor assignment operator? Is this because of RVO?
Case 2
When I call:
MyClass c2 = bar();
cout << c2.id << endl;
The output is:
[222] MyClass::ctor
[222] MyClass::dtor
[4197488] MyClass::ctor&
4197488
[4197488] MyClass::dtor
What is going on here? I create variable then return it and variable is destroyed because it is end of a scope. Compiler is trying copy that variable by copy constructor but It is already destroyed and that's why I have random value? So what is actually in c2 in main?
Case 3
When I call:
MyClass* c3 = baz();
cout << c3->id << endl;
The output is:
[333] MyClass::ctor
333
This is the simplest case? I return a dynamically created pointer which lies on heap, so memmory is allocated and not automatically freed. This is the case when destructor isn't called and I have memory leak. Am I right?
Are there any other cases or things that aren't obvious and I should know to fully master returning values in C++? ;) What is a recommended way to return a object from function (if any) - any rules of thumb upon that?
May I just add that case #2 is one of the cases of undefined behavior in the C++ language, since returning a reference to a local variable is illegal. This is because a local variable has a precisely defined lifetime, and - by returning it by a reference - you're returning a reference to a variable that does not exist anymore when the function returns. Therefore, you exhibit undefined behavior and the value of the given variable is practically random. As is the result of the rest of your program, since Anything at all can happen.
Most compilers will issue a warning when you try to do something like this (either return a local variable by reference, or by address) - gcc, for example, tells me something like this :
bla.cpp:37:13: warning: reference to local variable ‘c’ returned [-Wreturn-local-addr]
You should remember, however, that the compiler is not at all required to issue any kind of warning when a statement that may exhibit undefined behavior occurs. Situations such as this one, though, must be avoided at all costs, because they're practically never right.
Case 1:
MyClass foo() {
MyClass c(111);
return c;
}
...
MyClass c1 = foo();
is a typical case when RVO can be applied. This is called copy-initialization and the assignment operator is not used since the object is created in place, unlike the situation:
MyClass c1;
c1 = foo();
where c1 is constructed, temporary c in foo() is constructed, [ copy of c is constructed ], c or copy of c is assigned to c1, [ copy of c is destructed] and c is destructed. (what exactly happens depends on whether the compiler eliminates the redundant copy of c being created or not).
Case 2:
MyClass& bar() {
MyClass c(222);
return c;
}
...
MyClass c2 = bar();
invokes undefined behavior since you are returning a reference to local (temporary) variable c ~ an object with automatic storage duration.
Case 3:
MyClass* baz() {
MyClass* c = new MyClass(333);
return c;
}
...
MyClass c2 = bar();
is the most straightforward one since you control what happens yet with a very unpleasant consequence: you are responsible for memory management, which is the reason why you should avoid dynamic allocation of this kind always when it is possible (and prefer Case 1).
1) Yes.
2) You have a random value because your copy c'tor and operator= don't copy the value of id. However, you are correct in assuming there is no relying on the value of an object after it has been deleted.
3) Yes.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Can a local variable's memory be accessed outside its scope?
C++ constructor: garbage while initialization of const reference
This question is directly related to another question that I've asked a time ago: "Opaque reference instead of PImpl. Is it possible?".
Let's say we have a class with a reference member of some other class which is initialized to a temporary variable in a constructor:
#include <iostream>
struct B
{
B(int new_x = 10) : x(new_x) { std::cout << "B constructed\n"; }
~B() { std::cout << "B destroyed\n"; }
public:
int x;
};
struct A
{
A()
: b( B(23) )
{
std::cout << "A constructed\n";
}
void Foo()
{
std::cout << "A::Foo()\n";
}
~A()
{
std::cout << "A destroyed\n";
}
public:
const B& b;
};
int main()
{
A a;
a.Foo();
cout << "x = " << a.b.x << endl;
}
When I run the code above, the output is:
B constructed
B destroyed
A constructed
A::Foo()
x = 23
A destroyed
It seems that even if temporary is destroyed so the reference member should be invalid, the integer field of the reference member is still readable. Why does it still work?
Undefined behaviour. It so happens in your case that the memory previously occupied by the temporary B is not overwritten before you reference it. Next time you run the program, anything could happen.
Note that the superficially similar
const B &b = B();
does have defined behaviour; the lifetime of the temporary B is extended through reference binding. This only applies to reference variables, not reference members.
If a reference is invalid, that doesn't mean it's unreadable. It means that it doesn't refer to a valid object. It may or may not refer to some piece of accessible memory; if it does, you may or may not find that the memory contains the remnants of whatever object used to be there.
To summarise, the behaviour is undefined.
The memory that has been allocated to the temporary B is now invalid, but it has not been put to any other use. That is why your read produces the value that was there the last time. However, this is undefined behavior. Running valgrind should pinpoint the place of the error.
If you are wondering "how can it be", here is a great answer explaining what happens in a very similar situation.
Recently, I found an interesting discussion on how to allow read-only access to private members without obfuscating the design with multiple getters, and one of the suggestions was to do it this way:
#include <iostream>
class A {
public:
A() : _ro_val(_val) {}
void doSomething(int some_val) {
_val = 10*some_val;
}
const int& _ro_val;
private:
int _val;
};
int main() {
A a_instance;
std::cout << a_instance._ro_val << std::endl;
a_instance.doSomething(13);
std::cout << a_instance._ro_val << std::endl;
}
Output:
$ ./a.out
0
130
GotW#66 clearly states that object's lifetime starts
when its constructor completes successfully and returns normally. That is, control reaches the end of the constructor body or an earlier return statement.
If so, we have no guarantee that the _val memeber will have been properly created by the time we execute _ro_val(_val). So how come the above code works? Is it undefined behaviour? Or are primitive types granted some exception to the object's lifetime?
Can anyone point me to some reference which would explain those things?
Before the constructor is called an appropriate amount of memory is reserved for the object on Freestore(if you use new) or on stack if you create object on local storage. This implies that the memory for _val is already allocated by the time you refer it in Member initializer list, Only that this memory is not properly initialized as of yet.
_ro_val(_val)
Makes the reference member _ro_val refer to the memory allocated for _val, which might actually contain anything at this point of time.
There is still an Undefined Behavior in your program because, You should explicitly initialize _val to 0(or some value,you choose)in the constructor body/Member Initializer List.The output 0 in this case is just because you are lucky it might give you some other values since _val is left unInitialized. See the behavior here on gcc 4.3.4 which demonstrates the UB.
But as for the Question, Yes indeed the behavior is Well-Defined.
The object's address does not change.
I.e. it's well-defined.
However, the technique shown is just premature optimization. You don't save programmers' time. And with modern compiler you don't save execution time or machine code size. But you do make the objects un-assignable.
Cheers & hth.,
In my opinion, it is legal (well-defined) to initialize a reference with an uninitialized object. That is legal but standard (well, the latest C++11 draft, paragraph 8.5.3.3) recommends using a valid (fully constructed) object as an initializer:
A reference shall be initialized to refer to a valid object or function.
The next sentence from the same paragraph throws a bit more light at the reference creation:
[Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior.]
I understand that reference creation means binding reference to an object obtained by dereferencing its pointer and that probably explains that the minimal prerequisite for initialization of reference of type T& is having an address of the portion of the memory reserved for the object of type T (reserved, but not yet initialized).
Accessing uninitialized object through its reference can be dangerous.
I wrote a simple test application that demonstrates reference initialization with uninitialized object and consequences of accessing that object through it:
class C
{
public:
int _n;
C() : _n(123)
{
std::cout << "C::C(): _n = " << _n << " ...and blowing up now!" << std::endl;
throw 1;
}
};
class B
{
public:
// pC1- address of the reference is the address of the object it refers
// pC2- address of the object
B(const C* pC1, const C* pC2)
{
std::cout << "B::B(): &_ro_c = " << pC1 << "\n\t&_c = " << pC2 << "\n\t&_ro_c->_n = " << pC1->_n << "\n\t&_c->_n = " << pC2->_n << std::endl;
}
};
class A
{
const C& _ro_c;
B _b;
C _c;
public:
// Initializer list: members are initialized in the order how they are
// declared in class
//
// Initializes reference to _c
//
// Fully constructs object _b; its c-tor accesses uninitialized object
// _c through its reference and its pointer (valid but dangerous!)
//
// construction of _c fails!
A() : _ro_c(_c), _b(&_ro_c, &_c), _c()
{
// never executed
std::cout << "A::A()" << std::endl;
}
};
int main()
{
try
{
A a;
}
catch(...)
{
std::cout << "Failed to create object of type A" << std::endl;
}
return 0;
}
Output:
B::B(): &_ro_c = 001EFD70
&_c = 001EFD70
&_ro_c->_n = -858993460
&_c->_n = -858993460
C::C(): _n = 123 ...and blowing up now!
Failed to create object of type A