I'm pretty sure this is dangerous code. However, I wanted to check to see if anyone had an idea of what exactly would go wrong.
Suppose I have this class structure:
class A {
protected:
int a;
public:
A() { a = 0; }
int getA() { return a; }
void setA(int v) { a = v; }
};
class B: public A {
protected:
int b;
public:
B() { b = 0; }
};
And then suppose I want to have a way of automatically extending the class like so:
class Base {
public:
virtual ~Base() {}
};
template <typename T>
class Test: public T, public Base {};
One really important guarantee that I can make is that neither Base nor Test will have any other member variables or methods. They are essentially empty classes.
The (potentially) dangerous code is below:
int main() {
B *b = new B();
// dangerous part?
// forcing Test<B> to point to to an address of type B
Test<B> *test = static_cast<Test<B> *>(b);
//
A *a = dynamic_cast<A *>(test);
a->setA(10);
std::cout << "result: " << a->getA() << std::endl;
}
The rationale for doing something like this is I'm using a class similar to Test, but in order for it to work currently, a new instance T (i.e. Test) has necessarily be made, along with copying the instance passed. It would be really nice if I could just point Test to T's memory address.
If Base did not add a virtual destructor, and since nothing gets added by Test, I would think this code is actually okay. However, the addition of the virtual destructor makes me worried that the type info might get added to the class. If that's the case, then it would potentially cause memory access violations.
Finally, I can say this code works fine on my computer/compiler (clang), though this of course is no guarantee that it's not doing bad things to memory and/or won't completely fail on another compiler/machine.
The virtual destructor Base::~Base will be called when you delete the pointer. Since B doesn't have the proper vtable (none at all in the code posted here) that won't end very well.
It only works in this case because you have a memory leak, you're never deleting test.
Your code produces undefined behaviour, as it violates strict aliasing. Even if it did not, you are invoking UB, as neither B nor A are polymorphic classes, and the object pointed to is not a polymorphic class, therefore dynamic_cast cannot succeed. You're attempting to access a Base object that does not exist to determine the runtime type when using dynamic_cast.
One really important guarantee that I can make is that neither Base
nor Test will have any other member variables or methods. They are
essentially empty classes.
It's not important at all- it's utterly irrelevant. The Standard would have to mandate EBO for this to even begin to matter, and it doesn't.
As long as you perform no operations with Test<B>* and avoid any magic like smart pointers or automatic memory management, you should be fine.
You should be sure to look for obscured code like debug prints or logging that will inspect the object. I've had debuggers crash on me for attempting to look into a value of a pointer set up like this. I will bet this will cause you some pain, but you should be able to make it work.
I think the real problem is maintenance. How long will it be before some developer does an operation on Test<B>*?
Related
Consider three classes A, B, and C, where A is the parent of B and C with a function A::doThing().
Is there any difference between the following two methods of calling doThing() in terms of performance (assuming B and C don't override doThing()?
B b1;
C c1;
A* a1 = &b1;
A* a2 = &c1;
//Option 1:
b1.doThing();
c1.doThing();
//Option 2:
a1->doThing();
a2->doThing();
In a tutorial app I saw they claimed that the second option was faster. I understand that if B or C overrides doThing(), then the two different calls could have different results, but I don't get why the second way of calling the function would be faster? The direct quote (in the example they use option 2):
We would have achieved the same result by calling the functions directly on the objects. However, it's faster and more efficient to use pointers.
Edit: Code from app as some have suggested I misunderstood:
#include <iostream>
using namespace std;
class Enemy {
protected:
int attackPower;
public:
void setAttackPower(int a){
attackPower = a;
}
};
class Ninja: public Enemy {
public:
void attack() {
cout << "Ninja! - "<<attackPower<<endl;
}
};
class Monster: public Enemy {
public:
void attack() {
cout << "Monster! - "<<attackPower<<endl;
}
};
int main() {
Ninja n;
Monster m;
Enemy *e1 = &n;
Enemy *e2 = &m;
e1->setAttackPower(20);
e2->setAttackPower(80);
n.attack();
m.attack();
}
If doThing is not virtual, it's all the same, as in all cases the exact method to call is resolved at compile time.
Otherwise:
calling directly on the object should be always as fast as it gets, since the compiler is sure of the a actual type of the object, so there's no extra indirection step (no vtable lookup, possible inlining);
when calling through pointer to base class, it's down to the ability of the compiler to prove the dynamic type of the object (or realizing that the method is never overridden); it everything is local to the function this may be easy, but otherwise it quickly gets difficult (for the compiler is not even trivial to understand that nobody is redefining a virtual method, because - barring LTCG and similar mechanisms - it has no knowledge of what happens in other translation units).
We would have achieved the same result by calling the functions directly on the objects. However, it's faster and more efficient to use pointers.
If the context is as you reported, this is complete bullshit. Throw away whatever guide where found this, the author has no idea of what he is talking about.
I was looking at some of the new features in C++11, and due to my current version of GCC I am unable to use constructor delegation. But it got me thinking about replicating the feature like this:
class A
{
public:
A() : num( 42 ) {}
A( int input ) { *this = A(); num *= input; }
int num;
};
It certainly compiles and works fine, the code below:
A a;
cout << "a: " << a.num << endl;
A b( 2 );
cout << "a: " << b.num << endl;
Returns this, which is correct.
42
84
Obviously this is a very trivial example, but other than the memory inefficiencies (two A's created and one overwritten by the other before being destroyed), what problems could arise? It certainly looks like a code smell, but I can't think of a really good reason why.
You are not initializing your object with the integer, but modifying a default initialized object. This might or might not be an issue. Quite often people factor common stuff out in some init() function to have similar functionality as delegating ctors. However, there are some situations in which this is not desired/wrong/impossible:
when you have a reference member, you must initialize this in the ctor, you can not default initialize it and later overwrite. Using a pointer instead can help.
for certain members, default initialization does something, and overwriting does something additional. Performance wise, it would have been more efficient to right away initialize the member. Depending on what the initialization does, this might just be a performance hit, but for certain side-effects of the object, it might even be plane wrong.
The member might not be assignable.
Additionally, this is just considered bad style by some people. I personally consider it bad style because I think you should always initialize instead of assign later, even for simple cases, since one day you forget it for an important case and then the lost performance bites you.
But YMMV.
Your code is not actually C++11 at all. I was thinking whether move constructors might work here as you are essentially moving one A into another then modifying it slightly.
As with C++03, you can optimise the initialisation you want to perform once in all your constructors either by putting them into a sub-class or a base-class (often with protected or private inheritance as it's an implementation detail). Using a base class:
class ABase
{
protected:
int num;
ABase() : num(42) {}
};
class A : protected ABase
{
public:
A() = default; // or in C++03 just {}
explicit A(int input) : ABase()
{
num *= input;
}
};
(You can modify your access permissions to taste). The issue here is I only ever create one "ABase" object and if it has more than just a trivial int member, that might be significant. I quite like the inheritence as I then use it in A as a class member rather than a member of some aggregated object, and I prefer the inheritence protected or private here but sometimes if the base class has members I want to be public, I will use public inheritence but give the base class a protected destructor. This is assuming there is no v-table and thus no further derivation is expected. (You can finalize A here actually by making the inheritance virtual and private but you probably don't want to).
In Why is there no base class in C++?, I quoted Stroustrup on why a common Object class for all classes is problematic in c++. In that quote there is the statement:
Using a universal base class implies cost: Objects must be heap-allocated to be polymorphic;
I really didn't look twice at it, and since its on Bjarnes home page I would suppose a lot of eyes have scanned that sentence and reported any misstatements.
A commenter however pointed out that this is probably not the case, and in retrospect I can't find any good reason why this should be true. A short test case yields the expected result of VDerived::f().
struct VBase {
virtual void f() { std::cout <<"VBase::f()\n"; }
};
struct VDerived: VBase {
void f() { std::cout << "VDerived::f()\n"; }
};
void test(VBase& obj) {
obj.f();
}
int main() {
VDerived obj;
test(obj);
}
Of course if the formal argument to test was test(VBase obj) the case would be totally different, but that would not be a stack vs. heap argument but rather copy semantics.
Is Bjarne flat out wrong or am I missing something here?
Addendum:
I should point out that Bjarne has added to the original FAQ that
Yes. I have simplified the arguments; this is an FAQ, not an academic paper.
I understand and sympathize with Bjarnes point. Also I suppose my eyes was one of the pairs scanning that sentence.
Looks like polymorphism to me.
Polymorphism in C++ works when you have indirection; that is, either a pointer-to-T or a reference-to-T. Where T is stored is completely irrelevant.
Bjarne also makes the mistake of saying "heap-allocated" which is technically inaccurate.
(Note: this doesn't mean that a universal base class is "good"!)
I think Bjarne means that obj, or more precisely the object it points to, can't easily be stack-based in this code:
int f(int arg)
{
std::unique_ptr<Base> obj;
switch (arg)
{
case 1: obj = std::make_unique<Derived1 >(); break;
case 2: obj = std::make_unique<Derived2 >(); break;
default: obj = std::make_unique<DerivedDefault>(); break;
}
return obj->GetValue();
}
You can't have an object on the stack which changes its class, or is initially unsure what exact class it belongs to.
(Of course, to be really pedantic, one could allocate the object on the stack by using placement-new on an alloca-allocated space. The fact that there are complicated workarounds is beside the point here, though.)
The following code also doesn't work as might be expected:
int f(int arg)
{
Base obj = DerivedFactory(arg); // copy (return by value)
return obj.GetValue();
}
This code contains an object slicing error: The stack space for obj is only as large as an instance of class Base; when DerivedFactory returns an object of a derived class which has some additional members, they will not be copied into obj which renders obj invalid and unusable as a derived object (and quite possibly even unusable as a base object.)
Summing up, there is a class of polymorphic behaviour that cannot be achieved with stack objects in any straightforward way.
Of course any completely constructed derived object, wherever it is stored, can act as a base object, and therefore act polymorphically. This simply follows from the is-a relationship that objects of inherited classes have with their base class.
Having read it I think the point is (especially given the second sentence about copy-semantics) that universal base class is useless for objects handled by value, so it would naturally lead to more handling via reference and thus more memory allocation overhead (think template vector vs. vector of pointers).
So I think he meant that the objects would have to be allocated separately from any structure containing them and that it would have lead to many more allocations on heap. As written, the statement is indeed false.
PS (ad Captain Giraffe's comment): It would indeed be useless to have function
f(object o)
which means that generic function would have to be
f(object &o)
And that would mean the object would have to be polymorphic which in turn means it would have to be allocated separately, which would often mean on heap, though it can be on stack. On the other hand now you have:
template <typename T>
f(T o) // see, no reference
which ends up being more efficient for most cases. This is especially the case of collections, where if all you had was a vector of such base objects (as Java does), you'd have to allocate all the objects separately. Which would be big overhead especially given the poor allocator performance at time C++ was created (Java still has advantage in this because copying garbage collector are more efficient and C++ can't use one).
Bjarne's statement is not correct.
Objects, that is instances of a class, become potentially polymorphic by adding at least one virtual method to their class declaration. Virtual methods add one level of indirection, allowing a call to be redirected to the actual implementation which might not be known to the caller.
For this it does not matter whether the instance is heap- or stack-allocated, as long as it is accessed through a reference or pointer (T& instance or T* instance).
One possible reason why this general assertion slipped onto Bjarne's web page might be that it is nonetheless extremely common to heap-allocate instances with polymorphic behavior. This is mainly because the actual implementation is indeed not known to the caller who obtained it through a factory function of some sort.
I think he was going along the lines of not being able to store it in a base-typed variable. You're right in saying that you can store it on the stack if it's of the derived type because there's nothing special about that; conceptually, it's just storing the data of the class and it's derivatives + a vtable.
edit: Okay, now I'm confused, re-looking at the example. It looks like you may be right now...
I think the point is that this is not "really" polymorphic (whatever that means :-).
You could write your test function like this
template<class T>
void test(T& obj)
{
obj.f();
}
and it would still work, whether the classes have virtual functions or not.
Polymorphism without heap allocation is not only possible but also relevant and useful in some real life cases.
This is quite an old question with already many good answers. Most answers indicate, correctly of course, that Polymorphism can be achieved without heap allocation. Some answers try to explain that in most relevant usages Polymorphism needs heap allocation.
However, an example of a viable usage of Polymorphism without heap allocation seems to be required (i.e. not just purely syntax examples showing it to be merely possible).
Here is a simple Strategy-Pattern example using Polymorphism without heap allocation:
Strategies Hierarchy
class StrategyBase {
public:
virtual ~StrategyBase() {}
virtual void doSomething() const = 0;
};
class Strategy1 : public StrategyBase {
public:
void doSomething() const override { std::cout << "Strategy1" << std::endl; }
};
class Strategy2 : public StrategyBase {
public:
void doSomething() const override { std::cout << "Strategy2" << std::endl; }
};
A non-polymorphic type, holding inner polymorphic strategy
class A {
const StrategyBase* strategy;
public:
// just for the example, could be implemented in other ways
const static Strategy1 Strategy_1;
const static Strategy2 Strategy_2;
A(const StrategyBase& s): strategy(&s) {}
void doSomething() const { strategy->doSomething(); }
};
const Strategy1 A::Strategy_1 {};
const Strategy2 A::Strategy_2 {};
Usage Example
int main() {
// vector of non-polymorphic types, holding inner polymorphic strategy
std::vector<A> vec { A::Strategy_1, A::Strategy_2 };
// may also add strategy created on stack
// using unnamed struct just for the example
struct : StrategyBase {
void doSomething() const override {
std::cout << "Strategy3" << std::endl;
}
} strategy3;
vec.push_back(strategy3);
for(auto a: vec) {
a.doSomething();
}
}
Output:
Strategy1
Strategy2
Strategy3
Code: http://coliru.stacked-crooked.com/a/21527e4a27d316b0
Let's assume we have 2 classes
class Base
{
public:
int x = 1;
};
class Derived
: public Base
{
public:
int y = 5;
};
int main()
{
Base o = Derived{ 50, 50 };
std::cout << Derived{ o }.y;
return 0;
}
The output will be 5 and not 50. The y is cut off. If the member variables and the virtual functions are the same, there is the illusion that polymorphism works on the stack as a different VTable is used. The example below illustrates that the copy constructor is called. The variable x is copied in the derived class, but the y is set by the initialization list of a temporary object.
The stack pointer has increased by 4 as the class Base holds an integer. The y will just be cut off in the assignment.
When using Polymorphism on the heap you tell the new allocator which type you allocate and by that how much memory on heap you need. With the stack this does not work. And neither memory is shrinking or increasing on the heap. As at the time of initialization you know what you're initializing and exact this amount of memory is allocated.
I am currently debugging a crashlog. The crash occurs because the vtable pointer of a (c++-) object is 0x1, while the rest of the object seems to be ok as far as I can tell from the crashlog.
The program crashes when it tries to call a virtual method.
My question: Under what circumstances can a vtable pointer become null? Does operator delete set the vtable pointer to null?
This occurs on OS X using gcc 4.0.1 (Apple Inc. build 5493).
Could be a memory trample - something writing over that vtable by mistake. There is a nearly infinite amount of ways to "achieve" this in C++. A buffer overflow, for example.
Any kind of undefined behaviour you have may lead to this situation. For example:
Errors in pointer arithmetic or other that make your program write into invalid memory.
Uninitialized variables, invalid casts...
Treating an array polymorphically might cause this as a secondary effect.
Trying to use an object after delete.
See also the questions What’s the worst example of undefined behaviour actually possible? and What are all the common undefined behaviour that a C++ programmer should know about?.
Your best bet is to use a bounds and memory checker, as an aid to heavy debugging.
A very common case: trying to call a pure virtual method from the constructor...
Constructors
struct Interface
{
Interface();
virtual void logInit() const = 0;
};
struct Concrete: Interface()
{
virtual void logInit() const { std::cout << "Concrete" << std::endl; }
};
Now, suppose the following implementation of Interface()
Interface::Interface() {}
Then everything is fine:
Concrete myConcrete;
myConcrete.pure(); // outputs "Concrete"
It's such a pain to call pure after the constructor, it would be better to factorize the code right ?
Interface::Interface() { this->logInit(); } // DON'T DO THAT, REALLY ;)
Then we can do it in one line!!
Concrete myConcrete; // CRASHES VIOLENTLY
Why ?
Because the object is built bottom up. Let's look at it.
Instructions to build a Concrete class (roughly)
Allocate enough memory (of course), and enough memory for the _vtable too (1 function pointer per virtual function, usually in the order they are declared, starting from the leftmost base)
Call Concrete constructor (the code you don't see)
a> Call Interface constructor, which initialize the _vtable with its pointers
b> Call Interface constructor's body (you wrote that)
c> Override the pointers in the _vtable for those methods Concrete override
d> Call Concrete constructor's body (you wrote that)
So what's the problem ? Well, look at b> and c> order ;)
When you call a virtual method from within a constructor, it doesn't do what you're hoping for. It does go to the _vtable to lookup the pointer, but the _vtable is not fully initialized yet. So, for all that matters, the effect of:
D() { this->call(); }
is in fact:
D() { this->D::call(); }
When calling a virtual method from within a Constructor, you don't the full dynamic type of the object being built, you have the static type of the current Constructor invoked.
In my Interface / Concrete example, it means Interface type, and the method is virtual pure, so the _vtable does not hold a real pointer (0x0 or 0x01 for example, if your compiler is friendly enough to setup debug values to help you there).
Destructors
Coincidently, let's examine the Destructor case ;)
struct Interface { ~Interface(); virtual void logClose() const = 0; }
Interface::~Interface() { this->logClose(); }
struct Concrete { ~Concrete(); virtual void logClose() const; char* m_data; }
Concrete::~Concrete() { delete[] m_data; } // It's all about being clean
void Concrete::logClose()
{
std::cout << "Concrete refering to " << m_data << std::endl;
}
So what happens at destruction ? Well the _vtable works nicely, and the real runtime type is invoked... what it means here however is undefined behavior, because who knows what happened to m_data after it's been deleted and before Interface destructor was invoked ? I don't ;)
Conclusion
Never ever call virtual methods from within constructors or destructors.
If it's not that, you're left with a memory corruption, tough luck ;)
My first guess would be that some code is memset()'ing a class object.
This is totaly implementation dependant. However it would be quite safe to assume that after delete some other operation may set the memory space to null.
Other possibilities include overwrite of the memory by some loose pointer -- actually in my case it's almost always this...
That said, you should never try to use an object after delete.
I am working on a legacy framework. Lets say 'A' is the base-class and 'B' is the derived class. Both the classes do some critical framework initialization. FWIW, it uses ACE library heavily.
I have a situation wherein; an instance of 'B' is created. But the ctor of 'A' depends on some initialization that can only be performed from 'B'.
As we know when 'B' is instantiated the ctor for 'A' is invoked before that of 'B'. The virtual mechanism dosen't work from ctors, using static functions is ruled-out (due to static-initialization-order-fiasco).
I considered using the CRTP pattern as follows :-
template<class Derived>
class A {
public:
A(){
static_cast<Derived*>(this)->fun();
}
};
class B : public A<B> {
public:
B() : a(0) {
a = 10;
}
void fun() { std::cout << "Init Function, Variable a = " << a << std::endl; }
private:
int a;
};
But the class members that are initialized in the initializer list have undefined values as they are not yet executed (f.e. 'a' in the above case). In my case there a number of such framework-based initialization variables.
Are there any well-known patterns to handle this situation?
Thanks in advance,
Update:
Based on the idea given by dribeas, i conjured-up a temporary solution to this problem (a full-fledged refactoring does not fit my timelines for now). The following code will demonstrate the same:-
// move all A's dependent data in 'B' to a new class 'C'.
class C {
public:
C() : a(10)
{ }
int getA() { return a; }
private:
int a;
};
// enhance class A's ctor with a pointer to the newly split class
class A {
public:
A(C* cptr)
{
std::cout << "O.K. B's Init Data From C:- " << cptr->getA() <<
std::endl;
}
};
// now modify the actual derived class 'B' as follows
class B : public C, public A {
public:
B()
: A(static_cast<C*>(this))
{ }
};
For some more discussion on the same see this link on c.l.c++.m. There is a nice generic solution given by Konstantin Oznobikhin.
Probably the best thing you can do is refactoring. It does not make sense to have a base class depend on one of its derived types.
I have seen this done before, providing quite some pain to the developers: extend the ACE_Task class to provide a periodic thread that could be extended with concrete functionality and activating the thread from the periodic thread constructor only to find out that while in testing and more often than not it worked, but that in some situations the thread actually started before the most derived object was initialized.
Inheritance is a strong relationship that should be used only when required. If you take a look at the boost thread library (just the docs, no need to enter into detail), or the POCO library you will see that they split the problem in two: thread classes control thread execution and call a method that is passed to them in construction: the thread control is separated from the actual code that will be runned, and the fact that the code to be run is received as an argument to the constructor guarantees that it was constructed before the thread constructor was called.
Maybe you could use the same approach in your own code. Divide the functionality in two, whatever the derived class is doing now should be moved outside of the hierarchy (boost uses functors, POCO uses interfaces, use whatever seems to fit you most). Without a better description of what you are trying to do, I cannot really go into more detail.
Another thing you could try (this is fragile and I would recommend against) is breaking the B class into a C class that is independent of A and a B class that inherits from both, first from C then from A (with HUGE warning comments there). This will guarantee that C will be constructed prior to A. Then make the C subobject an argument of A (through an interface or as a template argument). This will probably be the fastest hack, but not a good one. Once you are willing to modify the code, just do it right.
First, I think your design is bad if the constructor of a base class depends on the something done in the constructor in a derived. It really shouldn't be that way. At the time the constructor of the base class run, the object of the derived class basically doesn't exist.
A solution might be to have a helper object passed from the derived class to the constructor of the base class.
Perhaps Lazy Initialization does it for you. Store a flag in A, wether it's initialized or not. Whenever you call a method, check for the flag. if it's false, initialize A (the ctor of B has been run then) and set the flag to true.
It is a bad design and as already said it is UB. Please consider moving such dependencies to some other method say 'initialize' and call this initialize method from your derived class constructor (or anywhere before you actually need the base class data to be initialized)
Hmm. So, if I'm reading into this correctly, "A" is part of the legacy code, and you're pretty damn sure the right answer to some problem is to use a derived class, B.
It seems to me that the simplest solution might be to make a functional (non-OOP) style static factory function;
static B& B::makeNew(...);
Except that you say you run into static initialization order fiasco? I wouldn't think you would with this kind of setup, since there's no initialization going on.
Alright, looking at the problem more, "C" needs to have some setup done by "B" that "A" needs done, only "A" gets first dibs, because you want to have inheritance. So... fake inheritance, in a way that lets you control construction order...?
class A
{
B* pB;
public:
rtype fakeVirtual(params) { return pB->fakeVirtual(params); }
~A()
{
pB->deleteFromA();
delete pB;
//Deletion stuff
}
protected:
void deleteFromB()
{
//Deletion stuff
pB = NULL;
}
}
class B
{
A* pA;
public:
rtype fakeInheritance(params) {return pA->fakeInheritance(params);}
~B()
{
//deletion stuff
pA->deleteFromB();
}
protected:
friend class A;
void deleteFromA()
{
//deletion stuff
pA = NULL;
}
}
While it's verbose, I think this should safely fake inheritance, and allow you to wait to construct A until after B has done it's thing. It's also encapsulated, so when you can pull A you shouldn't have to change anything other than A and B.
Alternatively, you may also want to take a few steps back and ask yourself; what is the functionality that inheritance gives me that I am trying to use, and how might I accomplish that via other means? For instance, CRTP can be used as an alternative to virtual, and policies an alternative to function inheritance. (I think that's the right phrasing of that). I'm using these ideas above, just dropping the templates b/c I'm only expecting A to template on B and vice versa.