Object initialization - c++

I works on embedded software. Previously we don't use too many C++ features so we use memset(this,0,sizeof(child)) to initialize(zero out) a object. However it doesn't work now since we are using virtual functions. Apparently it would destroy the vtable/virtual pointer.
So my question is:
How can I initialize an object quickly and conveniently?
The class child inherits from class parent, which defines a lot virtual functions, and got many data member. If I need only to zero out all data member, any way to avoid member-by-memeber assignment in child's constructor without using memset()? or any trick to use memset without destroying vtable? (compiler-independent way)
Thank you very much.

You're asking to utilize the facilities of C++ but don't want the performance-hit of per-member initialization. Firstly, I'd ask myself if this is really the hit you're talking about. There are plenty of more bottlenecks you can be looking for than setting a member to 0.
But, if you want the features of C++ and still want the speed of memset() then I suggest you put the data for this class in a different class and initialize that to 0 and pass it to the class that is going to use it by reference.

Using placement new is definitely an option to avoid member wise zeroing out memory. Use delete[] to delete memory.
struct base{virtual ~base(){}};
struct derived : base{};
int main()
{
char *p = new char[sizeof(derived)];
memset(p, 0, sizeof(derived));
derived *pd = new (p) derived;
}

DISCLAIMER: This is a cheap and dirty hack, not very C++ish, and haters will hate it. But hey. If you gotta do what you gotta do, and what you gotta do it to is a POD, then this will work.
If you can take the data members that you want to memset and put them in their own POD, you can memset that POD. To wit (the POD in question here is the BucketOBits struct):
NOTE: It is important that the datatype you use here is a POD (Plain Old Data). For more about what this means, see this FAQ entry.
#include <cstdlib>
#include <cstring>
class Interface
{
public:
virtual void do_it() const = 0;
virtual ~Interface() {};
};
class Object : public Interface
{
public:
Object();
void do_it() const {};
private:
struct BucketOBits
{
int int_a_;
int int_b_;
int int_c_;
} bucket_;
};
Object::Object()
{
memset(&bucket_, 0, sizeof(bucket_));
};
int main()
{
Interface* ifc = new Object;
}
Even better, you can use the fact that value initialization for integral types means zero-initialization, and get rid of the memset entirely, while at the same time maybe even making your code a little faster than if you had used memset. Use default construction for BucketOBits in the constructor's initialization:
Object::Object() : bucket_()
{
};
EDIT2:
If both base & derived classes have data members that need this zero-init, then you can still use this method by giving each class it's own BucketOBits. Case in point:
#include <cstdlib>
#include <cstring>
class Interface
{
public:
virtual void do_it() const = 0;
Interface();
virtual ~Interface() {};
private:
struct BucketOBits
{
unsigned base_int_a_;
unsigned base_int_b_;
long base_int_c_;
} bucket_
};
class Object : public Interface
{
public:
Object();
void do_it() const {};
private:
struct BucketOBits
{
int int_a_;
int int_b_;
int int_c_;
} bucket_;
};
Interface::Interface() : bucket_()
{
}
Object::Object() : bucket_()
{
}
int main()
{
Interface* ifc = new Object;
}

First, you cannot avoid using the constructor, because it will be called automatically when you create the object. If you do not define a constructor yourself, the compiler will define one for you. By the time you call memset(this), which BTW you should never ever do, the constructor has already been called.
Second, in C++ initialization and assignment is not quite the same thing. Initialization is actually faster, which is why you should initialize the data members in the constructor's initialization list, rather then assign values to them in the body of the constructor.
In short, I would advise you not to fight the language.

any trick to use memset without
destroying vtable?
(compiler-independent way)
There is no way to work-around this platform independent.
The reason is that vtable not placed in a specific address, but could be in the begining of the object, or right after the last data member. So it is not portable to start calculating addresses and jump over it. Also there is the size of pointer depending on architecture etc.
For multiple inheritance it gets worse.
You should use either an initialization list (not assignment in constructor) or placement new
as Chubsdad's answer.
If I need only to zero out all data
member, any way to avoid
member-by-memeber assignment in
child's constructor without using
memset()?
You can not (and must not) avoid calling the constructor in this context, since it is the constructor that initializes the vtable pointer

Related

Why does C++ allow calling constructor explicitly? [duplicate]

I know we can explicitly call the constructor of a class in C++ using scope resolution operator, i.e. className::className(). I was wondering where exactly would I need to make such a call.
You also sometimes explicitly use a constructor to build a temporary. For example, if you have some class with a constructor:
class Foo
{
Foo(char* c, int i);
};
and a function
void Bar(Foo foo);
but you don't have a Foo around, you could do
Bar(Foo("hello", 5));
This is like a cast. Indeed, if you have a constructor that takes only one parameter, the C++ compiler will use that constructor to perform implicit casts.
It is not legal to call a constructor on an already-existing object. That is, you cannot do
Foo foo;
foo.Foo(); // compile error!
no matter what you do. But you can invoke a constructor without allocating memory - that's what placement new is for.
char buffer[sizeof(Foo)]; // a bit of memory
Foo* foo = new(buffer) Foo(); // construct a Foo inside buffer
You give new some memory, and it constructs the object in that spot instead of allocating new memory. This usage is considered evil, and is rare in most types of code, but common in embedded and data structure code.
For example, std::vector::push_back uses this technique to invoke the copy constructor. That way, it only needs to do one copy, instead of creating an empty object and using the assignment operator.
Most often, in a child class constructor that require some parameters :
class BaseClass
{
public:
BaseClass( const std::string& name ) : m_name( name ) { }
const std::string& getName() const { return m_name; }
private:
const std::string m_name;
//...
};
class DerivedClass : public BaseClass
{
public:
DerivedClass( const std::string& name ) : BaseClass( name ) { }
// ...
};
class TestClass :
{
public:
TestClass( int testValue ); //...
};
class UniqueTestClass
: public BaseClass
, public TestClass
{
public:
UniqueTestClass()
: BaseClass( "UniqueTest" )
, TestClass( 42 )
{ }
// ...
};
... for example.
Other than that, I don't see the utility. I only did call the constructor in other code when I was too young to know what I was really doing...
I think the error message for compiler error C2585 gives the best reason why you would need to actually use the scope-resolution operator on the constructor, and it does in with Charlie's answer:
Converting from a class or structure type based on multiple inheritance. If the type inherits the same base class more than once, the conversion function or operator must use scope resolution (::) to specify which of the inherited classes to use in the conversion.
So imagine you have BaseClass, and BaseClassA and BaseClassB both inherit BaseClass, and then DerivedClass inherits both BaseClassA and BaseClassB.
If you are doing a conversion or operator overload to convert DerivedClass to a BaseClassA or BaseClassB, you will need to identify which constructor (I'm thinking something like a copy constructor, IIRC) to use in the conversion.
In general you do not call the constructor directly. The new operator calls it for you or a subclass calls the parent class' constructors. In C++, the base class is guarenteed to be fully constructed before the derived class' constructor starts.
The only time you would call a constructor directly is in the extremely rare case where you are managing memory without using new. And even then, you shouldn't do it. Instead you should use the placement form of operator new.
I don't think you would typically use that for the constructor, at least not in the way you're describing. You would, however, need it if you have two classes in different namespaces. For example, to specify the difference between these two made-up classes, Xml::Element and Chemistry::Element.
Usually, the name of the class is used with the scope resolution operator to call a function on an inherited class's parent. So, if you have a class Dog that inherits from Animal, and both of those classes define the function Eat() differently, there might be a case when you want to use the Animal version of eat on a Dog object called "someDog". My C++ syntax is a little rusty, but I think in that case you would say someDog.Animal::Eat().
There are valid use cases where you want to expose a classes constructors. If you wish to do your own memory management with an arena allocator for example, you'll need a two phase construction consisting of allocation and object initialization.
The approach I take is similar to that of many other languages. I simply put my construction code in well known public methods (Construct(), init(), something like that) and call them directly when needed.
You can create overloads of these methods that match your constructors; your regular constructors just call into them. Put big comments in the code to warn others that you are doing this so they don't add important construction code in the wrong place.
Remember that there is only one destructor method no matter which construction overload was used, so make your destructors robust to uninitialized members.
I recommend against trying to write initializers that can re-initialize. It's hard to tell the case where you are looking at an object that just has garbage in it because of uninitialized memory vs. actually holding real data.
The most difficult issue comes with classes that have virtual methods. In this case the compiler normally plugs in the vtable function table pointer as a hidden field at the start of the class. You can manually initialize this pointer, but you are basically depending on compiler specific behavior and it's likely to get your colleagues looking at you funny.
Placement new is broken in many respects; in the construction/destruction of arrays is one case so I tend not to use it.
Consider the following program.
template<class T>
double GetAverage(T tArray[], int nElements)
{
T tSum = T(); // tSum = 0
for (int nIndex = 0; nIndex < nElements; ++nIndex)
{
tSum += tArray[nIndex];
}
// Whatever type of T is, convert to double
return double(tSum) / nElements;
}
This will call a default constructor explicitly to initialize the variable.

Achieve polymorphism with contiguous memory

I'm not facing a "problem" actually, as my code does work. I'm just curious about whether my implementations are reasonable and riskless.
I've been working on a project using C++, in which I first parse a file and then build a directed-acyclic-graph structure accordingly. Each nodes may have 0~2 out-neighbors depending on the type of the node. For different types of nodes, some functions for printing and accessing are needed, and I decided to do it using polymorphism.
My first trial was to implement it with nodes storing pointers to its out-neighbors.
class Base{
public:
Base(){}
virtual ~Base(){}
virtual foo()=0;
// ...
protected:
unsigned _index;
}
class Derived1: public Base{
public:
foo(){ /*Do something here...*/ }
private:
Base* _out1;
}
class Derived2: public Base{
public:
foo(){ /*Do something different here...*/ }
private:
Base* _out1;
Base* _out2;
}
int main(){
std::vector<Base*> _nodeList;
for(/*during parsing*/){
if(someCondition){
_nodeList.pushback(new Derived1);
}
// ...
}
}
Since the out-neighbor of a node may be yet to define when the node is constructed, I have to add some tricks to first remember id of the out-neighbors and connect them after finishing the construction of all nodes.
However, since the number of nodes are determined given the file to parse and will not grow ever after, I consider it better to store all nodes contiguously and each node store the indices of its out-neighbors instead of pointers. This allows me to skip the connection part and also brings some minor benefits to other parts.
My current version is as follows:
// Something like this
class Base{
public:
Base(){}
virtual ~Base(){}
virtual foo()=0;
// ...
protected:
unsigned _index;
unsigned _out1;
unsigned _out2;
}
class Derived1: public Base{
public:
foo(){ /*Do something here...*/ }
}
class Derived2: public Base{
public:
foo(){ /*Do something a little bit different here...*/ }
}
int main(){
// EDITED!!
// Base* _nodeList = new DefaultNode[max_num];
Base* _nodeList = new Derived2[max_num];
for(/*during parsing*/){
if(someCondition){
// EDITED!!
// _nodeList[i] = Derived1;
new(_nodeList+i) Derived1();
}
// ...
}
}
My questions
Are there any risks to store objects of different class in a newed array, given that they are all of the same size and can be destructed using a virtual destructor?
I've always heard that the use of new[] should be avoided. I did found some approaches that achieve what I want using vector of union with a type tag, but it seems somewhat dirty to me. Is there a way to achieve polymorphism while storing data in a std::vector?
Is the practice of using polymorphism merely to make use of the convenience of virtual functions consider a bad habit? By saying so I mean if the memory taken by each object is already the same for each derived class, then they may be merged into one single class that store its own type, and each member function can just behave according to its own type. I chose not to do so since it also looks dirty to me to have huge switch structure in each member function.
Is it good to choose contiguous memory in this case? Are there any reasons that such choice may be harmful?
EDIT:
It turns out that I'm making many mistakes such as asking too many questions at a time. I think I'll first focus on the part with polymorphism and placement new. The following is a testable program of what I mean by "storing objects of different derived classes in an newed array, and it behaves on my laptop as shown below.
#include <iostream>
class Base{
public:
Base(){}
virtual ~Base(){}
void virtual printType() =0;
};
class Derived1: public Base{
public:
Derived1(){}
void printType(){ std::cout << "Derived 1." << std::endl; }
};
class Derived2: public Base{
public:
Derived2(){}
void printType(){ std::cout << "Derived 2." << std::endl; }
};
int main(){
Base* p = new Derived1[5];
new(p+2) Derived2();
for(unsigned i = 0; i < 5; ++i){
(p+i)->printType();
}
}
Outcome:
Derived 1.
Derived 1.
Derived 2.
Derived 1.
Derived 1.
Again, thanks for all the feedbacks and suggestions.
Are there any risks to store objects of different class in an newed array, given that they are all of the same size and can be destructed
using a virtual destructor?
This is not what happens in your second proposition:
Base* _nodeList = new DefaultNode[max_num];
_nodeList is an array of DefaultNote and nothing else! Trying to store something in it like _nodeList[i] = ... will never change anything about the nature of stored objects (note that _nodeList[i] = Derived1; is not C++). If you want polymorphism you need to retain objects either through pointers or references. Then the first solution is the correct one: std::vector<Base*> _nodeList;.
I've always heard that the use of new[] should be avoided. I did found some approaches that achieve what I want using vector of union
with a type tag, but it seems somewhat dirty to me. Is there a way to
achieve polymorphism while storing data in a std::vector?
the use of new[] should be avoided is a non-sense. As said before, if you need polymorphism then std::vector<Base*> _nodeList; is perfect, because that means that you can store in _nodeList the address of any object whose class is either Base or any subtype of.
Is the practice of using polymorphism merely to make use of the
convenience of virtual functions consider a bad habit? By saying so I
mean if the memory taken by each object is already the same for each
derived class, then they may be merged into one single class that
store its own type, and each member function can just behave according
to its own type. I chose not to do so since it also looks dirty to me
to have huge switch structure in each member function.
Subtyped polymorphism is the use of virtual functions. Why bad habit? If you don't use virtual functions that just means that you are constructing the polymorphism by yourself, which is probably a very bad thing.
Now, if your derived classes are just like what was proposed in your exemple, I can suggested you not to use subclasses but only ctor overloading...
Is it good to choose contiguous memory in this case? Are there any reasons that such choice may be harmful?
I'm not sure to really understand why this is a concern for you. Contiguous memory is not harmful... This question is at least not clear.
The problem is that normally you cannot allocate different polymorphic classes inside a vector or array - only pointers to them. So you cannot make it contiguous.
In your case usage of polymorphism is most probably a bad idea. It will result in poor memory fragmentation and slow performance due to issues with lots of virtual calls and fails on the branch prediction. Though, if there aren't many nodes or you don't use it too frequently in your code - then it won't affect the overall performance of the program.
To avoid this, simply store nodes' data (and make it a plain struct) inside a vector and utilize separate classes that implement those foo() functions.
Example:
std::vector<NodeData> nodes;
class Method1
{
public:
static void Process(NodeData& node);
...
}
class Method2
{
public:
static void Process(NodeData& node);
...
}
Then you can either make a single switch to choose which method to apply or, say, store nodes' data inside several vectors so that each vector identifies which method to use.

Is the memory layout of C++ single inheritance the same as this C code?

I'm working with a library written in C, which does inheritance like so:
struct Base
{
int exampleData;
int (function1)(struct Base* param1, int param2);
void (function2)(struct Base* param1, float param2);
//...
};
struct Derived
{
struct Base super;
//other data...
};
struct Derived* GetNewDerived(/*params*/)
{
struct Derived* newDerived = malloc(sizeof struct Derived);
newDerived->super.function1 = /*assign function*/;
newDerived->super.function2 = /*assign function*/;
//...
return newDerived;
}
int main()
{
struct Derived *newDerieved = GetNewDerived(/*params*/);
FunctionExpectingBase((struct Base*) newDerived);
free(newDerived);
}
It is my understanding this works because the pointer to Derived is the same as the pointer to the first member of Derived, so casting the pointer type is sufficient to treat an object as its "base class." I can write whatever gets assigned to function1 and function2 to cast incoming pointer from Base* to Derived* to access the new data.
I am extending functionality of code like this, but I am using a C++ compiler. I'm wondering if the below is equivalent to the above.
class MyDerived : public Base
{
int mydata1;
//...
public:
MyDerived(/*params*/)
{
function1 = /*assign function pointer*/;
function2 = /*assign function pointer*/;
//...
}
//...
};
int main()
{
MyDerived *newDerived = new MyDerived(/*params*/);
FunctionExpectingBase( static_cast<Base*>(newDerived) );
delete newDerived;
}
Can I expect the compiler to lay out the memory in MyDerived in the same way so I can do the same pointer cast to pass my object into their code? Or must I continue to write this more like their C architecture, and not take advantage of the C++ compiler doing some of the more tedious bits for me?
I'm only considering single inheritance for the scope of this question.
According to Adding a default constructor to a base class changes sizeof() a derived type and When extending a padded struct, why can't extra fields be placed in the tail padding? memory layout can change even if you just add constructor to MyDerived or make it non POD any other way. So I am afraid there is no such guarantee. Practically you can make it work using proper compile time asserts validating the same memory layout for both structures, but such solution does not seem to be safe and supported by standard C++.
On another side why your C++ wrapper MyDerived cannot inherit from Derived? This would be safe (as it can be safe when Derived is casted to Base and back, but I assume that is out of your control). It may change initialization code in MyDerived::MyDerived() to more verbose, but I guess that is small price for proper solution.
For your problem it does not really matter since the "client" code only cares about having a valid Base* pointer: they aren't going to downcast it in Derived or whatever, or copy it.

do operator methods occupy memory in c++ objects?

Suppose I have some simple classes/structs without anything but data and a select few operators. If I understand, a basic struct with only data in C++, just like C, occupies as much memory as the members. For example,
struct SomeStruct { float data; }
sizeof(SomeStruct) == sizeof(float); // this should evaluate to true
What I'm wondering is if adding operators to the class will make the object larger in memory. For example
struct SomeStruct
{
public:
SomeStruct & operator=(const float f) { data = f; return this; }
private:
float data;
}
will it still be true that sizeof(SomeStruct) == sizeof(float) evaluates to true? Are there any operators/methods which will not increase the size of the objects in memory?
The structure may not necessarily be only as large as its members (consider padding and alignment), but you are basically correct, in that:
Functions are not data, and are not "stored" inside the object type.
That said, watch out for the addition of virtual table pointers in the case where you add a virtual function to your type. This is a one-time size increase for the type, and does not re-apply when you add more virtual functions.
What I'm wondering is if adding operators to the class will make the object larger in memory.
The answer is "it depends".
If the class wasn't polymorphic prior to adding the function and this new function keeps the class non-polymorphic, then adding this non-polymorphic function does nothing to the size of your class instances.
On the other hand, if adding this new function does make your class polymorphic, this addition will make instances of your class bigger. Most C++ implementations use a virtual table, or vtable for short. Each instance of a polymorphic class contains a pointer to the vtable for that class. Instances of non-polymorphic classes don't need and thus don't contain a vtable pointer.
Finally, adding yet another virtual function to a class that is already polymorphic does not make the class instances bigger. This addition does makes the vtable for that class bigger, but the vtable itself isn't a part of the instance. A vtable pointer is a part of the instance, and that pointer is already a part of the class layout because the class is already polymorphic.
When I was learning about C++ and OOP, I read somewhere (some bad source) that objects in C++ are essentially the same thing as C structs with function pointers inside of them.
They may be like that functionally, but if they were really implemented like that, it would have been a huge waste of space since all object instances would have to store the same pointers.
Method code is stored in one central location and C++ just makes it conveniently look like as if each instance had its methods inside of it.
(Operators are essentially functions with different syntax).
Methods and operators defined inside classes do not increase the size of instantiated objects. You can test it out for yourself:
#include <iostream>
using namespace std;
struct A {
int a;
};
struct B {
int a;
//SOME RANDOM METHODS AND OPERATORS
B() : a(1) {cout<<"I'm the constructor and I set 'a' to 1"<<endl;}
void some_method() const { for(int i=0;i<40;i++) cout<<"loop";}
B operator+=(const B& b){
a+=b.a;
return *this;
}
size_t my_size() const { return sizeof(*this);}
};
int main(){
cout<<sizeof(A)<<endl;
cout<<B().my_size()<<endl;
}
Output on a 64 bit system:
4
I'm the constructor and I set 'a' to 1
4
==> No change in size.

Residing a member of parent class type inside another class

#include <iostream>
class BarParent
{
virtual void fuz()
{
std::cout << "BarParent" << std::endl;
}
};
class BarChild : public BarParent
{
virtual void fuz()
{
std::cout << "BarChild" << std::endl;
}
};
class Foo
{
// ??BarParent bar;??
public:
Foo(BarParent bar);
};
What I seek is to store a copy of BarParent that is passed to the constructor and let it reside in Foo, while still calling the correct virtual function
This is an embedded application: Use of the heap is frown upon. So preferably, no heap
SUMMARY: To the best of knowledge, it cannot be done, becuase of the slicing problem (long story short the compiler cannot determine the size of generic Bar and so on copying it type casts), so polymorphism cannot be achieved. Using templates might be a good idea, however, it defines multiple classes Foo<typename BarType>, as a result, doing a function such as changeBar(BarParent), would not be possible since the compiler would define this as changeBar(BarType) defined only for class Foo<Bartype>. If someone has a better idea, please let me know.
I think i will have to go for heap, or const Barparent and pointers. If the user const_casts, then he is asking for trouble, not my fault!
class Foo
{
BarParent* bar; //or std::unique_ptr<>
public:
Foo(BarParent* barInst):bar(barInst){}
};
This will do what you want it to. You store a pointer to a BarParent object and you can polymorphicaly(is that a word?) call virtual functions using it.
You need to create the copy outside the constructor (on the heap or otherwise), and pass the pointer to it to the foo object constructor. Alternatively you can implement a clone method as discussed at Copying derived entities using only base class pointers, (without exhaustive testing!) - C++
A radically different approach would be to use templates.. it would leave you with a multidudes of foo<> types though.. If you are not going to reassign the bar object, or store all foo in a container, this might be the better option for you, since it doesn't involve the heap
template<typename BarType>
class Foo
{
BarType bar; //pointer not needed any more since we are storing the exact type.
public:
Foo(BarType& barInst):bar(barInst){}
};
There is no way I know of to handle this gracefully without object slicing.
The only way I could think of would be to use pointer, and create a copy when "calling" the Foo constructor:
class Foo
{
BarParent* bar;
public:
Foo(BarParent* b) : bar(b) {}
};
BarChild child;
Foo myFoo(new BarChild(child));