Alternatives for polymorphic data storage - c++

I'm storing a large amount of computed data and I'm currently using a polymorphic type to reduce the amount of storage required. Everything is extremely fast except for deleting the objects when I'm finished and I think there must be a better alternative. The code computes the state at each step and depending on the conditions present it needs to store certain values. The worst case is storing the full object state and the best state is storing almost nothing. The (very simplified) setup is as follows:
class BaseClass
{
public:
virtual ~BaseClass() { }
double time;
unsigned int section;
};
class VirtualSmall : public BaseClass
{
public:
double values[2];
int othervalue;
};
class VirtualBig : public BaseClass
{
public:
double values[16];
int othervalues[5];
};
...
std::vector<BaseClass*> results(10000);
The appropriate object type is generated during computation and a pointer to it is stored in the vector. The overhead from vtable+pointer is overall much smaller than than the size difference between the largest and smallest object (which is least 200 bytes according to sizeof). Since often the smallest object can be used instead of the largest and there are potentially many tens of millions of them stored it can save a few gigabytes of memory usage. The results can then be searched extremely fast as the base class contains the information necessary to find the correct item which can then be dynamic_cast back to it's real type. It works very well for the most part.
The only issue is with delete. It takes a few seconds to free all of the memory when there is many tens of millions of objects. The delete code iterates through each object and delete results[i] which calls the virtual destructor. While it's not impossible to work around I think there must be a more elegant solution.
It could definitely be done by allocating largish contiguous blocks of memory (with malloc or similar), which are kept track of and then something generates a correct pointers to the next batch of free memory inside of the block. That pointer is then stored in the vector. To free the memory the smaller number of large blocks need to have free() called on them. There is no more vtable (and it can be replaced by a smaller type field to ensure the correct cast) which saves space as well. It is very much a C style solution though and not particularly pretty.
Is there a C++ style solution to this type of problem I'm overlooking?

You can overload the "new" operator (i.e. void* VirtualSmall::operator new(size_t) ) for you classes, and implement them to obtain memory from custom allocators. I would use one block allocator for each derived class, so that each block size is a multiple of the class' it's supposed to store.
When it's time to cleanup, tell each allocators to release all blocks. No destructors will be called, so make sure you don't need them.

Related

Avoiding repeated C++ virtual table lookup

I have C++ program that reads a config file when the binary is executed, creates a number of child class instances based on the config file, and then periodically iterates over these instances and calls their respective virtual functions.
Gprof is telling me that these function calls are taking up a lot of time (the aforementioned iteration happens very frequently), so I want to try to avoid the repeated virtual function calls somehow.
The code is similar to the following. Once the program populates vector v at the start of the program, this vector won't change anymore for the rest of the program, so it seems inefficient to repeatedly have to do a virtual table lookup every time I want to call f(). I would think there must be a way to cache or save the function pointers somehow, but I'm not sure how.
Would love any suggestions you have on speeding things up. Thank you!
Edit: Sorry, I forgot to mention that the function calls f() for the vector of Child instances has to be in order from 0 to v.size() - 1, so I can't group together the elements of v that have the same derived type.
Also, this was built with -O3 -std=c++14
class Parent {
public:
virtual void f() { }
};
class Child1 : public Parent {
public:
void f() { /* do stuff for child1 */ }
};
//...
class Child9 : public Parent {
public:
void f() { /* do stuff for child9 */ }
};
int main() {
vector<Parent*> v;
// read config file and add Child instances to v based on the file contents
while (true) {
// do other stuff
for (size_t i = 0; i != v.size(); ++i) {
v[i]->f(); // expensive to do the same virtual table lookups every loop!
}
}
};
Based on some of the questions and your answers in the comments, here are a couple of considerations.
1) Your problem (if there is one, your solution might already be close to optimal, depending on details you have not mentioned) is most likely somewhere else, not in the overhead of a virtual function call.
If you really run this in a tight loop, and there's not much going on in the implementations of f() that touches a lot of memory, your vtables probably remain in the L1 cache, and the virtual function call overhead will be absolutely minimal, if any, on modern hardware.
2) You say "the functions f() themselves are very simple, for example one of them just multiplies the values at two memory addresses and stores the product in a third address" - this might not be as innocent as you expect. For reference, going to L1 cache will cost you about 3 cycles, going to RAM may cost as much as 60-200, depedning on your hardware.
If you have enough of these objects (so that keeping all of the memory they reference in L1 cache is not possible), and the memory locations they reference are basically random (so that prefetching is ineffective), and/or you touch enough things in the rest of your program (so that all the relevant data gets vacated from cache between the loops over your vector), the cost of fetching and storing the values from and to memory/lower levels of cache will outweigh the cost of the virtual function calls by orders of magnitude in the worst case.
3) You iterate over a vector of pointers to objects - not the objects themselves.
Depending on how you allocate the objects and how big they are, this might not be an issue - prefetching will do wonders for you if you allocate them in a tight loop and your allocator packs them nicely. If, however, you allocate/free a lot of other things and mix in the allocations of these objects in between, they may end up located sparsely and in basically random locations in memory; then iterating over them in the order of creation will involve a lot random reads from memory, which will again be far slower than any virtual function overhead.
4) You say "calls to f() for the vector of children has to be in order" - do they?
If they do, then you are out of luck in some ways. If, however, you can re-architect your system so that they can be called ordered by type, then there is a lot of speed to be gained in various aspects - you could probably allocate an array of each type of object (nice, dense packing in memory), iterate over them in order (prefetcher friendly), and call your f()'s in groups for a single, well known type (inlining friendly, instruction cache friendly).
5) And finally - if none of the above applies and your problem is really in virtual function calls (unlikely), then, yes, you can try storing a pointer to the exact function you need to call for each object in some fashion - either manually or by using one of the type erasure / duck typing methods others have suggested.
My main point is this - there a lot of performance benefits to be had from changing the architecture of your system in some ways.
Remember: accessing things that are already in L1/L2 cache is good, having to go to L3/RAM for data is worse; accessing memory in a sequential order is good, jumping all over memory is bad; calling the same method in a tight loop, potentially inlining it, is good, calling a lot of different methods in a tight loop is worse.
If this is a part of your program the performance of which really matters, you should consider changing the architecture of your system to allow for some of the previously mentioned optimizations. I know this may seem daunting, but that is the game we are playing. Sometimes you need to sacrifice "clean" OOP and abstractions for performance, if the problem you are solving allows for it.
Edit: For vector of arbitrary child types mixed in together, I recommend going with the virtual call.
If, depending on config, there were a vector of only one child type - or if you can separate the different types into separate containers, then this could be a case where compile time polymorphism might be an option instead of runtime one. For example:
template<class Child, class Range>
void f_for(Range& r) {
for (Parent* p : r) {
Child* c = static_cast<Child*>(p);
c->Child::f(); // use static dispatch to avoid virtual lookup
}
}
...
if (config)
f_for<Child1>(v);
else
f_for<Child2>(v);
Alternative to explicit static dispatch would be to mark the child class or the member function final.
You might even expand the static portion of the program so that you get to use vector<Child1> or vector<Child2> directly, avoiding the extra indirection. At this point the inheritance is not even necessary.

Improve storage capacity / performance of std::vector

I am building a modelling software I had a few questions about how to get the best performance ?
1) Should I use std::vector<class> or std::vector<class*> ?
My class is quite complicated / big , and I think using the second option is better , as since std::vector tries to allocate memory contiguously and there might not be a contiguous block of memory to store a million class, but when I just store pointers, the class does not have to be stored contiguously only the pointers have to stored and the computer might have space to do this. Is this reasoning correct?
2) As I said I will have millions of class, (for proper simulation I will need > billion of the class ) is inheritance a smart thing to use here ?
For my simulation , there are multiple different types which inherits from the same base class,
class A - class B
- class C
- class D
Should I avoid inheritance as I keep hearing that there is a performance penalty for using inheritance ?
3) Also how do I store all these different class in a std::vector ?
Can a std::vector<base_class * > or std::vector<base_class> store class B , class C , class D which all inherit from the base class ?
4) In the previous version of the program , I used multi threading by making the different process handle different sections of the std::vector , is there a better way to do the threading ?
5) Should I use smart pointers ? Since I have so many objects , will they degrade performance ?
I am in the planning stage and any help is greatly appreciated.
I deal with problems like this every day in a professional setting (I'm a C++ programmer by trade, dealing with big-data sets). As such what I'm about to say here is as much personal-advice as it is an answer. I won't go all out on the simple parts:
1 - Yes store pointers, it will be much faster than reallocation and move times than the full class-object.
2 - Yes, use inheritance if the objects have information in relation, I imagine in this case they most likely do as your considering it. If they don't, why would you store them together?
3 - Store them all using smart pointers to the base-class (the parent object, thus you can add a single virtual "get_type" function to return and enumeration, and convert to a child when you need to. This will save the overhead of providing multiple virtual-methods if you don't need child-data often.
4 - Arguable, but threading separate parts of a larger array is the simpler approach (and when your dealing with huge complexity of data, simpler is better.
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? ~ Brian Kernighan
5 - There will be some small penalty for using smart pointers ( As explained in this question, however in my opinion that penalty (especially with the unique_ptr) is so small compared to the ease-of-use and loss of complexity, it's definitely worth it
And putting it all together:
class Abstract_Parent;
std::vector<std::unique_ptr<Abstract_Parent>> Data;
enum ChildType {Child_1 = 0, Child_2 = 1};
class Abstract_Parent
{
public:
virtual ChildType GetType() = 0;
}
class Child_One
{
public:
virtual ChildType GetType() { return Child_1; }
}
class Child_Two
{
public:
virtual ChildType GetType() { return Child_2; }
}
void Some_Function()
{
//this is how to insert a child-object
std::unique_ptr<Abstract_Parent> Push_me_Back(new Child_One());
Data.Push_Back(std::move(Push_me_Back));
if(Data[0]->GetType() == Child_1)
{
Child_1 *Temp_Ptr = dynamic_cast<Child_One*> Data[0];
Temp_Ptr->Do_Something_Specific();
}
}
1.) That depends on your use case. You will use a pointer if you want to access object through a base class pointer. On the other side you lose the advantage of continuous memory and cache locality of code and data.
2.) If you need 1 billion instance then every additional data per object will increase you memory footprint. For example an additional pointer to your virtual function table (vptr) of 8 bytes will increase your memory requirements by 8 GBytes. Storing every type in a different vector without a virtual base class does not have this overhead.
2b) Yes you should avoid inheritance with virtual function if you aim for performance. The instruction cache will be trashed if virtual function are called with different implementations. At least you can sort your big vector by type to minimize this problem.
3.) You must use the pointer option to prevent slicing if you go for a base class with virtual functions.
4.) More information is needed and should be answered in separate question.
5.) Every indirection will degrade performance.
1) Should I use std::vector<class> or std::vector<class*> ?
False dicotomy. There are a couple of other options:
boost::ptr_vector<class>
std::vector<std::unique_ptr<class>>
Probably even more.
Personally I like boost::ptr_vector<class> as it stores an owned pointer (thus memory allocation is done automatically). But when accessing members they are returned as reference to the object (not pointers). Thus using them with standard algorithms is vastly simplified over other techniques.
My class is quite complicated / big , and I think using the second option is better , as since std::vector tries to allocate memory contiguously and there might not be a contiguous block of memory to store a million class,
The real question here is if you can pre-calculate the maximum size of your vector and reserve() the required amount of space. If you can do this (and thus avoid any cost of copying) std::vector<class> would be the best solution.
This is because having the objects in contiguous storage is usually a significant advantage in terms of speed (especially when scanning a vector). The ability to do this should not be underestimated when you have huge datasets (especially in the billion range).
but when I just store pointers, the class does not have to be stored contiguously only the pointers have to stored and the computer might have space to do this. Is this reasoning correct?
By using pointers, you are also significantly increasing the amount of memory required by the application as you need to store the object and the pointer to the object. Over billions of objects this can be a significant cost.
2) As I said I will have millions of class, (for proper simulation I will need > billion of the class ) is inheritance a smart thing to use here ?
Impossible to say without much more information.
3) Also how do I store all these different class in a std::vector ? Can a std::vector or std::vector store class B , class C , class D which all inherit from the base class ?
But if you do use inheritance you will need not be able to use std::vector<class> directly. You will need to store a pointer to the base class. But that does not preclude the other three techniques.
4) In the previous version of the program , I used multi threading by making the different process handle different sections of the std::vector , is there a better way to do the threading ?
This seems a reasonable approach (assuming that the ranges don't overlap and are contiguous). Don't create more threads than you have available cores.
Should I use smart pointers ? Since I have so many objects , will they degrade performance ?
Use of unique_ptr over a normal pointer has zero overhead (assuming you don't use a custom deleter). The actual generated code will be basically equivalent.

Instantiate an object in method vs. make a class member

What are some reasons to instantiate an object needed in a method, vs. making the object a class member?
For example, in the example code below, I have a class ClassA that I want to use from another class, like USer1, which has pointer to object of classA as member variable and instantiates in its constructor, and on the other hand User2, which instantiates object of classA in a method just before using it. What are some reasons to do it one way vs the other?
class ClassA
{
public:
void doStuff(void){ }
};
//
// this class has ClassA as a member
//
class User1
{
public:
User1()
{
classA = new ClassA();
}
~User1()
{
delete classA;
}
void use(void)
{
classA->doStuff();
}
private:
ClassA *classA;
};
//
// this class uses ClassA only in a method
//
class User2
{
public:
void use(void)
{
ClassA *classA = new ClassA();
classA->doStuff();
delete classA;
}
};
int main(void)
{
User1 user1;
user1.use();
User2 user2;
user2.use();
return 0;
}
The advantages of making it a class member are:
You don't have to allocate the instance every time, which depending on the class could be very slow.
The member can store state (though some people would say that this is a bad idea)
less code
As a side note, if you are just instantiating and deleting with new and delete in the constructor and destructor, it should really not be a pointer, just a member instance and then get rid of the new and delete.
IE
class User1
{
public:
void use(void)
{
classA.doStuff();
}
private:
ClassA classA;
};
There are times that this isn't the case, for instance when the class being allocated on the stack is large, or you want the footprint of the holding class to be as small as possible. But these are the exception rather than the rule.
There are other thing to consider like memory fragmentation, the advantages of accessing contiguous memory blocks, and how memory is allocated on the target system. There are no silver bullets, only general advice, and for any particular program you need to measure and adjust to get the best performance or overcome the limitations of the particular program.
Memory fragmentation is when even though you have a lot of memory free, the size of the individual block is quite small and you will get memory errors when you try to allocate a large amount of memory. This is usually caused by creating and destroying a lot of different objects of various sizes, with some of them staying alive. If you have a system that suffers from memory fragmentation I would suggest a thorough analysis of how objects are created rather than worry about how having a member or not will affect the system. However, here is a breakdown of how the four different scenarios play out when you are suffering from memory fragmentation:
Instantiating the class on the stack is very helpful as it won't contribute to overall memory fragmentation.
Creating it as a value member might cause problems as it might increase the overall size of the object, so when you get to the fragmentation scenario, the object may be too large to be created.
Creating the object and storing a pointer to it may increase memory fragmentation
Allocating on the heap and deleting at the end of use may increase memory fragmentation if something else is allocated after it was.
The advantages of accessing contiguous memory is that cache misses are minimised, so my feeling is that having the object as a value member would be faster, but as with so many things depending lots of other variables this could be completely wrong. As always when it comes to performance, measure.
Memory is often aligned to a particular boundary, for instance 4 byte alignment, or power of 2 blocks. So depending on the size of your object when you allocate one of them it might take up more memory than you expect, if your allocated object contains any members it might significant change the memory footprint of the class if it is a value member, or if it doesn't it probably won't increase it at all, while having a pointer to it will definitely increase the footprint by the size of a pointer, and that may result in a significant increase. Either creating the class on the heap or the stack will not affect the size of the using class. As always if it is going to affect your program you need to measure on the target system to see what the effects are going to be.
If the constructor/destructor does something (for instance a file handle, opening the file, and closing the file) then you might want to only use it in the function. But yet again, the pointer isn't usually necessary.
void use(void)
{
ClassA classA;
classA.doStuff();
} //classA will be destructed at end of scope
First off there is no reason to have a pointer in either class. If we use value semantics in User1 then there is no need to have a constructor or destructor as the compiler generated ones will be sufficient. That changes User1 to:
class User1
{
public:
void use(void)
{
classA.doStuff();
}
private:
ClassA classA;
};
Likewise if we use value semantics in User2 then it would become:
class User2
{
public:
void use(void)
{
ClassA classA;
classA.doStuff();
}
};
Now as to whether you want to have ClassA as a member or if you should just use it in the function is a matter of design. If the class is going to be using and updating the ClassA then it should be a member. If you just need to to do something in a function the the second approach is okay.
If you are going to be calling the function that creates a ClassA a lot it might be beneficial to have it be a member as you only need to construct it once and you get to use it in the function. Conversely If you are going to have a lot objects but you hardly ever call that function it might be better to create the ClassA when you need it as you will save space.
Really though this is something that you would have to profile to determine which way would be better. We programmers are bad judges of what is faster and should let the profiler tell us if we need to change something. Some things like using value semantics over a pointer with heap allocation is generally faster. One example where we get this wrong is sorting. If N is small then using a bubble sort which is O(n^2) is faster than a quicksort which is O(n log n). Another example of this si presented in this Hurb Sutter talk starting at 46:00. He shows that using a std::vector is faster than a std::list at inserting and removing from the middle because a std::vector is very cache friendly where a std::list is not.

Statically allocating array of inherited objects

The title of this question is pretty convoluted, so I'll try to frame it with an example. Let's say that I have an abstract base class, with a number of classes which inherit from it. In the example below I've only shown two inherited classes, but in reality there could be more.
class Base {
public:
Base();
virtual ~Base() = 0;
/// Other methods/members
};
class SmallChild: public Base {
public:
SmallChild();
~SmallChild();
/// Other methods/members such that sizeof(SmallChild) < sizeof(LargeChild)
};
class LargeChild : public Base {
public:
LargeChild();
~LargeChild();
/// Other methods/members such that sizeof(LargeChild) > sizeof(SmallChild)
};
I need to implement a container which stores up to N inherited objects. These objects need to be created/destroyed at runtime and placed in the container, but due to constraints in the project (specifically that it's on embedded hardware), dynamic memory allocation isn't an option. The container needs to have all of its space statically allocated. Also, C++11 is not supported by the compiler.
There was only one way I could think to implement this. To reference the N objects, I'd first need to create an array of pointers to the base class, and then to actually store the objects, I'd need to create a buffer large enough to store N copies of the largest inherited object, which in this case is LargeChild
Base * children[N];
uint8_t childBuffer[N * sizeof(LargeChild)];
I could then distribute the pointers in children across childBuffer, each separated by sizeof(LargeChild). As objects need to be created, C++'s "placement new" could be used to place them at the specified locations in the array. I'd need to keep track of the type of each object in childBuffer in order to dereference the pointers in children, but this shouldn't be too bad.
I have a few questions regarding this entire setup/implementation:
Is this a good approach to solving the problem as I've described it? I've never implemented ANYTHING like this before, so I have no idea if I'm way out to lunch here and there's a much easier way to accomplish this.
How much of this can be done at compile-time? If I have M types of inherited classes (SmallChild, LargeChild, etc.) but I don't know their size in relation to each other, how can I determine the size of childBuffer? This size depends on the size of the largest class, but is there a way to determine this size at compile-time? I can imagine some preprocessor macros iterating through the classes, evaluating sizeof and finding the maximum, but I have very little experience with this level of preprocessor work and have no idea what this would look like. I can also imagine this being possible using templates, but again, I don't have any experience with compile-time template sorcery so I'm only basing this on my intuition. Any direction on how to implement this would be appreciated.
Do you need to be able to dealocate the objects? If not, it may be easier to override operator new. I refer to this:
void* operator new (std::size_t size) throw (std::bad_alloc);
All your overrides would allocate memory from a sinle large buffer. How much memory to allocate is specified by the size parammeter.
This way you should be able to just say
children[i] = new SmallChild();
Edit: if you do need to deallocate, you need more complex data structures. You may end up re-implementing the heap anyway.
If the set of objects is fully static (set at build time and doesn't change at runtime), the usual approach is to use a set of arrays of each derived class and build up the 'global' array with pointers into the other arrays:
static SmallChild small_children[] = {
{ ...initializer for first small child... },
{ ...initializer for second small child... },
...
};
static LargeChild large_children[] = {
{ ...initializer for first large child... },
...
};
Base *children[N] = { &small_children[0], &small_children[1], &large_children[0], ....
This can be tricky to maintain if there are children being added/removed from the build frequently, or if the order in the children array is important. It may be desirable to generate the above source file with a script or build program that reads a description of the children needed.
Your approach is interesting, given your constraints (i.e. no use of dynamic allocation).
In fact you are managing on your own way a kind of array of union anyChild { smallChild o1; largeChild o2; ... }; The sizeof(anyChild) would give you the largest block size you are looking for.
By the way, there could be a risk of dangling pointers in you approach, as long as all objects have not been created with the the placement new, or if some of them are deleted through explicit call of their destructor.
if you put your derived types into a union:
union Child{
SmallChild asSmallChild;
LargeChild asLargeChild;
}
Then the union will automatically be of the sizeof the largest type. Of course, now you have a new problem. What type is represented in the union? You could give yourself a hint in the base Class, or you could instead make Child a struct which contains a hint and then the union inlined within. For examples look at components made by Espressif for ESP32 on the githubs, lots of good union uses there.
Anyways, when you go to allocate, if you allocate an array of the union'ed type it will make an array of largest children... because that's what unions do.

In C++, where in memory are class functions put?

I'm trying to understand what kind of memory hit I'll incur by creating a large array of objects. I know that each object - when created - will be given space in the HEAP for member variables, and I think that all the code for every function that belongs to that type of object exists in the code segment in memory - permanently.
Is that right?
So if I create 100 objects in C++, I can estimate that I will need space for all the member variables that object owns multiplied by 100 (possible alignment issues here), and then I need space in the code segment for a single copy of the code for each member function for that type of object( not 100 copies of the code ).
Do virtual functions, polymorphism, inheritance factor into this somehow?
What about objects from dynamically linked libraries? I assume dlls get their own stack, heap, code and data segments.
Simple example (may not be syntactically correct):
// parent class
class Bar
{
public:
Bar() {};
~Bar() {};
// pure virtual function
virtual void doSomething() = 0;
protected:
// a protected variable
int mProtectedVar;
}
// our object class that we'll create multiple instances of
class Foo : public Bar
{
public:
Foo() {};
~Foo() {};
// implement pure virtual function
void doSomething() { mPrivate = 0; }
// a couple public functions
int getPrivateVar() { return mPrivate; }
void setPrivateVar(int v) { mPrivate = v; }
// a couple public variables
int mPublicVar;
char mPublicVar2;
private:
// a couple private variables
int mPrivate;
char mPrivateVar2;
}
About how much memory should 100 dynamically allocated objects of type Foo take including room for the code and all variables?
It's not necessarily true that "each object - when created - will be given space in the HEAP for member variables". Each object you create will take some nonzero space somewhere for its member variables, but where is up to how you allocate the object itself. If the object has automatic (stack) allocation, so too will its data members. If the object is allocated on the free store (heap), so too will be its data members. After all, what is the allocation of an object other than that of its data members?
If a stack-allocated object contains a pointer or other type which is then used to allocate on the heap, that allocation will occur on the heap regardless of where the object itself was created.
For objects with virtual functions, each will have a vtable pointer allocated as if it were an explicitly-declared data member within the class.
As for member functions, the code for those is likely no different from free-function code in terms of where it goes in the executable image. After all, a member function is basically a free function with an implicit "this" pointer as its first argument.
Inheritance doesn't change much of anything.
I'm not sure what you mean about DLLs getting their own stack. A DLL is not a program, and should have no need for a stack (or heap), as objects it allocates are always allocated in the context of a program which has its own stack and heap. That there would be code (text) and data segments in a DLL does make sense, though I am not expert in the implementation of such things on Windows (which I assume you're using given your terminology).
Code exists in the text segment, and how much code is generated based on classes is reasonably complex. A boring class with no virtual inheritance ostensibly has some code for each member function (including those that are implicitly created when omitted, such as copy constructors) just once in the text segment. The size of any class instance is, as you've stated, generally the sum size of the member variables.
Then, it gets somewhat complex. A few of the issues are...
The compiler can, if it wants or is instructed, inline code. So even though it might be a simple function, if it's used in many places and chosen for inlining, a lot of code can be generated (spread all over the program code).
Virtual inheritance increases the size of polymorphic each member. The VTABLE (virtual table) hides along with each instance of a class using a virtual method, containing information for runtime dispatch. This table can grow quite large, if you have many virtual functions, or multiple (virtual) inheritance. Clarification: The VTABLE is per class, but pointers to the VTABLE are stored in each instance (depending on the ancestral type structure of the object).
Templates can cause code bloat. Every use of a templated class with a new set of template parameters can generate brand new code for each member. Modern compilers try and collapse this as much as possible, but it's hard.
Structure alignment/padding can cause simple class instances to be larger than you expect, as the compiler pads the structure for the target architecture.
When programming, use the sizeof operator to determine object size - never hard code. Use the rough metric of "Sum of member variable size + some VTABLE (if it exists)" when estimating how expensive large groups of instances will be, and don't worry overly about the size of the code. Optimise later, and if any of the non-obvious issues come back to mean something, I'll be rather surprised.
Although some aspects of this are compiler vendor dependent, all compiled code goes into a section of memory on most systems called text segment. This is separate from both the heap and stack sections (a fourth section, data, holds most constants). Instantiating many instances of a class incurs run-time space only for its instance variables, not for any of its functions. If you make use of virtual methods, you will get an additional, but small, bit of memory set aside for the virtual look-up table (or equivalent for compilers that use some other concept), but its size is determined by the number of virtual methods times the number of virtual classes, and is independent of the number of instances at run-time.
This is true of statically and dynamically linked code. The actual code all lives in a text region. Most operating systems actually can share dll code across multiple applications, so if multiple applications are using the same dll's, only one copy resides in memory and both applications can use it. Obviously there is no additional savings from shared memory if only one application uses the linked code.
You can't completely accurately say how much memory a class or X objects will take up in RAM.
However to answer your questions, you are correct that code exists only in one place, it is never "allocated". The code is therefore per-class, and exists whether you create objects or not. The size of the code is determined by your compiler, and even then compilers can often be told to optimize code size, leading to differing results.
Virtual functions are no different, save the (small) added overhead of a virtual method table, which is usually per-class.
Regarding DLLs and other libraries... the rules are no different depending on where the code has come from, so this is not a factor in memory usage.
The information given above is of great help and gave me some insight in C++ memory structure. But I would like to add here is that no matter how many virtual functions in a class, there will always be only 1 VPTR and 1 VTABLE per class. After all the VPTR points to the VTABLE, so there is no need for more than one VPTR in case of multiple virtual functions.
Your estimate is accurate in the base case you've presented. Each object also has a vtable with pointers for each virtual function, so expect an extra pointer's worth of memory for each virtual function.
Member variables (and virtual functions) from any base classes are also part of the class, so include them.
Just as in c you can use the sizeof(classname/datatype) operator to get the size in bytes of a class.
Yes, that's right, code isn't duplicated when an object instance is created. As far as virtual functions go, the proper function call is determined using the vtable, but that doesn't affect object creation per se.
DLLs (shared/dynamic libraries in general) are memory-mapped into the process' memory space. Every modification is carried on as Copy-On-Write (COW): a single DLL is loaded only once into memory and for every write into a mutable space a copy of that space is created (generally page-sized).
if compiled as 32 bit. then sizeof(Bar) should yield 4.
Foo should add 10 bytes (2 ints + 2 chars).
Since Foo is inherited from Bar. That is at least 4 + 10 bytes = 14 bytes.
GCC has attributes for packing the structs so there is no padding. In this case 100 entries would take up 1400 bytes + a tiny overhead for aligning the allocation + some overhead of for memory management.
If no packed attribute is specified it depends on the compilers alignment.
But this doesn't consider how much memory vtable takes up and size of the compiled code.
It's very difficult to give an exact answer to yoour question, as this is implementtaion dependant, but approximate values for a 32-bit implementation might be:
int Bar::mProtectedVar; // 4 bytes
int Foo::mPublicVar; // 4 bytes
char Foo::mPublicVar2; // 1 byte
There are allgnment issues here and the final total may well be 12 bytes. You will also have a vptr - say anoter 4 bytes. So the total size for the data is around 16 bytes per instance. It's impossible to say how much space the code will take up, but you are correct in thinking there is only one copy of the code shared between all instances.
When you ask
I assume dlls get their own stack,
heap, code and data segments.
Th answer is that there really isn't much difference between data in a DLL and data in an app - basically they share everything between them, This has to be so when you think about about it - if they had different stacks (for example) how could function calls work?