This is a pretty basic question but I'm still unsure:
If I have a class that will be instantiated millions of times -- is it advisable not to derive it from some other class? In other words, does inheritance carry some cost (in terms of memory or runtime to construct or destroy an object) that I should be worrying about in practice?
Example:
class Foo : public FooBase { // should I avoid deriving from FooBase?
// ...
};
int main() {
// constructs millions of Foo objects...
}
Inheriting from a class costs nothing at runtime.
The class instances will of course take up more memory if you have variables in the base class, but no more than if they were in the derived class directly and you didn't inherit from anything.
This does not take into account virtual methods, which do incur a small runtime cost.
tl;dr: You shouldn't be worrying about it.
i'm a bit surprised by some the responses/comments so far...
does inheritance carry some cost (in terms of memory)
Yes. Given:
namespace MON {
class FooBase {
public:
FooBase();
virtual ~FooBase();
virtual void f();
private:
uint8_t a;
};
class Foo : public FooBase {
public:
Foo();
virtual ~Foo();
virtual void f();
private:
uint8_t b;
};
class MiniFoo {
public:
MiniFoo();
~MiniFoo();
void f();
private:
uint8_t a;
uint8_t b;
};
class MiniVFoo {
public:
MiniVFoo();
virtual ~MiniVFoo();
void f();
private:
uint8_t a;
uint8_t b;
};
} // << MON
extern "C" {
struct CFoo {
uint8_t a;
uint8_t b;
};
}
on my system, the sizes are as follows:
32 bit:
FooBase: 8
Foo: 8
MiniFoo: 2
MiniVFoo: 8
CFoo: 2
64 bit:
FooBase: 16
Foo: 16
MiniFoo: 2
MiniVFoo: 16
CFoo: 2
runtime to construct or destroy an object
additional function overhead and virtual dispatch where needed (including destructors where appropriate). this can cost a lot and some really obvious optimizations such as inlining may/can not be performed.
the entire subject is much more complex, but that will give you an idea of the costs.
if the speed or size is truly critical, then you can often use static polymorphism (e.g. templates) to achieve an excellent balance between performance and ease to program.
regarding cpu performance, i created a simple test which created millions of these types on the stack and on the heap and called f, the results are:
FooBase 16.9%
Foo 16.8%
Foo2 16.6%
MiniVFoo 16.6%
MiniFoo 16.2%
CFoo 15.9%
note: Foo2 derives from foo
in the test, the allocations are added to a vector, then deleted. without this stage, the CFoo was entirely optimized away. as Jeff Dege posted in his answer, allocation time will be a huge part of this test.
Pruning the allocation functions and vector create/destroy from the sample produces these numbers:
Foo 19.7%
FooBase 18.7%
Foo2 19.4%
MiniVFoo 19.3%
MiniFoo 13.4%
CFoo 8.5%
which means the virtual variants take over twice as long as the CFoo to execute their constructors, destructors and calls, and MiniFoo is about 1.5 times faster.
while we're on allocation: if you can use a single type for your implementation, you also reduce the number of allocations you must make in this scenario because you can allocate an array of 1M objects, rather than creating a list of 1M addresses and then filling it with uniquely new'ed types. of course, there are special purpose allocators which can reduce this weight. since allocations/free times are the weight of this test, it would significantly reduce the time you spend allocating and freeing objects.
Create many MiniFoos as array 0.2%
Create many CFoos as array 0.1%
Also keep in mind that the sizes of MiniFoo and CFoo consume 1/4 - 1/8 the memory per element, and a contiguous allocation removes the need to store pointers to dynamic objects. You could then keep track of the object in an implementation more ways (pointer or index), but the array can also significantly reduce allocation demends on clients (uint32_t vs pointer on a 64 bit arch) -- plus all the bookkeeping required by the system for the allocations (which is significant when dealing with so many small allocations).
Specifically, the sizes in this test consumed:
32 bit
267MB for dynamic allocations (worst)
19MB for the contiguous allocations
64 bit
381MB for dynamic allocations (worst)
19MB for the contiguous allocations
this means that the required memory was reduced by more than ten, and the times spent allocating/freeing is significantly better than that!
Static dispatch implementations vs mixed or dynamic dispatch can be several times faster. This typically gives the optimizers more opportunuities to see more of the program and optimize it accordingly.
In practice, dynamic types tend to export more symbols (methods, dtors, vtables), which can noticably increase the binary size.
Assuming this is your actual use case, then you can improve the performance and resource usage significantly. i've presented a number of major optimizations... just in case somebody believes changing the design in such a way would qualify as 'micro'-optimizations.
Largely, this depends upon the implementation. But there are some commonalities.
If your inheritance tree includes any virtual functions, the compiler will need to create a vtable for each class - a jump table with pointers to the various virtual functions. Every instance of those classes will carry along a hidden pointer to its class's vtable.
And any call to a virtual function will involve a hidden level of indirection - rather than jumping to a function address that had been resolved at link time, a call will involve reading the address from the vtable and then jumping to that.
Generally speaking, this overhead isn't likely to be measurable on any but the most time-critical software.
OTOH, you said you'd be instantiating and destroying millions of these objects. In most cases, the largest cost isn't constructing the object, but allocating memory for it.
IOW, you might benefit from using your own custom memory allocators, for the class.
http://www.cprogramming.com/tutorial/operator_new.html
I think we all guys have been programming too much as lone wolf ..
We forget to take cost of maintenance + readability + extensions with regards to features.
Here is my take
Inheritance Cost++
On smaller projects : time to develop increases. Easy to write all global sudoku code. Always has it taken more time for me, to write a class inheritance to do the right_thing.
On smaller projects : Time to modify increases. It is not always easy to modify the existing code to confirm the existing interface.
Time to design increases.
Program is slightly inefficient due to multiple message passing, rather than exposed gut(I mean data members. :))
Only for the virtual function calls via pointer to base class, there one single extra dereference.
There is a small space penalty in terms of RTTI
For sake of completeness I will add that, too many classes will add too many types and that is bound to increase your compilation time, no matter how small it might be.
There is also cost of tracking multiple objects in terms of base class object and all for run-time system, which obviously mean a slight increase in code size + slight runtime performance penalty due to the exception delegation mechanism(whether you use it or not).
You dont have to twist your arm unnaturally in a way of PIMPL, if all you want to do is to insulate users of your interface functions from getting recompiled. (This IS a HEAVY cost, trust me.)
Inheritance Cost--
As the program size grows larger than 1/2 thousand lines, it is more maintainable with inheritance. If you are the only one programming then you can easily push code without object upto 4k/5k lines.
Cost of bug fixing reduces.
You can easily extend the existing framework for more challenging tasks.
I know I am being a little devils advocate, but I think we gotta be fair.
If you need the functionality of FooBase in Foo, either you can derive or use composition. Deriving has the cost of the vtable, and FooBase has the cost of a pointer to a FooBase, the FooBase, and the FooBase's vtable. So they are (roughly) similar and you shouldn't have to worry about the cost of inheritance.
Creating a derived object involves calling constructors for all base classes, and destroying them invokes destructors for these classes. The cost depends then on what these constructors do, but then if you don't derive but include the same functionality in derived class, you pay the same cost. In terms of memory, every object of the derived class contains an object of its base class, but again, it's exactly the same memory usage as if you just included all of these fields in the class instead of deriving it.
Be wary, that in many cases it's a better idea to compose (have a data member of the 'base' class rather than deriving from it), in particular if you're not overriding virtual functions and your relationship between 'derived' and 'base' is not an "is a kind of" relationship. But in terms of CPU and memory usage both these techniques are equivalent.
The fact is that if you are in doubt of whether you should inherit or not, the answer is that you should not. Inheritance is the second most coupling relationship in the language.
As of the performance difference, there should be almost no difference in most cases, unless you start using multiple inheritance, where if one of the bases has virtual functions, the dispatch will have an additional (minimal, negligible) cost if the base subobject is not aligned with the final overrider as the compiler will add a thunk to adjust the this pointer.
Related
I am building a modelling software I had a few questions about how to get the best performance ?
1) Should I use std::vector<class> or std::vector<class*> ?
My class is quite complicated / big , and I think using the second option is better , as since std::vector tries to allocate memory contiguously and there might not be a contiguous block of memory to store a million class, but when I just store pointers, the class does not have to be stored contiguously only the pointers have to stored and the computer might have space to do this. Is this reasoning correct?
2) As I said I will have millions of class, (for proper simulation I will need > billion of the class ) is inheritance a smart thing to use here ?
For my simulation , there are multiple different types which inherits from the same base class,
class A - class B
- class C
- class D
Should I avoid inheritance as I keep hearing that there is a performance penalty for using inheritance ?
3) Also how do I store all these different class in a std::vector ?
Can a std::vector<base_class * > or std::vector<base_class> store class B , class C , class D which all inherit from the base class ?
4) In the previous version of the program , I used multi threading by making the different process handle different sections of the std::vector , is there a better way to do the threading ?
5) Should I use smart pointers ? Since I have so many objects , will they degrade performance ?
I am in the planning stage and any help is greatly appreciated.
I deal with problems like this every day in a professional setting (I'm a C++ programmer by trade, dealing with big-data sets). As such what I'm about to say here is as much personal-advice as it is an answer. I won't go all out on the simple parts:
1 - Yes store pointers, it will be much faster than reallocation and move times than the full class-object.
2 - Yes, use inheritance if the objects have information in relation, I imagine in this case they most likely do as your considering it. If they don't, why would you store them together?
3 - Store them all using smart pointers to the base-class (the parent object, thus you can add a single virtual "get_type" function to return and enumeration, and convert to a child when you need to. This will save the overhead of providing multiple virtual-methods if you don't need child-data often.
4 - Arguable, but threading separate parts of a larger array is the simpler approach (and when your dealing with huge complexity of data, simpler is better.
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? ~ Brian Kernighan
5 - There will be some small penalty for using smart pointers ( As explained in this question, however in my opinion that penalty (especially with the unique_ptr) is so small compared to the ease-of-use and loss of complexity, it's definitely worth it
And putting it all together:
class Abstract_Parent;
std::vector<std::unique_ptr<Abstract_Parent>> Data;
enum ChildType {Child_1 = 0, Child_2 = 1};
class Abstract_Parent
{
public:
virtual ChildType GetType() = 0;
}
class Child_One
{
public:
virtual ChildType GetType() { return Child_1; }
}
class Child_Two
{
public:
virtual ChildType GetType() { return Child_2; }
}
void Some_Function()
{
//this is how to insert a child-object
std::unique_ptr<Abstract_Parent> Push_me_Back(new Child_One());
Data.Push_Back(std::move(Push_me_Back));
if(Data[0]->GetType() == Child_1)
{
Child_1 *Temp_Ptr = dynamic_cast<Child_One*> Data[0];
Temp_Ptr->Do_Something_Specific();
}
}
1.) That depends on your use case. You will use a pointer if you want to access object through a base class pointer. On the other side you lose the advantage of continuous memory and cache locality of code and data.
2.) If you need 1 billion instance then every additional data per object will increase you memory footprint. For example an additional pointer to your virtual function table (vptr) of 8 bytes will increase your memory requirements by 8 GBytes. Storing every type in a different vector without a virtual base class does not have this overhead.
2b) Yes you should avoid inheritance with virtual function if you aim for performance. The instruction cache will be trashed if virtual function are called with different implementations. At least you can sort your big vector by type to minimize this problem.
3.) You must use the pointer option to prevent slicing if you go for a base class with virtual functions.
4.) More information is needed and should be answered in separate question.
5.) Every indirection will degrade performance.
1) Should I use std::vector<class> or std::vector<class*> ?
False dicotomy. There are a couple of other options:
boost::ptr_vector<class>
std::vector<std::unique_ptr<class>>
Probably even more.
Personally I like boost::ptr_vector<class> as it stores an owned pointer (thus memory allocation is done automatically). But when accessing members they are returned as reference to the object (not pointers). Thus using them with standard algorithms is vastly simplified over other techniques.
My class is quite complicated / big , and I think using the second option is better , as since std::vector tries to allocate memory contiguously and there might not be a contiguous block of memory to store a million class,
The real question here is if you can pre-calculate the maximum size of your vector and reserve() the required amount of space. If you can do this (and thus avoid any cost of copying) std::vector<class> would be the best solution.
This is because having the objects in contiguous storage is usually a significant advantage in terms of speed (especially when scanning a vector). The ability to do this should not be underestimated when you have huge datasets (especially in the billion range).
but when I just store pointers, the class does not have to be stored contiguously only the pointers have to stored and the computer might have space to do this. Is this reasoning correct?
By using pointers, you are also significantly increasing the amount of memory required by the application as you need to store the object and the pointer to the object. Over billions of objects this can be a significant cost.
2) As I said I will have millions of class, (for proper simulation I will need > billion of the class ) is inheritance a smart thing to use here ?
Impossible to say without much more information.
3) Also how do I store all these different class in a std::vector ? Can a std::vector or std::vector store class B , class C , class D which all inherit from the base class ?
But if you do use inheritance you will need not be able to use std::vector<class> directly. You will need to store a pointer to the base class. But that does not preclude the other three techniques.
4) In the previous version of the program , I used multi threading by making the different process handle different sections of the std::vector , is there a better way to do the threading ?
This seems a reasonable approach (assuming that the ranges don't overlap and are contiguous). Don't create more threads than you have available cores.
Should I use smart pointers ? Since I have so many objects , will they degrade performance ?
Use of unique_ptr over a normal pointer has zero overhead (assuming you don't use a custom deleter). The actual generated code will be basically equivalent.
Considering the new CPUs with new instructions for moving and new memory controllers, if in C++ I have a vector of Derived objects where Derived is composed of virtual member functions, is this a good or a bad thing for the locality ?
And what if I have a vector of pointers to the base class Base* where I store references to derived objects that are 1-2-3 level up from Base ?
Basically dynamic typing applies to both cases, but which one is better for caching and memory access ?
I have a preference between this 2 but I would like to see a complete answer on the subject.
There is something new to consider as ground-braking from the hardware industry in the last 2-3 years ?
Storing Derived rather than Base * in a vector is better because it eliminates one extra level of indirection and you have all objects laid out «together» in a continuous memory, which in turn makes life easier for a hardware prefetcher, helps with paging, TLB misses, etc. However, if you do this, make sure you don't introduce a slicing problem.
As for the virtual dispatch in this case, it almost does not matter with an exception of adjustment required for «this» pointer. For example, if Derived overrides a virtual function that you are calling and you already have a pointer to Devied *, then «this» adjustment is not required, and otherwise it should be adjusted to one of the base class`s «this» value (this also depends on size of the classes in inheritance hierarchy).
As long as all classes in a vector have the same overloads, CPU would be able to predict what's going on. However, if you have a mix of different implementations, then CPU would have no clue as to what function will be called for every next object, and that might cause performance issues.
And don't forget to always profile before and after you make changes.
Modern CPU's know how to optimise data-dependent jump instructions, as well as it can for data dependent "branch" instructions - the processor will "learn" that "Last time I went through here, I went THIS way", and if it has enough confidence (gone through several times with the same result) it ill keep going that way.
Of course that doesn't help if the instances are a complete random selection of different classes that each have it's own virtual function.
Cache-locality is of course a slightly different matter, and it really depends on whether you are storing the object instances or the pointers/references to instances in the vector.
And of course, an important factor is "what is the alternative?" - if you are using virtual functions "correctly", it means that there is (at least) one less conditional check in a code-path, because the decision was taken at a much earlier stage. That condition would be (assuming the probability corresponds the same) to the branch probability of the decision, if you solve it by some other method - which will be at least as bad for performance as virtual functions with the same probability (chances are that it's worse, because we now have a if (x) foo(); else bar(); type scenario, so we first have to evaluate x then choose the path. obj->vfunc() will just be unpredictable because fetching for the vtable gives an unpredictable result - but at least the vtable itself is cached.
I'm storing a large amount of computed data and I'm currently using a polymorphic type to reduce the amount of storage required. Everything is extremely fast except for deleting the objects when I'm finished and I think there must be a better alternative. The code computes the state at each step and depending on the conditions present it needs to store certain values. The worst case is storing the full object state and the best state is storing almost nothing. The (very simplified) setup is as follows:
class BaseClass
{
public:
virtual ~BaseClass() { }
double time;
unsigned int section;
};
class VirtualSmall : public BaseClass
{
public:
double values[2];
int othervalue;
};
class VirtualBig : public BaseClass
{
public:
double values[16];
int othervalues[5];
};
...
std::vector<BaseClass*> results(10000);
The appropriate object type is generated during computation and a pointer to it is stored in the vector. The overhead from vtable+pointer is overall much smaller than than the size difference between the largest and smallest object (which is least 200 bytes according to sizeof). Since often the smallest object can be used instead of the largest and there are potentially many tens of millions of them stored it can save a few gigabytes of memory usage. The results can then be searched extremely fast as the base class contains the information necessary to find the correct item which can then be dynamic_cast back to it's real type. It works very well for the most part.
The only issue is with delete. It takes a few seconds to free all of the memory when there is many tens of millions of objects. The delete code iterates through each object and delete results[i] which calls the virtual destructor. While it's not impossible to work around I think there must be a more elegant solution.
It could definitely be done by allocating largish contiguous blocks of memory (with malloc or similar), which are kept track of and then something generates a correct pointers to the next batch of free memory inside of the block. That pointer is then stored in the vector. To free the memory the smaller number of large blocks need to have free() called on them. There is no more vtable (and it can be replaced by a smaller type field to ensure the correct cast) which saves space as well. It is very much a C style solution though and not particularly pretty.
Is there a C++ style solution to this type of problem I'm overlooking?
You can overload the "new" operator (i.e. void* VirtualSmall::operator new(size_t) ) for you classes, and implement them to obtain memory from custom allocators. I would use one block allocator for each derived class, so that each block size is a multiple of the class' it's supposed to store.
When it's time to cleanup, tell each allocators to release all blocks. No destructors will be called, so make sure you don't need them.
I'm starting a new embedded project with C++ and I was wondering if it is too much expensive to use a interface oriented design. Something like this:
typedef int data;
class data_provider {
public:
virtual data get_data() = 0;
};
class specific_data_provider : public data_provider {
public:
data get_data() {
return 7;
}
};
class my_device {
public:
data_provider * dp;
data d;
my_device (data_provider * adp) {
dp = adp;
d = 0;
}
void update() {
d = dp->get_data();
}
};
int
main() {
specific_data_provider sdp;
my_device dev(&sdp);
dev.update();
printf("d = %d\n", dev.d);
return 0;
}
Inheritance, on its own, is free. For example, below, B and C are the same from a performance/memory point of view:
struct A { int x; };
struct B : A { int y; };
struct C { int x, y; };
Inheritance only incurs a cost when you have virtual functions.
struct A { virtual ~A(); };
struct B : A { ... };
Here, on virtually all implementations, both A and B will be one pointer size larger due to the virtual function.
Virtual functions also have other drawbacks (when compared with non-virtual functions)
Virtual functions require that you look up the vtable when called. If that vtable is not in the cache then you will get an L2 miss, which can be incredibly expensive on embedded platforms (over 600 cycles on current gen game consoles for example).
Even if you hit the L2 cache, if you branch to many different implementations then you will likely get a branch misprediction on most calls, causing a pipeline flush, which again costs many cycles.
You also miss out on many optimisation opportunities due to virtual functions being essentially impossible to inline (except in rare cases). If the function you call is small then this could add a serious performance penalty compared to a inlined non-virtual function.
Virtual calls can contribute to code bloat. Every virtual function call adds several bytes worth of instructions to lookup the vtable, and many bytes for the vtable itself.
If you use multiple inheritance then things get worse.
Often people will tell you "don't worry about performance until your profiler tells you to", but this is terrible advice if performance is at all important to you. If you don't worry about performance then what happens is that you end up with virtual functions everywhere, and when you run the profiler, there is no one hotspot that needs optimising -- the whole code base needs optimising.
My advice would be to design for performance if it is important to you. Design to avoid the need for virtual functions if at all possible. Design your data around the cache: prefer arrays to node-based data structures like std::list and std::map. Even if you have a container of a few thousand elements with frequent insertions into the middle, I would still go for an array on certain architectures. The several thousand cycles you lose copying data for the insertions may well be offset by the cache locality you will achieve on each traversal (Remember the cost of a single L2 cache miss? You can expect a lot of those when traversing a linked list)
Inheritance is basically free. However, polymorphism and dynamic dispatch (virtual) have some consequences: each instance of a class with a virtual method contains a pointer to the vtable, which is used to select the right method to call. This adds two memory access for each virtual method call.
In most cases it won't be a problem, but it can become a bottleneck in some real time applications.
Really depends on your hardware. Inheritance per se probably doesn't cost you anything. Virtual methods cost you some amount of memory for the vTable in each class. Turning on exception handling probably costs you even more in both memory and performance. I have used all the features of C++ extensively on the NetBurner platform with chips like the MOD5272 which have a couple of Megs of Flash and 8 Megs of RAM. Also some things may be implementation dependent, on the GCC toolchain I use, when cout gets used instead of printf you take a big memory hit (it appears to link in a bunch of libraries). You might be interested in a blog post I wrote on the cost of type safe code. You would have to run similar tests on your environment to truly answer your question.
The usual advice is to make the code clear and correct, and then think about optimisations only if it proves to be a problem (too slow or too much memory) in practice.
I'm trying to understand what kind of memory hit I'll incur by creating a large array of objects. I know that each object - when created - will be given space in the HEAP for member variables, and I think that all the code for every function that belongs to that type of object exists in the code segment in memory - permanently.
Is that right?
So if I create 100 objects in C++, I can estimate that I will need space for all the member variables that object owns multiplied by 100 (possible alignment issues here), and then I need space in the code segment for a single copy of the code for each member function for that type of object( not 100 copies of the code ).
Do virtual functions, polymorphism, inheritance factor into this somehow?
What about objects from dynamically linked libraries? I assume dlls get their own stack, heap, code and data segments.
Simple example (may not be syntactically correct):
// parent class
class Bar
{
public:
Bar() {};
~Bar() {};
// pure virtual function
virtual void doSomething() = 0;
protected:
// a protected variable
int mProtectedVar;
}
// our object class that we'll create multiple instances of
class Foo : public Bar
{
public:
Foo() {};
~Foo() {};
// implement pure virtual function
void doSomething() { mPrivate = 0; }
// a couple public functions
int getPrivateVar() { return mPrivate; }
void setPrivateVar(int v) { mPrivate = v; }
// a couple public variables
int mPublicVar;
char mPublicVar2;
private:
// a couple private variables
int mPrivate;
char mPrivateVar2;
}
About how much memory should 100 dynamically allocated objects of type Foo take including room for the code and all variables?
It's not necessarily true that "each object - when created - will be given space in the HEAP for member variables". Each object you create will take some nonzero space somewhere for its member variables, but where is up to how you allocate the object itself. If the object has automatic (stack) allocation, so too will its data members. If the object is allocated on the free store (heap), so too will be its data members. After all, what is the allocation of an object other than that of its data members?
If a stack-allocated object contains a pointer or other type which is then used to allocate on the heap, that allocation will occur on the heap regardless of where the object itself was created.
For objects with virtual functions, each will have a vtable pointer allocated as if it were an explicitly-declared data member within the class.
As for member functions, the code for those is likely no different from free-function code in terms of where it goes in the executable image. After all, a member function is basically a free function with an implicit "this" pointer as its first argument.
Inheritance doesn't change much of anything.
I'm not sure what you mean about DLLs getting their own stack. A DLL is not a program, and should have no need for a stack (or heap), as objects it allocates are always allocated in the context of a program which has its own stack and heap. That there would be code (text) and data segments in a DLL does make sense, though I am not expert in the implementation of such things on Windows (which I assume you're using given your terminology).
Code exists in the text segment, and how much code is generated based on classes is reasonably complex. A boring class with no virtual inheritance ostensibly has some code for each member function (including those that are implicitly created when omitted, such as copy constructors) just once in the text segment. The size of any class instance is, as you've stated, generally the sum size of the member variables.
Then, it gets somewhat complex. A few of the issues are...
The compiler can, if it wants or is instructed, inline code. So even though it might be a simple function, if it's used in many places and chosen for inlining, a lot of code can be generated (spread all over the program code).
Virtual inheritance increases the size of polymorphic each member. The VTABLE (virtual table) hides along with each instance of a class using a virtual method, containing information for runtime dispatch. This table can grow quite large, if you have many virtual functions, or multiple (virtual) inheritance. Clarification: The VTABLE is per class, but pointers to the VTABLE are stored in each instance (depending on the ancestral type structure of the object).
Templates can cause code bloat. Every use of a templated class with a new set of template parameters can generate brand new code for each member. Modern compilers try and collapse this as much as possible, but it's hard.
Structure alignment/padding can cause simple class instances to be larger than you expect, as the compiler pads the structure for the target architecture.
When programming, use the sizeof operator to determine object size - never hard code. Use the rough metric of "Sum of member variable size + some VTABLE (if it exists)" when estimating how expensive large groups of instances will be, and don't worry overly about the size of the code. Optimise later, and if any of the non-obvious issues come back to mean something, I'll be rather surprised.
Although some aspects of this are compiler vendor dependent, all compiled code goes into a section of memory on most systems called text segment. This is separate from both the heap and stack sections (a fourth section, data, holds most constants). Instantiating many instances of a class incurs run-time space only for its instance variables, not for any of its functions. If you make use of virtual methods, you will get an additional, but small, bit of memory set aside for the virtual look-up table (or equivalent for compilers that use some other concept), but its size is determined by the number of virtual methods times the number of virtual classes, and is independent of the number of instances at run-time.
This is true of statically and dynamically linked code. The actual code all lives in a text region. Most operating systems actually can share dll code across multiple applications, so if multiple applications are using the same dll's, only one copy resides in memory and both applications can use it. Obviously there is no additional savings from shared memory if only one application uses the linked code.
You can't completely accurately say how much memory a class or X objects will take up in RAM.
However to answer your questions, you are correct that code exists only in one place, it is never "allocated". The code is therefore per-class, and exists whether you create objects or not. The size of the code is determined by your compiler, and even then compilers can often be told to optimize code size, leading to differing results.
Virtual functions are no different, save the (small) added overhead of a virtual method table, which is usually per-class.
Regarding DLLs and other libraries... the rules are no different depending on where the code has come from, so this is not a factor in memory usage.
The information given above is of great help and gave me some insight in C++ memory structure. But I would like to add here is that no matter how many virtual functions in a class, there will always be only 1 VPTR and 1 VTABLE per class. After all the VPTR points to the VTABLE, so there is no need for more than one VPTR in case of multiple virtual functions.
Your estimate is accurate in the base case you've presented. Each object also has a vtable with pointers for each virtual function, so expect an extra pointer's worth of memory for each virtual function.
Member variables (and virtual functions) from any base classes are also part of the class, so include them.
Just as in c you can use the sizeof(classname/datatype) operator to get the size in bytes of a class.
Yes, that's right, code isn't duplicated when an object instance is created. As far as virtual functions go, the proper function call is determined using the vtable, but that doesn't affect object creation per se.
DLLs (shared/dynamic libraries in general) are memory-mapped into the process' memory space. Every modification is carried on as Copy-On-Write (COW): a single DLL is loaded only once into memory and for every write into a mutable space a copy of that space is created (generally page-sized).
if compiled as 32 bit. then sizeof(Bar) should yield 4.
Foo should add 10 bytes (2 ints + 2 chars).
Since Foo is inherited from Bar. That is at least 4 + 10 bytes = 14 bytes.
GCC has attributes for packing the structs so there is no padding. In this case 100 entries would take up 1400 bytes + a tiny overhead for aligning the allocation + some overhead of for memory management.
If no packed attribute is specified it depends on the compilers alignment.
But this doesn't consider how much memory vtable takes up and size of the compiled code.
It's very difficult to give an exact answer to yoour question, as this is implementtaion dependant, but approximate values for a 32-bit implementation might be:
int Bar::mProtectedVar; // 4 bytes
int Foo::mPublicVar; // 4 bytes
char Foo::mPublicVar2; // 1 byte
There are allgnment issues here and the final total may well be 12 bytes. You will also have a vptr - say anoter 4 bytes. So the total size for the data is around 16 bytes per instance. It's impossible to say how much space the code will take up, but you are correct in thinking there is only one copy of the code shared between all instances.
When you ask
I assume dlls get their own stack,
heap, code and data segments.
Th answer is that there really isn't much difference between data in a DLL and data in an app - basically they share everything between them, This has to be so when you think about about it - if they had different stacks (for example) how could function calls work?