Do classes take memory? - c++

class Test
{
int x;
};
int main()
{
cout << sizeof(Test) ;
return 0;
}
Output : 4
I just want to ask that even i am not created any object of class Test why it prints 4 ?

sizeof(X) is the number of bytes an X takes when created. A call to new tends to use a few more bytes for memory use overhead, but an automatic storage (on-stack or local or global or static etc) array of X[N] will take N*sizeof(X) memory in practice (a little extra maybe for function local statics due to thread safety requirements).
It has nothing to do with the amount of memory the type itself takes.
Classes themselves use memory if they have methods that are not optimized away, if they have a vtable (caused by use of the virtual keywords), or similar. Then memory storing code or virtual function tables may exist outside of the memory costs of instances of the class.
Within the C++ language itself, there is no way to determine how much memory the class itself takes, nor no reliable way to determine what the new overhead is. You can usually puzzle that out by looking at the runtime behaviour, or the code for the compiler or runtime libraries, for a given platform.

Related

Instantiate an object in method vs. make a class member

What are some reasons to instantiate an object needed in a method, vs. making the object a class member?
For example, in the example code below, I have a class ClassA that I want to use from another class, like USer1, which has pointer to object of classA as member variable and instantiates in its constructor, and on the other hand User2, which instantiates object of classA in a method just before using it. What are some reasons to do it one way vs the other?
class ClassA
{
public:
void doStuff(void){ }
};
//
// this class has ClassA as a member
//
class User1
{
public:
User1()
{
classA = new ClassA();
}
~User1()
{
delete classA;
}
void use(void)
{
classA->doStuff();
}
private:
ClassA *classA;
};
//
// this class uses ClassA only in a method
//
class User2
{
public:
void use(void)
{
ClassA *classA = new ClassA();
classA->doStuff();
delete classA;
}
};
int main(void)
{
User1 user1;
user1.use();
User2 user2;
user2.use();
return 0;
}
The advantages of making it a class member are:
You don't have to allocate the instance every time, which depending on the class could be very slow.
The member can store state (though some people would say that this is a bad idea)
less code
As a side note, if you are just instantiating and deleting with new and delete in the constructor and destructor, it should really not be a pointer, just a member instance and then get rid of the new and delete.
IE
class User1
{
public:
void use(void)
{
classA.doStuff();
}
private:
ClassA classA;
};
There are times that this isn't the case, for instance when the class being allocated on the stack is large, or you want the footprint of the holding class to be as small as possible. But these are the exception rather than the rule.
There are other thing to consider like memory fragmentation, the advantages of accessing contiguous memory blocks, and how memory is allocated on the target system. There are no silver bullets, only general advice, and for any particular program you need to measure and adjust to get the best performance or overcome the limitations of the particular program.
Memory fragmentation is when even though you have a lot of memory free, the size of the individual block is quite small and you will get memory errors when you try to allocate a large amount of memory. This is usually caused by creating and destroying a lot of different objects of various sizes, with some of them staying alive. If you have a system that suffers from memory fragmentation I would suggest a thorough analysis of how objects are created rather than worry about how having a member or not will affect the system. However, here is a breakdown of how the four different scenarios play out when you are suffering from memory fragmentation:
Instantiating the class on the stack is very helpful as it won't contribute to overall memory fragmentation.
Creating it as a value member might cause problems as it might increase the overall size of the object, so when you get to the fragmentation scenario, the object may be too large to be created.
Creating the object and storing a pointer to it may increase memory fragmentation
Allocating on the heap and deleting at the end of use may increase memory fragmentation if something else is allocated after it was.
The advantages of accessing contiguous memory is that cache misses are minimised, so my feeling is that having the object as a value member would be faster, but as with so many things depending lots of other variables this could be completely wrong. As always when it comes to performance, measure.
Memory is often aligned to a particular boundary, for instance 4 byte alignment, or power of 2 blocks. So depending on the size of your object when you allocate one of them it might take up more memory than you expect, if your allocated object contains any members it might significant change the memory footprint of the class if it is a value member, or if it doesn't it probably won't increase it at all, while having a pointer to it will definitely increase the footprint by the size of a pointer, and that may result in a significant increase. Either creating the class on the heap or the stack will not affect the size of the using class. As always if it is going to affect your program you need to measure on the target system to see what the effects are going to be.
If the constructor/destructor does something (for instance a file handle, opening the file, and closing the file) then you might want to only use it in the function. But yet again, the pointer isn't usually necessary.
void use(void)
{
ClassA classA;
classA.doStuff();
} //classA will be destructed at end of scope
First off there is no reason to have a pointer in either class. If we use value semantics in User1 then there is no need to have a constructor or destructor as the compiler generated ones will be sufficient. That changes User1 to:
class User1
{
public:
void use(void)
{
classA.doStuff();
}
private:
ClassA classA;
};
Likewise if we use value semantics in User2 then it would become:
class User2
{
public:
void use(void)
{
ClassA classA;
classA.doStuff();
}
};
Now as to whether you want to have ClassA as a member or if you should just use it in the function is a matter of design. If the class is going to be using and updating the ClassA then it should be a member. If you just need to to do something in a function the the second approach is okay.
If you are going to be calling the function that creates a ClassA a lot it might be beneficial to have it be a member as you only need to construct it once and you get to use it in the function. Conversely If you are going to have a lot objects but you hardly ever call that function it might be better to create the ClassA when you need it as you will save space.
Really though this is something that you would have to profile to determine which way would be better. We programmers are bad judges of what is faster and should let the profiler tell us if we need to change something. Some things like using value semantics over a pointer with heap allocation is generally faster. One example where we get this wrong is sorting. If N is small then using a bubble sort which is O(n^2) is faster than a quicksort which is O(n log n). Another example of this si presented in this Hurb Sutter talk starting at 46:00. He shows that using a std::vector is faster than a std::list at inserting and removing from the middle because a std::vector is very cache friendly where a std::list is not.

Is there a way to distinguish what type of memory used by the object instance?

If i have this code :
#include <assert.h>
class Foo {
public:
bool is_static();
bool is_stack();
bool is_dynamic();
};
Foo a;
int main()
{
Foo b;
Foo* c = new Foo;
assert( a.is_static() && !a.is_stack() && !a.is_dynamic());
assert(!b.is_static() && b.is_stack() && !b.is_dynamic());
assert(!c->is_static() && !c->is_stack() && c->is_dynamic());
delete c;
}
Is it possible to implement is_stack, is_static, is_dynamic method to do so in order to be assertions fulfilled?
Example of use: counting size of memory which particular objects of type Foo uses on stack, but not counting static or dynamic memory
This cannot be done using standard C++ facilities, which take pains to ensure that objects work the same way no matter how they are allocated.
You can do it, however, by asking the OS about your process memory map, and figuring out what address range a given object falls into. (Be sure to use uintptr_t for arithmetic while doing this.)
Scroll down to the second answer that gives a wide array of available options depending on the Operating System:
How to determine CPU and memory consumption from inside a process?
I would also recommend reading this article on Tracking Memory Alloactions in C++:
http://www.almostinfinite.com/memtrack.html
Just be aware that it's a ton of work.
while the intention is good here, the approach is not the best.
Consider a few things:
on the stack you allocate temporary variables for your methods. You
don't always have to worry about how much stack you use because the
lifetime of the temp variables is short
related to stack what you usually care about is not corrupting it,
which can happen if your program uses pointers and accesses data
outside the intended bounds. For this type of problems a isStatic
function will not help.
for dynamic memory allocation you usually override the new/ delete
operators and keep a counter to track the amount of memory used. so
again, a isDynamic function might not do the trick.
in the case of global variables (you said static but I extended the
scope a bit) which are allocated in a separate data section (not
stack nor heap) well you don't always care about them because they
are statically allocated and the linker will tell you at link time if
you don't have enough space. Plus you can check the map file if you
really want to know address ranges.
So most of your concerns are solved at compile time and to be honest you rarely care about them. And the rest are (dynamic memory allocation) are treated differently.
But if you insist on having those methods you can tell the linker to generate a map file which will give you the address ranges for all data sections and use those for your purposes.

Alternatives for polymorphic data storage

I'm storing a large amount of computed data and I'm currently using a polymorphic type to reduce the amount of storage required. Everything is extremely fast except for deleting the objects when I'm finished and I think there must be a better alternative. The code computes the state at each step and depending on the conditions present it needs to store certain values. The worst case is storing the full object state and the best state is storing almost nothing. The (very simplified) setup is as follows:
class BaseClass
{
public:
virtual ~BaseClass() { }
double time;
unsigned int section;
};
class VirtualSmall : public BaseClass
{
public:
double values[2];
int othervalue;
};
class VirtualBig : public BaseClass
{
public:
double values[16];
int othervalues[5];
};
...
std::vector<BaseClass*> results(10000);
The appropriate object type is generated during computation and a pointer to it is stored in the vector. The overhead from vtable+pointer is overall much smaller than than the size difference between the largest and smallest object (which is least 200 bytes according to sizeof). Since often the smallest object can be used instead of the largest and there are potentially many tens of millions of them stored it can save a few gigabytes of memory usage. The results can then be searched extremely fast as the base class contains the information necessary to find the correct item which can then be dynamic_cast back to it's real type. It works very well for the most part.
The only issue is with delete. It takes a few seconds to free all of the memory when there is many tens of millions of objects. The delete code iterates through each object and delete results[i] which calls the virtual destructor. While it's not impossible to work around I think there must be a more elegant solution.
It could definitely be done by allocating largish contiguous blocks of memory (with malloc or similar), which are kept track of and then something generates a correct pointers to the next batch of free memory inside of the block. That pointer is then stored in the vector. To free the memory the smaller number of large blocks need to have free() called on them. There is no more vtable (and it can be replaced by a smaller type field to ensure the correct cast) which saves space as well. It is very much a C style solution though and not particularly pretty.
Is there a C++ style solution to this type of problem I'm overlooking?
You can overload the "new" operator (i.e. void* VirtualSmall::operator new(size_t) ) for you classes, and implement them to obtain memory from custom allocators. I would use one block allocator for each derived class, so that each block size is a multiple of the class' it's supposed to store.
When it's time to cleanup, tell each allocators to release all blocks. No destructors will be called, so make sure you don't need them.

What is the cost of inheritance?

This is a pretty basic question but I'm still unsure:
If I have a class that will be instantiated millions of times -- is it advisable not to derive it from some other class? In other words, does inheritance carry some cost (in terms of memory or runtime to construct or destroy an object) that I should be worrying about in practice?
Example:
class Foo : public FooBase { // should I avoid deriving from FooBase?
// ...
};
int main() {
// constructs millions of Foo objects...
}
Inheriting from a class costs nothing at runtime.
The class instances will of course take up more memory if you have variables in the base class, but no more than if they were in the derived class directly and you didn't inherit from anything.
This does not take into account virtual methods, which do incur a small runtime cost.
tl;dr: You shouldn't be worrying about it.
i'm a bit surprised by some the responses/comments so far...
does inheritance carry some cost (in terms of memory)
Yes. Given:
namespace MON {
class FooBase {
public:
FooBase();
virtual ~FooBase();
virtual void f();
private:
uint8_t a;
};
class Foo : public FooBase {
public:
Foo();
virtual ~Foo();
virtual void f();
private:
uint8_t b;
};
class MiniFoo {
public:
MiniFoo();
~MiniFoo();
void f();
private:
uint8_t a;
uint8_t b;
};
class MiniVFoo {
public:
MiniVFoo();
virtual ~MiniVFoo();
void f();
private:
uint8_t a;
uint8_t b;
};
} // << MON
extern "C" {
struct CFoo {
uint8_t a;
uint8_t b;
};
}
on my system, the sizes are as follows:
32 bit:
FooBase: 8
Foo: 8
MiniFoo: 2
MiniVFoo: 8
CFoo: 2
64 bit:
FooBase: 16
Foo: 16
MiniFoo: 2
MiniVFoo: 16
CFoo: 2
runtime to construct or destroy an object
additional function overhead and virtual dispatch where needed (including destructors where appropriate). this can cost a lot and some really obvious optimizations such as inlining may/can not be performed.
the entire subject is much more complex, but that will give you an idea of the costs.
if the speed or size is truly critical, then you can often use static polymorphism (e.g. templates) to achieve an excellent balance between performance and ease to program.
regarding cpu performance, i created a simple test which created millions of these types on the stack and on the heap and called f, the results are:
FooBase 16.9%
Foo 16.8%
Foo2 16.6%
MiniVFoo 16.6%
MiniFoo 16.2%
CFoo 15.9%
note: Foo2 derives from foo
in the test, the allocations are added to a vector, then deleted. without this stage, the CFoo was entirely optimized away. as Jeff Dege posted in his answer, allocation time will be a huge part of this test.
Pruning the allocation functions and vector create/destroy from the sample produces these numbers:
Foo 19.7%
FooBase 18.7%
Foo2 19.4%
MiniVFoo 19.3%
MiniFoo 13.4%
CFoo 8.5%
which means the virtual variants take over twice as long as the CFoo to execute their constructors, destructors and calls, and MiniFoo is about 1.5 times faster.
while we're on allocation: if you can use a single type for your implementation, you also reduce the number of allocations you must make in this scenario because you can allocate an array of 1M objects, rather than creating a list of 1M addresses and then filling it with uniquely new'ed types. of course, there are special purpose allocators which can reduce this weight. since allocations/free times are the weight of this test, it would significantly reduce the time you spend allocating and freeing objects.
Create many MiniFoos as array 0.2%
Create many CFoos as array 0.1%
Also keep in mind that the sizes of MiniFoo and CFoo consume 1/4 - 1/8 the memory per element, and a contiguous allocation removes the need to store pointers to dynamic objects. You could then keep track of the object in an implementation more ways (pointer or index), but the array can also significantly reduce allocation demends on clients (uint32_t vs pointer on a 64 bit arch) -- plus all the bookkeeping required by the system for the allocations (which is significant when dealing with so many small allocations).
Specifically, the sizes in this test consumed:
32 bit
267MB for dynamic allocations (worst)
19MB for the contiguous allocations
64 bit
381MB for dynamic allocations (worst)
19MB for the contiguous allocations
this means that the required memory was reduced by more than ten, and the times spent allocating/freeing is significantly better than that!
Static dispatch implementations vs mixed or dynamic dispatch can be several times faster. This typically gives the optimizers more opportunuities to see more of the program and optimize it accordingly.
In practice, dynamic types tend to export more symbols (methods, dtors, vtables), which can noticably increase the binary size.
Assuming this is your actual use case, then you can improve the performance and resource usage significantly. i've presented a number of major optimizations... just in case somebody believes changing the design in such a way would qualify as 'micro'-optimizations.
Largely, this depends upon the implementation. But there are some commonalities.
If your inheritance tree includes any virtual functions, the compiler will need to create a vtable for each class - a jump table with pointers to the various virtual functions. Every instance of those classes will carry along a hidden pointer to its class's vtable.
And any call to a virtual function will involve a hidden level of indirection - rather than jumping to a function address that had been resolved at link time, a call will involve reading the address from the vtable and then jumping to that.
Generally speaking, this overhead isn't likely to be measurable on any but the most time-critical software.
OTOH, you said you'd be instantiating and destroying millions of these objects. In most cases, the largest cost isn't constructing the object, but allocating memory for it.
IOW, you might benefit from using your own custom memory allocators, for the class.
http://www.cprogramming.com/tutorial/operator_new.html
I think we all guys have been programming too much as lone wolf ..
We forget to take cost of maintenance + readability + extensions with regards to features.
Here is my take
Inheritance Cost++
On smaller projects : time to develop increases. Easy to write all global sudoku code. Always has it taken more time for me, to write a class inheritance to do the right_thing.
On smaller projects : Time to modify increases. It is not always easy to modify the existing code to confirm the existing interface.
Time to design increases.
Program is slightly inefficient due to multiple message passing, rather than exposed gut(I mean data members. :))
Only for the virtual function calls via pointer to base class, there one single extra dereference.
There is a small space penalty in terms of RTTI
For sake of completeness I will add that, too many classes will add too many types and that is bound to increase your compilation time, no matter how small it might be.
There is also cost of tracking multiple objects in terms of base class object and all for run-time system, which obviously mean a slight increase in code size + slight runtime performance penalty due to the exception delegation mechanism(whether you use it or not).
You dont have to twist your arm unnaturally in a way of PIMPL, if all you want to do is to insulate users of your interface functions from getting recompiled. (This IS a HEAVY cost, trust me.)
Inheritance Cost--
As the program size grows larger than 1/2 thousand lines, it is more maintainable with inheritance. If you are the only one programming then you can easily push code without object upto 4k/5k lines.
Cost of bug fixing reduces.
You can easily extend the existing framework for more challenging tasks.
I know I am being a little devils advocate, but I think we gotta be fair.
If you need the functionality of FooBase in Foo, either you can derive or use composition. Deriving has the cost of the vtable, and FooBase has the cost of a pointer to a FooBase, the FooBase, and the FooBase's vtable. So they are (roughly) similar and you shouldn't have to worry about the cost of inheritance.
Creating a derived object involves calling constructors for all base classes, and destroying them invokes destructors for these classes. The cost depends then on what these constructors do, but then if you don't derive but include the same functionality in derived class, you pay the same cost. In terms of memory, every object of the derived class contains an object of its base class, but again, it's exactly the same memory usage as if you just included all of these fields in the class instead of deriving it.
Be wary, that in many cases it's a better idea to compose (have a data member of the 'base' class rather than deriving from it), in particular if you're not overriding virtual functions and your relationship between 'derived' and 'base' is not an "is a kind of" relationship. But in terms of CPU and memory usage both these techniques are equivalent.
The fact is that if you are in doubt of whether you should inherit or not, the answer is that you should not. Inheritance is the second most coupling relationship in the language.
As of the performance difference, there should be almost no difference in most cases, unless you start using multiple inheritance, where if one of the bases has virtual functions, the dispatch will have an additional (minimal, negligible) cost if the base subobject is not aligned with the final overrider as the compiler will add a thunk to adjust the this pointer.

In C++, where in memory are class functions put?

I'm trying to understand what kind of memory hit I'll incur by creating a large array of objects. I know that each object - when created - will be given space in the HEAP for member variables, and I think that all the code for every function that belongs to that type of object exists in the code segment in memory - permanently.
Is that right?
So if I create 100 objects in C++, I can estimate that I will need space for all the member variables that object owns multiplied by 100 (possible alignment issues here), and then I need space in the code segment for a single copy of the code for each member function for that type of object( not 100 copies of the code ).
Do virtual functions, polymorphism, inheritance factor into this somehow?
What about objects from dynamically linked libraries? I assume dlls get their own stack, heap, code and data segments.
Simple example (may not be syntactically correct):
// parent class
class Bar
{
public:
Bar() {};
~Bar() {};
// pure virtual function
virtual void doSomething() = 0;
protected:
// a protected variable
int mProtectedVar;
}
// our object class that we'll create multiple instances of
class Foo : public Bar
{
public:
Foo() {};
~Foo() {};
// implement pure virtual function
void doSomething() { mPrivate = 0; }
// a couple public functions
int getPrivateVar() { return mPrivate; }
void setPrivateVar(int v) { mPrivate = v; }
// a couple public variables
int mPublicVar;
char mPublicVar2;
private:
// a couple private variables
int mPrivate;
char mPrivateVar2;
}
About how much memory should 100 dynamically allocated objects of type Foo take including room for the code and all variables?
It's not necessarily true that "each object - when created - will be given space in the HEAP for member variables". Each object you create will take some nonzero space somewhere for its member variables, but where is up to how you allocate the object itself. If the object has automatic (stack) allocation, so too will its data members. If the object is allocated on the free store (heap), so too will be its data members. After all, what is the allocation of an object other than that of its data members?
If a stack-allocated object contains a pointer or other type which is then used to allocate on the heap, that allocation will occur on the heap regardless of where the object itself was created.
For objects with virtual functions, each will have a vtable pointer allocated as if it were an explicitly-declared data member within the class.
As for member functions, the code for those is likely no different from free-function code in terms of where it goes in the executable image. After all, a member function is basically a free function with an implicit "this" pointer as its first argument.
Inheritance doesn't change much of anything.
I'm not sure what you mean about DLLs getting their own stack. A DLL is not a program, and should have no need for a stack (or heap), as objects it allocates are always allocated in the context of a program which has its own stack and heap. That there would be code (text) and data segments in a DLL does make sense, though I am not expert in the implementation of such things on Windows (which I assume you're using given your terminology).
Code exists in the text segment, and how much code is generated based on classes is reasonably complex. A boring class with no virtual inheritance ostensibly has some code for each member function (including those that are implicitly created when omitted, such as copy constructors) just once in the text segment. The size of any class instance is, as you've stated, generally the sum size of the member variables.
Then, it gets somewhat complex. A few of the issues are...
The compiler can, if it wants or is instructed, inline code. So even though it might be a simple function, if it's used in many places and chosen for inlining, a lot of code can be generated (spread all over the program code).
Virtual inheritance increases the size of polymorphic each member. The VTABLE (virtual table) hides along with each instance of a class using a virtual method, containing information for runtime dispatch. This table can grow quite large, if you have many virtual functions, or multiple (virtual) inheritance. Clarification: The VTABLE is per class, but pointers to the VTABLE are stored in each instance (depending on the ancestral type structure of the object).
Templates can cause code bloat. Every use of a templated class with a new set of template parameters can generate brand new code for each member. Modern compilers try and collapse this as much as possible, but it's hard.
Structure alignment/padding can cause simple class instances to be larger than you expect, as the compiler pads the structure for the target architecture.
When programming, use the sizeof operator to determine object size - never hard code. Use the rough metric of "Sum of member variable size + some VTABLE (if it exists)" when estimating how expensive large groups of instances will be, and don't worry overly about the size of the code. Optimise later, and if any of the non-obvious issues come back to mean something, I'll be rather surprised.
Although some aspects of this are compiler vendor dependent, all compiled code goes into a section of memory on most systems called text segment. This is separate from both the heap and stack sections (a fourth section, data, holds most constants). Instantiating many instances of a class incurs run-time space only for its instance variables, not for any of its functions. If you make use of virtual methods, you will get an additional, but small, bit of memory set aside for the virtual look-up table (or equivalent for compilers that use some other concept), but its size is determined by the number of virtual methods times the number of virtual classes, and is independent of the number of instances at run-time.
This is true of statically and dynamically linked code. The actual code all lives in a text region. Most operating systems actually can share dll code across multiple applications, so if multiple applications are using the same dll's, only one copy resides in memory and both applications can use it. Obviously there is no additional savings from shared memory if only one application uses the linked code.
You can't completely accurately say how much memory a class or X objects will take up in RAM.
However to answer your questions, you are correct that code exists only in one place, it is never "allocated". The code is therefore per-class, and exists whether you create objects or not. The size of the code is determined by your compiler, and even then compilers can often be told to optimize code size, leading to differing results.
Virtual functions are no different, save the (small) added overhead of a virtual method table, which is usually per-class.
Regarding DLLs and other libraries... the rules are no different depending on where the code has come from, so this is not a factor in memory usage.
The information given above is of great help and gave me some insight in C++ memory structure. But I would like to add here is that no matter how many virtual functions in a class, there will always be only 1 VPTR and 1 VTABLE per class. After all the VPTR points to the VTABLE, so there is no need for more than one VPTR in case of multiple virtual functions.
Your estimate is accurate in the base case you've presented. Each object also has a vtable with pointers for each virtual function, so expect an extra pointer's worth of memory for each virtual function.
Member variables (and virtual functions) from any base classes are also part of the class, so include them.
Just as in c you can use the sizeof(classname/datatype) operator to get the size in bytes of a class.
Yes, that's right, code isn't duplicated when an object instance is created. As far as virtual functions go, the proper function call is determined using the vtable, but that doesn't affect object creation per se.
DLLs (shared/dynamic libraries in general) are memory-mapped into the process' memory space. Every modification is carried on as Copy-On-Write (COW): a single DLL is loaded only once into memory and for every write into a mutable space a copy of that space is created (generally page-sized).
if compiled as 32 bit. then sizeof(Bar) should yield 4.
Foo should add 10 bytes (2 ints + 2 chars).
Since Foo is inherited from Bar. That is at least 4 + 10 bytes = 14 bytes.
GCC has attributes for packing the structs so there is no padding. In this case 100 entries would take up 1400 bytes + a tiny overhead for aligning the allocation + some overhead of for memory management.
If no packed attribute is specified it depends on the compilers alignment.
But this doesn't consider how much memory vtable takes up and size of the compiled code.
It's very difficult to give an exact answer to yoour question, as this is implementtaion dependant, but approximate values for a 32-bit implementation might be:
int Bar::mProtectedVar; // 4 bytes
int Foo::mPublicVar; // 4 bytes
char Foo::mPublicVar2; // 1 byte
There are allgnment issues here and the final total may well be 12 bytes. You will also have a vptr - say anoter 4 bytes. So the total size for the data is around 16 bytes per instance. It's impossible to say how much space the code will take up, but you are correct in thinking there is only one copy of the code shared between all instances.
When you ask
I assume dlls get their own stack,
heap, code and data segments.
Th answer is that there really isn't much difference between data in a DLL and data in an app - basically they share everything between them, This has to be so when you think about about it - if they had different stacks (for example) how could function calls work?