How can class definitions not occupy memory? - c++

So I have read this about if class definitions occupy memory and this about if function occupy memory. This is what I do not get: How come class definitions do not occupy memory if functions do, or their code does. I mean, class definitions are also code, so shouldn't that occupy memory just like function code does?

It is not entirely correct to say that class definitions do not occupy memory: any class with member functions may place some code in memory, although the amount of code and its actual placement depends heavily on function inlining.
The Q&A at the first link talks about sizeof, which shows a per-instance memory requirement of the class, which excludes memory requirements for storing member functions, static members, inlined functions, dispatch tables, and so on. This is because all these elements are shared among all instances of the class.

You don't need to keep the class definition anywhere, because the details of how to create an instance of a class are encoded in its constructors.
(In a sense, the class definition is code, it's just not represented explicitly.)
All you need to know in order to create an object is
How big it is,
Which constructor to use for creating it, and
Which its virtual functions are.
To create an instance of class A:
Reserve a piece of memory of size sizeof(A) (or be handed one),
Associate that piece of memory with the virtual functions of A, if any (usually held in a table in a predetermined location), and
Tell the relevant A constructor where the A should be created, and then let it do the actual work.
You don't need to know a thing about the types of member variables or anything like that, the constructors know what to do once they know where the object is to be created.
(Every member variable can be found at an offset from the beginning of the object, so the constructor knows where things must be.)
To create a function, on the other hand, you would need to store its definition in some form and then generate the code at runtime. (This is usually called "Just-in-time" compilation.)
This requires a compiler, which means that you need to either
Include a compiler in every executable, or
Provide (or require everyone to install) a shared compiler for all executables (Java VMs usually contain at least one).
C++ compilers instead generate the functions in advance.
Abusing terminology a little, you could say that the functions are "instantiated" by the compilation process, with the source code as a blueprint.

Related

Why in C++ the size of a class must be always known by its users?

Let's say that a class is completely defined in its .cpp file, so that in the source file you can find:
The constructor defined
The desctructor defined
Every method defined
Than why its private member variables must still be in the header file?
Why do we still need PIMPL to get rid of them?
If for this class I also define its own new operator in the source file, why I still need to know the size from the outside code?
Is it because the class can be still stack allocated?
If so, than why the "function" who allocates on the stack is not part of the constructor call inside the .cpp file?
Than why its private member variables must still be in the header file? Why do we still need PIMPL to get rid of them?
Because for many operations - those allowed after a definition's been seen - the compiler needs to know the size of object instances. Further details below.
If for this class I also define its own new operator in the source file, why I still need to know the size from the outside code?
Is it because the class can be still stack allocated? If so, than why the "function" who allocates on the stack is not part of the constructor call inside the .cpp file?
Partly. It's simplest and most efficient for the compiler to move the stack pointer by the total size of local variables as a function call starts, then move it back as it returns. That size can normally be calculated at compile time. If you had runtime functions returning the individual object sizes, then the compiler would need to handle the stack pointer deltas in dribs and drabs, and either repeatedly calculate the address of specific objects at runtime as the cumulative total of earlier allocations, or use memory/registers to maintain a set of pointers or offsets to wherever they end up. (This is one of the main reasons most C++ compilers don't support runtime specification of array dimensions.)
I say "partly" because it's not just about the stack: similar issues apply to static / global and thread-local objects.

does a variable consume memory in addition to just its content (e.g. type, location)?

Quite likely this has been asked/answered before, but not sure how to phrase it best, a link to a previously answered question would be great.
If you define something like
char myChar = 'a';
I understand that this will take up one byte in memory (depending on implementation and assuming no unicode and so on, the actual number is unimportant).
But I would assume the compiler/computer would also need to keep a table of variable types, addresses (i.e. pointers), and possibly more. Otherwise it would have the memory reserved, but would not be able to do anything with it. So that's already at least a few more bytes of memory consumed per variable.
Is this a correct picture of what happens, or am I misunderstanding what happens when a program gets compiled/executed? And if the above is correct, is it more to do with compilation, or execution?
The compiler will keep track of the properties of a variable - its name, lifetime, type, scope, etc. This information will exist in memory only during compilation. Once the program has been compiled and the program is executed, however, all that is left is the object itself. There is no type information at run-time (except if you use RTTI, then there will be some, but only because you required it for your program to function - such as is required for dynamic_casting).
Everything that happens in the code that accesses the object has been compiled into a form that treats it exactly as a single byte (because it's a char). The address that the object is located at can only be known at run-time anyway. However, variables with automatic storage duration (like local variables), are typically located simply by some fixed offset from the current stack frame. That offset is hard-baked into the executable.
Wether a variable contains extra information depends on the type of the variable and your compiler options. If you use RTTI, extra information is stored. If you compile with debug information then there will also extra overhead be added.
For native datatypes like your example of char there is usually no overhead, unless you have structs which also can cotnain padding bytes. If you define classes, there may be a virtual table associated with your class. However, if you dynamically allocate memory, then there usually will be some overhead along with your allocated memory.
Somtimes a variable may not even exist, because the optimizer realizes that there is no storage needed for it, and it can wrap it up in a register.
So in total, you can not rely on counting your used variables and sum their size up to calculate the amount of memory it requires because there is not neccessarily a 1:1: relation.
Some types can be detected in compile type, say in this code:
void foo(char c) {...}
it is obvious what type of variable c in compile time is.
In case of inheritance you cannot know the real type of the variable in the compile type, like:
void draw(Drawable* drawable); // where drawable can be Circle, Line etc.
But C++ compiler can help to determine the type of the Drawable using dynamic_cast. In this case it uses pointer to a virtual method tables, associated with an object to determine the real type.

Preventing a hierarchy of classes from being created on the stack

I'm not sure if this is possible.
I need to prevent a all classes derived from X from being instantiated as local stack or member variables. I made all their destructors protected and this did the trick as far as outside scopes are concerned. However I need to prevent them from being instantiated by themselves, too. I mean if Y has member variables of type Z or instantiates local variables of type Z in its methods, thid doesn't cut it.
Now I could create private destructors in all the leaves of the hierarchy tree, but the problem is that every not should be allowed to be a (heap) variable. In the case X <- Y <- Z, all three should be instantiated but X and Y cannot have private destructors. Moreover even that doesn't stop me from having local variables of type Z in the methods of Z.
I guess by making their constructors private and adding operator new as friend to all of them will do the trick, but this is a LOT of extra work (since we use several versions of operator new) and the hierarchy is big.
So, is there a way of having a (preferably) compile-time, or a runtime error for stack instantiation of these objects, whithout resorting to the private-constructors-friend-new-way?
<edit>
The thing is that the previous programmers of this project wrote a ton of code and all classes in this hierarchy have terribly complicated destructors. And also, the authors indiscriminately called virtual methods in those destructros, which lead to a lot of unexplicable (to them) crashes and memory corruptions. Now converted all destructors to a obj->Release() pattern and in the top-most Release I have delete this. Obviously this wont work for stack objects and now I introduced some crashes of my own. Also I'm kinda short of time and the run/wait for crash/fix this specific crash method is very very slow
</edit>
In his book “More Effective C++” in item 27, Scott Meyers (incidentally my favorite author on C++) describes why it’s not possible in the general sense and within the bounds of portable or semi-portable C++ to definitively distinguish whether an object has been allocated on the stack, heap, or is statically allocated. It also discusses various options for ensuring an object can only be allocated on the stack, or on the heap. One of those is more or less do-able, the other doesn't have a truly portable foolproof way of working; I forget which is which. (Book is at work, I'm at home.)

C++ Class Memory Model And Alignment

I have several questions to ask that pertains to data position and alignment in C++. Do classes have the same memory placement and memory alignment format as structs?
More specifically, is data loaded into memory based on the order in which it's declared? Do functions affect memory alignment and data position or are they allocated to another location? Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct. I'm just curious to know whether or not this is intrinsic to classes as it is to structs and whether or not it will translate well into classes if I chose to use that approach.
Edit: Thanks for all your answers. They've really helped a lot.
Do classes have the same memory placement and memory alignment format
as structs?
The memory placement/alignment of objects is not contingent on whether its type was declared as a class or a struct. The only difference between a class and a struct in C++ is that a class have private members by default while a struct have public members by default.
More specifically, is data loaded into memory based on the order in
which it's declared?
I'm not sure what you mean by "loaded into memory". Within an object however, the compiler is not allowed to rearrange variables. For example:
class Foo {
int a;
int b;
int c;
};
The variables c must be located after b and b must be located after a within a Foo object. They are also constructed (initialized) in the order shown in the class declaration when a Foo is created, and destructed in the reverse order when a Foo is destroyed.
It's actually more complicated than this due to inheritance and access modifiers, but that is the basic idea.
Do functions affect memory alignment and data position or are they
allocated to another location?
Functions are not data, so alignment isn't a concern for them. In some executable file formats and/or architectures, function binary code does in fact occupy a separate area from data variables, but the C++ language is agnostic to that fact.
Generally speaking, I keep all of my memory alignment and position
dependent stuff like file headers and algorithmic data within a
struct. I'm just curious to know whether or not this is intrinsic to
classes as it is to structs and whether or not it will translate well
into classes if I chose to use that approach.
Memory alignment is something that's almost automatically taken care of for you by the compiler. It's more of an implementation detail than anything else. I say "almost automatically" since there are situations where it may matter (serialization, ABIs, etc) but within an application it shouldn't be a concern.
With respect with reading files (since you mention file headers), it sounds like you're reading files directly into the memory occupied by a struct. I can't recommend that approach since issues with padding and alignment may make your code work on one platform and not another. Instead you should read the raw bytes a couple at a time from the file and assign them into the structs with simple assignment.
Do classes have the same memory placement and memory alignment format as structs?
Yes. Technically there is no difference between a class and a struct. The only difference is the default member access specification otherwise they are identical.
More specifically, is data loaded into memory based on the order in which it's declared?
Yes.
Do functions affect memory alignment and data position or are they allocated to another location?
No. They do not affect alignment. Methods are compiled separately. The object does not contain any reference to methods (to those that say virtual tables do affect members the answer is yes and no but this is an implementation detail that does not affect the relative difference between members. The compiler is allowed to add implementation specific data to the object).
Generally speaking, I keep all of my memory alignment and position dependent stuff like file headers and algorithmic data within a struct.
OK. Not sure how that affects anything.
I'm just curious to know whether or not this is intrinsic to classes as it is to structs
Class/Structs different name for the same thing.
and whether or not it will translate well into classes if I chose to use that approach.
Choose what approach?
C++ classes simply translate into structs with all the instance variables as the data contained inside the structs, while all the functions are separated from the class and are treated like functions with accept those structs as an argument.
The exact way instance variables are stored depends on the compiler used, but they generally tend to be in order.
C++ classes do not participate in "persistence", like binary-mode structures, and shouldn't have alignment attached to them. Keep the classes simple.
Putting alignment with classes may have negative performance benefits and may have side effects too.

In C++, where in memory are class functions put?

I'm trying to understand what kind of memory hit I'll incur by creating a large array of objects. I know that each object - when created - will be given space in the HEAP for member variables, and I think that all the code for every function that belongs to that type of object exists in the code segment in memory - permanently.
Is that right?
So if I create 100 objects in C++, I can estimate that I will need space for all the member variables that object owns multiplied by 100 (possible alignment issues here), and then I need space in the code segment for a single copy of the code for each member function for that type of object( not 100 copies of the code ).
Do virtual functions, polymorphism, inheritance factor into this somehow?
What about objects from dynamically linked libraries? I assume dlls get their own stack, heap, code and data segments.
Simple example (may not be syntactically correct):
// parent class
class Bar
{
public:
Bar() {};
~Bar() {};
// pure virtual function
virtual void doSomething() = 0;
protected:
// a protected variable
int mProtectedVar;
}
// our object class that we'll create multiple instances of
class Foo : public Bar
{
public:
Foo() {};
~Foo() {};
// implement pure virtual function
void doSomething() { mPrivate = 0; }
// a couple public functions
int getPrivateVar() { return mPrivate; }
void setPrivateVar(int v) { mPrivate = v; }
// a couple public variables
int mPublicVar;
char mPublicVar2;
private:
// a couple private variables
int mPrivate;
char mPrivateVar2;
}
About how much memory should 100 dynamically allocated objects of type Foo take including room for the code and all variables?
It's not necessarily true that "each object - when created - will be given space in the HEAP for member variables". Each object you create will take some nonzero space somewhere for its member variables, but where is up to how you allocate the object itself. If the object has automatic (stack) allocation, so too will its data members. If the object is allocated on the free store (heap), so too will be its data members. After all, what is the allocation of an object other than that of its data members?
If a stack-allocated object contains a pointer or other type which is then used to allocate on the heap, that allocation will occur on the heap regardless of where the object itself was created.
For objects with virtual functions, each will have a vtable pointer allocated as if it were an explicitly-declared data member within the class.
As for member functions, the code for those is likely no different from free-function code in terms of where it goes in the executable image. After all, a member function is basically a free function with an implicit "this" pointer as its first argument.
Inheritance doesn't change much of anything.
I'm not sure what you mean about DLLs getting their own stack. A DLL is not a program, and should have no need for a stack (or heap), as objects it allocates are always allocated in the context of a program which has its own stack and heap. That there would be code (text) and data segments in a DLL does make sense, though I am not expert in the implementation of such things on Windows (which I assume you're using given your terminology).
Code exists in the text segment, and how much code is generated based on classes is reasonably complex. A boring class with no virtual inheritance ostensibly has some code for each member function (including those that are implicitly created when omitted, such as copy constructors) just once in the text segment. The size of any class instance is, as you've stated, generally the sum size of the member variables.
Then, it gets somewhat complex. A few of the issues are...
The compiler can, if it wants or is instructed, inline code. So even though it might be a simple function, if it's used in many places and chosen for inlining, a lot of code can be generated (spread all over the program code).
Virtual inheritance increases the size of polymorphic each member. The VTABLE (virtual table) hides along with each instance of a class using a virtual method, containing information for runtime dispatch. This table can grow quite large, if you have many virtual functions, or multiple (virtual) inheritance. Clarification: The VTABLE is per class, but pointers to the VTABLE are stored in each instance (depending on the ancestral type structure of the object).
Templates can cause code bloat. Every use of a templated class with a new set of template parameters can generate brand new code for each member. Modern compilers try and collapse this as much as possible, but it's hard.
Structure alignment/padding can cause simple class instances to be larger than you expect, as the compiler pads the structure for the target architecture.
When programming, use the sizeof operator to determine object size - never hard code. Use the rough metric of "Sum of member variable size + some VTABLE (if it exists)" when estimating how expensive large groups of instances will be, and don't worry overly about the size of the code. Optimise later, and if any of the non-obvious issues come back to mean something, I'll be rather surprised.
Although some aspects of this are compiler vendor dependent, all compiled code goes into a section of memory on most systems called text segment. This is separate from both the heap and stack sections (a fourth section, data, holds most constants). Instantiating many instances of a class incurs run-time space only for its instance variables, not for any of its functions. If you make use of virtual methods, you will get an additional, but small, bit of memory set aside for the virtual look-up table (or equivalent for compilers that use some other concept), but its size is determined by the number of virtual methods times the number of virtual classes, and is independent of the number of instances at run-time.
This is true of statically and dynamically linked code. The actual code all lives in a text region. Most operating systems actually can share dll code across multiple applications, so if multiple applications are using the same dll's, only one copy resides in memory and both applications can use it. Obviously there is no additional savings from shared memory if only one application uses the linked code.
You can't completely accurately say how much memory a class or X objects will take up in RAM.
However to answer your questions, you are correct that code exists only in one place, it is never "allocated". The code is therefore per-class, and exists whether you create objects or not. The size of the code is determined by your compiler, and even then compilers can often be told to optimize code size, leading to differing results.
Virtual functions are no different, save the (small) added overhead of a virtual method table, which is usually per-class.
Regarding DLLs and other libraries... the rules are no different depending on where the code has come from, so this is not a factor in memory usage.
The information given above is of great help and gave me some insight in C++ memory structure. But I would like to add here is that no matter how many virtual functions in a class, there will always be only 1 VPTR and 1 VTABLE per class. After all the VPTR points to the VTABLE, so there is no need for more than one VPTR in case of multiple virtual functions.
Your estimate is accurate in the base case you've presented. Each object also has a vtable with pointers for each virtual function, so expect an extra pointer's worth of memory for each virtual function.
Member variables (and virtual functions) from any base classes are also part of the class, so include them.
Just as in c you can use the sizeof(classname/datatype) operator to get the size in bytes of a class.
Yes, that's right, code isn't duplicated when an object instance is created. As far as virtual functions go, the proper function call is determined using the vtable, but that doesn't affect object creation per se.
DLLs (shared/dynamic libraries in general) are memory-mapped into the process' memory space. Every modification is carried on as Copy-On-Write (COW): a single DLL is loaded only once into memory and for every write into a mutable space a copy of that space is created (generally page-sized).
if compiled as 32 bit. then sizeof(Bar) should yield 4.
Foo should add 10 bytes (2 ints + 2 chars).
Since Foo is inherited from Bar. That is at least 4 + 10 bytes = 14 bytes.
GCC has attributes for packing the structs so there is no padding. In this case 100 entries would take up 1400 bytes + a tiny overhead for aligning the allocation + some overhead of for memory management.
If no packed attribute is specified it depends on the compilers alignment.
But this doesn't consider how much memory vtable takes up and size of the compiled code.
It's very difficult to give an exact answer to yoour question, as this is implementtaion dependant, but approximate values for a 32-bit implementation might be:
int Bar::mProtectedVar; // 4 bytes
int Foo::mPublicVar; // 4 bytes
char Foo::mPublicVar2; // 1 byte
There are allgnment issues here and the final total may well be 12 bytes. You will also have a vptr - say anoter 4 bytes. So the total size for the data is around 16 bytes per instance. It's impossible to say how much space the code will take up, but you are correct in thinking there is only one copy of the code shared between all instances.
When you ask
I assume dlls get their own stack,
heap, code and data segments.
Th answer is that there really isn't much difference between data in a DLL and data in an app - basically they share everything between them, This has to be so when you think about about it - if they had different stacks (for example) how could function calls work?