what are the consequences (if any) of having big objects stored on the heap rather than in the stack? I remember reading that it was preferable to have the bigger objects on the stack to limit the heap fragmentation... is that true?
thanks
edit : question comes from a game I'm making where my basic object that will have all the informations about textures, entities etc will be most likely created on the heap, I don't really have any idea of its size, we could assume something like 300 MB
Generally no.
It depends on the implementation, but on many systems the stack is much more limited in size than the heap. Heap fragmentation is typically going to be an issue if you have a large number of (small) objects allocated on the heap. It also tends to be caused by certain patterns of allocation and deallocation.
You have to keep in mind that stack is limited. The size can be configured on some environment but it also has drawbacks. If your object are short lived, they can reside on the stack but to be able to keep them for a long time, you have to create them and keep calling function and pass them as parameters because when the scope ends, your object is going out the window.
Following your edit, there's no way you're going to store an object of 300 MB on the stack.
You should decide where to put objects based on what their storage duration should be more so than what their size will be; however, as the stack is fairly limited, creating a large object on it is sometimes not a good idea and it may be necessary/more future-proof to new it and put the pointer to it in a scoped_ptr.
If you have enough big objects to cause significant heap fragmentation, or if you have an object that is so big as to be a significant factor by itself (to be honest, I'm not sure this is even possible), are you sure your design is right? Note also that your objects are likely to be smaller than the storage your containers use, and that storage (except that of std::arrays) is all dynamically allocated, i.e. on the heap.
In general, large objects should be created on the heap. The stack should generally be used only for small objects relevant to a particular stack context.
Related
I'm trying to create a class in C++ with an idea of absolute encapsulation and efficiency for the sake of practice. In my case this means every data member is supposed to be inside the class with no pointers pointing outside (e.g. to dynamically allocated storage).
For example, I'm using
char name [10];
instead of
std::string name;
char* name;
My idea is that objects of the class are created as completely enclosed blocks on the stack. As well as that performance is increased, since, if I remember correctly, access to the stack is considerably faster than to the heap.
Am I correct in those assumptions?
And is this idea of absolute encapsulation sensible outside practice? (For example to ensure safety, since there seems to be no risk of memory mismanagement or buffer overflow)
access to the stack is considerably faster than to the heap
This is false: an access to memory is an access to memory. Two things might have confused you here.
First, it is true that different types of memory can be accessed at different speeds. For example, the disk is usually the slowest (without talking about networking, which complicates things even further), while registers are usually the fastest. In between is the main memory, or RAM, where both the stack and the heap live. And then you can have caches, different types of disks, and so on.
Second, stack allocation is indeed faster than heap allocation, just because the allocation scheme is simpler. With the stack, as the name implies, you can only allocate and deallocate at the end, meaning you need to follow a specific order. With the heap, you can allocate pretty much anywhere, meaning that you can deallocate at any point and in any order. This implies some kind of management of the memory that comes with its own problems, for example fragmentation.
is this idea of absolute encapsulation sensible outside practice?
First of all, only using the stack is impossible in practice simply because of its limited size. While this size can vary in practice, it's unlikely to be more than 8MB currently. As soon as you need to load a file larger than that, you cannot do it on the stack.
However, even if stack size was practically unlimited, you still need to deallocate things in the reverse order that you allocated them, otherwise it no longer is a stack. Many things are infeasible that way. For example, as soon as you want interactivity, you need some sort of event processing (to respond to user input), and this is usually done with a queue, which is like the opposite of a stack. Sure you could allocate an insanely large queue, but that's infeasible in practice. Another example that comes to mind is networking. If you want to deal with multiple connections at once (like a web browser for example), you need to deal with the memory associated to each one independantly. Again, you could allocate an insane amount of memory to each connection, but again, that's infeasible in practice.
Also, note that encapsulation does not mean "no pointers to dynamically allocated memory". Instead, "hidden memory management" would be closer to the meaning of this concept.
I know that local variables will be stored on the stack orderly.
but, when i dynamically allocate variable in the heap memory in c++ like this.
int * a = new int{1};
int * a2 = new int{2};
int * a3 = new int{3};
int * a4 = new int{4};
Question 1 : are these variable stored in contiguous memory location?
Question 2 : if not, is it because dynamic allocation store variables in random location in the heap memory?
Question3 : so does dynamic allocation increase possibility of cache miss and has low spatial locality?
Part 1: Are separate allocations contiguous?
The answer is probably not. How dynamic allocation occurs is implementation dependent. If you allocate memory like in the above example, two separate allocations might be contiguous, but there is no guarantee of this happening (and it should never be relied on to occur).
Different implementations of c++ use different algorithms for deciding how memory is allocated.
Part 2: Is allocation random?
Somewhat; but not entirely. Memory doesn’t get allocated in an intentionally random fashion. Oftentimes memory allocators will try to allocate blocks of memory near each other in order to minimize page faults and cache misses, but it’s not always possible to do so.
Allocation happens in two stages:
The allocator asks for a large chunk of memory from the OS
The takes pieces of that large chunk and returns them whenever you call new, until you ask for more memory than it has to give, in which case it asks for another large chunk from the OS.
This second stage is where an implementation can make attempts to give things you memory that’s near other recent allocations, however it has little control over the first stage (and the OS usually just provides whatever memory is available, without any knowledge of other allocations by your program).
Part 3: avoiding cache misses
If cache misses are a bottleneck in your code,
Try to reduce the amount of indirection (by having arrays store objects by value, rather than by pointer);
Ensure that the memory you’re operating on is as contiguous as the design permits (so use a std::array or std::vector, instead of a linked list, and prefer a few big allocations to lots of small ones); and
Try to design the algorithm so that it has to jump around in memory as little as possible.
A good general principle is to just use a std::vector of objects, unless you have a good reason to use something fancier. Because they have better cache locality, std::vector is faster at inserting and deleting elements than std::list, even up to dozens or even hundreds of elements.
Finally: try to take advantage of the stack. Unless there’s a good reason for something to be a pointer, just declare as a variable that lives on the stack. When possible,
Prefer to use MyClass x{}; instead of MyClass* x = new MyClass{};, and
Prefer std::vector<MyClass> instead of std::vector<MyClass*>.
By extension, if you can use static polymorphism (i.e, templates), use that instead of dynamic polymorphism.
IMHO this is Operating System specific / C++ standard library implementation.
new ultimately uses lower-level virtual memory allocation services and allocating several pages at once, using system calls like mmap and munmap. The implementation of new could reuse previously freed memory space when relevant.
The implementation of new could use various and different strategies for "large" and "small" allocations.
In the example you gave the first new results in a system call for memory allocation (usually several pages), the allocated memory could be large enough so that subsequent new calls results in contiguous allocation..But this depends on the implementation
In short:
not at all (there is padding due to alignment, heap housekeeping data, allocated chunks may be reused, etc.),
not at all (AFAIK, heap algorithms are deterministic without any randomness),
generally yes (e.g., memory pooling might help here).
I'm studying for my data organization final and I'm going over stacks and heaps because I know they will be on the final and I'm going to need to know the differences.
I know what the Stack is and what the Heap is.
But I'm confused on what a stack is and what a heap is.
The Stack is a place in the RAM where memory is stored, if it runs out of space, a stackoverflow occurs. Objects are stored here by default, it reallocates memory when objects go out of scope, and it is faster.
The Heap is a place in the RAM where memory is stored, if it runs out of space, the OS will assign it more. For an object to be stored on the Heap it needs to be told by using the, new, operator, and will only be deallocated if told. fragmentation problems can occur, it is slower then the Stack, and it handles large amounts of memory better.
But what is a stack, and what is a heap? is it the way memory is stored? for example a static array or static vector is a stack type and a dynamic array, linked list a heap type?
Thank you all!
"The stack" and "the heap" are memory lumps used in a specific way by a program or operating system. For example, the call stack can hold data pertaining to function calls and the heap is a region of memory specifically used for dynamically allocating space.
Contrast these with stack and heap data structures.
A stack can be thought of as an array where the last element in will be the first element out. Operations on this are called push and pop.
A heap is a data structure that represents a special type of graph where each node's value is greater than that of the node's children.
On a side note, keep in mind that "the stack" or "the heap" or any of the stack/heap data structures are unique to any given programming language but are simply concepts in the field of computer science.
I won't get into virtual memory (read about that if you want) so let's simplify and say you have RAM of some size.
You have your code with static initialized data, with some static uninitialized data (static in C++ means like global vars). You have your code.
When you compile something compiler (and linker) will organize and translate your code to machine code (byte code, ones and zeroes) in a following way:
Binary file (and object files) is organized into segments (portions of RAM).
First you have DATA segment. This is the segment that contains values of initialized variables. so if u have variables i.e. int a=3, b = 4 they will go to DATA segment (4 bytes of RAM containing 00000003h, and other 4 bytes containing 000000004h, hexadecimal notation). They are stored consecutively.
Then you have Code segment. All your code is translated into machine code (1s and 0s) and stored in this segment consecutively.
Then you have BSS segment. There goes uninitialized global vars (all static vars that weren't initialized).
Then you have STACK segment. This is reserved for stack. Stack size is determined by operating system by default. You can change this value but i won't get into this now. All local variables go here. When you call some function first func args are pushed to stack, then return address (where to come back when u exit function), then some computer registers are pushed here, and finally all local variables declared in the function get their reserved space on stack.
And you have HEAP segment. This is part of the RAM (size is also determined by OS) where the objects and data are stored using operator new.
Then all of the segments are piled one after the other DATA, CODE, BSS, STACK, HEAP. There are some other segments, but they are not of interest here, and that is loaded in RAM by the operating system. Binary file also has some headers containing information from which location (address in memory) your code begins.
So in short, they are all parts of RAM, since everything that is being executed is loaded into RAM (can't be in ROM (read only), nor HDD since HDD its just for storing files.
When specifically referring to C++'s memory model, the heap and stack refer to areas of memory. It is easy to confuse this with the stack data structure and heap data structure. They are, however, separate concepts.
When discussing programming languages, stack memory is called 'the stack' because it behaves like a stack data structure. The heap is a bit of a misnomer, as it does not necessarily (or likely) use a heap data structure. See Why are two different concepts both called "heap"? for a discussion of why C++'s heap and the data structure's names are the same, despite being two different concepts.
So to answer your question, it depends on the context. In the context of programming languages and memory management, the heap and stack refer to areas of memory with specific properties. Otherwise, they refer to specific data structures.
The technical definition of "a stack" is a Last In, First Out (LIFO) data structure where data is pushed onto and pulled off of the top. Just like with a stack of plates in the real world, you wouldn't pull one out from the middle or bottom, you [usually] wouldn't pull data out of the middle of or the bottom of a data structure stack. When someone talks about the stack in terms of programming, it can often (but not always) mean the hardware stack, which is controlled by the stack pointer register in the CPU.
As far as "a heap" goes, that generally becomes much more nebulous in terms of a definition everyone can agree on. The best definition is likely "a large amount of free memory from which space is allocated for dynamic memory management." In other words, when you need new memory, be it for an array, or an object created with the new operator, it comes from a heap that the OS has reserved for your program. This is "the heap" from the POV of your program, but just "a heap" from the POV of the OS.
The important thing for you to know about stacks is the relationship between the stack and function/method calls. Every function call reserves space on the stack, called a stack frame. This space contains your auto variables (the ones declared inside the function body). When you exit from the function, the stack frame and all the auto variables it contains disappear.
This mechanism is very cheap in terms of CPU resources used, but the lifetime of these stack-allocated variables is obviously limited by the scope of the function.
Memory allocations (objects) on the heap, on the other hand, can live "forever" or as long as you need them without regards to the flow of control of your program. The down side is since you don't get automatic lifetime management of these heap allocated objects, you have to either 1) manage the lifetime yourself, or 2) use special mechanisms like smart pointers to manage the lifetime of these objects. If you get it wrong your program has memory leaks, or access data that may change unexpectedly.
Re: Your question about A stack vs THE stack: When you are using multiple threads, each thread has a separate stack so that each thread can flow into and out of functions/methods independently. Most single threaded programs have only one stack: "the stack" in common terminology.
Likewise for heaps. If you have a special need, it is possible to allocate multiple heaps and choose at allocation time which heap should be used. This is much less common (and a much more complicated topic than I have mentioned here.)
Whilst asking another question (and also before) I was wondering how do I judge whether to create an object on the heap or keep it as an object on the stack? What should I ask myself about the object to make the correct allocation?
Put it on the heap if you have to, the stack if you can.
What kinds of things do you need to put on the heap? Anything of varying length. Any object that might need to be null. Anything that's very large, lest you cause a stack overflow.
Simple answer.
When it goes out of scope, do you want it to hang around and be able to use it?
Depends on intended lifetime of the object.
If you want the object to be alive even after function returns, then HEAP, else STACK
If an object is placed in the HEAP, then it must be explicitly free()'ed or deleted by the programmer, once its usage is over; otherwise the program will be leaking memory.
Stack memory is fast. It is fast because (a) there is no system overhead to allocate the memory - the allocation is done by simply moving the stack pointer in one instruction and (b) the memory in the stack is "hot" so it is already in cache. Heap memory is slow because (a) it requires a lot of system work to look around and find a free chunk of memory and (b) is probably not in cache and will require evicting some data you might have wanted.
Stack memory doesn't get fragmented. It is possible that a heap eventually gets so fragmented, you can't allocate anything (even though ironically there is still enough unused memory!)
For long lived data and for large data (multi KB or more), you have to use a heap.
The danger of allocating a bigger stack is that it might hurt you if are running multiple threads. You have to size the stack for the "worst case" usage. Each thread requires its own stack. On a high core count machine (where you might have 200+ threads running), you may not want to arbitrarily increase the stack. The heap on the other hand does not need to be sized for "worst case" usage - it is much more efficient.
Two reasons to use the heap:
1- You want the data after the current scope.
2- You want to reserve large memory.
Other than that stay on stack.
Note: don't reserve a lot of memory on the stack, or you'll get a "Stack-overflow" ;)
I have an object(A) which has a list composed of objects (B). The objects in the list(B) are pointers, but should the list itself be a pointer? I'm migrating from Java to C++ and still haven't gotten fully accustomed to the stack/heap. The list will not be passed outside of class A, only the elements in the list. Is it good practice to allocate the list itself on the heap just in case?
Also, should the class that contains the list(A) also be on the heap itself? Like the list, it will not be passed around.
Bear in mind that
The list would only be on the stack if Object-A was also on the stack
Even if the list itself is not on the heap, it may allocate its storage from the heap. This is how std::list, std::vector and most C++ lists work – the reason is that stack-based elements cannot grow.
These days most stacks are around 1mb, so you'd need a pretty big list of pretty big objects before you need to worry about it. Even if your stack was only about 32kb you could store close to eight thousand pointers before it would be an issue.
IMO people new to the explicit memory management in C/C++ can have a tendency to overthink these things.
Unless you're writing something that you know will have thousands of sizable objects just put the list on the stack. Unless you're using giant C-style arrays in a function the chances are the memory used by the list will end up in the heap anyway due to #1 and #2 above.
You're better off storing a list, if it can grow, on the heap. Since you never know what the runtime stack will be, overflow is a real danger, and the consequences are fatal.
If you absolutely know the upper bound of the list, and it's small compared to the size of your stack, you can probably get away with stack allocating the list.
I work in environments where the stack can be small and heap fragmentation needs to be avoided, so I'd use these rules:
If the list is small and a known fixed size, stack.
If the list is small and an unknown fixed size, you can consider both the heap and alloca(). Using the heap would be a fine choice if you can guarantee that your function doesn't allocate anything on the heap during the duration your allocation is going to be on there. If you can't guarantee this, you're asking for a fragment and alloca() would be a better choice.
If the list is large or will need to grow, use the heap. If you can't guarantee it won't fragment, we tend to have some recourses for this built into our memory manager such as top-down allocations and separate heaps.
Most situations don't call for people to worry about fragmentation, in which case they'd probably not recommend the usage of alloca.
With respect to the class containing the list, if it's local to the function scope I would put it on the stack provided that the internal data structures are not extremely large.
What do you mean by "list". If it's std::list (or std::vector or any other STL container) then it's not going to be storing anything on the stack so don't worry.
If you're in any doubt, look at sizeof(A) and that tells you how much memory it will use when it's on the stack.
But ... the decision should mainly be based on the lifetime of the object. Stack-based objects are destroyed as soon as they go out of scope.
The stack is always simplest. Unfortunately it has one big drawback - you have to know the number of elements ahead of time.