C++ - contiguous memory and polymorphism - c++

I have a baseclass named GameObject from which other classes derive.
I am wondering if handling the memory allocation by allocating all derived classes of GameObjects in contiguous memory will improve performance.
I will end up iterating over all of them each game engine frame.
My question is, does contiguous memory storage in this case give me faster iteration times than mallocing memory without contiguation? In both cases, I have to keep a vector of pointers to the Game Objects since they will vary in size.

Iterating through objects in continuous memory likely works better because cache and locality. However, I recommend that you build the two systems and actually profile them. Good luck!

I'm not sure I understand the question. Are you asking if its better to pre-allocate all your objects in one giant block of memory and store the pointers to subsections of that in memory? If so please don't do that.
Instead of being faster you're more likely bound to slow things down since the system has to request contiguous memory of larger blocks instead of non-contiguous of smaller blocks. Keep in mind block allocation, paging, etc. You may request 100 megabytes of contiguous memory but its really not contiguous. A bunch of it is in disk and everything is broken up into pages anyways.
Then you're faced with the question of do you allocate all your GameObjects in one go to get contiguous memory or are you creating them on demand? Do you really want to pre-allocate for this one minor optimization? What happens if you need to create a new object and your contiguous memory block wasn't large enough? etc.
Really I'm just brainstorming potential problems here. Like the other comments said its a case of premature optimization.
Now, it would certainly be faster if you stored all your pointers in a contiguous array instead of a vector which grows and copies based on its current size, but even then unless you absolutely know the amount of game objects you're better off just allocating a sufficiently large vector so that it only grows once or twice.

Related

How to avoid dynamically allocating small objects?

I found that many times I only need a small std::map (say less than 10 keys), or a small std::vector containing only a few elements, and I think it's really a waste of performance to always dynamically allocate them, especially in structures like std::map<std::string, std::string>, std::vector<std::string>, there're really a lot dynamic allocation involved.
Any good advice? At least reduce the amount of dynamic allocation, better without sacrifying the ease of use. Thanks
You may use stack-allocated memory for small size data (as stack allocations are very fast, basically just a stack-pointer movement; although stack's space is precious and it's a very limited resource), and heap-allocated memory for larger size. In other words, think along the lines of the std::string's small string optimization.
Moreover, to speed up allocations, you could also preallocate big memory chunks on the heap, and then carve smaller allocations inside those chunks, again basically just increasing a pointer inside the chunk. For a sample implementation of this pool allocator technique, consider reading this blog post.
You will find this CppCon 2016 talk interesting as well:
High Performance Code 201: Hybrid Data Structures

Dealing with fragmentation in a memory pool?

Suppose I have a memory pool object with a constructor that takes a pointer to a large chunk of memory ptr and size N. If I do many random allocations and deallocations of various sizes I can get the memory in such a state that I cannot allocate an M byte object contiguously in memory even though there may be a lot free! At the same time, I can't compact the memory because that would cause a dangling pointer on the consumers. How does one resolve fragmentation in this case?
I wanted to add my 2 cents only because no one else pointed out that from your description it sounds like you are implementing a standard heap allocator (i.e what all of us already use every time when we call malloc() or operator new).
A heap is exactly such an object, that goes to virtual memory manager and asks for large chunk of memory (what you call "a pool"). Then it has all kinds of different algorithms for dealing with most efficient way of allocating various size chunks and freeing them. Furthermore, many people have modified and optimized these algorithms over the years. For long time Windows came with an option called low-fragmentation heap (LFH) which you used to have to enable manually. Starting with Vista LFH is used for all heaps by default.
Heaps are not perfect and they can definitely bog down performance when not used properly. Since OS vendors can't possibly anticipate every scenario in which you will use a heap, their heap managers have to be optimized for the "average" use. But if you have a requirement which is similar to the requirements for a regular heap (i.e. many objects, different size....) you should consider just using a heap and not reinventing it because chances are your implementation will be inferior to what OS already provides for you.
With memory allocation, the only time you can gain performance by not simply using the heap is by giving up some other aspect (allocation overhead, allocation lifetime....) which is not important to your specific application.
For example, in our application we had a requirement for many allocations of less than 1KB but these allocations were used only for very short periods of time (milliseconds). To optimize the app, I used Boost Pool library but extended it so that my "allocator" actually contained a collection of boost pool objects, each responsible for allocating one specific size from 16 bytes up to 1024 (in steps of 4). This provided almost free (O(1) complexity) allocation/free of these objects but the catch is that a) memory usage is always large and never goes down even if we don't have a single object allocated, b) Boost Pool never frees the memory it uses (at least in the mode we are using it in) so we only use this for objects which don't stick around very long.
So which aspect(s) of normal memory allocation are you willing to give up in your app?
Depending on the system there are a couple of ways to do it.
Try to avoid fragmentation in the first place, if you allocate blocks in powers of 2 you have less a chance of causing this kind of fragmentation. There are a couple of other ways around it but if you ever reach this state then you just OOM at that point because there are no delicate ways of handling it other than killing the process that asked for memory, blocking until you can allocate memory, or returning NULL as your allocation area.
Another way is to pass pointers to pointers of your data(ex: int **). Then you can rearrange memory beneath the program (thread safe I hope) and compact the allocations so that you can allocate new blocks and still keep the data from old blocks (once the system gets to this state though that becomes a heavy overhead but should seldom be done).
There are also ways of "binning" memory so that you have contiguous pages for instance dedicate 1 page only to allocations of 512 and less, another for 1024 and less, etc... This makes it easier to make decisions about which bin to use and in the worst case you split from the next highest bin or merge from a lower bin which reduces the chance of fragmenting across multiple pages.
Implementing object pools for the objects that you frequently allocate will drive fragmentation down considerably without the need to change your memory allocator.
It would be helpful to know more exactly what you are actually trying to do, because there are many ways to deal with this.
But, the first question is: is this actually happening, or is it a theoretical concern?
One thing to keep in mind is you normally have a lot more virtual memory address space available than physical memory, so even when physical memory is fragmented, there is still plenty of contiguous virtual memory. (Of course, the physical memory is discontiguous underneath but your code doesn't see that.)
I think there is sometimes unwarranted fear of memory fragmentation, and as a result people write a custom memory allocator (or worse, they concoct a scheme with handles and moveable memory and compaction). I think these are rarely needed in practice, and it can sometimes improve performance to throw this out and go back to using malloc.
write the pool to operate as a list of allocations, you can then extended and destroyed as needed. this can reduce fragmentation.
and/or implement allocation transfer (or move) support so you can compact active allocations. the object/holder may need to assist you, since the pool may not necessarily know how to transfer types itself. if the pool is used with a collection type, then it is far easier to accomplish compacting/transfers.

Dynamically allocate or waste memory?

I have a 2d integer array used for a tile map.
The size of the map is unknown and read in from a file at runtime. currently the biggest file is 2500 items(50x50 grid).
I have a working method of dynamic memory allocation from an earlier question but people keep saying that it a bad idea so I have been thinking whether or not to just use a big array and not fill it all up when using a smaller map.
Do people know of any pros or cons to either solution ? any advice or personal opinions welcome.
c++ btw
edit: all the maps are made by me so I can pick a max size.
Probably the easiest way is for example a std::vector<std::vector<int> > to allow it to be dynamically sized AND let the library do all the allocations for you. This will prevent accidentally leaking memory.
My preference would be to dynamically allocate. That way should you encounter a surprisingly large map you (hopefully) won't overflow if you've written it correctly, whereas with the fixed size your only option is to return an error and fail.
Presumably loading tile maps is a pretty infrequent operation. I'd be willing to bet too that you can't even measure a meaningful difference in speed between the two. Unless there is a measurable performance reduction, or you're actually hitting something else which is causing you problems the static sized one seems like a premature optimisation and is asking for trouble later on.
It depends entirely on requirements that you haven't stated :-)
If you want your app to be as blazingly fast as possible, with no ability to handle larger tile maps, then by all means just use a big array. For small PIC-based embedded systems this could be an ideal approach.
But, if you want your code to be robust, extensible, maintainable and generally suitable for a wider audience, use STL containers.
Or, if you just want to learn stuff, and have no concern about maintainability or performance, try and write your own dynamically allocating containers from scratch.
I believe the issue people refer to with dynamic allocation results from allocating randomly sized blocks of memory and not being able to effectively manage the random sized holes left when deallocated. If you're allocating fixed sized tiles then this may not be an issue.
I see quite a few people suggest allocating a large block of memory and managing it themselves. That might be an alternative solution.
Is allocating the memory dynamically a bottleneck in your program? Is it the cause of a performance issue? If not, then simply keep dynamic allocation, you can handle any map size. If yes, then maybe use some data structure that does not deallocate the memory it has allocated but rather use its old buffer and if needed, reallocate more memory.

What is more efficient stack memory or heap? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
C++ Which is faster: Stack allocation or Heap allocation
What is more efficient from memory allocation perspective - stack memory or heap memory? What it depends on?
Obviously there is an overhead of dynamic allocation versus allocation on the stack. Using heap involves finding a location where the memory can be allocated and maintaining structures. On the stack it is simple as you already know where to put the element. I would like to understand what is the overhead in worst case in milliseconds on supporting structures that allow for dynamic allocation?
Stack is usually more efficient speed-wise, and simple to implement!
I tend to agree with Michael from Joel on Software site, who says,
It is more efficient to use the stack
when it is possible.
When you allocate from the heap, the
heap manager has to go through what is
sometimes a relatively complex
procedure, to find a free chunk of
memory. Sometimes it has to look
around a little bit to find something
of the right size.
This is not normally a terrible amount
of overhead, but it is definitely more
complex work compared to how the stack
functions. When you use memory from
the stack, the compiler is able to
immediately claim a chunk of memory
from the stack to use. It's
fundamentally a more simple procedure.
However, the size of the stack is
limited, so you shouldn't use it for
very large things, if you need
something that is greater than
something like 4k or so, then you
should always grab that from the heap
instead.
Another benefit of using the stack is
that it is automatically cleaned up
when the current function exits, you
don't have to worry about cleaning it
yourself. You have to be much more
careful with heap allocations to make
sure that they are cleaned up. Using
smart pointers that handle
automatically deleting heap
allocations can help a lot with this.
I sort of hate it when I see code that
does stuff like allocates 2 integers
from the heap because the programmer
needed a pointer to 2 integers and
when they see a pointer they just
automatically assume that they need to
use the heap. I tend to see this with
less experienced coders somewhat -
this is the type of thing that you
should use the stack for and just have
an array of 2 integers declared on the
stack.
Quoted from a really good discussion at Joel on Software site:
stack versus heap: more efficiency?
Allocating/freeing on the stack is more "efficient" because it just involves incrementing/decrementing a stack pointer, typically, while heap allocation is generally much more complicated. That said, it's generally not a good idea to have huge things on your stack as stack space is far more limited than heap space on most systems (especially when multiple threads are involved as each thread has a separate stack).
These two regions of memory are optimized for different use cases.
The stack is optimized for the case where objects are deallocated in a FIFO order - that is, newer objects are always allocated before older objects. Because of this, memory can be allocated and deallocated quickly by simply maintaining a giant array of bytes, then handing off or retracting the bytes at the end. Because the the memory needed to store local variables for function calls is always reclaimed in this way (because functions always finish executing in the reverse order from which they were called), the stack is a great place to allocate this sort of memory.
However, the stack is not good at doing other sorts of allocation. You cannot easily deallocate memory allocated off the stack that isn't the most-recently allocated block, since this leads to "gaps" in the stack and complicates the logic for determining where bytes are available. For these sorts of allocations, where object lifetime can't be determined from the time at which the object is allocated, the heap is a better place to store things. There are many ways to implement the heap, but most of them rely somehow on the idea of storing a giant table or linked list of the blocks that are allocated in a way that easily allows for locating suitable chunks of memory to hand back to clients. When memory is freed, it is then added back in to the table or linked list, and possibly some other logic is applied to condense the blocks with other blocks. Because of this overhead from the search time, the heap is usually much, much slower than the stack. However, the heap can do allocations in patterns that the stack normally is not at all good at, hence the two are usually both present in a program.
Interestingly, there are some other ways of allocating memory that fall somewhere in-between the two. One common allocation technique uses something called an "arena," where a single large chunk of memory is allocated from the heap that is then partitioned into smaller blocks, like in the stack. This gives the benefit that allocations from the arena are very fast if allocations are sequential (for example, if you're going to allocate a lot of small objects that all live around the same length), but the objects can outlive any particular function call. Many other approaches exist, and this is just a small sampling of what's possible, but it should make clear that memory allocation is all about tradeoffs. You just need to find an allocator that fits your particular needs.
Stack is much more efficient, but limited in size. I think it's something like 1MByte.
When allocating memory on the heap, I keep in mind the figure 1000. Allocating on the heap is something like 1000 times slower than on the stack.

Faster to malloc multiple small times or few large times?

When using malloc to allocate memory, is it generally quicker to do multiple mallocs of smaller chunks of data or fewer mallocs of larger chunks of data? For example, say you are working with an image file that has black pixels and white pixels. You are iterating through the pixels and want to save the x and y position of each black pixel in a new structure that also has a pointer to the next and previous pixels x and y values. Would it be generally faster to iterate through the pixels allocating a new structure for each black pixel's x and y values with the pointers, or would it be faster to get a count of the number of black pixels by iterating through once, then allocating a large chunk of memory using a structure containing just the x and y values, but no pointers, then iterating through again, saving the x and y values into that array? I'm assuming certain platforms might be different than others as to which is faster, but what does everyone think would generally be faster?
It depends:
Multiple small times means multiple times, which is slower
There may be a special/fast implementation for small allocations.
If I cared, I'd measure it! If I really cared a lot, and couldn't guess, then I might implement both, and measure at run-time on the target machine, and adapt accordingly.
In general I'd assume that fewer is better: but there are size and run-time library implementations such that a (sufficiently) large allocation will be delegated to the (relatively slow) O/S. whereas a (sufficiently) small allocation will be served from a (relatively quick) already-allocated heap.
Allocating large blocks is more efficient; additionally, since you are using larger contiguous blocks, you have greater locality of reference, and traversing your in-memory structure once you've generated it should also be more efficient! Further, allocating large blocks should help to reduce memory fragmentation.
Generally speaking, allocating larger chunks of memory fewer times will be faster. There's overhead involved each time a call to malloc() is made.
Except speed issues there is also the memory fragmentation problem.
Allocating memory is work. The amount of work done when allocating a block of memory is typically independent of the size of the block. You work it out from here.
It's faster not to allocate in performance-sensitive code at all. Allocate the memory you're going to need once in advance, and then use and reuse that as much as you like.
Memory allocation is a relatively slow operation in general, so don't do it more often than necessary.
In general malloc is expensive. It has to find an appropriate memory chunk from which to allocate memory and keep track of non-contiguous memory blocks. In several libraries you will find small memory allocators that try to minimize the impact by allocating a large block and managing the memory in the allocator.
Alexandrescu deals with the problem in 'Modern C++ Design' and in the Loki library if you want to take a look at one such libs.
This question is one of pragmatism, I'm afraid; that is to say, it depends.
If you have a LOT of pixels, only a few of which are black then counting them might be the highest cost.
If you're using C++, which your tags suggest you are, I would strongly suggest using STL, somthing like std::vector.
The implementation of vector, if I remember correctly, uses a pragmatic approach to allocation. There are a few heuristics for allocation strategies, an informative one is this:
class SampleVector {
int N,used,*data;
public:
SampleVector() {N=1;used=0;data=malloc(N);}
void push_back(int i)
{
if (used>=N)
{
// handle reallocation
N*=2;
data=realloc(data,N);
}
data[used++]=i;
}
};
In this case, you DOUBLE the amount of memory allocated every time you realloc.
This means that reallocations progressively halve in frequency.
Your STL implementation will have been well-tuned, so if you can use that, do!
Another point to consider is how this interacts with threading. Using malloc many times in a threaded concurrent application is a major drag on performance. In that environment you are better off with a scalable allocator like the one used in Intel's Thread Building Blocks or Hoard. The major limitation with malloc is that there is a single global lock that all the threads contend for. It can be so bad that adding another thread dramatically slows down your application.
As already mentonned, malloc is costly, so fewer will probably be faster.
Also, working with the pixels, on most platforms will have less cache-misses and will be faster.
However, there is no guarantee on every platforms
Next to the allocation overhead itself, allocating multiple small chunks may result in lots of cache misses, while if you can iterate through a contiguous block, chances are better.
The scenario you describe asks for preallocation of a large block, imho.
Although allocating large blocks is faster per byte of allocated memory, it will probably not be faster if you artificially increase the allocation size only to chop it up yourself. You're are just duplicating the memory management.
Do an iteration over the pixels to count the number of them to be stored.
Then allocate an array for the exact number of items. This is the most efficient solution.
You can use std::vector for easier memory management (see the std::vector::reserve procedure). Note: reserve will allocate probably a little (probably up to 2 times) more memory then necessary.
"I can allocate-it-all" (really, I can!)
We can philosophy about some special implementations, that speed up small allocations considerably ... yes! But in general this holds:
malloc must be general. It must implement all different kinds of allocations. That is the reason it is considerably slow! It might be, that you use a special kinky-super-duper Library, that speeds things up, but also those can not do wonders, since they have to implement malloc in its full spectrum.
The rule is, when you have more specialized allocation coding, you are always faster then the broad "I can allocate-it-all" routine "malloc".
So when you are able to allocate the memory in bigger blocks in your coding (and it does not cost you to much) you can speed up things considerably. Also - as mentioned by others - you will get lot less fragmentation of memory, that also speeds things up and can cost less memory. You must also see, that malloc needs additional memory for every chunk of memory it returns to you (yes, special routines can reduce this ... but you don't know! what it does really unless you implemented it yourself or bought some wonder-library).