Should I worry about memory fragmentation with std::vector? If so, are there ways to help prevent it? I don't always predict for my programs to be running on a PC, they may also be running on embedded devices/game consoles, so I won't always be able to rely on virtual memory.
Then again I believe it would be more efficient to use a dynamically sized array rather than a static array, so that memory would only be allocated if needed. It would also simplify my programs' design process. Are there ways to achieve this efficiently?
Thanks for any advice!
The answer to your worries may be std::deque. It gives you a similar interface to that of std::vector, but works better with fragmented memory, since it allocates several small arrays instead of a large one. It is actually less efficient than std::vector in some aspects, but for your case it may be a good trade-off.
If your vector will be reallocated many times then yes, it can cause memory fragmentation.
The simplest way to avoid that would be using std::vector::reserve() if you more or less know how big your array can grow.
You can also consider using std::deque instead of vector, so you won't have problem with memory fragmentation at all.
Here is topic on stackoverflow which can be interesting for you: what-is-memory-fragmentation.
std::vector is only as good as new. It simply handles the underlying memory allocation for you
A couple of things you can do - assuming you don't want to write a whole new new handler.
Pre-allocate vectors or resize() if you know what eventual size they will be, this stops wasteful memory copies as they grow.
If you are going to be using the vector again with the same size, it's better to keep it and refill it than to delete it and recreate it.
Generally on embedded targets if you know the memory requirements it's best to statically allocate all the memory at the start and divide it up yourself - it's not like another user is going to want some.
You should always worry about performance and efficiency when your profiler tells you so (you can be that profiler, but you have to 'measure', not guess).
Things you can do:
pre-allocate capacity:
std::vector<int> x(1000); // size() is 1000
std::vector<int> y;
y.reserve(1000); // size() is 0, capacity is 1000
use a custom allocator
have a look at Boost Pool
have a look at EASTL (specialized for embedded/game programming)
STL Alternative
EASTL versus STL, how can there be such a performance difference in std::vector<uint64_t>::operator[]
The first option is clearly the quick win; The second option is more involved and I only recommend it when your heap profiler tells you that fragmentation is causing problems.
For heap profiling, I suggest
valgrind massif (see Using Massif and ms_print)
One good way to minimize repeated memory allocation and reallocation calls with std::vector is to make liberal use of std::vector::reserve() if you have some idea of how many elements your vector will be using. That will preallocate capacity and prevent the resizing of the internal array the vector is maintaining as you add elements via push_back().
No, std::vector guarantees contiguous storage. You can use vector::reserve() to avoid reallocations as the vector size increases though.
Related
I'm working on a largish project, and we are having some memory issues now. Vectors have been used for all arrays, and a quick search there seems to be about 2000 member vectors.
Going through the code it seems nobody has ever used a reserve or a swap (were not on C++11 yet for this project).
Are there any tools or techniques I can do to find out how much memory is being lost in these vectors?
use valgrind for debugging memory issues.
http://valgrind.org/docs/manual/ms-manual.html
One fast but dirty trick to see the effect of capacity on memory would be to modify
std::vector (or typedef std::vector to your custom vector type).
Idea is to modify vector to ensure that this custom new vector increases capacity exactly by what is needed instead of doubling it (yes, it will be super slow), and see how memory usage of the application changes when you run it with this custom vector.
While not useful in actually optimizing the code, it at least quickly gives you an idea of how much you can gain by optimizing vectors.
Just add some periodic logging lines that print the vector size, capacity and
sizeof(v) + sizeof(element_type) * v.capacity();
for each of your vectors v (this last will be the exact size of the vector in memory). You could register all your vectors somewhere central to keep this tidy.
Then you can do some analysis by searching through your logfiles - to see which ones are using significant amounts of memory and how the usage varies over time. If it is only peak usage that is high, then you may be able to 'resize' your vectors to get rid of the spare capacity.
I'm using a
vector<vector<size_t>> Ar;
structure. The contents of the structure change over time, and, in particular, the length of each of the nested vectors is random and changes in time. Order is important, and I cannot ignore the nested vector if it is empty. I know the maximum capacity of the nested vectors (say m) and outer vectors (say n).
I'm having some difficulty getting the initialization right. If I use
Ar(n);
there is no problem but I end up getting a memory fragmentation because the allocator does not know the size of nested vector. I would like to avoid this if possible, because I don't know what impact it will have as the size of the data I'm trying to handle increases. I try to get around the fragmentation by fixing the length of the nested vectors in advance to get a compact representation, but I'm having trouble doing this. I use
Ar(n,vector<size_t>(m));
but this is super slow and a massive waste of memory, because most of the entries will not be used.
I have successfully implemented this with a
vector<list<size_t> > Ar(n);
without suffering fragmentation, but it runs much slower than using a nested vector. A fixed representation such as a Boost::multi_array would take up too much space, for the same reason as the second initialization above, and it will be more difficult to implement because I would need to keep track of where the useful entries stop.
Any suggestions? Thanks in advance.
You don't know if memory fragmentation is a problem until you've profiled your code with a typical use case.
Unless m is very small in front of n, I think it won't be a bottleneck at all, since you still have mostly sequential memory accesses.
If you want to avoid it anyway, you could use reserve instead of resize or initialization with m objects. It will only allocate memory, without the overhead of constructing objects which will not be used, increasing initialization speed.
Moreover, reserveing capacity for the vectors will likely only consume virtual memory, not "real" memory, until you effectively use it.
And if you know the distribution of the inner vectors' size, use the mean value as the default length, it may help you reducing the waste of memory.
In any case, std::list is a bigger waste in space and a lot worse considering fragmentation.
Perhaps the resize function will help you. See here for details.
I have just read a question regarding initializing multidimensional vectors (question) and Viktor Sehr and Sbi reccomended instead using a single vector and getting the element with my_vector[x+y*100+z*100*100]. Why is this? Is it for performance reasons? If so, how does it improve performance? Thanks in advance, ell.
Edit: Do these reasons still apply when the width/height/depth are not the same and can change?
Just few reasons:
It wastes spaces, it is slow (unpredictable memory access, cache waste, etc), it's cumbersome
Main performance drawback is likely to be caching. With flat arrays you are guaranteed memory to be contiguous - cache is happy. With vector of vectors - who knows!
This advice is sound if you're looking at a bottleneck here. If memory usage or speed of access of this vector are not critical, just go down the easiest road.
You should have a look at Boost.MultiArray which gives you best of both worlds.
If for whatever reason you cannot use Boost, I'd definitely typedef it:
typedef vector<vector<vector<int> > > My3DIntVector;
My3DIntVector v;
…Viktor Sehr and Sbi reccomended instead using a single vector and getting the element with my_vector[x+y*100+z*100*100]. Why is this?
Given the dimensions, it's logical recommendation if the sizes are fixed.
Is it for performance reasons? If so, how does it improve performance?
Consider:
the number of allocations required to create all arrays
the time to copy even one dimension
the complexity it adds to the system's allocator
the time it takes to free
the complexity of common operations, such as filling
Edit: Do these reasons still apply when the width/height/depth are not the same and can change?
Resizing this (massive!) array can be extremely slow. You have to understand how your program will operate if you want it to be fastest. The copy and destruction complexity of elements is also a consideration (when using something more complex than int). If you do a lot of resizing or insertions/deletions, then the flattened vector can be very slow.
However, if its dimensions are fixed, you can do much better than std::vector. std::array is one alternative. (If you go the route of std::array, be careful of what you allocate on the stack)
The only thing I can imagine is that this is one big block of memory and thus prevents from fragmented memory. This is much easier to cache.
A vector<vector<vector<int> > > contains a lot of chunks of memory: a chunk for the first vector, a chunk for each element in vector<> and a chunk for each element in the vector<vector<>>. This is not easy to cache and may produce hardly predictable memory usage.
std::realloc is dangerous in c++ if the malloc'd memory contains non-pod types. It seems the only problem is that std::realloc wont call the type destructors if it cannot grow the memory in situ.
A trivial work around would be a try_realloc function. Instead of malloc'ing new memory if it cannot be grown in situ, it would simply return false. In which case new memory could be allocated, the objects copied (or moved) to the new memory, and finally the old memory freed.
This seems supremely useful. std::vector could make great use of this, possibly avoiding all copies/reallocations.
preemptive flame retardant: Technically, that is same Big-O performance, but if vector growth is a bottle neck in your application a x2 speed up is nice even if the Big-O remains unchanged.
BUT, I cannot find any c api that works like a try_realloc.
Am I missing something? Is try_realloc not as useful as I imagine? Is there some hidden bug that makes try_realloc unusable?
Better yet, Is there some less documented API that performs like try_realloc?
NOTE: I'm obviously, in library/platform specific code here. I'm not worried as try_realloc is inherently an optimization.
Update:
Following Steve Jessops comment's on whether vector would be more efficient using realloc I wrote up a proof of concept to test. The realloc-vector simulates a vector's growth pattern but has the option to realloc instead. I ran the program up to a million elements in the vector.
For comparison a vector must allocate 19 times while growing to a million elements.
The results, if the realloc-vector is the only thing using the heap the results are awesome, 3-4 allocation while growing to the size of million bytes.
If the realloc-vector is used alongside a vector that grows at 66% the speed of the realloc-vector The results are less promising, allocating 8-10 times during growth.
Finally, if the realloc-vector is used alongside a vector that grows at the same rate, the realloc-vector allocates 17-18 times. Barely saving one allocation over the standard vector behavior.
I don't doubt that a hacker could game allocation sizes to improve the savings, but I agree with Steve that the tremendous effort to write and maintain such an allocator isn't work the gain.
vector generally grows in large increments. You can't do that repeatedly without relocating, unless you carefully arrange things so that there's a large extent of free addresses just above the internal buffer of the vector (which in effect requires assigning whole pages, because obviously you can't have other allocations later on the same page).
So I think that in order to get a really good optimization here, you need more than a "trivial workaround" that does a cheap reallocation if possible - you have to somehow do some preparation to make it possible, and that preparation costs you address space. If you only do it for certain vectors, ones that indicate they're going to become big, then it's fairly pointless, because they can indicate with reserve() that they're going to become big. You can only do it automatically for all vectors if you have a vast address space, so that you can "waste" a big chunk of it on every vector.
As I understand it, the reason that the Allocator concept has no reallocation function is to keep it simple. If std::allocator had a try_realloc function, then either every Allocator would have to have one (which in most cases couldn't be implemented, and would just have to return false always), or else every standard container would have to be specialized for std::allocator to take advantage of it. Neither option is a great Allocator interface, although I suppose it wouldn't be a huge effort for implementers of almost all Allocator classes just to add a do-nothing try_realloc function.
If vector is slow due to re-allocation, deque might be a good replacement.
You could implement something like the try_realloc you proposed, using mmap with MAP_ANONYMOUS and MAP_FIXED and mremap with MREMAP_FIXED.
Edit: just noticed that the man page for mremap even says:
mremap() uses the Linux page table
scheme. mremap() changes the
mapping between
virtual addresses and memory pages. This can be used to implement
a very efficient
realloc(3).
realloc in C is hardly more than a convenience function; it has very little benefit for performance/reducing copies. The main exception I can think of is code that allocates a big array then reduces the size once the size needed is known - but even this might require moving data on some malloc implementations (ones which segregate blocks strictly by size) so I consider this usage of realloc really bad practice.
As long as you don't constantly reallocate your array every time you add an element, but instead grow the array exponentially (e.g. by 25%, 50%, or 100%) whenever you run out of space, just manually allocating new memory, copying, and freeing the old will yield roughly the same (and identical, in case of memory fragmentation) performance to using realloc. This is surely the approach that C++ STL implementations use, so I think your whole concern is unfounded.
Edit: The one (rare but not unheard-of) case where realloc is actually useful is for giant blocks on systems with virtual memory, where the C library interacts with the kernel to relocate whole pages to new addresses. The reason I say this is rare is because you need to be dealing with very big blocks (at least several hundred kB) before most implementations will even enter the realm of dealing with page-granularity allocation, and probably much larger (several MB maybe) before entering and exiting kernelspace to rearrange virtual memory is cheaper than simply doing the copy. Of course try_realloc would not be useful here, since the whole benefit comes from actually doing the move inexpensively.
I'm doing some graphics programming and I'm using Vertex pools. I'd like to be able to allocate a range out of the pool and use this for drawing.
Whats different from the solution I need than from a C allocator is that I never call malloc. Instead I preallocate the array and then need an object that wraps that up and keeps track of the free space and allocates a range (a pair of begin/end pointers) from the allocation I pass in.
Much thanks.
in general: you're looking for a memory mangager, which uses a (see wikipedia) memory pool (like the boost::pool as answered by TokenMacGuy). They come in many flavours. Important considerations:
block size (fixed or variable; number of different block sizes; can the block size usage be predicted (statistically)?
efficiency (some managers have 2^n block sizes, i.e. for use in network stacks where they search for best fit block; very good performance and no fragementation at the cost of wasting memory)
administration overhead (I presume that you'll have many, very small blocks; so the number of ints and pointers maintainted by the memory manager is significant for efficiency)
In case of boost::pool, I think the simple segragated storage is worth a look.
It will allow you to configure a memory pool with many different block sizes for which a best-match is searched for.
boost::pool does this for you very nicely!
Instead I preallocate the array and then need an object that wraps that up and keeps track of the free space and allocates a range (a pair of begin/end pointers) from the allocation I pass in.
That's basically what malloc() does internally (malloc() can increase the size of this "preallocated array" if it gets full, though). So yes, there is an algorithm for it. There are many, in fact, and Wikipedia gives a basic overview. Different strategies can work better in different situations. (E.g. if all the blocks are a similar size, or if there's some pattern to allocation and freeing)
If you have many objects of the same size, look into obstacks.
You probably don't want to write the code yourself, it's not an easy task and bugs can be painful.