Using STL Allocator with STL Vectors - c++

Here's the basic problem. There's an API which I depend on, with a method using the following syntax:
void foo_api (std::vector<type>& ref_to_my_populated_vector);
The area of code in question is rather performance intensive, and I want to avoid using the heap to allocate memory. As a result, I created a custom allocator which allocates the memory required for the vector on the stack. So, I can now define a vector as:
// Create the stack allocator, with room for 100 elements
my_stack_allocator<type, 100> my_allocator;
// Create the vector, specifying our stack allocator to use
std::vector<type, my_stack_allocator> my_vec(my_allocator);
This is all fine. Performance tests using the stack allocated vector compared to the standard vector show performance is roughly 4x faster. The problem is, I can't call foo_api! So...
foo_api(my_vec); // Results in an error due to incompatible types.
// Can't convert std::vector<type> to std::vector<type, allocator>
Is there a solution to this?

You have to use the default allocator just as the function expects. You have two different types, and there's no way around that.
Just call reserve prior to operating on the vector to get the memory allocations out of the way.
Think about the bad things that could happen. That function may take your vector and start adding more elements. Soon, you could over-flow the stack space you've allocated; oops!
If you're really concerned about performance, a much better route is to replace operator new and kin with a custom memory manager. I have done so and allocations can be hugely improved. For me, allocating sizes of size 512 or less is about 4 operations (move a couple pointers around); I used a pool allocator)

Related

Preallocate memory for dynamic data structure

I have a question/curiosity.
Let's say I want to implement a list, and for example I could basically use the cormen book approach. Where it is explained how to implement, insert, delete, key search etc.
However nothing is said for what the memory use is concerned. For example if I would like to insert an integer, in a list of integers. I could for example first create a node (I allocate memory there) insert the integer and then insert the node in the list. If I would like to delete an integers, once I know in which node is stored, I have to free the memory.
I was now wondering if instead it would be more convenient to preallocate memory to store, say, 10 nodes and keeping a pointer to a free node to be used. If the memory pool is full then I reallocate memory for 20 nodes, if the pool is the large I half the size of such pool (and so on and so forth). The pool is of course more complicated to manage since I'd need for example to handle possible memory fragmentation etc.
Does what I'm saying make any sense? Or is it no sense? I've read in a book, for game programming, that memory preallocation could improve performance, but I was wondering how.
This is both a simple and a complex question. If you operate within standard problems, you don't really need to worry about memory allocation. For example, preallocating memory for 10 nodes won't be efficient in any scale, and your performance problems might be elsewhere. However, if your program constantly allocates and deallocates hundreds or thousands of small objects per second, it could lead to memory fragmentation, and you might need to write your custom allocator.
Almost no standard containers don't have any methods to preallocate elements storage, except for std::vector::reserve function. All of them, however, allow to use custom allocators in constructors. Also, there's placement new operator.
You could try to experiment with such things, they're fun to write, just don't use them in production if you absolutely don't have to.
I was now wondering if instead it would be more convenient to preallocate memory to store, say, 10 nodes and keeping a pointer to a free node to be used.
You basically are describing what a pool allocator usually does (I assume you are talking about nodes of constant size). So, the short answer to your question is: yes you would improve performance by using a pool allocator with a list container.
Memory allocators shipped with common compilers are quite good for general purpose allocation (i.e. for allocation of random size objects). However, when your need is to allocate objects of constant size, you should consider using a custom pool allocator. You can easily understand why a constant size objects allocator performs faster than the standard one.
You might write your own pool allocator, however it's not an easy task and you should better consider using an existing one, such as boost pool_allocator or fast_pool_allocator.

Is it possible to implement a memory pool that works with arrays instead of single objects?

I know it's easy to make a memory pool for single objects, however I need to make a memory pool for arrays. The memory pool I have currently has a vector of addresses to contiguous memory blocks and a stack that points to each object from these blocks, so when you allocate from the pool you just pop the stack and when you free, you just push an object's address back to it. However I also need an array equivalent. Something like this:
template<typename T>
class ArrayPool
{
public:
ArrayPool();
~ArrayPool();
T* AllocateArray(int x); //Returns a pointer to a T array that contains 'x' elements.
void FreeArray(T* arr, int x); //Returns the array to the free address list/stack/whatever/
};
Has such a thing been implemented? I imagine a big problem from having such a pool - if make sure arrays returned by ALlocateArray are contiguous in memory, I'm basically doing the same as if not having a memorypool. Just allocating arrays on the spot. With the normal object pool every time I just allocate 1 object. With the arrays I may allocate a different sized array every time, so once an array is freed, it won't be compatible with a new one of different size, unless I stich arrays together with some linkedlist-like structure, but then they won't be contiguous.
Currently your allocator takes advantage of the fact that all allocations are the same size. This simplifies and speeds up allocation and freeing, and means memory fragmentation is impossible.
If you have to allocate arrays of any size, then what you want is a general-purpose allocator, not a pool allocator. What to do next depends why you're using a pool allocator in the first place. I can think of two other features of a pool allocator that might be relevant, and there may be others:
all memory comes from a particular region specified when you create the pool
all memory can be freed at once without freeing each individual allocation, by resetting the pool.
If you don't need any special features of controlling allocation yourself then just use vector or global operator new or malloc to allocate your memory. If you do need special features then you'll probably want to take an allocator off the shelf rather than implementing your own. If you really want to get into the details of how a good memory allocator works then look at http://g.oswego.edu/dl/html/malloc.html and perhaps adapt it to your use.
But if you really need to hand-roll an allocator for limited purposes, then the basic idea is that instead of a list of free nodes from which you can always take the first, you need some data structure (your choice what) containing free blocks of different sizes, that allows you to quickly find a block that's big enough to satisfy the current request. In the case where it's much bigger you might choose to split the block, return part of it, and keep the rest as a new smaller free block. In the case where two free blocks are adjacent you might choose to merge them into a single larger free block.
One common strategy is to keep pool-like lists of blocks of certain sizes (for example 16, 32, 64...). If the request is small enough, satisfy it using one of these. If not, do something more complex. But as I say, if you want to see a lot of tricks working together then look at dlmalloc.
What you could do is having fixed sizes and only work on those. For example 400st 32 byte arrays, 200 128b, 100 1024b, 50 8096b or something like that. When something ask for an array of size N you match to the closest size with a free array.
How many you need to each size is probably up for a lot of tweaking.
That would allow you to re-use arrays much more freely than allowing custom sizes.
What exactly are you trying to win from this? Why isn't it enough just to treat each array as an object? Unless you are direly strapped for memory or the time to construct the array elements is really excessive and not to be wasted, this sounds like a classic case of premature optimization. And if the above are your problems, I'd explore other data structures (not arrays) first before plunging into this.
Your time (getting this working and its quirks ironed out will be a week or so, methinks) is way more valuable than a few pennies of computer time or memory saved.

C++ layout of an object containing containers

As someone with a lot of assembler language experience and old habits to lose, I recently did a project in C++ using a lot of the features that c++03 and c++11 have to offer (mostly the container classes, including some from Boost). It was surprisingly easy - and I tried wherever I could to favor simplicity over premature optimization. As we move into code review and performance testing I'm sure some of the old hands will have aneurisms at not seeing exactly how every byte is manipulated, so I want to have some advance ammunition.
I defined a class whose instance members contain several vectors and maps. Not "pointers to" vectors and maps. And I realized that I haven't got the slightest idea how much contiguous space my objects take up, or what the performance implications might be for frequently clearing and re-populating these containers.
What does such an object look like, once instantiated?
Formally, there aren't any constraints on the implementation
other than those specified in the standard, with regards to
interface and complexity. Practically, most, if not all
implementations derive from the same code base, and are fairly
similar.
The basic implementation of vector is three pointers. The
actual memory for the objects in the vector is dynamically
allocated. Depending on how the vector was "grown", the dynamic
area may contain extra memory; the three pointers point to the
start of the memory, the byte after the last byte currently
used, and the byte after the last byte allocated. Perhaps the
most significant aspect of the implementation is that it
separates allocation and initialization: the vector will, in
many cases, allocate more memory than is needed, without
constructing objects in it, and will only construct the objects
when needed. In addition, when you remove objects, or clear the
vector, it will not free the memory; it will only destruct the
objects, and will change the pointer to the end of the used
memory to reflect this. Later, when you insert objects, no
allocation will be needed.
When you add objects beyond the amount of allocated space,
vector will allocate a new, larger area; copy the objects into
it, then destruct the objects in the old space, and delete it.
Because of the complexity constrains, vector must grow the area
exponentially, by multiplying the size by some fixed constant
(1.5 and 2 are the most common factors), rather than by
incrementing it by some fixed amount. The result is that if you
grow the vector from empty using push_back, there will not be
too many reallocations and copies; another result is that if you
grow the vector from empty, it can end up using almost twice as
much memory as necessary. These issues can be avoided if you
preallocate using std::vector<>::reserve().
As for map, the complexity constraints and the fact that it must
be ordered mean that some sort of balanced tree must be used.
In all of the implementations I know, this is a classical
red-black tree: each entry is allocated separately, in a node
which contains two or three pointers, plus maybe a boolean, in
addition to the data.
I might add that the above applies to the optimized versions of
the containers. The usual implementations, when not optimized,
will add additional pointers to link all iterators to the
container, so that they can be marked when the container does
something which would invalidate them, and so that they can do
bounds checking.
Finally: these classes are templates, so in practice, you have
access to the sources, and can look at them. (Issues like
exception safety sometimes make the implementations less
straight forward than we might like, but the implementations
with g++ or VC++ aren't really that hard to understand.)
A map is a binary tree (of some variety, I believe it's customarily a Red-Black tree), so the map itself probably only contains a pointer and some housekeeping data (such as the number of elements).
As with any other binary tree, each node will then contain two or three pointers (two for "left & right" nodes, and perhaps one to the previous node above to avoid having to traverse the whole tree to find where the previous node(s) are).
In general, vector shouldn't be noticeably slower than a regular array, and certainly no worse than your own implementation of a variable size array using pointers.
A vector is a wrapper for an array. The vector class contains a pointer to a contiguous block of memory and knows its size somehow. When you clear a vector, it usually retains its old buffer (implementation-dependent) so that the next time you reuse it, there are less allocations. If you resize a vector above its current buffer size, it will have to allocate a new one. Reusing and clearing the same vectors to store objects is efficient. (std::string is similar). If you want to find out exactly how much a vector has allocated in its buffer, call the capacity function and multiply this by the size of the element type. You can call the reserve function to manually increase the buffer size, in expectation of the vector taking more elements shortly.
Maps are more complicated so I don't know. But if you need an associative container, you would have to use something complicated in C too, right?
Just wanted to add to the answers of others few things that I think are important.
Firstly, the default (in implementations I've seen) sizeof(std::vector<T>) is constant and made up of three pointers. Below is excerpt from GCC 4.7.2 STL header, the relevant parts:
template<typename _Tp, typename _Alloc>
struct _Vector_base
{
...
struct _Vector_impl : public _Tp_alloc_type
{
pointer _M_start;
pointer _M_finish;
pointer _M_end_of_storage;
...
};
...
_Vector_impl _M_impl;
...
};
template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
class vector : protected _Vector_base<_Tp, _Alloc>
{
...
};
That's where the three pointers come from. Their names are self-explanatory, I think. But there is also a base class - the allocator. Which takes me to my second point.
Secondly, std::vector< T, Allocator = std::allocator<T>> takes second template parameter that is a class that handles memory operations. It's through functions of this class vector does memory management. There is a default STL allocator std::allocator<T>>. It has no data-members, only functions such as allocate, destroy etc. It bases its memory handling around new/delete. But you can write your own allocator and supply it to the std::vector as second template parameter. It has to conform to certain rules, such as functions it provides etc, but how the memory management is done internally - it's up to you, as long as it does not violate logic of std::vector relies on. It might introduce some data-members that will add to the sizeof(std::vector) through the inheritance above. It also gives you the "control over each bit".
Basically, a vector is just a pointer to an array, along with its capacity (total allocated memory) and size (actually used elements):
struct vector {
Item* elements;
size_t capacity;
size_t size;
};
Of course thanks to encapsulation all of this is well hidden and the users never get to handle the gory details (reallocation, calling constructors/destructors when needed, etc) directly.
As to your performance questions regarding clearing, it depends how you clear the vector:
Swapping it with a temporary empty vector (the usual idiom) will delete the old array: std::vector<int>().swap(myVector);
Using clear() or resize(0) will erase all the items and keep the allocated memory and capacity unchanged.
If you are concerned about efficiency, IMHO the main point to consider is to call reserve() in advance (if you can) in order to pre-allocate the array and avoid useless reallocations and copies (or moves with C++11). When adding a lot of items to a vector, this can make a big difference (as we all know, dynamic allocation is very costly so reducing it can give a big performance boost).
There is a lot more to say about this, but I believe I covered the essential details. Don't hesitate to ask if you need more information on a particular point.
Concerning maps, they are usually implemented using red-black trees. But the standard doesn't mandate this, it only gives functional and complexity requirements so any other data structure that fits the bill is good to go. I have to admit, I don't know how RB-trees are implemented but I guess that, again, a map contains at least a pointer and a size.
And of course, each and every container type is different (eg. unordered maps are usually hash tables).

Is there an STL Allocator that will not implicitly free memory?

Memory usage in my STL containers is projected to be volatile - that is to say it will frequently shrink and grow. I'm thinking to account for this by specifying an allocator to the STL container type declarations. I understand that pool allocators are meant to handle this type of situation, but my concern is that the volatility will be more than the pool accounts for, and to overcome it I would have to do a lot of testing to determine good pool metrics.
My ideal allocator would never implicitly release memory, and in fact is perfectly acceptable if memory is only ever released upon destruction of the allocator. A member function to explicitly release unused memory would be nice, but not necessary. I know that what I'm referring to sounds like a per-object allocator and this violates the standard. I'd rather stick with the standard, but will abandon it if I can't resolve this within it.
I am less concerned with initial performance and more with average performance. Put another way, it matters less whether a single element or a pool of them is allocated at a time, and more whether said allocation results in a call to new/malloc. I have no problem writing my own allocator, but does anyone know of a preexisting one that accomplishes this? If it makes a difference, this will be for contiguous memory containers (e.g. vector, deque), although a generalized solution would be nice.
I hope this isn't too basic.
Memory will be allocated and freed more for adding items than removing them.
I believe that never "freeing" memory isn't possible unless you know the maximum number of elements allowed by your application. The CRT might try to allocate a larger block of memory in place, but how would you handle the failure cases?
Explaination:
To create a dynamically expanding vector, there will be a larger capacity to handle most push_backs, and a reallocation to handle when this is insufficient.
During the reallocation, a new larger piece of memory is "newed" up
and the elements of the old piece of memory are copied into the new
one.
It is important that you don't hold any iterators while to push_back
elements, because the reallocation will invalidate the memory
locations the iterator point to.
In c++11 and TR1, you might have perfect forwarding where only the
pointer to the elements need to be copied. This is done via a move
constructor instead of a copy constructor.
However, you seem to want to avoid reallocation as much as possible.
Using the default allocator for vector you can specify an initial capacity.
Capacity is the memory allocated and size is the number of elements.
Memory will only be allocated at construction and if the size reaches
capacity. This should only happen with a push_back();
The default allocator will increase the capacity by a multiple (eg.
1.5, 2.0) so that reallocation takes place in linear time. So if you have a loop that pushes back data it's linear. Or if you know the number of elements ahead of time you can allocate once.
There are pool concepts you can explore. The idea with a pool is you are not destroying elements but deactivating them.
If you still want to write your own allocator, this is a good article.
custom allocators

shrinking a vector

I've got a problem with my terrain engine (using DirectX).
I'm using a vector to hold the vertices of a detail block.
When the block increases in detail, so the vector does.
BUT, when the block decreases its detail, the vector doesn't shrink in size.
So, my question: is there a way to shrink the size of a vector?
I did try this:
vertexvector.reserve(16);
If you pop elements from a vector, it does not free memory (because that would invalidate iterators into the container elements). You can copy the vector to a new vector, and then swap that with the original. That will then make it not waste space. The Swap has constant time complexity, because a swap must not invalidate iterators to elements of the vectors swapped: So it has to just exchange the internal buffer pointers.
vector<vertex>(a).swap(a);
It is known as the "Shrink-to-fit" idiom. Incidentally, the next C++ version includes a "shrink_to_fit()" member function for std::vector.
The usual trick is to swap with an empty vector:
vector<vertex>(vertexvector.begin(), vertexvector.end()).swap(vertexvector);
The reserved memory is not reduced when the vector size is reduced because it is generally better for performance. Shrinking the amount of memory reserved by the vector is as expensive as increasing the size of the vector beyond the reserved size, in that it requires:
Ask the allocator for a new, smaller memory location,
Copy the contents from the old location, and
Tell the allocator to free the old memory location.
In some cases, the allocator can resize an allocation in-place, but it's by no means guaranteed.
If you have had a very large change in the size required, and you know that you won't want that vector to expand again (the principal of locality suggests you will, but of course there are exceptions), then you can use litb's suggested swap operation to explicitly shrink your vector:
vector<vertex>(a).swap(a);
There is a member function for this, shrink_to_fit. Its more efficient than most other methods since it will only allocate new memory and copy if there is a need. The details are discussed here,
Is shrink_to_fit the proper way of reducing the capacity a `std::vector` to its size?
If you don't mind the libc allocation functions realloc is even more efficient, it wont copy the data on a shrink, just mark the extra memory as free, and if you grow the memory and there is memory free after it will mark the needed memory as used and not copy either. Be careful though, you are moving out of C++ stl templates into C void pointers and need to understand how pointers and memory management works, its frowned upon by many now adays as a source for bugs and memory leaks.