What does "STL allocate memory internally" means? - c++

I was reading this answer and maybe because I have never encountered this words, I don't understand what the user was mentioning in the first point of that answer, can someone use simpler words or an example to show what that statement means ?

When you use something like vectors or map ,... it belongs to STL (STANDARD TEMPLATE LIBRARY). you don't need to allocate memory as you do in arrays. In realtime the arrays are not sufficient and we cannot determine size.
STL containers will allocate memory internally, as you add elements to it. so there is good memory management. [if users manually allot, it might be not enough if alloted less or gets wasted if alloted too much memory].

Related

Finding the size of a large boost::unordered_map

I would like to find the size of a boost::unordered_map I have that contains a pointer to a class mapped by a std::string. I am doing a sizeof(unordered_map var). Is that right? Would it give me the space it occupies? Including the house keeping it takes up? Wanted to measure it to compare it to a std::map that will hold the same data, which also I would measure by sizeof(std::map var). I would like to know both to decide how much storage each occupies, and which is a better alternative to go with, comparing the speed and space.
Please let me know if my way of calculating the sizes are right and will give me the actual/correct sizes and will help me make the right decision.
Edit 1:
If my way of trying to get the size is wrong, please let me know ways of getting the correct size(inclusive of house keeping)
TIA
-R
The sizeof() operator returns only the size of an object, but not the space it occupies on the heap (dynamically allocated memory). Since maps and strings may very well allocate memory on the heap, this will not help you.
There is no simple way to measure the total memory footprint of certain parts of your program. However, it is not impossible. One option is to use a custom allocator, which records its memory allocation and which you use for all objects related to the entities you want to measure (for the map and its objects including the strings).
You're simply not going to be able to reliably calculate the amount of space used up by your map. There's types and space you have no access to.
What you should do is ask a totally different question having to do with the problem you're trying to solve where you think this is necessary.

What advantages do arrays hold over vectors?

Well, after a full year of programming and only knowing of arrays, I was made aware of the existence of vectors (by some members of StackOverflow on a previous post of mine). I did a load of researching and studying them on my own and rewrote an entire application I had written with arrays and linked lists, with vectors. At this point, I'm not sure if I'll still use arrays, because vectors seem to be more flexible and efficient. With their ability to grow and shrink in size automatically, I don't know if I'll be using arrays as much. At this point, the only advantage I personally see is that arrays are much easier to write and understand. The learning curve for arrays is nothing, where there is a small learning curve for vectors. Anyway, I'm sure there's probably a good reason for using arrays in some situation and vectors in others, I was just curious what the community thinks. I'm an entirely a novice, so I assume that I'm just not well-informed enough on the strict usages of either.
And in case anyone is even remotely curious, this is the application I'm practicing using vectors with. Its really rough and needs a lot of work: https://github.com/JosephTLyons/Joseph-Lyons-Contact-Book-Application
A std::vector manages a dynamic array. If your program need an array that changes its size dynamically at run-time then you would end up writing code to do all the things a std::vector does but probably much less efficiently.
What the std::vector does is wrap all that code up in a single class so that you don't need to keep writing the same code to do the same stuff over and over.
Accessing the data in a std::vector is no less efficient than accessing the data in a dynamic array because the std::vector functions are all trivial inline functions that the compiler optimizes away.
If, however, you need a fixed size then you can get slightly more efficient than a std::vector with a raw array. However you won't loose anything using a std::array in those cases.
The places I still use raw arrays are like when I need a temporary fixed-size buffer that isn't going to be passed around to other functions:
// some code
{ // new scope for temporary buffer
char buffer[1024]; // buffer
file.read(buffer, sizeof(buffer)); // use buffer
} // buffer is destroyed here
But I find it hard to justify ever using a raw dynamic array over a std::vector.
This is not a full answer, but one thing I can think of is, that the "ability to grow and shrink" is not such a good thing if you know what you want. For example: assume you want to save memory of 1000 objects, but the memory will be filled at a rate that will cause the vector to grow each time. The overhead you'll get from growing will be costly when you can simply define a fixed array
Generally speaking: if you will use an array over a vector - you will have more power at your hands, meaning no "background" function calls you don't actually need (resizing), no extra memory saved for things you don't use (size of vector...).
Additionally, using memory on the stack (array) is faster than heap (vector*) as shown here
*as shown here it's not entirely precise to say vectors reside on the heap, but they sure hold more memory on the heap than the array (that holds none on the heap)
One reason is that if you have a lot of really small structures, small fixed length arrays can be memory efficient.
compare
struct point
{
float coords[4]
}
with
struct point
{
std::vector<float> coords;
}
Alternatives include std::array for cases like this. Also std::vector implementations will over allocate, meaning that if you want resize to 4 slots, you might have memory allocated for 16 slots.
Furthermore, the memory locations will be scattered and hard to predict, killing performance - using an exceptionally larger number of std::vectors may also need to memory fragmentation issues, where new starts failing.
I think this question is best answered flipped around:
What advantages does std::vector have over raw arrays?
I think this list is more easily enumerable (not to say this list is comprehensive):
Automatic dynamic memory allocation
Proper stack, queue, and sort implementations attached
Integration with C++ 11 related syntactical features such as iterator
If you aren't using such features there's not any particular benefit to std::vector over a "raw array" (though, similarly, in most cases the downsides are negligible).
Despite me saying this, for typical user applications (i.e. running on windows/unix desktop platforms) std::vector or std::array is (probably) typically the preferred data structure because even if you don't need all these features everywhere, if you're already using std::vector anywhere else you may as well keep your data types consistent so your code is easier to maintain.
However, since at the core std::vector simply adds functionality on top of "raw arrays" I think it's important to understand how arrays work in order to be fully take advantage of std::vector or std::array (knowing when to use std::array being one example) so you can reduce the "carbon footprint" of std::vector.
Additionally, be aware that you are going to see raw arrays when working with
Embedded code
Kernel code
Signal processing code
Cache efficient matrix implementations
Code dealing with very large data sets
Any other code where performance really matters
The lesson shouldn't be to freak out and say "must std::vector all the things!" when you encounter this in the real world.
Also: THIS!!!!
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
embedding arrays can localise memory access requirement, improving cache hits and therefore performance

Preallocate memory for dynamic data structure

I have a question/curiosity.
Let's say I want to implement a list, and for example I could basically use the cormen book approach. Where it is explained how to implement, insert, delete, key search etc.
However nothing is said for what the memory use is concerned. For example if I would like to insert an integer, in a list of integers. I could for example first create a node (I allocate memory there) insert the integer and then insert the node in the list. If I would like to delete an integers, once I know in which node is stored, I have to free the memory.
I was now wondering if instead it would be more convenient to preallocate memory to store, say, 10 nodes and keeping a pointer to a free node to be used. If the memory pool is full then I reallocate memory for 20 nodes, if the pool is the large I half the size of such pool (and so on and so forth). The pool is of course more complicated to manage since I'd need for example to handle possible memory fragmentation etc.
Does what I'm saying make any sense? Or is it no sense? I've read in a book, for game programming, that memory preallocation could improve performance, but I was wondering how.
This is both a simple and a complex question. If you operate within standard problems, you don't really need to worry about memory allocation. For example, preallocating memory for 10 nodes won't be efficient in any scale, and your performance problems might be elsewhere. However, if your program constantly allocates and deallocates hundreds or thousands of small objects per second, it could lead to memory fragmentation, and you might need to write your custom allocator.
Almost no standard containers don't have any methods to preallocate elements storage, except for std::vector::reserve function. All of them, however, allow to use custom allocators in constructors. Also, there's placement new operator.
You could try to experiment with such things, they're fun to write, just don't use them in production if you absolutely don't have to.
I was now wondering if instead it would be more convenient to preallocate memory to store, say, 10 nodes and keeping a pointer to a free node to be used.
You basically are describing what a pool allocator usually does (I assume you are talking about nodes of constant size). So, the short answer to your question is: yes you would improve performance by using a pool allocator with a list container.
Memory allocators shipped with common compilers are quite good for general purpose allocation (i.e. for allocation of random size objects). However, when your need is to allocate objects of constant size, you should consider using a custom pool allocator. You can easily understand why a constant size objects allocator performs faster than the standard one.
You might write your own pool allocator, however it's not an easy task and you should better consider using an existing one, such as boost pool_allocator or fast_pool_allocator.

Is it possible to implement a memory pool that works with arrays instead of single objects?

I know it's easy to make a memory pool for single objects, however I need to make a memory pool for arrays. The memory pool I have currently has a vector of addresses to contiguous memory blocks and a stack that points to each object from these blocks, so when you allocate from the pool you just pop the stack and when you free, you just push an object's address back to it. However I also need an array equivalent. Something like this:
template<typename T>
class ArrayPool
{
public:
ArrayPool();
~ArrayPool();
T* AllocateArray(int x); //Returns a pointer to a T array that contains 'x' elements.
void FreeArray(T* arr, int x); //Returns the array to the free address list/stack/whatever/
};
Has such a thing been implemented? I imagine a big problem from having such a pool - if make sure arrays returned by ALlocateArray are contiguous in memory, I'm basically doing the same as if not having a memorypool. Just allocating arrays on the spot. With the normal object pool every time I just allocate 1 object. With the arrays I may allocate a different sized array every time, so once an array is freed, it won't be compatible with a new one of different size, unless I stich arrays together with some linkedlist-like structure, but then they won't be contiguous.
Currently your allocator takes advantage of the fact that all allocations are the same size. This simplifies and speeds up allocation and freeing, and means memory fragmentation is impossible.
If you have to allocate arrays of any size, then what you want is a general-purpose allocator, not a pool allocator. What to do next depends why you're using a pool allocator in the first place. I can think of two other features of a pool allocator that might be relevant, and there may be others:
all memory comes from a particular region specified when you create the pool
all memory can be freed at once without freeing each individual allocation, by resetting the pool.
If you don't need any special features of controlling allocation yourself then just use vector or global operator new or malloc to allocate your memory. If you do need special features then you'll probably want to take an allocator off the shelf rather than implementing your own. If you really want to get into the details of how a good memory allocator works then look at http://g.oswego.edu/dl/html/malloc.html and perhaps adapt it to your use.
But if you really need to hand-roll an allocator for limited purposes, then the basic idea is that instead of a list of free nodes from which you can always take the first, you need some data structure (your choice what) containing free blocks of different sizes, that allows you to quickly find a block that's big enough to satisfy the current request. In the case where it's much bigger you might choose to split the block, return part of it, and keep the rest as a new smaller free block. In the case where two free blocks are adjacent you might choose to merge them into a single larger free block.
One common strategy is to keep pool-like lists of blocks of certain sizes (for example 16, 32, 64...). If the request is small enough, satisfy it using one of these. If not, do something more complex. But as I say, if you want to see a lot of tricks working together then look at dlmalloc.
What you could do is having fixed sizes and only work on those. For example 400st 32 byte arrays, 200 128b, 100 1024b, 50 8096b or something like that. When something ask for an array of size N you match to the closest size with a free array.
How many you need to each size is probably up for a lot of tweaking.
That would allow you to re-use arrays much more freely than allowing custom sizes.
What exactly are you trying to win from this? Why isn't it enough just to treat each array as an object? Unless you are direly strapped for memory or the time to construct the array elements is really excessive and not to be wasted, this sounds like a classic case of premature optimization. And if the above are your problems, I'd explore other data structures (not arrays) first before plunging into this.
Your time (getting this working and its quirks ironed out will be a week or so, methinks) is way more valuable than a few pennies of computer time or memory saved.

Given an Array, is there an algorithm that can allocate memory out of it?

I'm doing some graphics programming and I'm using Vertex pools. I'd like to be able to allocate a range out of the pool and use this for drawing.
Whats different from the solution I need than from a C allocator is that I never call malloc. Instead I preallocate the array and then need an object that wraps that up and keeps track of the free space and allocates a range (a pair of begin/end pointers) from the allocation I pass in.
Much thanks.
in general: you're looking for a memory mangager, which uses a (see wikipedia) memory pool (like the boost::pool as answered by TokenMacGuy). They come in many flavours. Important considerations:
block size (fixed or variable; number of different block sizes; can the block size usage be predicted (statistically)?
efficiency (some managers have 2^n block sizes, i.e. for use in network stacks where they search for best fit block; very good performance and no fragementation at the cost of wasting memory)
administration overhead (I presume that you'll have many, very small blocks; so the number of ints and pointers maintainted by the memory manager is significant for efficiency)
In case of boost::pool, I think the simple segragated storage is worth a look.
It will allow you to configure a memory pool with many different block sizes for which a best-match is searched for.
boost::pool does this for you very nicely!
Instead I preallocate the array and then need an object that wraps that up and keeps track of the free space and allocates a range (a pair of begin/end pointers) from the allocation I pass in.
That's basically what malloc() does internally (malloc() can increase the size of this "preallocated array" if it gets full, though). So yes, there is an algorithm for it. There are many, in fact, and Wikipedia gives a basic overview. Different strategies can work better in different situations. (E.g. if all the blocks are a similar size, or if there's some pattern to allocation and freeing)
If you have many objects of the same size, look into obstacks.
You probably don't want to write the code yourself, it's not an easy task and bugs can be painful.