Do map and set allocate 1 item at a time always? - c++

I am implementing an allocator for std::map and std::set in C++14. The allocator has to provide a function pointer allocate(size_type n) that allocates space for n items at a time.
After some tests, I have seen std::map and std::set always do allocate(1) on my platform, I have not seen any n > 1. It makes sense to me if I think about the internal tree representation.
Does the standard guarantee this behavior? Or can I safely trust n == 1 always in any specific platform?

Does the standard guarantee this behavior?
No. The standard does not guarantee this.
Or can I safely trust n == 1 always in any specific platform?
The number of allocations when instering is constrained by the complexity of the containers methods. For example, for std::map::insert the standard specifies (from cppreference, only first 3 overloads, inserting a single element):
1-3) Logarithmic in the size of the container, O(log(size())).
Then the implementers are free to choose an implementation that fulfills this specification. The log(size()) part is because you need to find the place where to insert and allocating space for a fixed number of elements is just a constant contribution to complexity.
An implementation could choose to allocate space for two elements every second time it is called. 2 is just as constant as 1. However, it shouldn't be too hard to find cases where allocating 1 is more efficient than allocating 2 in absolute terms. Moreover, std::map and std::set are not required to store their elements in contiguous memory.
Hence, I would assume that it is always 1, but you have no guarantee. If you want to be certain you have to look at the specific implementation, but then you rely on implementation details.
allocate(n) is not the same as allocate(1) n times.
A::allocate(n) must return a single pointer, hence it is not trivial to allocate non-contiguous memory. There is however no requirement that this pointer is a T*. Instead A::allocate(n) returns an A::pointer. This can be any type as long as it satisfies NullablePointer, LegacyRandomAccessIterator, and LegacyContiguousIterator.
cppreference mentions boost::interprocess::offset_ptr as an example of how to allocate segmented memory. You might want to take a look at that. Here is the full quote:
Fancy pointers
When the member type pointer is not a raw pointer type, it is commonly
referred to as a "fancy pointer". Such pointers were introduced to
support segmented memory architectures and are used today to access
objects allocated in address spaces that differ from the homogeneous
virtual address space that is accessed by raw pointers. An example of
a fancy pointer is the mapping address-independent pointer
boost::interprocess::offset_ptr, which makes it possible to allocate
node-based data structures such as std::set in shared memory and
memory mapped files mapped in different addresses in every process.
Fancy pointers can be used independently of the allocator that
provided them, through the class template std::pointer_traits.

Related

Does the standard guarantee that the total memory occupied by a std::vector scales as C+N*sizeof(T)?

The C++ standard provides the guarantee that the content of a std::vector is stored contiguously. But does it states that the total occupied memory is:
S = C+N*sizeof(T)
where:
S is the total size on the stack AND on the heap
C is the total size on the stack: C = sizeof(std::vector)
N is the capacity of the vector
T is the type stored
In other words, do I have the guarantee that there is no overhead per element ?
And if I have no such guarantee is there any reason ?
EDIT: to be clear, if I take the example of a std::list, it generally stores 2 extra pointers per element. So my question is: would a such implementation of a std::vector be standard-compliant ?
For there to be any such guarantee, the standard would have to pass the requirement on to the interface of the allocator. It doesn't, so there isn't.
In practice though, as a quality of implementation issue, you expect that memory allocators probably have a constant overhead per allocation but no overhead proportional to the size of the allocation. A counter-example to this would be a memory allocator that always uses a power-of-two-sized block regardless of the size requested. This would be pretty wasteful for large allocations, but not forbidden either as a user-defined allocator or even as the system allocator used by ::operator new[]. It would create an overhead proportional to N on average, assuming that the vector capacities don't happen to fit nicely.
Leaving aside the allocator, I don't believe there's anything in the standard to say that the vector can't allocate (for example) an extra byte per element and use it to store some flags for who-knows-what purpose. As others have remarked, the contiguousness requirement means that those extra bytes cannot lie between the vector elements. They would have to be in a separate allocation or all together at one end of the allocation.
There's at least one good reason that the standard doesn't forbid implementations from "wasting" space by using it to store data used for operations not required by the standard -- doing so would rule out many debugging techniques!
Do I have the guarantee that there is no overhead per element?
Does the standard prohibit it? No.
But would you ever expect to see this in practice? No.
The rule of contiguous data storage and the complexity requirements of vector growth mean that the only possible way for a non-constant-sized data block to be part of the vector would be if it were emplaced directly before the dynamically-allocated element data, or somewhere else entirely. There is no guarantee that this doesn't happen, but, quite simply, no implementation does it because it would be entirely ridiculous and serve no purpose whatsoever.
Does it states that the total occupied memory is:
S = C+N*sizeof(T)
There may be other data members of the vector itself (what you've inaccurately deemed to be "on the stack"), increasing the object's size in constant terms.
The standard gives no guarantee, afaics. But the requirement that the elements be stored contiguously makes it likely that there is no per element overhead. The whole data must be in a memory area which was allocated in one piece. #aschepler remarked correctly though that typical free store implementations have a (constant) overhead per allocation unit, typically a size variable or an end pointer.
Additionally there may be some padding overhead, e.g. an allocation unit will probably span multiples of the natural word size on a machine. And then the OS call will likely reserve a whole memory page to the program, even if you allocate only 1 byte. Whether you consider that as overhead or not is a matter of taste (from the outside yes, from the inside of the program no; and of course subsequent vectors or resize()s dine from the same page).
So at least it's CM + CV + N*sizeof(T), CM and CV being the overhead in the vector (not necessarily on the stack, as Lighness said) and CM the overhead of the memory management.
No, the implementation characteristics you suggest would not be standard compliant. The STL specifies that a std::vector support appending individual elements in amortized constant time.
In order for the amortized cost of inserting an element to be O(1), the size of the array must increase in at least a geometric progression when it is reallocated (see here). A geometric progression means that if the size of the array was N, the new size after reallocation must be K * N, for some K > 1. The choice of K is implementation dependent.
To find out how much space a std::vector has allocated, call std::vector::capacity(). With regard to overhead per element, in the best case the capacity() == size(). In the worst case capacity() == K * (size() - 1).
If you must ensure that your vector is absolutely no larger than it has to be, you can call std::vector::reserve() if you know exactly how large your std::vector will be. You may also call std::vector::resize() (or std::vector::shrink_to_fit() in C++11) after you are done adding elements to reduce the amount of memory reserved.

C++ layout of an object containing containers

As someone with a lot of assembler language experience and old habits to lose, I recently did a project in C++ using a lot of the features that c++03 and c++11 have to offer (mostly the container classes, including some from Boost). It was surprisingly easy - and I tried wherever I could to favor simplicity over premature optimization. As we move into code review and performance testing I'm sure some of the old hands will have aneurisms at not seeing exactly how every byte is manipulated, so I want to have some advance ammunition.
I defined a class whose instance members contain several vectors and maps. Not "pointers to" vectors and maps. And I realized that I haven't got the slightest idea how much contiguous space my objects take up, or what the performance implications might be for frequently clearing and re-populating these containers.
What does such an object look like, once instantiated?
Formally, there aren't any constraints on the implementation
other than those specified in the standard, with regards to
interface and complexity. Practically, most, if not all
implementations derive from the same code base, and are fairly
similar.
The basic implementation of vector is three pointers. The
actual memory for the objects in the vector is dynamically
allocated. Depending on how the vector was "grown", the dynamic
area may contain extra memory; the three pointers point to the
start of the memory, the byte after the last byte currently
used, and the byte after the last byte allocated. Perhaps the
most significant aspect of the implementation is that it
separates allocation and initialization: the vector will, in
many cases, allocate more memory than is needed, without
constructing objects in it, and will only construct the objects
when needed. In addition, when you remove objects, or clear the
vector, it will not free the memory; it will only destruct the
objects, and will change the pointer to the end of the used
memory to reflect this. Later, when you insert objects, no
allocation will be needed.
When you add objects beyond the amount of allocated space,
vector will allocate a new, larger area; copy the objects into
it, then destruct the objects in the old space, and delete it.
Because of the complexity constrains, vector must grow the area
exponentially, by multiplying the size by some fixed constant
(1.5 and 2 are the most common factors), rather than by
incrementing it by some fixed amount. The result is that if you
grow the vector from empty using push_back, there will not be
too many reallocations and copies; another result is that if you
grow the vector from empty, it can end up using almost twice as
much memory as necessary. These issues can be avoided if you
preallocate using std::vector<>::reserve().
As for map, the complexity constraints and the fact that it must
be ordered mean that some sort of balanced tree must be used.
In all of the implementations I know, this is a classical
red-black tree: each entry is allocated separately, in a node
which contains two or three pointers, plus maybe a boolean, in
addition to the data.
I might add that the above applies to the optimized versions of
the containers. The usual implementations, when not optimized,
will add additional pointers to link all iterators to the
container, so that they can be marked when the container does
something which would invalidate them, and so that they can do
bounds checking.
Finally: these classes are templates, so in practice, you have
access to the sources, and can look at them. (Issues like
exception safety sometimes make the implementations less
straight forward than we might like, but the implementations
with g++ or VC++ aren't really that hard to understand.)
A map is a binary tree (of some variety, I believe it's customarily a Red-Black tree), so the map itself probably only contains a pointer and some housekeeping data (such as the number of elements).
As with any other binary tree, each node will then contain two or three pointers (two for "left & right" nodes, and perhaps one to the previous node above to avoid having to traverse the whole tree to find where the previous node(s) are).
In general, vector shouldn't be noticeably slower than a regular array, and certainly no worse than your own implementation of a variable size array using pointers.
A vector is a wrapper for an array. The vector class contains a pointer to a contiguous block of memory and knows its size somehow. When you clear a vector, it usually retains its old buffer (implementation-dependent) so that the next time you reuse it, there are less allocations. If you resize a vector above its current buffer size, it will have to allocate a new one. Reusing and clearing the same vectors to store objects is efficient. (std::string is similar). If you want to find out exactly how much a vector has allocated in its buffer, call the capacity function and multiply this by the size of the element type. You can call the reserve function to manually increase the buffer size, in expectation of the vector taking more elements shortly.
Maps are more complicated so I don't know. But if you need an associative container, you would have to use something complicated in C too, right?
Just wanted to add to the answers of others few things that I think are important.
Firstly, the default (in implementations I've seen) sizeof(std::vector<T>) is constant and made up of three pointers. Below is excerpt from GCC 4.7.2 STL header, the relevant parts:
template<typename _Tp, typename _Alloc>
struct _Vector_base
{
...
struct _Vector_impl : public _Tp_alloc_type
{
pointer _M_start;
pointer _M_finish;
pointer _M_end_of_storage;
...
};
...
_Vector_impl _M_impl;
...
};
template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
class vector : protected _Vector_base<_Tp, _Alloc>
{
...
};
That's where the three pointers come from. Their names are self-explanatory, I think. But there is also a base class - the allocator. Which takes me to my second point.
Secondly, std::vector< T, Allocator = std::allocator<T>> takes second template parameter that is a class that handles memory operations. It's through functions of this class vector does memory management. There is a default STL allocator std::allocator<T>>. It has no data-members, only functions such as allocate, destroy etc. It bases its memory handling around new/delete. But you can write your own allocator and supply it to the std::vector as second template parameter. It has to conform to certain rules, such as functions it provides etc, but how the memory management is done internally - it's up to you, as long as it does not violate logic of std::vector relies on. It might introduce some data-members that will add to the sizeof(std::vector) through the inheritance above. It also gives you the "control over each bit".
Basically, a vector is just a pointer to an array, along with its capacity (total allocated memory) and size (actually used elements):
struct vector {
Item* elements;
size_t capacity;
size_t size;
};
Of course thanks to encapsulation all of this is well hidden and the users never get to handle the gory details (reallocation, calling constructors/destructors when needed, etc) directly.
As to your performance questions regarding clearing, it depends how you clear the vector:
Swapping it with a temporary empty vector (the usual idiom) will delete the old array: std::vector<int>().swap(myVector);
Using clear() or resize(0) will erase all the items and keep the allocated memory and capacity unchanged.
If you are concerned about efficiency, IMHO the main point to consider is to call reserve() in advance (if you can) in order to pre-allocate the array and avoid useless reallocations and copies (or moves with C++11). When adding a lot of items to a vector, this can make a big difference (as we all know, dynamic allocation is very costly so reducing it can give a big performance boost).
There is a lot more to say about this, but I believe I covered the essential details. Don't hesitate to ask if you need more information on a particular point.
Concerning maps, they are usually implemented using red-black trees. But the standard doesn't mandate this, it only gives functional and complexity requirements so any other data structure that fits the bill is good to go. I have to admit, I don't know how RB-trees are implemented but I guess that, again, a map contains at least a pointer and a size.
And of course, each and every container type is different (eg. unordered maps are usually hash tables).

Does an allocation hint get used?

I was reading Why is there no reallocation functionality in C++ allocators? and Is it possible to create an array on the heap at run-time, and then allocate more space whenever needed?, which clearly state that reallocation of a dynamic array of objects is impossible.
However, in The C++ Standard Library by Josuttis, it states an Allocator, allocator, has a function allocate with the following syntax
pointer allocator::allocate(size_type num, allocator<void>::pointer hint = 0)
where the hint has an implementation defined meaning, which may be used to help improve performance.
Are there any implementations that take advantage of this?
I have gained significant performance advantages for iteration times on small scalar types in my plf::colony c++ container using hints with std::allocator under Visual Studio 2010-2013 (iteration speed increased by ~21%), and much smaller speedups under GCC 5.1. So it's safe to say that with those compilers and std::allocator, it makes a difference. But the difference will be compiler-dependent. I am not aware of the ratio of hint-ignoring to hint-observing allocators.
I'm not sure about specific implementations, but note that the allocator isn't allowed to return the hint pointer value before it's been passed to deallocate. So that can't be used as a primitive operation to form a reallocate.
The Standard says the hint must have been returned by a previous call to allocate. It says "The use of [the hint] is unspecified, but it is
intended as an aid to locality." So if you're allocating and releasing a sequence of similar-sized blocks on one thread, you might pass the previously-freed value to avoid cache contention between microprocessor caches.
Otherwise, when CPU B sees that you're using memory addresses still in CPU A's cache (even that memory contains objects that were destroyed according to C++), it must forward the junk data over the bus. Better to let CPU A and B each reuse their own respective cached addresses.
C++11 states, in 20.6.9.1 allocator members:
4 - [ Note: In a container member function, the address of an adjacent element is often a good choice to pass for the hint argument. — end note ]
[...]
6 - [...] The use of hint is unspecified, but intended as an aid
to locality if an implementation so desires.
Allocating new elements adjacent or close to existing elements in memory can aid performance by improving locality; because they are usually cached together, nearby elements will tend to travel together up the memory hierarchy and will not evict each other.

std::vector versus std::array in C++

What are the difference between a std::vector and an std::array in C++? When should one be preferred over another? What are the pros and cons of each? All my textbook does is list how they are the same.
std::vector is a template class that encapsulate a dynamic array1, stored in the heap, that grows and shrinks automatically if elements are added or removed. It provides all the hooks (begin(), end(), iterators, etc) that make it work fine with the rest of the STL. It also has several useful methods that let you perform operations that on a normal array would be cumbersome, like e.g. inserting elements in the middle of a vector (it handles all the work of moving the following elements behind the scenes).
Since it stores the elements in memory allocated on the heap, it has some overhead in respect to static arrays.
std::array is a template class that encapsulate a statically-sized array, stored inside the object itself, which means that, if you instantiate the class on the stack, the array itself will be on the stack. Its size has to be known at compile time (it's passed as a template parameter), and it cannot grow or shrink.
It's more limited than std::vector, but it's often more efficient, especially for small sizes, because in practice it's mostly a lightweight wrapper around a C-style array. However, it's more secure, since the implicit conversion to pointer is disabled, and it provides much of the STL-related functionality of std::vector and of the other containers, so you can use it easily with STL algorithms & co. Anyhow, for the very limitation of fixed size it's much less flexible than std::vector.
For an introduction to std::array, have a look at this article; for a quick introduction to std::vector and to the the operations that are possible on it, you may want to look at its documentation.
Actually, I think that in the standard they are described in terms of maximum complexity of the different operations (e.g. random access in constant time, iteration over all the elements in linear time, add and removal of elements at the end in constant amortized time, etc), but AFAIK there's no other method of fulfilling such requirements other than using a dynamic array. As stated by #Lucretiel, the standard actually requires that the elements are stored contiguously, so it is a dynamic array, stored where the associated allocator puts it.
To emphasize a point made by #MatteoItalia, the efficiency difference is where the data is stored. Heap memory (required with vector) requires a call to the system to allocate memory and this can be expensive if you are counting cycles. Stack memory (possible for array) is virtually "zero-overhead" in terms of time, because the memory is allocated by just adjusting the stack pointer and it is done just once on entry to a function. The stack also avoids memory fragmentation. To be sure, std::array won't always be on the stack; it depends on where you allocate it, but it will still involve one less memory allocation from the heap compared to vector. If you have a
small "array" (under 100 elements say) - (a typical stack is about 8MB, so don't allocate more than a few KB on the stack or less if your code is recursive)
the size will be fixed
the lifetime is in the function scope (or is a member value with the same lifetime as the parent class)
you are counting cycles,
definitely use a std::array over a vector. If any of those requirements is not true, then use a std::vector.
If you are considering using multidimensional arrays, then there is one additional difference between std::array and std::vector. A multidimensional std::array will have the elements packed in memory in all dimensions, just as a c style array is. A multidimensional std::vector will not be packed in all dimensions.
Given the following declarations:
int cConc[3][5];
std::array<std::array<int, 5>, 3> aConc;
int **ptrConc; // initialized to [3][5] via new and destructed via delete
std::vector<std::vector<int>> vConc; // initialized to [3][5]
A pointer to the first element in the c-style array (cConc) or the std::array (aConc) can be iterated through the entire array by adding 1 to each preceding element. They are tightly packed.
A pointer to the first element in the vector array (vConc) or the pointer array (ptrConc) can only be iterated through the first 5 (in this case) elements, and then there are 12 bytes (on my system) of overhead for the next vector.
This means that a std::vector> array initialized as a [3][1000] array will be much smaller in memory than one initialized as a [1000][3] array, and both will be larger in memory than a std:array allocated either way.
This also means that you can't simply pass a multidimensional vector (or pointer) array to, say, openGL without accounting for the memory overhead, but you can naively pass a multidimensional std::array to openGL and have it work out.
Summarizing the above discussion in a table for quick reference:
C-Style Array
std::array
std::vector
Size
Fixed/Static
Fixed/Static
Dynamic
Memory efficiency
More efficient
More Efficient
Less efficient (May double its size on new allocation.)
Copying
Iterate over elements or use std::copy()
Direct copy: a2 = a1;
Direct copy: v2 = v1;
Passing to function
Passed by pointer. (Size not available in function)
Passed by value
Passed by value (Size available in that function)
Size
sizeof(a1) / sizeof(a1[0])
a1.size()
v1.size()
Use case
For quick access and when insertions/deletions not frequently needed.
Same as classic array but safer and easier to pass and copy.
When frequent additions or deletions might be needed
Using the std::vector<T> class:
...is just as fast as using built-in arrays, assuming you are doing only the things built-in arrays allow you to do (read and write to existing elements).
...automatically resizes when new elements are inserted.
...allows you to insert new elements at the beginning or in the middle of the vector, automatically "shifting" the rest of the elements "up"( does that make sense?). It allows you to remove elements anywhere in the std::vector, too, automatically shifting the rest of the elements down.
...allows you to perform a range-checked read with the at() method (you can always use the indexers [] if you don't want this check to be performed).
There are two three main caveats to using std::vector<T>:
You don't have reliable access to the underlying pointer, which may be an issue if you are dealing with third-party functions that demand the address of an array.
The std::vector<bool> class is silly. It's implemented as a condensed bitfield, not as an array. Avoid it if you want an array of bools!
During usage, std::vector<T>s are going to be a bit larger than a C++ array with the same number of elements. This is because they need to keep track of a small amount of other information, such as their current size, and because whenever std::vector<T>s resize, they reserve more space then they need. This is to prevent them from having to resize every time a new element is inserted. This behavior can be changed by providing a custom allocator, but I never felt the need to do that!
Edit: After reading Zud's reply to the question, I felt I should add this:
The std::array<T> class is not the same as a C++ array. std::array<T> is a very thin wrapper around C++ arrays, with the primary purpose of hiding the pointer from the user of the class (in C++, arrays are implicitly cast as pointers, often to dismaying effect). The std::array<T> class also stores its size (length), which can be very useful.
A vector is a container class while an array is an allocated memory.

Why is it not possible to access the size of a new[]'d array?

When you allocate an array using new [], why can't you find out the size of that array from the pointer? It must be known at run time, otherwise delete [] wouldn't know how much memory to free.
Unless I'm missing something?
In a typical implementation the size of dynamic memory block is somehow stored in the block itself - this is true. But there's no standard way to access this information. (Implementations may provide implementation-specific ways to access it). This is how it is with malloc/free, this is how it is with new[]/delete[].
In fact, in a typical implementation raw memory allocations for new[]/delete[] calls are eventually processed by some implementation-specific malloc/free-like pair, which means that delete[] doesn't really have to care about how much memory to deallocate: it simply calls that internal free (or whatever it is named), which takes care of that.
What delete[] does need to know though is how many elements to destruct in situations when array element type has non-trivial destructor. And this is what your question is about - the number of array elements, not the size of the block (these two are not the same, the block could be larger than really required for the array itself). For this reason, the number of elements in the array is normally also stored inside the block by new[] and later retrieved by delete[] to perform the proper array element destruction. There are no standard ways to access this number either.
(This means that in general case, a typical memory block allocated by new[] will independently, simultaneously store both the physical block size in bytes and the array element count. These values are stored by different levels of C++ memory allocation mechanism - raw memory allocator and new[] itself respectively - and don't interact with each other in any way).
However, note that for the above reasons the array element count is normally only stored when the array element type has non-trivial destructor. I.e. this count is not always present. This is one of the reasons why providing a standard way to access that data is not feasible: you'd either have to store it always (which wastes memory) or restrict its availability by destructor type (which is confusing).
To illustrate the above, when you create an array of ints
int *array = new int[100];
the size of the array (i.e. 100) is not normally stored by new[] since delete[] does not care about it (int has no destructor). The physical size of the block in bytes (like, 400 bytes or more) is normally stored in the block by the raw memory allocator (and used by raw memory deallocator invoked by delete[]), but it can easily turn out to be 420 for some implementation-specific reason. So, this size is basically useless for you, since you won't be able to derive the exact original array size from it.
You most likely can access it, but it would require intimate knowledge of your allocator and would not be portable. The C++ standard doesn't specify how implementations store this data, so there's no consistent method for obtaining it. I believe it's left unspecified because different allocators may wish to store it in different ways for efficiency purposes.
It makes sense, as for example the size of the allocated block may not necessarily be the same size as the array. While it is true that new[] may store the number of elements (calling each elements destructor), it doesn't have to as it wouldn't be required for a empty destructor. There is also no standard way (C++ FAQ Lite 1, C++ FAQ Lite 2) of implementing where new[] stores the array length as each method has its pros and cons.
In other words, it allows allocations to as fast an cheap as possible by not specifying anything about the implementation. (If the implementation has to store the size of the array as well as the size of the allocated block every time, it wastes memory that you may not need).
Simply put, the C++ standard does not require support for this. It is possible that if you know enough about the internals of your compiler, you can figure out how to access this information, but that would generally be considered bad practice. Note that there may be a difference in memory layout for heap-allocated arrays and stack-allocated arrays.
Remember that essentially what you are talking about here are C-style arrays, too -- even though new and delete are C++ operators -- and the behavior is inherited from C. If you want a C++ "array" that is sized, you should be using the STL (e.g. std::vector, std::deque).