Why is 'unbounded_array' more efficient than 'vector'? - c++

It says here that
The unbounded array is similar to a
std::vector in that in can grow in
size beyond any fixed bound. However
unbounded_array is aimed at optimal
performance. Therefore unbounded_array
does not model a Sequence like
std::vector does.
What does this mean?

As a Boost developer myself, I can tell you that it's perfectly fine to question the statements in the documentation ;-)
From reading those docs, and from reading the source code (see storage.hpp) I can say that it's somewhat correct given some assumptions about the implementation of std::vector at the time that code was written. That code dates to 2000 initially, and perhaps as late as 2002. Which means at the time many STD implementations did not do a good job of optimizing destruction and construction of objects in containers. The claim about the non-resizing is easily refuted by using an initially large capacity vector. The claim about speed, I think, comes entirely from the fact that the unbounded_array has special code for eliding dtors & ctors when the stored objects have trivial implementations of them. Hence it can avoid calling them when it has to rearrange things, or when it's copying elements. Compared to really recent STD implementations it's not going to be faster, as new STD implementation tend to take advantage of things like move semantics to do even more optimizations.

It appears to lack insert and erase methods. As these may be "slow," ie their performance depends on size() in the vector implementation, they were omitted to prevent the programmer from shooting himself in the foot.
insert and erase are required by the standard for a container to be called a Sequence, so unlike vector, unbounded_array is not a sequence.
No efficiency is gained by failing to be a sequence, per se.
However, it is more efficient in its memory allocation scheme, by avoiding a concept of vector::capacity and always having the allocated block exactly the size of the content. This makes the unbounded_array object smaller and makes the block on the heap exactly as big as it needs to be.

As I understood it from the linked documentation, it is all about allocation strategy. std::vector afaik postpones allocation until necessary and than might allocate some reasonable chunk of meory, unbounded_array seams to allocate more memory early and therefore it might allocate less often. But this is only a gues from the statement in documentation, that it allocates more memory than might be needed and that the allocation is more expensive.

Related

Does std::unordered_map::erase actually perform dynamic deallocation?

It isn't difficult to find information on the big-O time behavior of stl container operations. However, we operate in a hard real-time environment, and I'm having a lot more trouble finding information on their heap memory usage behavior.
In particular I had a developer come to me asking about std::unordered_map. We're allowed to be non-realtime at startup, so he was hoping to perform a .reserve() at startup time. However, he's finding he gets overruns at runtime. The operations he uses are lookups, insertions, and deletions with .erase().
I'm a little worried about that .reserve() actually preventing later runtime memory allocations (I don't really understand the explanation of what it does wrt to heap usage), but .erase() in particular I don't see any guarantee whatsoever that it won't be asking the heap for a dynamic deallocation when called.
So the question is what's the specified heap interactions (if any) for std::unordered_map::erase, and if it actually does deallocations, if there's some kind of trick that can be used to avoid them?
The standard doesn't specify container allocation patterns per-se. These are effectively derived from iterator/reference invalidation rules. For example, vector::insert only invalidates all references if the number of elements inserted causes the size of the container to exceed its capacity. Which means reallocation happened.
By contrast, the only operations on unordered_map which invalidates references are those which actually remove that particular element. Even a rehash (which likely allocates memory) does not invalidate references (this is why reserve changes nothing).
This means that each element must be stored separately from the hash table itself. They are individual nodes (which is why it has a node_type extraction interface), and must be able to be allocated and deallocated individually.
So it is reasonable to assume that each insertion or erasure represents at least one allocation/deallocation.
If you're all right with nodes continuing to consume memory, even after they've been removed from the container, you could pretty easily write an Allocator class that basically made deallocation a NOP.
Quite a few real-time systems basically allocate all the memory they're going to use up-front, then once they've finished initialization they neither allocate nor release memory. This would allow you to do pretty much the same thing with an unordered_map.
That said, I'm somewhat skeptical about the benefit in this case. The main strength of unordered_map is supporting insertion and deletion that are usually fast. If you're not going to be doing insertion at runtime, chances are pretty good that it's not a particularly great choice.
If it's a collection that's mostly filled during initialization, then used mostly as-is, with a few items being "removed", but no more being inserted after you finish initialization, you're likely to be better off with a simple sorted array and an interpolating search (or, if the data is distributed extremely unpredictably, maybe a binary search--but an interpolating search is usually better). In this case, I'd handle removal by simply adding a boolean to each item saying whether that item is valid or not. Erase by setting that value to false. If you find such a value during a search, you basically just ignore it.

Is shrink_to_fit the proper way of reducing the capacity a `std::vector` to its size?

In C++11 shrink_to_fit was introduce to complement certain STL containers (e.g., std::vector, std::deque, std::string).
Synopsizing, its main functionality is to request the container that is associated to, to reduce its capacity to fit its size. However, this request is non-binding, and the container implementation is free to optimize otherwise and leave the vector with a capacity greater than its size.
Furthermore, in a previous SO question the OP was discouraged from using shrink_to_fit to reduce the capacity of his std::vector to its size. The reasons not to do so are quoted below:
shrink_to_fit does nothing or it gives you cache locality issues and it's O(n) to
execute (since you have to copy each item to their new, smaller home).
Usually it's cheaper to leave the slack in memory. #Massa
Could someone be so kind as to address the following questions:
Do the arguments in the quotation hold?
If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?
Do the arguments in the quotation hold?
Measure and you will know. Are you constrained in memory? Can you figure out the correct size up front? It will be more efficient to reserve than it will be to shrink after the fact. In general I am inclined to agree on the premise that most uses are probably fine with the slack.
If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
The comment does not only apply to shrink_to_fit, but to any other way of shrinking. Given that you cannot realloc in place, it involves acquiring a different chunk of memory and copying over there regardless of what mechanism you use for shrinking.
And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?
The request is non-binding, but the alternatives don't have better guarantees. The question is whether shrinking makes sense: if it does, then it makes sense to provide a shrink_to_fit operation that can take advantage of the fact that the objects are being moved to a new location. I.e., if the type T has a noexcept(true) move constructor, it will allocate the new memory and move the elements.
While you can achieve the same externally, this interface simplifies the operation. The equivalent to shrink_to_fit in C++03 would have been:
std::vector<T>(current).swap(current);
But the problem with this approach is that when the copy is done to the temporary it does not know that current is going to be replaced, there is nothing that tells the library that it can move the held objects. Note that using std::move(current) would not achieve the desired effect as it would move the whole buffer, maintaining the same capacity().
Implementing this externally would be a bit more cumbersome:
{
std::vector<T> copy;
if (noexcept(T(std::move(declval<T>())))) {
copy.assign(std::make_move_iterator(current.begin()),
std::make_move_iterator(current.end()));
} else {
copy.assign(current.begin(), current.end());
}
copy.swap(current);
}
Assuming that I got the if condition right... which is probably not what you want to write every time that you want this operation.
Will the arguments hold?
As the arguments are originally mine, don't mind if I defend them, one by one:
Either shrink_to_fit does nothing (...)
As it was mentioned, the standard says (many times, but in the case of vector it's section 23.3.7.3...) that the request is non-binding to allow an implementation latitude for optimizations. This means that the implementation can define shrink_to_fit as an no-op.
(...) or it gives you cache locality issues
In the case that shrink_to_fit is not implemented as a no-op, you have to allocate a new underlying container with capacity size(), copy (or, in the best case, move) construct all your N = size() new items from the old ones, destruct all the old ones (in the move case this should be optimized, but it's possible that this involves a loop again over the old container) and then destructing the old container per se. This is done, in libstdc++-4.9, exactly as David Rodriguez has described, by
_Tp(__make_move_if_noexcept_iterator(__c.begin()),
__make_move_if_noexcept_iterator(__c.end()),
__c.get_allocator()).swap(__c);
and in libc++-3.5, by a function in __alloc_traits that does approximately the same.
Oh, and an implementation absolutely cannot rely on realloc (even if it uses malloc inside ::operator new for its memory allocations) because realloc, if it cannot shrink in-place, will either leave the memory alone (no-op case) or make a bitwise copy (and miss the opportunity for readjusting pointers, etc. that the proper C++ copying/moving constructors would give).
Sure, one can write a shrinkable memory allocator, and use it in the constructor of its vectors.
In the easy case where the vectors are larger than the cache lines, all that movement puts pressure on the cache.
and it's O(n)
If n = size(), I think it was established above that, at the very least, you have to do one n sized allocation, n copy or move constructions, n destructions, and one old_capacity sized deallocation.
usually it's cheaper just to leave the slack in memory
Obviously, unless you are really pressed for free memory (in which case it might be wiser to save your data to the disk and re-load it later on demand...)
If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
The proper way is still shrink_to_fit... you just have to either not rely on it or know very well your implementation!
And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?
There is no better way, but the reason for the existence of shrink_to_fit is, AFAICT, that sometimes your program might feel memory pressure and it's one way of treating it. Not a very good way, but still.
HTH!
If yes, what's the proper way of shrinking an STL container's capacity to its size (at least for std::vector).
The 'swap trick' will trim a vector to the exact size required (from More Effective STL):
vector<Person>(persons).swap(persons);
Particularly useful when the vector is empty, to release all memory:
vector<Person>().swap(persons);
Vectors were constantly tripping my unit tester's memory leak detection code because of retained allocations of unused space, and this sorted them out perfectly.
This is the kind of example where I really don't care about runtime efficiency (size or speed), but I do care about exact memory usage.
And if there's a better way to shrink a container, what's the reason for the existence of shrink_to_fit after-all?
I really don't know what the point of providing a function that can legally do absolutely nothing is.
I cheered when I saw it had been introduced, then despaired when I found it couldn't be relied upon.
Perhaps we'll see maybe_sort() in the next version.

Overhead to using std::vector?

I know that manual dynamic memory allocation is a bad idea in general, but is it sometimes a better solution than using, say, std::vector?
To give a crude example, if I had to store an array of n integers, where n <= 16, say. I could implement it using
int* data = new int[n]; //assuming n is set beforehand
or using a vector:
std::vector<int> data;
Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?
It is always better to use std::vector/std::array, at least until you can conclusively prove (through profiling) that the T* a = new T[100]; solution is considerably faster in your specific situation. This is unlikely to happen: vector/array is an extremely thin layer around a plain old array. There is some overhead to bounds checking with vector::at, but you can circumvent that by using operator[].
I can't think of any case where dynamically allocating a C style
vector makes sense. (I've been working in C++ for over 25
years, and I've yet to use new[].) Usually, if I know the
size up front, I'll use something like:
std::vector<int> data( n );
to get an already sized vector, rather than using push_back.
Of course, if n is very small and is known at compile time,
I'll use std::array (if I have access to C++11), or even
a C style array, and just create the object on the stack, with
no dynamic allocation. (Such cases seem to be rare in the
code I work on; small fixed size arrays tend to be members of
classes. Where I do occasionally use a C style array.)
If you know the size in advance (especially at compile time), and don't need the dynamic re-sizing abilities of std::vector, then using something simpler is fine.
However, that something should preferably be std::array if you have C++11, or something like boost::scoped_array otherwise.
I doubt there'll be much efficiency gain unless it significantly reduces code size or something, but it's more expressive which is worthwhile anyway.
You should try to avoid C-style-arrays in C++ whenever possible. The STL provides containers which usually suffice for every need. Just imagine reallocation for an array or deleting elements out of its middle. The container shields you from handling this, while you would have to take care of it for yourself, and if you haven't done this a hundred times it is quite error-prone.
An exception is of course, if you are adressing low-level-issues which might not be able to cope with STL-containers.
There have already been some discussion about this topic. See here on SO.
Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?
Call me a simpleton, but 99.9999...% of the times I would just use a standard container. The default choice should be std::vector, but also std::deque<> could be a reasonable option sometimes. If the size is known at compile-time, opt for std::array<>, which is a lightweight, safe wrapper of C-style arrays which introduces zero overhead.
Standard containers expose member functions to specify the initial reserved amount of memory, so you won't have troubles with reallocations, and you won't have to remember delete[]ing your array. I honestly do not see why one should use manual memory management.
Efficiency shouldn't be an issue, since you have throwing and non-throwing member functions to access the contained elements, so you have a choice whether to favor safety or performance.
std::vector could be constructed with an size_type parameter that instantiate the vector with the specified number of elements and that does a single dynamic allocation (same as your array) and also you can use reserve to decrease the number of re-allocations over the usage time.
In n is known at compile-time, then you should choose std::array as:
std::array<int, n> data; //n is compile-time constant
and if n is not known at compile-time, OR the array might grow at runtime, then go for std::vector:
std::vector<int> data(n); //n may be known at runtime
Or in some cases, you may also prefer std::deque which is faster than std::vector in some scenario. See these:
C++ benchmark – std::vector VS std::list VS std::deque
Using Vector and Deque by Herb Sutter
Hope that helps.
From a perspective of someone who often works with low level code with C++, std vectors are really just helper methods with a safety net for a classic C style array. The only overheads you'd experience realistically are memory allocations and safety checks for boundaries. If you're writing a program which needs performance and are going to be using vectors as a regular array I'd recommend to just use C style arrays instead of vectors. You should realistically be vetting the data that comes into the application and check the boundaries yourself to avoid checks on every memory access to the array.
It's good to see that others are checking the differences of the C ways and the C++ ways. More often than not C++ standard methods have significantly worse performance and uglier syntax than their C counterparts and is generally the reason people call C++ bloated. I think C++ focuses more on safety and making the language more like JavaScript/C# even though the language fundamentally lacks the foundation to be one.

Is std::vector::size() allowed to require non-trivial computations? When would it make sense?

I'm reviewing a piece of code and see a class where an std::vector is stored as a member variable and the size of that std::vector is stored as a separate member variable. Both std::vector and its "stored copy" of size are never change during the containing object lifetime and the comments say size is stored separately "for convenience and for cases when an implementation computes the size each time".
My first reaction was "WT*? Should't it be always trivial to extract std::vectors size?"
Now I've carefully read 23.2.4 of C++ Standard and can't see anything saying whether such implementations are allowed in the first place and I can't imagine why it would be necessary to implement std::vector in such way that its current size needs non-trivial computations.
Is such implementation that std::vector::size() requires some non-trivial actions allowed? When would having such implementation make sense?
C++03 says in Table 65, found in §23.1, that size() should have a constant complexity. (In C++0x, this is required for all containers.) You'd be hard-pressed to find a std::vector<> where it's not.
Typically, as Steve says, this is just the difference between two pointers, a simple operation.
I would guess that your definition of "trivial" doesn't match that of the author of the code.
If size isn't stored, I'd expect begin and end to be stored, and size to be computed as the difference of the two, and that code to be inlined. So we're basically talking two (nearby) memory accesses and a subtraction, instead of one memory access.
For most practical purposes, both of those are trivial, and if the standard library author thinks that the result of that computation isn't worth caching, then personally I am happy to accept their opinion. But the author of that code comment might think otherwise.
IIRC the standard says somewhere that size "should" be O(1), not sure if that's in the text for sequences or for containers. I don't think it anywhere specifies that it must be for vector. But even if we read that as a non-requirement there's a fundamental QOI issue here - what on earth am I doing optimizing my code for such a poor implementation at the expense of normal implementations?
If someone uses such an implementation, presumably that's because they want their code to run slowly. Who am I to judge otherwise? ;-)
It's also possible that the author of the code has profiled using a number of end-begin implementations, and measured a significant improvement by caching the size. But I think that's less likely than that the author is being too pessimistic about the worst case their code needs to handle.

Using realloc in c++

std::realloc is dangerous in c++ if the malloc'd memory contains non-pod types. It seems the only problem is that std::realloc wont call the type destructors if it cannot grow the memory in situ.
A trivial work around would be a try_realloc function. Instead of malloc'ing new memory if it cannot be grown in situ, it would simply return false. In which case new memory could be allocated, the objects copied (or moved) to the new memory, and finally the old memory freed.
This seems supremely useful. std::vector could make great use of this, possibly avoiding all copies/reallocations.
preemptive flame retardant: Technically, that is same Big-O performance, but if vector growth is a bottle neck in your application a x2 speed up is nice even if the Big-O remains unchanged.
BUT, I cannot find any c api that works like a try_realloc.
Am I missing something? Is try_realloc not as useful as I imagine? Is there some hidden bug that makes try_realloc unusable?
Better yet, Is there some less documented API that performs like try_realloc?
NOTE: I'm obviously, in library/platform specific code here. I'm not worried as try_realloc is inherently an optimization.
Update:
Following Steve Jessops comment's on whether vector would be more efficient using realloc I wrote up a proof of concept to test. The realloc-vector simulates a vector's growth pattern but has the option to realloc instead. I ran the program up to a million elements in the vector.
For comparison a vector must allocate 19 times while growing to a million elements.
The results, if the realloc-vector is the only thing using the heap the results are awesome, 3-4 allocation while growing to the size of million bytes.
If the realloc-vector is used alongside a vector that grows at 66% the speed of the realloc-vector The results are less promising, allocating 8-10 times during growth.
Finally, if the realloc-vector is used alongside a vector that grows at the same rate, the realloc-vector allocates 17-18 times. Barely saving one allocation over the standard vector behavior.
I don't doubt that a hacker could game allocation sizes to improve the savings, but I agree with Steve that the tremendous effort to write and maintain such an allocator isn't work the gain.
vector generally grows in large increments. You can't do that repeatedly without relocating, unless you carefully arrange things so that there's a large extent of free addresses just above the internal buffer of the vector (which in effect requires assigning whole pages, because obviously you can't have other allocations later on the same page).
So I think that in order to get a really good optimization here, you need more than a "trivial workaround" that does a cheap reallocation if possible - you have to somehow do some preparation to make it possible, and that preparation costs you address space. If you only do it for certain vectors, ones that indicate they're going to become big, then it's fairly pointless, because they can indicate with reserve() that they're going to become big. You can only do it automatically for all vectors if you have a vast address space, so that you can "waste" a big chunk of it on every vector.
As I understand it, the reason that the Allocator concept has no reallocation function is to keep it simple. If std::allocator had a try_realloc function, then either every Allocator would have to have one (which in most cases couldn't be implemented, and would just have to return false always), or else every standard container would have to be specialized for std::allocator to take advantage of it. Neither option is a great Allocator interface, although I suppose it wouldn't be a huge effort for implementers of almost all Allocator classes just to add a do-nothing try_realloc function.
If vector is slow due to re-allocation, deque might be a good replacement.
You could implement something like the try_realloc you proposed, using mmap with MAP_ANONYMOUS and MAP_FIXED and mremap with MREMAP_FIXED.
Edit: just noticed that the man page for mremap even says:
mremap() uses the Linux page table
scheme. mremap() changes the
mapping between
virtual addresses and memory pages. This can be used to implement
a very efficient
realloc(3).
realloc in C is hardly more than a convenience function; it has very little benefit for performance/reducing copies. The main exception I can think of is code that allocates a big array then reduces the size once the size needed is known - but even this might require moving data on some malloc implementations (ones which segregate blocks strictly by size) so I consider this usage of realloc really bad practice.
As long as you don't constantly reallocate your array every time you add an element, but instead grow the array exponentially (e.g. by 25%, 50%, or 100%) whenever you run out of space, just manually allocating new memory, copying, and freeing the old will yield roughly the same (and identical, in case of memory fragmentation) performance to using realloc. This is surely the approach that C++ STL implementations use, so I think your whole concern is unfounded.
Edit: The one (rare but not unheard-of) case where realloc is actually useful is for giant blocks on systems with virtual memory, where the C library interacts with the kernel to relocate whole pages to new addresses. The reason I say this is rare is because you need to be dealing with very big blocks (at least several hundred kB) before most implementations will even enter the realm of dealing with page-granularity allocation, and probably much larger (several MB maybe) before entering and exiting kernelspace to rearrange virtual memory is cheaper than simply doing the copy. Of course try_realloc would not be useful here, since the whole benefit comes from actually doing the move inexpensively.