Complexity requirements for std::deque::push_back/front - c++

As a result of this question from a few days ago there are a few things that have been bugging me about the complexity requirements for std::deque::push_back/push_front vs the actual std::deque implementations out in the wild.
The upshot of the previous question was that these operations are required to have O(1) worst case complexity. I verified that this was indeed the case in c++11:
from 23.3.3.4 deque modifiers, refering to insert, push/emplace front/back
Complexity: The complexity is linear in the number of elements inserted plus the
lesser of the distances to the beginning and end of the deque. Inserting a single
element either at the beginning or end of a deque always takes constant time and
causes a single call to a constructor of T.
This is combined with the O(1) complexity requirement for indexing, via operator[] etc.
The issue is that implementations don't strictly satisfy these requirements.
In terms of both msvc and gcc the std::deque implementation is a blocked data structure, consisting of a dynamic array of pointers to (fixed size) blocks, where each block stores a number of data elements.
In the worst case, push_back/front etc could require an extra block to be allocated (which is fine - fixed size allocation is O(1)), but it could also require that the dynamic array of block pointers be resized - this is not fine, since this is O(m) where m is the number of blocks, which at the end of the day is O(n).
Obviously this is still amortised O(1) complexity and since generally m << n it's going to be pretty fast in practice. But it seems there's an issue with conformance?
As a further point, I don't see how you can design a data structure that strictly satisfies both the O(1) complexity for both push_back/front etc and operator[]. You could have a linked-list of block pointers, but this doesn't give you the desired operator[] behaviour. Can it actually be done?

In the C++11 FDIS, we can read:
23.2.3 Sequence containers [sequence.reqmts]
16/ Table 101 lists operations that are provided for some types of sequence containers but not others. An implementation shall provide these operations for all container types shown in the “container” column, and shall implement them so as to take amortized constant time.
Where Table 101 is named Optional sequence container operations and lists deque for the push_back and push_front operations.
Therefore, it seems more like a slight omission in the paragraph you cited. Perhaps worth a Defect Report ?
Note that the single call to a constructor still holds.

I suspect that the reallocation of the block pointers is done with a geometrically increasing size - this is a common trick for std::vector. I think this is technically O(log m) but as you point out m << n, so as a practical matter it doesn't affect the real-world results.

Related

Does std::deque actually have constant time insertion at the beginning?

The Standard says:
A deque is a sequence container that supports random access iterators (27.2.7). In addition, it supports
constant time insert and erase operations at the beginning or the end; insert and erase in the middle take
linear time.
However, it also says in the same Clause:
All of the complexity requirements in this Clause are stated solely in terms of the number of operations on the contained objects. [ Example: The copy constructor of type vector<vector<int>> has linear complexity, even though the complexity of copying each contained vector<int> is itself linear. — end example ]
Doesn't this mean that insertion at the beginning of, say, deque<int> is allowed to take linear time as long as it doesn't perform more than a constant number of operations on the ints that are already in the deque and the new int object being inserted?
For example, suppose that we implement a deque using a "vector of size-K vectors". It seems that once every K times we insert at the beginning, a new size-K vector must be added at the beginning, so all other size-K vectors must be moved. This would mean the time complexity of insertion at the beginning is amortized O(N/K) where N is the total number of elements, but K is constant, so this is just O(N). But it seems that this is allowed by the Standard, because moving a size-K vector doesn't move any of its elements, and the "complexity requirements" are "stated solely in terms of the number of operations" on the contained int objects.
Does the Standard really allow this? Or should we interpret it as having a stricter requirement, i.e. constant number of operations on the contained objects plus constant additional time?
For example, suppose that we implement a deque using a "vector of size-K vectors".
That wouldn't be a valid implementation. Insertion at the front of a vector invalidates all of the pointers/references in the container. deque is required to not invalidate any pointers/references on front insertion.
But let's ignore that for now.
But it seems that this is allowed by the Standard, because moving a size-K vector doesn't move any of its elements, and the "complexity requirements" are "stated solely in terms of the number of operations" on the contained int objects.
Yes, that would be allowed. Indeed, real implementations of deque are not so dissimilar from that (though they don't use std::vector itself for obvious reasons). The broad outline of a deque implementation is an array of pointers to blocks (with space for growth at both the front and back), with each block containing up to X items as well as a pointers to the next/previous blocks (to make single-element iteration fast).
If you insert enough elements at the front or back, then the array of block pointers has to grow. That will require an operation that is linear time relative to the number of items in the deque, but doesn't actually operate on the items themselves, so it doesn't count. The specification has nothing to say about the complexity of that operation.
Without this provision, I'm not sure if the unique feature-set of deque (fast front/back insert and random-access) would be implementable.
I think you're reaching a bit, in how you interpret the meaning of complexity domains. You're trying to make a distinction between "linear time" and "linear complexity" which I'm not convinced makes much sense.
The standard's clear that insertion at the front is constant-time, and I think we all agree on that. The latter passage just tells us that what each of that "constant" quantity of operations involves underneath is simply not specified or constrained by the standard.
And this is not unusual. Any algorithm works on some basis of abstraction. Even if we were to write an algorithm that came down to individual machine instructions, and we said that there were only ever N machine instructions generated by our algorithm, we wouldn't go investigating what sort of complexity each individual complexity has inside the processor and add that into our results. We wouldn't say that some operations end up doing more on the quantum molecular level and thus our O(n) algorithm is actually O(N×M3) or somesuch. We've chosen not to consider that level of abstraction. And, unless said complexity depends on the algorithm's inputs, that's completely fine.
In your case, the size of the moved/copied inner vectors isn't really relevant, because these do not inherently change as the deque grows. The number of inner vectors does, but the size of each one is an independent property. Thus, it is irrelevant when describing the complexity of inserting a new element into the outer vector.
Does the actual execution time (which could itself be described in some algorithmical terms if you so chose) vary depending on the contents of the copied inner vectors? Yes, of course. But that has nothing to do with how the task of expanding the outer vector grows in workload as the outer vector itself grows.
So, here, the standard is saying that it will not copy N or N2 or even log N inner vectors when you put another one at the front; it is saying that the number of these operations will not change in scale as your deque grows. It is also saying that, for the purposes of that rule, it doesn't care what copying/moving the inner vectors actually involves or how big they are.
Complexity analysis is not about performance. Complexity analysis is about how performance scales.

Does the std::vector implementation use an internal array or linked list or other?

I've been told that std::vector has a C-style array on the inside implementation, but would that not negate the entire purpose of having a dynamic container?
So is inserting a value in a vector an O(n) operation? Or is it O(1) like in a linked-list?
From the C++11 standard, in the "sequence containers" library section (emphasis mine):
[23.3.6.1 Class template vector overview][vector.overview]
A vector is a sequence container that supports (amortized) constant time insert and erase operations at the
end; insert and erase in the middle take linear time. Storage management is handled automatically, though
hints can be given to improve efficiency.
This does not defeat the purpose of dynamic size -- part of the point of vector is that not only is it very fast to access a single element, but scanning over the vector has very good memory locality because everything is tightly packed together. In practice, having good memory locality is very important because it greatly reduces cache misses, which has a large impact on runtime. This is a major advantage of vector over list in many situations, particularly those where you need to iterate over the entire container more often than you need to add or remove elements.
The memory in a std::vector is required to be contiguous, so it's typically represented as an array.
Your question about the complexity of the operations on a std::vector is a good one - I remember wondering this myself when I first started programming. If you append an element to a std::vector, then it may have to perform a resize operation and copy over all the existing elements to a new array. This will take time O(n) in the worst case. However, the amortized cost of appending an element is O(1). By this, we mean that the total cost of any sequence of n appends to a std::vector is always O(n). The intuition behind this is that the std::vector usually overallocates space in its array, leaving a lot of free slots for elements to be inserted into without a reallocation. As a result, most of the appends will take time O(1) even though every now and then you'll have one that takes time O(n).
That said, the cost of performing an insertion elsewhere in a std::vector will be O(n), because you may have to shift everything down.
You also asked why this is, if it defeats the purpose of having a dynamic array. Even if the std::vector just acted like a managed array, it's still a win over raw arrays. The std::vector knows its size, can do bounds-checking (with at), is an actual object (unlike an array), and doesn't decay to a pointer. These extra features - coupled with the extra logic to make appends work quickly - are almost always worth it.

Vector vs Deque insertion in middle

I know that deque is more efficient than vector when insertions are at front or end and vector is better if we have to do pointer arithmetic. But which one to use when we have to perform insertions in middle.? and Why.?
You might think that a deque would have the advantage, because it stores the data broken up into blocks. However to implement operator[] in constant time requires all those blocks to be the same size. Inserting or deleting an element in the middle still requires shifting all the values on one side or the other, same as a vector. Since the vector is simpler and has better locality for caching, it should come out ahead.
Selection criteria with Standard library containers is, You select a container depending upon:
Type of data you want to store &
The type of operations you want to perform on the data.
If you want to perform large number of insertions in the middle you are much better off using a std::list.
If the choice is just between a std::deque and std::vector then there are a number of factors to consider:
Typically, there is one more indirection in case of deque to access the elements, so element
access and iterator movement of deques are usually a bit slower.
In systems that have size limitations for blocks of memory, a deque might contain more elements because it uses more than one block of memory. Thus, max_size() might be larger for deques.
Deques provide no support to control the capacity and the moment of reallocation. In
particular, any insertion or deletion of elements other than at the beginning or end
invalidates all pointers, references, and iterators that refer to elements of the deque.
However, reallocation may perform better than for vectors, because according to their
typical internal structure, deques don't have to copy all elements on reallocation.
Blocks of memory might get freed when they are no longer used, so the memory size of a
deque might shrink (this is not a condition imposed by standard but most implementations do)
std::deque could perform better for large containers because it is typically implemented as a linked sequence of contiguous data blocks, as opposed to the single block used in an std::vector. So an insertion in the middle would result in less data being copied from one place to another, and potentially less reallocations.
Of course, whether that matters or not depends on the size of the containers and the cost of copying the elements stored. With C++11 move semantics, the cost of the latter is less important. But in the end, the only way of knowing is profiling with a realistic application.
Deque would still be more efficient, as it doesn't have to move half of the array every time you insert an element.
Of course, this will only really matter if you consider large numbers of elements, and even than it is advisable to run a benchmark and see which one works better in your particular case. Remember that premature optimization is the root of all evil.

Which one is better stl map or unordered_map for the following cases

I am trying to compare stl map and stl unordered_map for certain operations. I looked on the net and it only increases my doubts regarding which one is better as a whole. So I would like to compare the two on the basis of the operation they perform.
Which one performs faster in
Insert, Delete, Look-up
Which one takes less memory and less time to clear it from the memory. Any explanations are heartily welcomed !!!
Thanks in advance
Which one performs faster in Insert, Delete, Look-up? Which one takes less memory and less time to clear it from the memory. Any explanations are heartily welcomed !!!
For a specific use, you should try both with your actual data and usage patterns and see which is actually faster... there are enough factors that it's dangerous to assume either will always "win".
implementation and characteristics of unordered maps / hash tables
Academically - as the number of elements increases towards infinity, those operations on an std::unordered_map (which is the C++ library offering for what Computing Science terms a "hash map" or "hash table") will tend to continue to take the same amount of time O(1) (ignoring memory limits/caching etc.), whereas with a std::map (a balanced binary tree) each time the number of elements doubles it will typically need to do an extra comparison operation, so it gets gradually slower O(log2n).
std::unordered_map implementations necessarily use open hashing: the fundamental expectation is that there'll be a contiguous array of "buckets", each logically a container of any values hashing thereto.
It generally serves to picture the hash table as a vector<list<pair<key,value>>> where getting from the vector elements to a value involves at least one pointer dereference as you follow the list-head-pointer stored in the bucket to the initial list node; the insert/find/delete operations' performance depends on the size of the list, which on average equals the unordered_map's load_factor.
If the max_load_factor is lowered (the default is 1.0), then there will be less collisions but more reallocation/rehashing during insertion and more wasted memory (which can hurt performance through increased cache misses).
The memory usage for this most-obvious of unordered_map implementations involves both the contiguous array of bucket_count() list-head-iterator/pointer-sized buckets and one doubly-linked list node per key/value pair. Typically, bucket_count() + 2 * size() extra pointers of overhead, adjusted for any rounding-up of dynamic memory allocation request sizes the implementation might do. For example, if you ask for 100 bytes you might get 128 or 256 or 512. An implementation's dynamic memory routines might use some memory for tracking the allocated/available regions too.
Still, the C++ Standard leaves room for real-world implementations to make some of their own performance/memory-usage decisions. They could, for example, keep the old contiguous array of buckets around for a while after allocating a new larger array, so rehashing values into the latter can be done gradually to reduce the worst-case performance at the cost of average-case performance as both arrays are consulted during operations.
implementation and characteristics of maps / balanced binary trees
A map is a binary tree, and can be expected to employ pointers linking distinct heap memory regions returned by different calls to new. As well as the key/value data, each node in the tree will need parent, left, and right pointers (see wikipedia's binary tree article if lost).
comparison
So, both unordered_map and map need to allocate nodes for key/value pairs with the former typically having two-pointer/iterator overhead for prev/next-node linkage, and the latter having three for parent/left/right. But, the unordered_map additionally has the single contiguous allocation for bucket_count() buckets (== size() / load_factor()).
For most purposes that's not a dramatic difference in memory usage, and the deallocation time difference for one extra region is unlikely to be noticeable.
another alternative
For those occasions when the container's populated up front then repeatedly searched without further inserts/erases, it can sometimes be fastest to use a sorted vector, searched using Standard algorithms binary_search, equal_range, lower_bound, upper_bound. This has the advantage of a single contiguous memory allocation, which is much more cache friendly. It always outperforms map, but unordered_map may still be faster - measure if you care.
The reason there is both is that neither is better as a whole.
Use either one. Switch if the other proves better for your usage.
std::map provides better space for worse time.
std::unordered_map provides better time for worse space.
The answer to your question is heavily dependent on the particular STL implementation you're using. Really, you should look at your STL implementation's documentation – it'll likely have a good amount of information on performance.
In general, though, according to cppreference.com, maps are usually implemented as red-black trees and support operations with time complexity O(log n), while unordered_maps usually support constant-time operations. cppreference.com offers little insight into memory usage; however, another StackOverflow answer suggests maps will generally use less memory than unordered_maps.
For the STL implementation Microsoft packages with Visual Studio 2012, it looks like map supports these operations in amortized O(log n) time, and unordered_map supports them in amortized constant time. However, the documentation says nothing explicit about memory footprint.
Map:
Insertion:
For the first version ( insert(x) ), logarithmic.
For the second
version ( insert(position,x) ), logarithmic in general, but
amortized constant if x is inserted right after the element pointed
by position.
For the third version ( insert (first,last) ),
Nlog(size+N) in general (where N is the distance between first and
last, and size the size of the container before the insertion), but
linear if the elements between first and last are already sorted
according to the same ordering criterion used by the container.
Deletion:
For the first version ( erase(position) ), amortized constant.
For the second version ( erase(x) ), logarithmic in container size.
For the last version ( erase(first,last) ), logarithmic in container size plus linear in the distance between first and last.
Lookup:
Logarithmic in size.
Unordered map:
Insertion:
Single element insertions:
Average case: constant.
Worst case: linear in container size.
Multiple elements insertion:
Average case: linear in the number of elements inserted.
Worst case: N*(size+1): number of elements inserted times the container size plus one.
Deletion:
Average case: Linear in the number of elements removed ( constant when you remove just one element )
Worst case: Linear in the container size.
Lookup:
Average case: constant.
Worst case: linear in container size.
Knowing these, you can decide which container to use according to the type of the implementation.
Source: www.cplusplus.com

vector implemented with many blocks and no resize copy

I'm wondering if it would be possible to implement an stl-like vector where the storage is done in blocks, and rather than allocate a larger block and copy from the original block, you could keep different blocks in different places, and overload the operator[] and the iterator's operator++ so that the user of the vector wasn't aware that the blocks weren't contiguous.
This could save a copy when moving beyond the existing capacity.
You would be looking for std::deque
See GotW #54 Using Vector and Deque
In Most Cases, Prefer Using deque (Controversial)
Contains benchmarks to demonstrate the behaviours
The latest C++11 standard says:
§ 23.2.3 Sequence containers
[2] The sequence containers offer the programmer different complexity trade-offs and should be used accordingly.
vector or array is the type of sequence container that should be used by default. list or forward_list
should be used when there are frequent insertions and deletions from the middle of the sequence. deque is
the data structure of choice when most insertions and deletions take place at the beginning or at the end of
the sequence.
FAQ > Prelude's Corner > Vector or Deque? (intermediate) Says:
A vector can only add items to the end efficiently, any attempt to insert an item in the middle of the vector or at the beginning can be and often is very inefficient. A deque can insert items at both the beginning and then end in constant time, O(1), which is very good. Insertions in the middle are still inefficient, but if such functionality is needed a list should be used. A deque's method for inserting at the front is push_front(), the insert() method can also be used, but push_front is more clear.
Just like insertions, erasures at the front of a vector are inefficient, but a deque offers constant time erasure from the front as well.
A deque uses memory more effectively. Consider memory fragmentation, a vector requires N consecutive blocks of memory to hold its items where N is the number of items and a block is the size of a single item. This can be a problem if the vector needs 5 or 10 megabytes of memory, but the available memory is fragmented to the point where there are not 5 or 10 megabytes of consecutive memory. A deque does not have this problem, if there isn't enough consecutive memory, the deque will use a series of smaller blocks.
[...]
Yes it's possible.
do you know rope? it's what you describe, for strings (big string == rope, got the joke?). Rope is not part of the standard, but for practical purposes: it's available on modern compilers. You could use it to represent the complete content of a text editor.
Take a look here: STL Rope - when and where to use
And always remember:
the first rule of (performance) optimizations is: don't do it
the second rule (for experts only): don't do it now.