How do multimaps internally handle duplicate keys? - c++

With maps, I can understand it being implemented as a binary search tree (a red/black tree, for example) and the time complexity of it.
But with multimaps, how are key collisions handled internally? Is it that a list is maintained for all the nodes with same keys? Or some other handling is undertaken. I came across a situation where I could use either a map<int,vector<strings>> or a multimap<int,string> and would like to know the tradeoffs.

The C++ spec doesn't give a specific implementation for std::multimap, but instead gives requirements on how fast the operations on std::multimap should be and what guarantees should hold on those operations. For example, insert on a multimap needs to insert the key/value pair into the multimap and has to do so in a way that makes it come after all existing entries with the same key. That has to work in time O(log n), and specifically amortized O(1) if the insertion occurs with a hint and the hint is the spot right before where the element should go. With just this information, the multimap could work by having a red/black tree with many nodes, one per key, or it could be an red/black tree storing a vector of values for each key. (This rules out an AVL tree, though, because the rotations involved in an AVL tree insertion don't run in amortized O(1) time. However, it also permits things like 2-3-4 trees or deterministic skiplists).
As we add more requirements, though, certain implementations get ruled out. For example, the erase operation needs to run in amortized constant time if given an iterator to the element to erase. That rules out the use of a single node with a key and a vector of values, but it doesn't rule out a single node with a key and a doubly-linked list of values. The iterator type needs to be able to dereference to a value_type, which needs to match the underlying allocator's value_type. This rules out the possibility that you could have individual nodes in a red/black tree with a single key and a linked list of values, since you couldn't obtain a reference to a value_type that way.
Overall, the restrictions are such that one permissible implementation is a red/black tree with one node per key/value pair, though others may be possible as well. Other ideas - like using an AVL tree, or coalescing values for a given key into a vector or list - aren't possible.
Hope this helps!

Related

How is the ordering of std::map achieved?

We can see from several sources that std::map is implemented using a red-black tree. It is my understanding that these types of data structures do not hold their elements in any particular order and just maintain the BST property and the height balancing requirements.
So, how is it that map::begin is constant time, and we are able to iterate over an ordered sequence?
Starting from the premise that std::map is maintaining a BST internally (which is not strictly required by the standard, but most libraries probably do that, like a red-black tree).
In a BST, to find the smallest element, you would just follow the left branches until you reach a leaf, which is O(log(N)). However, if you want to deliver the "begin()" iterator in constant time, it is quite simple to just keep track, internally, of the smallest element. Every time an insertion causes the smallest element to change, you update it, that's all. It's memory overhead, of course, but that's a trade-off.
There are possibly other ways to single out the smallest element (like keeping the root node unbalanced on purpose). Either way, it's not hard to do.
To iterate through the "ordered" sequence, you simply have to do an in-order traversal of the tree. Starting from the left-most leaf node, you go (up), (right), (up, up), (right), ... so on.. it's a simple set of rules and it's easy to implement, just see a quick implementation of a simple BST inorder iterator that I wrote a while back. As you do the in-order traversal, you will visit every node from the smallest to the biggest, in the correct order. In other words, it just gives you the illusion that "array" is sorted, but in reality, it's the traversal that makes it look like it's sorted.
The balancing properties of a red-black tree allow you to insert a node, anywhere in the tree, at O(log N) cost. For typical std::map implementations, the container will keep the tree sorted, and whenever you insert a new node, insert it into the correct location to keep the tree sorted, and then rebalance the tree to maintain the red-black property.
So no, red-black trees are not inherently sorted.
RB trees are binary search trees. Binary search trees don't necessarily store their elements in any particular order, but you can always get an inorder traversal. I'm not sure how map::begin guarantees constant time, I'd assume this involves always remembering the path to the smallest element (normally it'd be O(log(n))).

Queue-like data structure with random access element removal

Is there a data structure like a queue which also supports removal of elements at arbitrary points? Enqueueing and dequeueing occur most frequently, but mid-queue element removal must be similar in speed terms since there may be periods where that is the most common operation. Consistency of performance is more important than absolute speed. Time is more important than memory. Queue length is small, under 1,000 elements at absolute peak load.In case it's not obvious I'll state it explicitly: random insertion is not required.
Have tagged C++ since that is my implementation language, but I'm not using (and don't want to use) any STL or Boost. Pure C or C++ only (I will convert C solutions to a C++ class.)
Edit: I think what I want is a kind of dictionary that also has a queue interface (or a queue that also has a dictionary interface) so that I can do things like this:
Container.enqueue(myObjPtr1);
MyObj *myObjPtr2 = Container.dequeue();
Container.remove(myObjPtr3);
I think that double-link list is exactly what you want (assuming you do not want a priority queue):
Easy and fast adding elements to both ends
Easy and fast removal of elements from anywhere
You can use std::list container, but (in your case) it is difficult to remove an element
from the middle of the list if you only have a pointer (or reference) to the element (wrapped in STL's list element), but
you do not have an iterator. If using iterators (e.g. storing them) is not an option - then implementing a double linked list (even with element counter) should be pretty easy. If you implement your own list - you can directly operate on pointers to elements (each of them contains pointers to both of its neighbours). If you do not want to use Boost or STL this is probably the best option (and the simplest), and you have control of everything (you can even write your own block allocator for list elements to speed up things).
One option is to use an order statistic tree, an augmented tree structure that supports O(log n) random access to each element, along with O(log n) insertion and deletion at arbitrary points. Internally, the order statistic tree is implemented as a balanced binary search treewith extra information associated with it. As a result, lookups are a slower than in a standard dynamic array, but the insertions are much faster.
Hope this helps!
You can use a combination of a linked list and a hash table. In java it is called a LinkedHashSet.
The idea is simple, have a linked list of elements, and also maintain a hash map of (key,nodes), where node is a pointer to the relevant node in the linked list, and key is the key representing this node.
Note that the basic implementation is a set, and some extra work will be needed to make this data structure allow dupes.
This data structure allows you both O(1) head/tail access, and both O(1) access to any element in the list. [all on average armotorized]

Advantages and Disadvantages of hashmap and tree map?

C++ by default provides a tree based map. With Boost you can get a hashmap.
What are the advantages and disadvantages of
C++'s Tree Based Map and
Boost's Hashmap
?
C++0x/TR1 also provides the unordered_map which is usually implemented as a hash map.
The differences are twofold:
The key type. In the ordered map, the key type must obey a strict weak ordering, and entries are maintained in that order. In the unordered map, the key type must be equality-comparable and you must provide a hash function h such that h(Key) returns size_t [thanks to Steve Jessop for the clarification].
Access complexity: Insert/delete/find in an ordered map is O(log n) in the map size n. In the unordered map, it is "usually" O(1), but worst-case behaviour is O(n) (e.g. if all keys map to the same hash value).
So the ordered map provides a total complexity guarantee, while the unordered map provides a (better) complexity in good cases, depending on the quality of your hash function.
The internal implementation complexity of the unordered map is greater than of the ordered map, but you can imagine that you get the better access complexity because you get fewer features, i.e. you don't get sorting for free. It's a classical trade-off.
Another point: Practically, if the weak ordering operator is expensive to compute, like for strings, the unordered map may actually be quite a bit faster, because comparisons on the hash type are very fast. On the other hand, if your key type is one with trivial hash function (like any built-in integral type) and if you don't need the ordering, consider using an unordered container.
Hash tables provide very fast search access and insertion/deletions of objects ... the complexity for such operations is on average O(1), meaning constant time. The main limitation for these two operations is the speed of the hashing algorithm (for some types of objects that are not POD's, these can be a bit complex and take up more time for good ones that avoid "collisions" where two different objects hash to the same value). The main penalty for a hash table is that it requires a lot of extra space.
Binary trees on the other-hand have relatively quick insertion and search times, and the complexity for deleting and object is the same as insertions. Because of the way a binary tree works, where each node has two more child-nodes, the search and access time (as well as insertions and deletions), takes O(log N) time. So binaty trees are "slower" than hash tables, but are not as complex to implement (although balanced binary search trees are more complex that unbalanced trees).
Another side-benefit of a binary search tree is that you can, by iterating though the container from the "first" element to the "last" element, get a sorted list of objects, where-as with the hash-map, that list would not be sorted. So the extra time for insertions also takes into account the face that the binary search tree is a sorted insertion. For instance, the complexity of a quicksort on a group of N items is the same complexity as building a balanced binary search tree (i.e., a red/black tree) for the same group of N items. Both operations are O(N log N).

Simple and efficient container in C++ with characteristics of map and list containers

I'm looking for a C++ container that will enjoy both map container and list container benefits.
map container advantages I would like to maintain:
O(log(n)) access
operator[] ease of use
sparse nature
list container advantages I would like to maintain:
having an order between the items
being able to traverse the list easily UPDATE: by a sorting order based on the key or value
A simple example application would be to hold a list of certain valid dates (business dates, holidays, some other set of important dates...), once given a specific date, you could find it immediately "map style" and then find the next valid date "list style".
std::map is already a sorted container where you can iterate over the contained items in order. It only provides O(log(n)) access, though.
std::tr1::unordered_map (or std::unordered_map in C++0x) has O(1) access but is unsorted.
Do you really need O(1) access? You have to use large datasets and do many lookups for O(log(n)) not being fast enough.
If O(log(n)) is enough, std::map provides everything you are asking for.
If you don't consider the sparse nature, you can take a look at the Boost Multi-Index library. For the sparse nature, you can take a look at the Boost Flyweight library, but I guess you'll have to join both approaches by yourself. Note that your requirements are often contradictory and hard to achieve. For instance, O(1) and order between the items is difficult to maintain efficiently.
Maps are generally implemented as trees and thus have logarithmic look up time, not O(1), but it sounds like you want a sorted associative container. Hash maps have O(1) best case, O(N) worst case, so perhaps that is what you mean, but they are not sorted, and I don't think they are part of the standard library yet.
In the C++ standard library, map, set, multimap, and multiset are sorted associative containers, but you have to give up the O(1) look up requirement.
According to Stroustrup, the [] operator for maps is O(log(n)). That is much better than the O(n) you'd get if you were to try such a thing with a list, but it is definitely not O(1). The only container that gives you that for the [] operator is vector.
That aside, you can already do all your listy stuff with maps. Iterators work fine on them. So if I were you, I'd stick with map.
having an order between the items
being able to traverse the list easily
Maps already do both. They are sorted, so you start at begin() and traverse until you hit end(). You can, of course, start at any map iterator; you may find map's find, lower_bound, and related methods helpful.
You can store data in a list and have a map to iterators of your list enabling you to find the actual list element itself. This kind of thing is something I often use for LRU containers, where I want a list because I need to move the accessed element to the end to make it the most recently accessed. You can use the splice function to do this, and since the 2003 standard it does not invalidate the iterator as long as you keep it in the same list.
How about this one: all dates are stored in std::list<Date>, but you look it up with helper structure stdext::hash_map<Date, std::list<Date>::iterator>. Once you have iterator for the list access to the next element is simple. In your STL implementation it could be std::tr1::unordered_map instead of stdext::hash_map, and there is boost::unordered_map as well.
You will never find a container that satisfies both O(log n) access and an ordered nature. The reason is that if a container is ordered then inherently it must support an arbitrary order. That's what an ordered nature means: you get to decide exactly where any element is positioned. So to find any element you have to guess where it is. It can be anywhere, because you can place it anywhere!
Note that an ordered sequence is not the same as a sorted sequence. A sorted nature means there is one particular ordering relation between any two elements. An ordered nature means there may be more than one ordering relation among the elements.

What's a good and stable C++ tree implementation?

I'm wondering if anyone can recommend a good C++ tree implementation, hopefully one that is
stl compatible if at all possible.
For the record, I've written tree algorithms many times before, and I know it can be fun, but I want to be pragmatic and lazy if at all possible. So an actual link to a working solution is the goal here.
Note: I'm looking for a generic tree, not a balanced tree or a map/set, the structure itself and the connectivity of the tree is important in this case, not only the data within.
So each branch needs to be able to hold arbitrary amounts of data, and each branch should be separately iterateable.
I don't know about your requirements, but wouldn't you be better off with a graph (implementations for example in Boost Graph) if you're interested mostly in the structure and not so much in tree-specific benefits like speed through balancing? You can 'emulate' a tree through a graph, and maybe it'll be (conceptually) closer to what you're looking for.
Take a look at this.
The tree.hh library for C++ provides an STL-like container class for n-ary trees, templated over the data stored at the nodes. Various types of iterators are provided (post-order, pre-order, and others). Where possible the access methods are compatible with the STL or alternative algorithms are available.
HTH
I am going to suggest using std::map instead of a tree.
The complexity characteristics of a tree are:
Insert: O(ln(n))
Removal: O(ln(n))
Find: O(ln(n))
These are the same characteristics the std::map guarantees.
Thus as a result most implementations of std::map use a tree (Red-Black Tree) underneath the covers (though technically this is not required).
If you don't have (key, value) pairs, but simply keys, use std::set. That uses the same Red-Black tree as std::map.
Ok folks, I found another tree library; stlplus.ntree. But haven't tried it out yet.
Let suppose the question is about balanced (in some form, mostly red black tree) binary trees, even if it is not the case.
Balanced binaries trees, like vector, allow to manage some ordering of elements without any need of key (like by inserting elements anywhere in vector), but :
With optimal O(log(n)) or better complexity for all the modification of one element (add/remove at begin, end and before & after any iterator)
With persistance of iterators thru any modifications except direct destruction of the element pointed by the iterator.
Optionally one may support access by index like in vector (with a cost of one size_t by element), with O(log(n)) complexity. If used, iterators will be random.
Optionally order can be enforced by some comparison func, but persistence of iterators allow to use non repeatable comparison scheme (ex: arbitrary car lanes change during traffic jam).
In practice, balanced binary tree have interface of vector, list, double linked list, map, multimap, deque, queue, priority_queue... with attaining theoretic optimal O(log(n)) complexity for all single element operations.
<sarcastic> this is probably why c++ stl does not propose it </sarcastic>
Individuals may not implement general balanced tree by themselves, due to the difficulties to get correct management of balancing, especially during element extraction.
There is no widely available implementation of balanced binary tree because the state of the art red black tree (at this time the best type of balanced tree due to fixed number of costly tree reorganizations during remove) know implementation, slavishly copied by every implementers’ from the initial code of the structure inventor, does not allow iterator persistency. It is probably the reason of the absence of fully functionnal tree template.