I'm looking for a C++ implementation of a data structure ( or a combination of data structures ) that meet the following criteria:
items are accessed in the same way as in std::vector
provides random access iterator ( along with iterator comparison <,> )
average item access(:lookup) time is at worst of O(log(n)) complexity
items are iterated over in the same order as they were added to the container
given an iterator, i can find out the ordinal position of the item pointed to in the container, at worst of O(log(n)) complexity
provides item insertion and removal at specific position of at worst O(log(n)) complexity
removal/insertion of items does not invalidate previously obtained iterators
Thank you in advance for any suggestions
Dalibor
(Edit) Answers:
The answer I selected describes a data structure that meet all these requirements. However, boost::multi_index, as suggested by Maxim Yegorushkin, provides features very close to those above.
(Edit) Some of the requirements were not correctly specified. They are modified according to correction(:original)
(Edit) I've found an implementation of the data structure described in the accepted answer. So far, it works as expected. It's called counter tree
(Edit) Consider using the AVL-Array suggested by sp2danny
Based on your requirements boost::multi_index with two indices does the trick.
The first index is ordered index. It allows for O(log(n)) insert/lookup/remove. The second index is random access index. It allows for random access and the elements are stored in the order of insertion. For both indices iterators don't get invalidated when other elements are removed. Converting from one iterator to another is O(1) operation.
Let's go through these...
average item lookup time is at worst of O(log(n)) complexity
removal/insertion of items does not invalidate previously obtained iterators
provides item insertion and removal of at worst O(log(n)) complexity
That pretty much screams "tree".
provides random access iterator ( along with iterator comparison <,> )
given an iterator, i can find out the ordinal position of the item pointed to in the container, at worst of O(log(n)) complexity
items are iterated over in the same order as they were added to the container
I'm assuming that the index you're providing your random-access iterator is by order of insertion, so [0] would be the oldest element in the container, [1] would be the next oldest, etc. This means that, on deletion, for the iterators to be valid, the iterator internally cannot store the index, since it could change without notice. So just using a map with the key being the insertion order isn't going to work.
Given that, each node of your tree needs to keep track of how many elements are in each subtree, in addition to its usual members. This will allow random-access with O(log(N)) time. I don't know of a ready-to-go set of code, but subclassing std::rb_tree and std::rb_node would be my starting point.
See here: STL Containers (scroll down the page to see information on algorithmic complexity) and I think std::deque fits your requirements.
AVL-Array should fit the bill.
Here's my "lv" container that fit the requirement, O(log n) insert/delete/access time.
https://github.com/xhawk18/lv
The container is header only C++ libraries,
and has the same iterator and functions with other C++ containers, such as list and vector.
"lv" container is based on rb-tree, each node of which has a size value about the amount of nodes in the sub-tree. By check the size of left/right child of a tree, we can fast access the node randomly.
Related
I would like to implement a map whose number of elements never exceeds a certain limit L. When the L+1-th element is inserted, the oldest entry should be removed from the map to empty the space.
I found something similar: Data Structure for Queue using Map Implementations in Java with Size limit of 5. There it is suggested to use a linked hash map, i.e., a hash map that also keeps a linked list of all elements. Unfortunately, that is for java, and I need a solution for C++. I could not find anything like this in the standard library nor in the boost libraries.
Same story here: Add and remove from MAP with limited size
A possible solution for C++ is given here, but it does not address my questions below: C++ how to mix a map with a circular buffer?
I would implement it in a very similar way to what is described there. An hash map to store the key-value pairs and a linked list, or a double-ended queue, of keys to keep the index of the entries. To insert a new value I would add it to the hash map and its key at the end of the index; if the size of the has at this point exceeds the limit, I would pop the first element of the index and remove the entry with that key from the has. Simple, same complexity as adding to the hash map.
Removing an entry requires an iteration over the index to remove the key from there, which has linear complexity for both linked lists and double-ended queue. (Double-ended queues also have the disadvantage that removing an element has itself linear complexity.) So it looks like the remove operation on such a data structure does not preserve the complexity as the underlying has map.
The question is: Is this increase in complexity necessary, or there are some clever ways to implement a limited map data structure so that both insertion and removal keep the same complexity?
Ooops, just posted and immediately realized something important. The size of the index is also limited. If that limit is constant, then the complexity of iterating over it can also be considered constant.
Well, the limit gives an upper bound to the cost of the removal operation.
If the limit is very high one might still prefer a solution that does not involve a linear iteration over the index.
I would still use an associative container to have direct access and a sequential one to allow easy removal of the older item. Let us look at the required access methods:
access to an element given its key => ok, the associative container allows direct access
add a new key-value pair
if the map is not full it is easy: push_back on the sequence container, and simple addition to the associative one
if the map is full, above action must happen, but the oldest element must be removed => front on the sequence container will give that element, and pop_front and erase will remove it, provided the key is contained in the sequence container
remove an element given by its key => trivial to remove from an associative container, but only list allows for constant time removal of an element provided you have an iterator on it. The good news is that removing or inserting an element in a list does not invalidate iterators pointing on other elements.
As you did not give any requirement for keeping keys sorted, I would use an unordered_map for the associative container and a list for the sequence one. The additional requirements is that the list must contain the key, and that the unordered_map must contain an iterator to its corresponding element in the list. The value can be in either container. As I assume that the major access will be the direct one, I would store the value in the map.
It boils down to:
a list<K> to allow identification of the oldest key
an unordered_map<K, pair<V, list<K>::iterator>>
It doubles the storage for key and adds an additional iterator. But keys are expected not to be too big and list::iterator normally contains little more than a pointer: this changes a small amount of memory for speed.
That should be enough to provide constant time
key access
insertion of a new item
key removal of an item
You may want to take a look at Boost.MultiIndex MRU example.
I am looking for a data structure that has the following properties:
Sorted(unless this is not needed for sorted order iteration)
O(1) iteration in the sorted order.
Fast insertion. I would think O(lg(n)) is the way to go.
Fast deletion using an iterator. I am hoping for at least the same speed as insertion. I will never have to delete an item by value, and will always have the iterator available. This requirement means insertion and iteration will never invalidate iterators, unless deletion by value happens at the same speed as deletion by iterator.
Anything else is not relevant and will never be used.
After searching around for a while I was not able to find a data structure that follows these properties. A heap allows for fast insertion and removal(althought not by iterator per se), but is not easy to iterate in the required way.
I have also looked at a sorted vector. This has fast insertion and correct iteration, but deletion is pretty hard there.
I would say this is a pretty common data structure, even though I was not able to find a matching structure. Furthermore I think the fact that I always have an iterator when deleting might improve the performance.
I hope you can help me in the right direction.
std::set or std::multiset is specified to have logarithmic complexity for insert(), and amortized constant complexity for deletion via an existing iterator.
Iterating over a set/multiset will iterate in sorted order.
I'm looking for a C++ container that will enjoy both map container and list container benefits.
map container advantages I would like to maintain:
O(log(n)) access
operator[] ease of use
sparse nature
list container advantages I would like to maintain:
having an order between the items
being able to traverse the list easily UPDATE: by a sorting order based on the key or value
A simple example application would be to hold a list of certain valid dates (business dates, holidays, some other set of important dates...), once given a specific date, you could find it immediately "map style" and then find the next valid date "list style".
std::map is already a sorted container where you can iterate over the contained items in order. It only provides O(log(n)) access, though.
std::tr1::unordered_map (or std::unordered_map in C++0x) has O(1) access but is unsorted.
Do you really need O(1) access? You have to use large datasets and do many lookups for O(log(n)) not being fast enough.
If O(log(n)) is enough, std::map provides everything you are asking for.
If you don't consider the sparse nature, you can take a look at the Boost Multi-Index library. For the sparse nature, you can take a look at the Boost Flyweight library, but I guess you'll have to join both approaches by yourself. Note that your requirements are often contradictory and hard to achieve. For instance, O(1) and order between the items is difficult to maintain efficiently.
Maps are generally implemented as trees and thus have logarithmic look up time, not O(1), but it sounds like you want a sorted associative container. Hash maps have O(1) best case, O(N) worst case, so perhaps that is what you mean, but they are not sorted, and I don't think they are part of the standard library yet.
In the C++ standard library, map, set, multimap, and multiset are sorted associative containers, but you have to give up the O(1) look up requirement.
According to Stroustrup, the [] operator for maps is O(log(n)). That is much better than the O(n) you'd get if you were to try such a thing with a list, but it is definitely not O(1). The only container that gives you that for the [] operator is vector.
That aside, you can already do all your listy stuff with maps. Iterators work fine on them. So if I were you, I'd stick with map.
having an order between the items
being able to traverse the list easily
Maps already do both. They are sorted, so you start at begin() and traverse until you hit end(). You can, of course, start at any map iterator; you may find map's find, lower_bound, and related methods helpful.
You can store data in a list and have a map to iterators of your list enabling you to find the actual list element itself. This kind of thing is something I often use for LRU containers, where I want a list because I need to move the accessed element to the end to make it the most recently accessed. You can use the splice function to do this, and since the 2003 standard it does not invalidate the iterator as long as you keep it in the same list.
How about this one: all dates are stored in std::list<Date>, but you look it up with helper structure stdext::hash_map<Date, std::list<Date>::iterator>. Once you have iterator for the list access to the next element is simple. In your STL implementation it could be std::tr1::unordered_map instead of stdext::hash_map, and there is boost::unordered_map as well.
You will never find a container that satisfies both O(log n) access and an ordered nature. The reason is that if a container is ordered then inherently it must support an arbitrary order. That's what an ordered nature means: you get to decide exactly where any element is positioned. So to find any element you have to guess where it is. It can be anywhere, because you can place it anywhere!
Note that an ordered sequence is not the same as a sorted sequence. A sorted nature means there is one particular ordering relation between any two elements. An ordered nature means there may be more than one ordering relation among the elements.
I couldn't find an answer but I am pretty sure I am not the first one looking for this.
Did anyone know / use / see an STL like container with bidirectional access iterator that has O(1) complexity for Insert/Erase/Lookup ?
Thank you.
There is no abstract data type with O(1) complexity for Insert, Erase AND Lookup which also provides a bi-directional access iterator.
Edit:
This is true for an arbitrarily large domain. Given a sufficiently small domain you can implement a set with O(1) complexity for Insert, Erase and Lookup and a bidirectional access iterator using an array and a doubly linked list:
std::list::iterator array[MAX_VALUE];
std::list list;
Initialise:
for (int i=0;i<MAX_VALUE;i++)
array[i] = list.end();
Insert:
if (array[value] != list.end())
array[value] = list.insert(value);
Erase:
if (array[value] != list.end()) {
array[value].erase();
array[value] = list.end();
}
Lookup:
array[value] != list.end()
tr1's unordered_set (also available in boost) is probably what you are looking for. You don't specify whether or not you want a sequence container or not, and you don't specify what you are using to give O(1) lookup (ie. vectors have O(1) lookup on index, unordered_set mentioned above has O(1) average case lookup based on the element itself).
In practice, it may be sufficient to use array (vector) and defer costs of inserts and deletes.
Delete element by marking it as deleted, insert element into bin at desired position and remember offset for larger indices.
Inserts and deletes will O(1) plus O(N) cleanup at convenient time; lookup will be O(1) average, O(number of changes since last cleanup) worst case.
Associative arrays (hashtable) have O(1) lookup complexity, while doubly linked lists have O(1) bidi iteration.
One trick I've done when messing about storage optimization is to implement a linked list with an add of O(1)[1], then have a caching operation which provides a structure with a faster O(n) lookup[2]. The actual cache takes some O(n) time to build, and I didn't focus on erase. So I 'cheated' a bit and pushed the work into another operation. But if you don't have to do a ton of adds/deletes, it's not a bad way to do it.
[1] Store end pointer and only add onto the end. No traversal required.
[2] I created a dynamic array[3] and searched against it. Since the data wasn't sorted, I couldn't binsearch against it for O(lg n) time. Although I suppose I could have sorted it.
[3]Arrays have better cache performance than lists.
Full list of all the complexity gurantees for the STL can be found here:
What are the complexity guarantees of the standard containers?
Summary:
Insert: No container gurantees O(1) for generic insert.
The only container that has a genric insert gurtantee is: the 'Associative Container'. And this is O(ln(n))
There are containers the provide limited insert gurantees
Forward sequece gurantee an insert at head of O(1)
Back sequence gurantee an insert at tail of O(1)
Erase
The Associative containers gurantee O(1) for erase (If you have an iterator).
Lookup:
If you mean element access by lookup (as no container has O(1) find capabilities).
Then Random Access container is the only container with O(1) accesses
So the answer is based on container types.
This is what the standard gurantees are defiend for how does this translate to real containers:
std::vector: Sequence, Back Sequence, Forward/Reverse/Random Container
std::deque: Sequence, Front/Back Sequence, Forward/Reverse/Random Container
std::list: Sequence, Front/Back Seuqence, Forward/Reverse Container
std::set: Sorted/Simple/Unique Associative Container, Forward Container
std::map: Sorted/Pair/Unique Associative Container, Forward Container
std::multiset: Sorted/Simple/Multiple Associative Container, Forward Container
std::multimap: Sorted/Pair/Multiple Associative Container, Forward Container
You won't be able to fit all of your requirements into one container... something's gotta give ;)
However, maybe this is interesting for you:
http://www.cplusplus.com/reference/stl/
I need a container (not necessarily a STL container) which let me do the following easily:
Insertion and removal of elements at any position
Accessing elements by their index
Iterate over the elements in any order
I used std::list, but it won't let me insert at any position (it does, but for that I'll have to iterate over all elements and then insert at the position I want, which is slow, as the list may be huge). So can you recommend any efficient solution?
It's not completely clear to me what you mean by "Iterate over the elements in any order" - does this mean you don't care about the order, as long as you can iterate, or that you want to be able to iterate using arbitrarily defined criteria? These are very different conditions!
Assuming you meant iteration order doesn't matter, several possible containers come to mind:
std::map [a red-black tree, typically]
Insertion, removal, and access are O(log(n))
Iteration is ordered by index
hash_map or std::tr1::unordered_map [a hash table]
Insertion, removal, and access are all (approx) O(1)
Iteration is 'random'
This diagram will help you a lot, I think so.
Either a vector or a deque will suit. vector will provide faster accesses, but deque will provide faster instertions and removals.
Well, you can't have all of those in constant time, unfortunately. Decide if you are going to do more insertions or reads, and base your decision on that.
For example, a vector will let you access any element by index in constant time, iterate over the elements in linear time (all containers should allow this), but insertion and removal takes linear time (slower than a list).
You can try std::deque, but it will not provide the constant time removal of elements in middle but it supports
random access to elements
constant time insertion and removal
of elements at the end of the
sequence
linear time insertion and removal of
elements in the middle.
A vector. When you erase any item, copy the last item over one to be erased (or swap them, whichever is faster) and pop_back. To insert at a position (but why should you, if the order doesn't matter!?), push_back the item at that position and overwrite (or swap) with item to be inserted.
By "iterating over the elements in any order", do you mean you need support for both forward and backwards by index, or do you mean order doesn't matter?
You want a special tree called a unsorted counted tree. This allows O(log(n)) indexed insertion, O(log(n)) indexed removal, and O(log(n)) indexed lookup. It also allows O(n) iteration in either the forward or reverse direction. One example where these are used is text editors, where each line of text in the editor is a node.
Here are some references:
Counted B-Trees
Rope (computer science)
An order statistic tree might be useful here. It's basically just a normal tree, except that every node in the tree includes a count of the nodes in its left sub-tree. This supports all the basic operations with no worse than logarithmic complexity. During insertion, anytime you insert an item in a left sub-tree, you increment the node's count. During deletion, anytime you delete from the left sub-tree, you decrement the node's count. To index to node N, you start from the root. The root has a count of nodes in its left sub-tree, so you check whether N is less than, equal to, or greater than the count for the root. If it's less, you search in the left subtree in the same way. If it's greater, you descend the right sub-tree, add the root's count to that node's count, and compare that to N. Continue until A) you've found the correct node, or B) you've determined that there are fewer than N items in the tree.
(source: adrinael.net)
But it sounds like you're looking for a single container with the following properties:
All the best benefits of various containers
None of their ensuing downsides
And that's impossible. One benefit causes a detriment. Choosing a container is about compromise.
std::vector
[padding for "15 chars" here]