Insert, find-min(key) and delete(based on value) functionality in logN with STL containers? - c++

Here's an interesting problem:
Let's say we have a set A for which the following are permitted:
Insert x
Find-min x
Delete the n-th inserted element in A
Create a data structure to permit these in logarithmic time.
The most common solution is with a heap. AFAIK, heaps with decrease-key (based on a value - generally the index when an element was added) keep a table with the Pos[1...N] meaning the i-th added value is now on index Pos[i], so it can find the key to decrease in O(1). Can someone confirm this?
Another question is how we solve the problem with STL containers? i.e. with sets, maps or priority queues. A partial solution i found is to have a priority queue with indexes but ordered by the value to these indexes. I.e. A[1..N] are our added elements in order of insertion. pri-queue with 1..N based on comparison of (A[i],A[j]). Now we keep a table with the deleted indexes and verify if the min-value index was deleted. Unfortunately, Find-min becomes slightly proportional with no. of deleted values.
Any alternative ideas?
Now I thought how to formulate a more general problem.
Create a data structure similar to multimap with <key, value> elements. Keys are not unique. Values are. Insert, find one (based on key), find (based on value), delete one (based on key) and delete (based on value) must be permitted O(logN).
Perhaps a bit oddly, this is possible with a manually implemented Binary Search Tree with a modification: for every node operation a hash-table or a map based on value is updated with the new pointer to the node.
Similar to having a strictly ordered std::set (if equal key order by value) with a hash-table on value giving the iterator to the element containing that value.
Possible with std::set and a (std::map/hash table) as described by Chong Luo.

You can use a combination of two containers to solve your problem - A vector in which you add each consecutive element and a set:
You use the set to execute find_min while
When you insert an element you execute push_back in the vector and insert in the set
When you delete the n-th element, you see it's value in the vector and erase it from the set. Here I assume the number of elements does not change after executing delete n-th element.
I think you can not solve the problem with only one container from STL. However there are some data structures that can solve your problem:
Skip list - can find the minimum in constant time and will perform the other two operations with amortized complexity O(log(n)). It is relatively easy to implement.
Tiered vector is easy to implement and will perform find_min in constant time and the other two operations in O(sqrt(n))
And of course the approach you propose - write your own heap that keeps track of where is the n-th element in it.

Related

How to implement a map with limited size

I would like to implement a map whose number of elements never exceeds a certain limit L. When the L+1-th element is inserted, the oldest entry should be removed from the map to empty the space.
I found something similar: Data Structure for Queue using Map Implementations in Java with Size limit of 5. There it is suggested to use a linked hash map, i.e., a hash map that also keeps a linked list of all elements. Unfortunately, that is for java, and I need a solution for C++. I could not find anything like this in the standard library nor in the boost libraries.
Same story here: Add and remove from MAP with limited size
A possible solution for C++ is given here, but it does not address my questions below: C++ how to mix a map with a circular buffer?
I would implement it in a very similar way to what is described there. An hash map to store the key-value pairs and a linked list, or a double-ended queue, of keys to keep the index of the entries. To insert a new value I would add it to the hash map and its key at the end of the index; if the size of the has at this point exceeds the limit, I would pop the first element of the index and remove the entry with that key from the has. Simple, same complexity as adding to the hash map.
Removing an entry requires an iteration over the index to remove the key from there, which has linear complexity for both linked lists and double-ended queue. (Double-ended queues also have the disadvantage that removing an element has itself linear complexity.) So it looks like the remove operation on such a data structure does not preserve the complexity as the underlying has map.
The question is: Is this increase in complexity necessary, or there are some clever ways to implement a limited map data structure so that both insertion and removal keep the same complexity?
Ooops, just posted and immediately realized something important. The size of the index is also limited. If that limit is constant, then the complexity of iterating over it can also be considered constant.
Well, the limit gives an upper bound to the cost of the removal operation.
If the limit is very high one might still prefer a solution that does not involve a linear iteration over the index.
I would still use an associative container to have direct access and a sequential one to allow easy removal of the older item. Let us look at the required access methods:
access to an element given its key => ok, the associative container allows direct access
add a new key-value pair
if the map is not full it is easy: push_back on the sequence container, and simple addition to the associative one
if the map is full, above action must happen, but the oldest element must be removed => front on the sequence container will give that element, and pop_front and erase will remove it, provided the key is contained in the sequence container
remove an element given by its key => trivial to remove from an associative container, but only list allows for constant time removal of an element provided you have an iterator on it. The good news is that removing or inserting an element in a list does not invalidate iterators pointing on other elements.
As you did not give any requirement for keeping keys sorted, I would use an unordered_map for the associative container and a list for the sequence one. The additional requirements is that the list must contain the key, and that the unordered_map must contain an iterator to its corresponding element in the list. The value can be in either container. As I assume that the major access will be the direct one, I would store the value in the map.
It boils down to:
a list<K> to allow identification of the oldest key
an unordered_map<K, pair<V, list<K>::iterator>>
It doubles the storage for key and adds an additional iterator. But keys are expected not to be too big and list::iterator normally contains little more than a pointer: this changes a small amount of memory for speed.
That should be enough to provide constant time
key access
insertion of a new item
key removal of an item
You may want to take a look at Boost.MultiIndex MRU example.

why Run-time for add-before for doubly-linked lists is O(1)?

In data structures, we say pushing an element before a node in singly-linked lists are O(n) operation! since there is no backward pointers, we have to walk all the way through the elements to get to the key we are going to add before the new element. Therefore, it has a linear run time.
Then, when we introduce doubly-linked lists, we say the problem is resolved and now since we have pointers in both directions pushing before becomes a constant time operation O(1).
I understand the logic but still, something is confusing to me! Since we DO NOT have constant time access to the elements of the list, for finding the element we want to add before, we have to walk through the previous element to get there! that is true that in the doubly-linked list it is now faster to implement the add-before command, but still, the action of finding the interested key is O(n)! then why we say with the doubly-linked list the operation of add before becomes O(1)?
Thanks,
In C++, the std::list::insert() function takes an iterator to indicate where the insert should occur. That means the caller already has this iterator, and the insert operation is not doing a search and therefore runs in constant time.
The find() algorithm, however, is linear, and is the normal way to search for a list element. If you need to find+insert, the combination is O(n).
However, there is no requirement to do a search before an insert. For example, if you have a cached (valid) iterator, you can insert in front of (or delete) the element it corresponds with in constant time.

Is there a linked hash set in C++?

Java has a LinkedHashSet, which is a set with a predictable iteration order. What is the closest available data structure in C++?
Currently I'm duplicating my data by using both a set and a vector. I insert my data into the set. If the data inserted successfully (meaning data was not already present in the set), then I push_back into the vector. When I iterate through the data, I use the vector.
If you can use it, then a Boost.MultiIndex with sequenced and hashed_unique indexes is the same data structure as LinkedHashSet.
Failing that, keep an unordered_set (or hash_set, if that's what your implementation provides) of some type with a list node in it, and handle the sequential order yourself using that list node.
The problems with what you're currently doing (set and vector) are:
Two copies of the data (might be a problem when the data type is large, and it means that your two different iterations return references to different objects, albeit with the same values. This would be a problem if someone wrote some code that compared the addresses of the "same" elements obtained in the two different ways, expecting the addresses to be equal, or if your objects have mutable data members that are ignored by the order comparison, and someone writes code that expects to mutate via lookup and see changes when iterating in sequence).
Unlike LinkedHashSet, there is no fast way to remove an element in the middle of the sequence. And if you want to remove by value rather than by position, then you have to search the vector for the value to remove.
set has different performance characteristics from a hash set.
If you don't care about any of those things, then what you have is probably fine. If duplication is the only problem then you could consider keeping a vector of pointers to the elements in the set, instead of a vector of duplicates.
To replicate LinkedHashSet from Java in C++, I think you will need two vanilla std::map (please note that you will get LinkedTreeSet rather than the real LinkedHashSet instead which will get O(log n) for insert and delete) for this to work.
One uses actual value as key and insertion order (usually int or long int) as value.
Another ones is the reverse, uses insertion order as key and actual value as value.
When you are going to insert, you use std::map::find in the first std::map to make sure that there is no identical object exists in it.
If there is already exists, ignore the new one.
If it does not, you map this object with the incremented insertion order to both std::map I mentioned before.
When you are going to iterate through this by order of insertion, you iterate through the second std::map since it will be sorted by insertion order (anything that falls into the std::map or std::set will be sorted automatically).
When you are going to remove an element from it, you use std::map::find to get the order of insertion. Using this order of insertion to remove the element from the second std::map and remove the object from the first one.
Please note that this solution is not perfect, if you are planning to use this on the long-term basis, you will need to "compact" the insertion order after a certain number of removals since you will eventually run out of insertion order (2^32 indexes for unsigned int or 2^64 indexes for unsigned long long int).
In order to do this, you will need to put all the "value" objects into a vector, clear all values from both maps and then re-insert values from vector back into both maps. This procedure takes O(nlogn) time.
If you're using C++11, you can replace the first std::map with std::unordered_map to improve efficiency, you won't be able to replace the second one with it though. The reason is that std::unordered map uses a hash code for indexing so that the index cannot be reliably sorted in this situation.
You might wanna know that std::map doesn't give you any sort of (log n) as in "null" lookup time. And using std::tr1::unordered is risky business because it destroys any ordering to get constant lookup time.
Try to bash a boost multi index container to be more freely about it.
The way you described your combination of std::set and std::vector sounds like what you should be doing, except by using std::unordered_set (equivalent to Java's HashSet) and std::list (doubly-linked list). You could also use std::unordered_map to store the key (for lookup) along with an iterator into the list where to find the actual objects you store (if the keys are different from the objects (or only a part of them)).
The boost library does provide a number of these types of combinations of containers and look-up indices. For example, this bidirectional list with fast look-ups example.

Looking for special C++ data structure

I'm looking for a C++ implementation of a data structure ( or a combination of data structures ) that meet the following criteria:
items are accessed in the same way as in std::vector
provides random access iterator ( along with iterator comparison <,> )
average item access(:lookup) time is at worst of O(log(n)) complexity
items are iterated over in the same order as they were added to the container
given an iterator, i can find out the ordinal position of the item pointed to in the container, at worst of O(log(n)) complexity
provides item insertion and removal at specific position of at worst O(log(n)) complexity
removal/insertion of items does not invalidate previously obtained iterators
Thank you in advance for any suggestions
Dalibor
(Edit) Answers:
The answer I selected describes a data structure that meet all these requirements. However, boost::multi_index, as suggested by Maxim Yegorushkin, provides features very close to those above.
(Edit) Some of the requirements were not correctly specified. They are modified according to correction(:original)
(Edit) I've found an implementation of the data structure described in the accepted answer. So far, it works as expected. It's called counter tree
(Edit) Consider using the AVL-Array suggested by sp2danny
Based on your requirements boost::multi_index with two indices does the trick.
The first index is ordered index. It allows for O(log(n)) insert/lookup/remove. The second index is random access index. It allows for random access and the elements are stored in the order of insertion. For both indices iterators don't get invalidated when other elements are removed. Converting from one iterator to another is O(1) operation.
Let's go through these...
average item lookup time is at worst of O(log(n)) complexity
removal/insertion of items does not invalidate previously obtained iterators
provides item insertion and removal of at worst O(log(n)) complexity
That pretty much screams "tree".
provides random access iterator ( along with iterator comparison <,> )
given an iterator, i can find out the ordinal position of the item pointed to in the container, at worst of O(log(n)) complexity
items are iterated over in the same order as they were added to the container
I'm assuming that the index you're providing your random-access iterator is by order of insertion, so [0] would be the oldest element in the container, [1] would be the next oldest, etc. This means that, on deletion, for the iterators to be valid, the iterator internally cannot store the index, since it could change without notice. So just using a map with the key being the insertion order isn't going to work.
Given that, each node of your tree needs to keep track of how many elements are in each subtree, in addition to its usual members. This will allow random-access with O(log(N)) time. I don't know of a ready-to-go set of code, but subclassing std::rb_tree and std::rb_node would be my starting point.
See here: STL Containers (scroll down the page to see information on algorithmic complexity) and I think std::deque fits your requirements.
AVL-Array should fit the bill.
Here's my "lv" container that fit the requirement, O(log n) insert/delete/access time.
https://github.com/xhawk18/lv
The container is header only C++ libraries,
and has the same iterator and functions with other C++ containers, such as list and vector.
"lv" container is based on rb-tree, each node of which has a size value about the amount of nodes in the sub-tree. By check the size of left/right child of a tree, we can fast access the node randomly.

Which STL Container?

I need a container (not necessarily a STL container) which let me do the following easily:
Insertion and removal of elements at any position
Accessing elements by their index
Iterate over the elements in any order
I used std::list, but it won't let me insert at any position (it does, but for that I'll have to iterate over all elements and then insert at the position I want, which is slow, as the list may be huge). So can you recommend any efficient solution?
It's not completely clear to me what you mean by "Iterate over the elements in any order" - does this mean you don't care about the order, as long as you can iterate, or that you want to be able to iterate using arbitrarily defined criteria? These are very different conditions!
Assuming you meant iteration order doesn't matter, several possible containers come to mind:
std::map [a red-black tree, typically]
Insertion, removal, and access are O(log(n))
Iteration is ordered by index
hash_map or std::tr1::unordered_map [a hash table]
Insertion, removal, and access are all (approx) O(1)
Iteration is 'random'
This diagram will help you a lot, I think so.
Either a vector or a deque will suit. vector will provide faster accesses, but deque will provide faster instertions and removals.
Well, you can't have all of those in constant time, unfortunately. Decide if you are going to do more insertions or reads, and base your decision on that.
For example, a vector will let you access any element by index in constant time, iterate over the elements in linear time (all containers should allow this), but insertion and removal takes linear time (slower than a list).
You can try std::deque, but it will not provide the constant time removal of elements in middle but it supports
random access to elements
constant time insertion and removal
of elements at the end of the
sequence
linear time insertion and removal of
elements in the middle.
A vector. When you erase any item, copy the last item over one to be erased (or swap them, whichever is faster) and pop_back. To insert at a position (but why should you, if the order doesn't matter!?), push_back the item at that position and overwrite (or swap) with item to be inserted.
By "iterating over the elements in any order", do you mean you need support for both forward and backwards by index, or do you mean order doesn't matter?
You want a special tree called a unsorted counted tree. This allows O(log(n)) indexed insertion, O(log(n)) indexed removal, and O(log(n)) indexed lookup. It also allows O(n) iteration in either the forward or reverse direction. One example where these are used is text editors, where each line of text in the editor is a node.
Here are some references:
Counted B-Trees
Rope (computer science)
An order statistic tree might be useful here. It's basically just a normal tree, except that every node in the tree includes a count of the nodes in its left sub-tree. This supports all the basic operations with no worse than logarithmic complexity. During insertion, anytime you insert an item in a left sub-tree, you increment the node's count. During deletion, anytime you delete from the left sub-tree, you decrement the node's count. To index to node N, you start from the root. The root has a count of nodes in its left sub-tree, so you check whether N is less than, equal to, or greater than the count for the root. If it's less, you search in the left subtree in the same way. If it's greater, you descend the right sub-tree, add the root's count to that node's count, and compare that to N. Continue until A) you've found the correct node, or B) you've determined that there are fewer than N items in the tree.
(source: adrinael.net)
But it sounds like you're looking for a single container with the following properties:
All the best benefits of various containers
None of their ensuing downsides
And that's impossible. One benefit causes a detriment. Choosing a container is about compromise.
std::vector
[padding for "15 chars" here]