I want a structure to store (for example) numbers where I can insert and remove elements, my structure remains sorted always (like a priority queue) BUT with the possibility of knowing where is a given number, and every operation in logarithmic time.
Maybe with lower_bound, upper_bound, or just a binary search, but in priority_queue what blocks me to do binary search is that I cannot access the elements with an index, only the first one.
I think you’re looking for an order statistics tree, an augmented BST that supports all the regular BST operations in time O(log n), along with two others:
rank(elem): return which index elem would occupy in the sorted sequence.
index(k): given an index k, return the element at that index in the sorted sequence.
The two above operations run in O(log n) time, making them extremely fast.
You can treat an order statistics tree as a priority queue. Insertions work as normal BST insertions, and to extract the lowest/highest element you just remove the smallest/greatest element from the tree, which you can do in time O(log n) by just walking down the left or right spines of the tree.
A priority queue does not keep things in sorted order. At least, not typically. A priority queue makes it possible for you to quickly obtain the next item in the sequence. But you can't efficiently access, say, the 5th item in the queue.
I know of three different ways to build a priority queue in which you can efficiently access items by key:
Use a balanced binary search tree to implement the queue. Although all operations are O(log n), typical running time is slower than a binary heap.
Implement the heap priority queue as a skip list. This is a good option. I've seen some people report that a skip list priority queue outperforms a binary heap. A search for [C++ skip list] will return you lots of implementations.
What I call an indexed binary heap also works. Basically, you marry a hash map or dictionary with a binary heap. The map is indexed by key, and its value contains the index of the item in the heap array. Such a thing is not difficult to build, and is quite effective.
Come to think of it, you can make an indexed version of any type of heap.
You have a number of options. I rather like the skip list, myself, but your mileage may vary.
The indexed binary heap, as I pointed out, is a hybrid data structure that maintains a dictionary (hash map) and a binary heap. Briefly how it works:
The dictionary key is the field that you use to look up an item that you put into the heap. The value is an integer: the index of that item in the heap.
The heap itself is a standard binary heap implemented in an array. The only difference is that every time you move an item from one place to another in the heap, you update its location in the dictionary. So, for example, if you swap two items, you have to swap not only the items themselves in the array, but also their positions as stored in the dictionary. For example:
heap is an array of string references
dict is a dictionary, keyed by string
swap (a, b)
{
// swap item at heap[a] with item at heap[b]
temp = heap[a]
heap[a] = heap[b]
heap[b] = temp
// update their positions in the dictionary
dict[heap[a]] = b
dict[heap[b]] = a
}
It's a pretty simple modification of a standard binary heap implementation. You just have to be careful to update the position every time you move an item.
You can also do this with node-based heaps like Pairing heap, Fibonacci heap, Skew heap, etc.
Related
How i can access/search randomly in a priority queue?. for example, if have a priority queue like q={5,4,3,2,1} for example, i want to access 3rd value directly which is 3, i could not do this,is there any process to access randomly in priority queue?
Most priority queue implementations, including the C++ std::priority_queue type, don't support random access. The idea behind a priority queue is to sacrifice random access for fast access to the smallest element.
Depending on what you're trying to do, there are a number of other approaches you could use. If you always want access to the third element in the queue (and not any other arbitrary positions), it's probably fast enough to just dequeue two elements, cache them, then dequeue the value you want and put the other two elements back.
If you want access to the kth-smallest element at any point in time, where k is larger, one option is to store two different priority queues: a reverse-sorted priority queue that holds k elements (call it the left queue) and a regular priority queue holding the remaining n-k elements (call it the right queue). To get the kth-smallest element, dequeue from the left queue (giving back the kth-smallest element), then dequeue an element from the right and enqueue into the left to get it back up to k total elements. To do an enqueue, check if the number is less than the top of the left queue. If so, dequeue from the left queue, enqueue the removed element into the right queue, then enqueue the original element into the left. Otherwise, enqueue into the right. This guarantees O(log n) runtimes for each operation.
If you need true random access to a sorted sequence, consider using an order statistics tree. This is an augmented binary search tree that supports O(log n) access to elements by index. You can use this to build a priority queue - the minimum element is always at index 0. The catch (of course there's a catch) is that it's hard to find a good implementation of one and the constant factors hidden in the O(log n) terms are much higher than in a standard binary heap.
To add to the answer of #templatetypedef:
You cannot combine random access of elements with a priority queue, unless you use a very inefficient priority queue. Here's a few options, depending on what you need:
1- An inefficient priority queue would be a std::vector that you keep sorted. Pushing an element means finding where it should be inserted, and move all subsequent elements forward. Popping an element would be simply reading and deleting the last element (back and pop_back, these are efficient). Random access is efficient as well, of course.
2- You could use a std::multiset (or std::multimap) instead of a priority queue. It is a tree structure that keeps things sorted. You can insert instead of push, then read and remove the first (or last) element using a begin (or rbegin) iterator with erase. Insertion and finding the first/last element are log(n) operations. The data structure allows for reading all elements in order, though it doesn't give random access.
3- You could hack your own version of std::priority_queue using a std::vector and the std::push_heap and std::pop_heap algorithms (together with the push_back and pop_back methods of the std::vector). You'd get the same efficient priority queue, but also random access to all elements of the priority_queue. They are not sorted, all you know is that the first element in your array is the top priority element, and that the other elements are stored in a way that the heap property is satisfied. If you only occasionally want to read all elements in order, you can use the function std::sort_heap to sort all elements in your array by priority. The function std::make_heap will return your array to its heap status.
Note that std::priority_queue uses a std::vector by default to store its data. It might be possible to reinterpret_cast the std::priority_queue to a std::vector, so you get random access to the other elements in the queue. But if it works on your implementation of the standard library, it might not on others, or on future versions of the same library, so I don't recommend you do this! It's safer to create your own heap class using the algorithms in the standard library as per #3 above.
I'm trying to implement a priority queue as an sorted array backed minimum binary heap. I'm trying to get the update_key function to run in logarithmic time, but to do this I have to know the position of the item in the array. Is there anyway to do this without the use of a map? If so, how? Thank you
If you really want to be able to change the key of an arbitrary element, a heap is not the best choice of data structure. What it gives you is the combination of:
compact representation (no pointers, just an array and an implicit
indexing scheme)
logarithmic insertion, rebalancing
logarithmic removal of the smallest (largest) element.
O(1) access to the value of the smallest (largest) element. -
A side benefit of 1. is that the lack of pointers means you do substantially fewer calls to malloc/free (new/delete).
A map (represented in the standard library as a balanced binary tree) gives you the middle two of these, adding in
logarithmic find() on any key.
So while you could attach another data structure to your heap, storing pointers in the heap and then making the comparison operator dereference through the pointer, you'd pretty soon find yourself with the complexity in time and space of just using a map in the first place.
Your find key function should operate in log(n) time. Your updating (changing the key) should be constant time. Your remove function should run in log(n) time. Your insert function should be log(n) time.
If these assumptions are true try this:
1) Find your item in your heap (IE: binary search, since it is a sorted array).
2) Update your key (you're just changing a value, constant time)
3) Remove the item from the heap log(n) to reheapify.
4) Insert your item into the heap log(n).
So, you'd have log(n) + 1 + log(n) + log(n) which reduces to log(n).
Note: this is amortized, because if you have to realloc your array, etc... that adds overhead. But you shouldn't do that very often anyway.
That's the tradeoff of the array-backed heap: you get excellent memory use (good locality and minimal overhead), but you lose track of the elements. To solve it, you have to add back some overhead.
One solution would be this. The heap contains objects of type C*. C is a class with an int member heap_index, which is the index of the object in the heap array. Whenever you move an element inside the heap array, you'll have to update its heap_index to set it to the new index.
Update_key (as well as removal of an arbitrary element) is then log(n) time because it takes constant time to find the element (via heap_index), and log(n) time to bubble it into the correct position.
Since both std::priority_queue and std::set (and std::multiset) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n), what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)?
While I know that the underlying structures are different, I am not as much interested in the difference in their implementation as I am in the comparison their performance and suitability for various uses.
Note: I know about the no-duplicates in a set. That's why I also mentioned std::multiset since it has the exactly same behavior as the std::set but can be used where the data stored is allowed to compare as equal elements. So please, don't comment on single/multiple keys issue.
A priority queue only gives you access to one element in sorted order -- i.e., you can get the highest priority item, and when you remove that, you can get the next highest priority, and so on. A priority queue also allows duplicate elements, so it's more like a multiset than a set. [Edit: As #Tadeusz Kopec pointed out, building a heap is also linear on the number of items in the heap, where building a set is O(N log N) unless it's being built from a sequence that's already ordered (in which case it is also linear).]
A set allows you full access in sorted order, so you can, for example, find two elements somewhere in the middle of the set, then traverse in order from one to the other.
std::priority_queue allows to do the following:
Insert an element O(log n)
Get the smallest element O(1)
Erase the smallest element O(log n)
while std::set has more possibilities:
Insert any element O(log n) and the constant is greater than in std::priority_queue
Find any element O(log n)
Find an element, >= than the one your are looking for O(log n) (lower_bound)
Erase any element O(log n)
Erase any element by its iterator O(1)
Move to previous/next element in sorted order O(1)
Get the smallest element O(1)
Get the largest element O(1)
set/multiset are generally backed by a binary tree. http://en.wikipedia.org/wiki/Binary_tree
priority_queue is generally backed by a heap. http://en.wikipedia.org/wiki/Heap_(data_structure)
So the question is really when should you use a binary tree instead of a heap?
Both structures are laid out in a tree, however the rules about the relationship between anscestors are different.
We will call the positions P for parent, L for left child, and R for right child.
In a binary tree L < P < R.
In a heap P < L and P < R
So binary trees sort "sideways" and heaps sort "upwards".
So if we look at this as a triangle than in the binary tree L,P,R are completely sorted, whereas in the heap the relationship between L and R is unknown (only their relationship to P).
This has the following effects:
If you have an unsorted array and want to turn it into a binary tree it takes O(nlogn) time. If you want to turn it into a heap it only takes O(n) time, (as it just compares to find the extreme element)
Heaps are more efficient if you only need the extreme element (lowest or highest by some comparison function). Heaps only do the comparisons (lazily) necessary to determine the extreme element.
Binary trees perform the comparisons necessary to order the entire collection, and keep the entire collection sorted all-the-time.
Heaps have constant-time lookup (peek) of lowest element, binary trees have logarithmic time lookup of lowest element.
Since both std::priority_queue and std::set (and std::multiset) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n), what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)?
Even though insert and erase operations for both containers have the same complexity O(log n), these operations for std::set are slower than for std::priority_queue. That's because std::set makes many memory allocations. Every element of std::set is stored at its own allocation. std::priority_queue (with underlying std::vector container by default) uses single allocation to store all elements. On other hand std::priority_queue uses many swap operations on its elements whereas std::set uses just pointers swapping. So if swapping is very slow operation for element type, using std::set may be more efficient. Moreover element may be non-swappable at all.
Memory overhead for std::set is much bigger also because it has to store many pointers between its nodes.
Here's an interesting problem:
Let's say we have a set A for which the following are permitted:
Insert x
Find-min x
Delete the n-th inserted element in A
Create a data structure to permit these in logarithmic time.
The most common solution is with a heap. AFAIK, heaps with decrease-key (based on a value - generally the index when an element was added) keep a table with the Pos[1...N] meaning the i-th added value is now on index Pos[i], so it can find the key to decrease in O(1). Can someone confirm this?
Another question is how we solve the problem with STL containers? i.e. with sets, maps or priority queues. A partial solution i found is to have a priority queue with indexes but ordered by the value to these indexes. I.e. A[1..N] are our added elements in order of insertion. pri-queue with 1..N based on comparison of (A[i],A[j]). Now we keep a table with the deleted indexes and verify if the min-value index was deleted. Unfortunately, Find-min becomes slightly proportional with no. of deleted values.
Any alternative ideas?
Now I thought how to formulate a more general problem.
Create a data structure similar to multimap with <key, value> elements. Keys are not unique. Values are. Insert, find one (based on key), find (based on value), delete one (based on key) and delete (based on value) must be permitted O(logN).
Perhaps a bit oddly, this is possible with a manually implemented Binary Search Tree with a modification: for every node operation a hash-table or a map based on value is updated with the new pointer to the node.
Similar to having a strictly ordered std::set (if equal key order by value) with a hash-table on value giving the iterator to the element containing that value.
Possible with std::set and a (std::map/hash table) as described by Chong Luo.
You can use a combination of two containers to solve your problem - A vector in which you add each consecutive element and a set:
You use the set to execute find_min while
When you insert an element you execute push_back in the vector and insert in the set
When you delete the n-th element, you see it's value in the vector and erase it from the set. Here I assume the number of elements does not change after executing delete n-th element.
I think you can not solve the problem with only one container from STL. However there are some data structures that can solve your problem:
Skip list - can find the minimum in constant time and will perform the other two operations with amortized complexity O(log(n)). It is relatively easy to implement.
Tiered vector is easy to implement and will perform find_min in constant time and the other two operations in O(sqrt(n))
And of course the approach you propose - write your own heap that keeps track of where is the n-th element in it.
I am trying to implement my own heap with the method removing any number (not only the min or max) but I can't solve one problem. To write that removing function I need the pointers to the elements in my heap (to have O(logn) time of removing given element). But when I have tried do it this way:
vector<int*> ptr(n);
it of course did not work.
Maybe I should insert into my heap another class or structure containing int, but for now I would like to find any solution with int (because I have already implemented it using int)?
When you need to remove (or change the priority of) other objects than the root, a d-heap isn't necessarily the ideal data structure: the nodes keep changing their position and you need to keep track of various moves. It is doable, however. To use a heap like this you would return a handle to the newly inserted object which identifies some sort of node which stays put. Since the d-heap algorithm relies on the tree being a perfectly balanced tree, you effectively need to implement it using an array. Since these two requirements (using an array and having nodes stay put) are mutually exclusive you need to do both and have an index from the nodes into the array (so you can find the position of the object in the array) and a pointer from the array to the node (so you can update the node when the position changes). Almost certainly you don't want to move your nodes a lot, i.e. you rather accept finding the proper direction to move a nodes by searching multiple nodes, i.e. you want to use a d > 2.
There are alternative approach to implement a heap which are inherently nodes based. In particular Fibonacci heaps which yield for certain usage patterns a better amortized complexity than the usual O(ln(n)) complexity. However, they are somewhat harder to implement and the actual efficiency only pays off if you either need to change the priority of a node frequently or you have fairly large data sets.
A heap is a particular sort of data structure; the elements are stored in a binary tree, and there are well-established procedures for adding or removing elements. Many implementations use an array to hold the tree nodes, and removing an element involved moving log(n) elements around. Normally the way the array is used, the children of the node at array location n are stored at locations 2n and 2n+1; element 0 is left empty.
This Wikipedia page does a fine job of explaining the algorithms.