Why Priority queue preferred to implement using heap not by array although Heap itself implemented using array - heap

Why Priority queue preferred to implement using heap not by the array, although Heap itself implemented using an array.

The binary heap provides a partial ordering, which guarantees O(log n) insertion and O(log n) removal of the highest-priority item.
With a flat array, you have two choices:
Maintain an ordered array. Insertion becomes an O(n) operation, and removal of the highest-priority item is O(1).
Insert items in the order they're received. Insertion then becomes O(1), and removal of the highest-priority item is O(n).
Either of those two options makes the priority queue less efficient than if you implement a binary heap.

Related

Randomly Access in a Priority Queue

How i can access/search randomly in a priority queue?. for example, if have a priority queue like q={5,4,3,2,1} for example, i want to access 3rd value directly which is 3, i could not do this,is there any process to access randomly in priority queue?
Most priority queue implementations, including the C++ std::priority_queue type, don't support random access. The idea behind a priority queue is to sacrifice random access for fast access to the smallest element.
Depending on what you're trying to do, there are a number of other approaches you could use. If you always want access to the third element in the queue (and not any other arbitrary positions), it's probably fast enough to just dequeue two elements, cache them, then dequeue the value you want and put the other two elements back.
If you want access to the kth-smallest element at any point in time, where k is larger, one option is to store two different priority queues: a reverse-sorted priority queue that holds k elements (call it the left queue) and a regular priority queue holding the remaining n-k elements (call it the right queue). To get the kth-smallest element, dequeue from the left queue (giving back the kth-smallest element), then dequeue an element from the right and enqueue into the left to get it back up to k total elements. To do an enqueue, check if the number is less than the top of the left queue. If so, dequeue from the left queue, enqueue the removed element into the right queue, then enqueue the original element into the left. Otherwise, enqueue into the right. This guarantees O(log n) runtimes for each operation.
If you need true random access to a sorted sequence, consider using an order statistics tree. This is an augmented binary search tree that supports O(log n) access to elements by index. You can use this to build a priority queue - the minimum element is always at index 0. The catch (of course there's a catch) is that it's hard to find a good implementation of one and the constant factors hidden in the O(log n) terms are much higher than in a standard binary heap.
To add to the answer of #templatetypedef:
You cannot combine random access of elements with a priority queue, unless you use a very inefficient priority queue. Here's a few options, depending on what you need:
1- An inefficient priority queue would be a std::vector that you keep sorted. Pushing an element means finding where it should be inserted, and move all subsequent elements forward. Popping an element would be simply reading and deleting the last element (back and pop_back, these are efficient). Random access is efficient as well, of course.
2- You could use a std::multiset (or std::multimap) instead of a priority queue. It is a tree structure that keeps things sorted. You can insert instead of push, then read and remove the first (or last) element using a begin (or rbegin) iterator with erase. Insertion and finding the first/last element are log(n) operations. The data structure allows for reading all elements in order, though it doesn't give random access.
3- You could hack your own version of std::priority_queue using a std::vector and the std::push_heap and std::pop_heap algorithms (together with the push_back and pop_back methods of the std::vector). You'd get the same efficient priority queue, but also random access to all elements of the priority_queue. They are not sorted, all you know is that the first element in your array is the top priority element, and that the other elements are stored in a way that the heap property is satisfied. If you only occasionally want to read all elements in order, you can use the function std::sort_heap to sort all elements in your array by priority. The function std::make_heap will return your array to its heap status.
Note that std::priority_queue uses a std::vector by default to store its data. It might be possible to reinterpret_cast the std::priority_queue to a std::vector, so you get random access to the other elements in the queue. But if it works on your implementation of the standard library, it might not on others, or on future versions of the same library, so I don't recommend you do this! It's safer to create your own heap class using the algorithms in the standard library as per #3 above.

c++ a fixed sized priority queue to store k-nearest neighbors

I am implementing k nearest neighbor search in a tree data structure. I store the results in a priority queue, which will automatically sort the elements in a ascending order, and so the first k elements are the results. The priority_queue container in STL is really not a good option here because it support only a few functions such as push(), pop(), top(), size() empty(), etc. A big problem here is that when searching the whole tree, I need to visit a lot of nodes, and using push() will make the priority queue longer and longer, which will increase time cost for later operations. What I really want is a fixed-length priority queue, so when push() a new element into the queue, some elements with larger values will be automatically deleted. How can I implement this? Or is there any standard container I can use? Thank you.
What about using std::set? It stores elements in order, and if it grows above k elements you can just remove the largest one (in constant time). Each insertion is O(log k).
One way with priority_queue but changing the ordering (ascending to descending), and if it grows above k elements, remove the top element (which is the farther).

STL priority_queue<pair> vs. map

I need a priority queue that will store a value for every key, not just the key. I think the viable options are std::multi_map<K,V> since it iterates in key order, or std::priority_queue<std::pair<K,V>> since it sorts on K before V. Is there any reason I should prefer one over the other, other than personal preference? Are they really the same, or did I miss something?
A priority queue is sorted initially, in O(N) time, and then iterating all the elements in decreasing order takes O(N log N) time. It is stored in a std::vector behind the scenes, so there's only a small coefficient after the big-O behavior. Part of that, though, is moving the elements around inside the vector. If sizeof (K) or sizeof (V) is large, it will be a bit slower.
std::map is a red-black tree (in universal practice), so it takes O(N log N) time to insert the elements, keeping them sorted after each insertion. They are stored as linked nodes, so each item incurs malloc and free overhead. Then it takes O(N) time to iterate over them and destroy the structure.
The priority queue overall should usually have better performance, but it's more constraining on your usage: the data items will move around during iteration, and you can only iterate once.
If you don't need to insert new items while iterating, you can use std::sort with a std::vector, of course. This should outperform the priority_queue by some constant factor.
As with most things in performance, the only way to judge for sure is to try it both ways (with real-world testcases) and measure.
By the way, to maximize performance, you can define a custom comparison function to ignore the V and compare only the K within the pair<K,V>.

Difference between std::set and std::priority_queue

Since both std::priority_queue and std::set (and std::multiset) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n), what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)?
While I know that the underlying structures are different, I am not as much interested in the difference in their implementation as I am in the comparison their performance and suitability for various uses.
Note: I know about the no-duplicates in a set. That's why I also mentioned std::multiset since it has the exactly same behavior as the std::set but can be used where the data stored is allowed to compare as equal elements. So please, don't comment on single/multiple keys issue.
A priority queue only gives you access to one element in sorted order -- i.e., you can get the highest priority item, and when you remove that, you can get the next highest priority, and so on. A priority queue also allows duplicate elements, so it's more like a multiset than a set. [Edit: As #Tadeusz Kopec pointed out, building a heap is also linear on the number of items in the heap, where building a set is O(N log N) unless it's being built from a sequence that's already ordered (in which case it is also linear).]
A set allows you full access in sorted order, so you can, for example, find two elements somewhere in the middle of the set, then traverse in order from one to the other.
std::priority_queue allows to do the following:
Insert an element O(log n)
Get the smallest element O(1)
Erase the smallest element O(log n)
while std::set has more possibilities:
Insert any element O(log n) and the constant is greater than in std::priority_queue
Find any element O(log n)
Find an element, >= than the one your are looking for O(log n) (lower_bound)
Erase any element O(log n)
Erase any element by its iterator O(1)
Move to previous/next element in sorted order O(1)
Get the smallest element O(1)
Get the largest element O(1)
set/multiset are generally backed by a binary tree. http://en.wikipedia.org/wiki/Binary_tree
priority_queue is generally backed by a heap. http://en.wikipedia.org/wiki/Heap_(data_structure)
So the question is really when should you use a binary tree instead of a heap?
Both structures are laid out in a tree, however the rules about the relationship between anscestors are different.
We will call the positions P for parent, L for left child, and R for right child.
In a binary tree L < P < R.
In a heap P < L and P < R
So binary trees sort "sideways" and heaps sort "upwards".
So if we look at this as a triangle than in the binary tree L,P,R are completely sorted, whereas in the heap the relationship between L and R is unknown (only their relationship to P).
This has the following effects:
If you have an unsorted array and want to turn it into a binary tree it takes O(nlogn) time. If you want to turn it into a heap it only takes O(n) time, (as it just compares to find the extreme element)
Heaps are more efficient if you only need the extreme element (lowest or highest by some comparison function). Heaps only do the comparisons (lazily) necessary to determine the extreme element.
Binary trees perform the comparisons necessary to order the entire collection, and keep the entire collection sorted all-the-time.
Heaps have constant-time lookup (peek) of lowest element, binary trees have logarithmic time lookup of lowest element.
Since both std::priority_queue and std::set (and std::multiset) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n), what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)?
Even though insert and erase operations for both containers have the same complexity O(log n), these operations for std::set are slower than for std::priority_queue. That's because std::set makes many memory allocations. Every element of std::set is stored at its own allocation. std::priority_queue (with underlying std::vector container by default) uses single allocation to store all elements. On other hand std::priority_queue uses many swap operations on its elements whereas std::set uses just pointers swapping. So if swapping is very slow operation for element type, using std::set may be more efficient. Moreover element may be non-swappable at all.
Memory overhead for std::set is much bigger also because it has to store many pointers between its nodes.

Efficiency of the STL priority_queue

I have an application (C++) that I think would be well served by an STL priority_queue. The documentation says:
Priority_queue is a container adaptor, meaning that it is implemented on top of some underlying container type. By default that underlying type is vector, but a different type may be selected explicitly.
and
Priority queues are a standard concept, and can be implemented in many different ways; this implementation uses heaps.
I had previously assumed that top() is O(1), and that push() would be a O(logn) (the two reasons I chose the priority_queue in the first place) - but the documentation neither confirms nor denies this assumption.
Digging deeper, the docs for the Sequence concept say:
The complexities of single-element insert and erase are sequence dependent.
The priority_queue uses a vector (by default) as a heap, which:
... supports random access to elements, constant time insertion and removal of elements at the end, and linear time insertion and removal of elements at the beginning or in the middle.
I'm inferring that, using the default priority_queue, top() is O(1) and push() is O(n).
Question 1: Is this correct? (top() access is O(1) and push() is O(n)?)
Question 2: Would I be able to achieve O(logn) efficiency on push() if I used a set (or multiset) instead of a vector for the implementation of the priority_queue? What would the consequences be of doing this? What other operations would suffer as a consequence?
N.B.: I'm worried about time efficiency here, not space.
The priority queue adaptor uses the standard library heap algorithms to build and access the queue - it's the complexity of those algorithms you should be looking up in the documentation.
The top() operation is obviously O(1) but presumably you want to pop() the heap after calling it which (according to Josuttis) is O(2*log(N)) and push() is O(log(N)) - same source.
And from the C++ Standard, 25.6.3.1, push_heap :
Complexity: At most log(last - first) comparisons.
and pop_heap:
Complexity: At most 2 * log(last - first) comparisons.
top() - O(1) -- as it just returns the element # front.
push() -
insert into vector - 0(1) amortized
push_into_heap - At most, log(n) comparisons. O(logn)
so push() complexity is --
log(n)
No. This is not correct. top() is O(1) and push() is O(log n). Read note 2 in the documentation to see that this adapter does not allow iterating through the vector. Neil is correct about pop(), but still this allows working with the heap doing insertions and removals in O(log n) time.
If the underlying data structure is a heap, top() will be constant time, and push (EDIT: and pop) will be logarithmic (like you are saying). The vector is just used to store these things like this:
HEAP:
1
2 3
8 12 11 9
VECTOR (used to store)
1 2 3 8 12 11 9
You can think of it as the element at position i's children is (2i) and (2i+1)
They use the vector because it stores the data sequentially (which is much more efficient and cache-friendly than discrete)
Regardless of how it is stored, a heap should always be implemented (especially by the gods who made the STD lib) so that pop is constant and push is logarithmic
C++ STL priority_queue underlying data structure is Heap data structure(Heap is a non linear ADT which based on complete binary tree and complete binary tree is implemented through vector(or Array) container.
ex Input data : 5 9 3 10 12 4.
Heap (Considering Min heap) would be :
[3]
[9] [4]
[10] [12] [5]
NOW , we store this min heap in to vector,
[3][9][4][10][12][5].
Using formula ,
Parent : ceiling of n-1/2
Left Child : 2n+1
Right Child : 2n+2 .
Hence ,
Time Complexity for
Top = O(1) , get element from root node.
POP()= O(logn) , During deletion of root node ,there is chance to violation of heap order . hence restructure of heap order takes at most O(logn) time (an element might move down to height of tree).
PUSH()= O(logn) , During insertion also , there might chance to violation of heap order . hence restructure of heap order takes at most O(logn) time (an element might move up to root from leaf node).
Q1: I didn't look in the standard, but AFAIK, using vector (or deque btw), the complexity would be O(1) for top(), O(log n) for push() and pop(). I once compared std::priority_queue with my own heap with O(1) push() and top() and O(log n) pop() and it certainly wasn't as slow as O(n).
Q2: set is not usable as underlying container for priority_queue (not a Sequence), but it would be possible to use set to implement a priority queue with O(log n) push() and pop(). However, this wouldn't probably outperform the std::priority_queue over std::vector implementation.