Can heap be used for strict priority queuing given priority isn't guaranteed between nodes of different branch levels - heap

I wonder about heap and priority queuing.
Seems with heap structure that we don't control whether left or right child has highest or lowest value of the two. Similarly between tree levels nodes at a level further from root can have higher priorities than nodes closer to the root of another branch.
Are we supposed not to care?
What if we require strict priority is heap still the way to go?
If so how would we implement them (sorted trees)?

Related

Clear mark attribute of a node in extract-min operation of Fibonacci Heap

In the DECREASE-KEY operation of Fibonacci Heap, whenever a node is cut from its parent and added to the root list, its mark attribute is set to FALSE. However, in the EXTRACT-MIN operation, the children of the min-node are added to the root list but their mark attributes aren't set to FALSE. Why is there such inconsistency?
Moreover, in the linking operation where a node is made the child of another node, the mark attribute of the new child is set to FALSE. The EXTRACT-MIN operation performs this linking operation multiple times. But in the amortized analysis of EXTRACT-MIN operation described in the CLRS book, the authors claim that the number of marked nodes doesn't change in EXTRACT-MIN operation. They use m(H) to denote the number of marked nodes both before and after EXTRACT-MIN operation. I am quoting the exact line from the book:
The potential before extracting the minimum node is t(H)+2m(H), and
the potential afterward is at most (D(n)+1)+2m(H).
Here D(n) is the maximum degree of any node in an n-node Fibonacci Heap, t(H) is the number of trees in the Fibonacci Heap and m(H) is the number of marked nodes in the Fibonacci Heap.
Isn't this calculation wrong?
Let's take a step back - why do we need mark bits in a Fibonacci heap in the first place?
The idea behind a Fibonacci heap is to make the DECREASE-KEY operation as fast as possible. To that end, the basic strategy of DECREASE-KEY is to take the element whose key is being decreased and to cut it from its parent if the heap property is violated. Unfortunately, if we do that, then we lose the exponential connection between the order of a tree and the number of nodes in the tree. That's a problem because the COALESCE step of an EXTRACT-MIN operation links trees based on their orders and assumes that each tree's order says something about how many nodes it contains. With that connection broken, all the nice runtime bounds we want go away.
As a compromise, we introduce mark bits. If a node loses a child, it gets marked to indicate "something was lost here." Once a marked node loses a second child, then it gets cut from its parent. Over time, if you do a huge number of cuts in a single tree, eventually the cuts propagate up to root of the tree, decreasing the tree order. That makes the tree behave differently during a COALESCE step.
With that said, the important detail here is that mark bits are only relevant for non-root nodes. If a node is a root of a tree and it loses a child, this immediately changes its order in a way that COALESCE will notice. Therefore, you can essentially ignore the mark bit of any tree root, since it never comes up. It's probably a wise idea to clear a node's mark bit before moving it up to the root list just so you don't have to clear it later, but that's more of an implementation detail than anything else.

Data structure for a priority queue of jobs with priority that changes often

I have a worker class, and I can submit jobs to the worker. Worker keeps these jobs and runs them sequentially in the order of priority (priority can be any unsigned int basically). For this case std::priority_queue or even a std::set/map could be used to store jobs ordered by priority and then worker would be able to to extract them in order in O(1). Adding jobs would be O(log N).
Now, the requirement that I have is to be able to change priority of any submitted job. In case of std::set/map I'd need to remove and add back the job with different priority. This would be O(log N) and on top of that with set/map it would reallocate nodes internally afaik (this might possibly be avoided with C++17 though). What makes it unusual is that in my case I'll update job priorities way more often than scheduling or executing them. Basically I might schedule a job once, and before it's executed I may end up updating its priority thousands times. In fact, priorities of each job will be changed like 10-20 times a second.
In my case it's reasonably safe to assume that I won't have more than 10K jobs in the queue. At start of my process I expect it always to grow to 10K or so jobs and as these jobs are removed queue should eventually be almost empty all the time, and occasionally there would be 10-50 new jobs added, but it shouldn't grow more than 1000 jobs. Jobs would be removed at a rate of a few jobs a second. Because of my weird requirement of that frequent priority update std::priority_queue or a set don't seem like a good fit. Plain std::list seems to be a better choice: priority change or update/removal is O(1), and when I need to remove jobs it's O(N) to walk entire list to find highest priority item which should happen less frequently than modifying priorities.
One other observation that even though job priorities change often, these changes do not necessarily result in ordering change, e.g. I could possibly simply update key element of my set (by casting away constness or making key mutable?) if that change would still keep that modified element between left and right nodes. What would you suggest for such priority queue? Any boost container or custom data structure design is OK.
In case of set/map I use priority as a key. To make keys unique in my case each key is actually two integers: job sequence number (derived from atomic int that I increment for each new request) and actual priority number. This way if I add multiple jobs with the same priority, they will be executed in order they were scheduled, as sequence numbers would keep them ordered.
A simple priority heap should fit your requirements. Insertion, removal and priority change is all O(log n). But you said usually the priority change would not result in a change in the order. So in case of a priority heap when you change the priority you would check the changed item against the parent and the 2 children and if none of the heap conditions are violated no up or down heap action is required. So only rarely the full O(log n) time will be needed. Practically it will be more like O(1).
Now for efficient operation it is crucial that given an item I you can find the position of that item in the heap in O(1) and access the parent and children.
If the heap simply contains the items in an array then that is all just pointer arithmetic. The drawback is that reordering the heap means copying the items.
If you store pointers to items in the heap then you have to also store a back reference to the position in the heap in the items them self. When you reorder the heap you then only swap the pointers and update the back references.
Basically your are looking for a IndexPriorityQueue. You can implement your own varient of the index priority queue based on your requirement.
A index priority queue allows you to decrease key or increase the key , i.e basically you can increase and decrease the priority of your jobs.
The following is the java implementation of the IndexMinQueue, hope it helps you. IndexMinQueue

augmenting/index priority_queue in STL

I am using STL priority_queue as an data structure in my graph application. You can safely assume it like a advance version of Prim's spanning tree algorithm.
With in the Algorithm I want to find a node in the priority queue (not just a minimum node) efficiently.[ this is needed because cost of node might get changed and need to be fixed in priority_queue]
All i have to do is augment the priority_queue and index it based on my node key's also. I don't find any way this can be done in STL. Can anyone have better idea how to do it in STL?
The std::priority_queue<T> doesn't support efficient look-up of nodes: it uses a d-ary heap, typically with d == 2. This representation doesn't keep nodes put. If you really want to use a std::priority_queue<T> with Prim's algorithm, the only way is to just add nodes with their current shortest distance and possibly add each node multiple times. This turns the size of the into O(E) instead of O(N), though, i.e., for graphs with many edges it will result in a much higher complexity.
You can use something like std::map<...> but that really suffers from pretty much the same problem: you can either locate the next node to extract efficiently or you can locate the nodes to update efficiently.
The "proper" approach is to use a node-based priority queue, e.g., a Fibanocci-heap: Since the nodes stay put, you can get a handle from the heap when inserting a node and efficiently update the distance of a node through the handle. Access to the closest node is efficient using the few top nodes in the heap's set of trees. The overall performance of basic heap operations (push(), top(), and pop()) are slower for Fibonacci heaps than for d-ary heaps but the efficient update of individual nodes makes their use worthwhile. I seem to recall that Prim's algorithm actually required Fibonacci-heaps anyway to achieve the tight complexity bound.
I know that there is an implementation of Fibonacci-heaps at Boost. An efficient implementation of Fibonacci heaps isn't entirely trivial but they are more efficient than just being of theoretical interest.

Why is std::map implemented as a red-black tree?

Why is std::map implemented as a red-black tree?
There are several balanced binary search trees (BSTs) out there. What were design trade-offs in choosing a red-black tree?
Probably the two most common self balancing tree algorithms are Red-Black trees and AVL trees. To balance the tree after an insertion/update both algorithms use the notion of rotations where the nodes of the tree are rotated to perform the re-balancing.
While in both algorithms the insert/delete operations are O(log n), in the case of Red-Black tree re-balancing rotation is an O(1) operation while with AVL this is a O(log n) operation, making the Red-Black tree more efficient in this aspect of the re-balancing stage and one of the possible reasons that it is more commonly used.
Red-Black trees are used in most collection libraries, including the offerings from Java and Microsoft .NET Framework.
It really depends on the usage. AVL tree usually has more rotations of rebalancing. So if your application doesn't have too many insertion and deletion operations, but weights heavily on searching, then AVL tree probably is a good choice.
std::map uses Red-Black tree as it gets a reasonable trade-off between the speed of node insertion/deletion and searching.
The previous answers only address tree alternatives and red black probably only remains for historical reasons.
Why not a hash table?
A type only requires < operator (comparison) to be used as a key in a tree. However, hash tables require that each key type has a hash function defined. Keeping type requirements to a minimum is very important for generic programming so you can use it with a wide variety of types and algorithms.
Designing a good hash table requires intimate knowledge of the context it which it will be used. Should it use open addressing, or linked chaining? What levels of load should it accept before resizing? Should it use an expensive hash that avoids collisions, or one that is rough and fast?
Since the STL can't anticipate which is the best choice for your application, the default needs to be more flexible. Trees "just work" and scale nicely.
(C++11 did add hash tables with unordered_map. You can see from the documentation it requires setting policies to configure many of these options.)
What about other trees?
Red Black trees offer fast lookup and are self balancing, unlike BSTs. Another user pointed out its advantages over the self-balancing AVL tree.
Alexander Stepanov (The creator of STL) said that he would use a B* Tree instead of a Red-Black tree if he wrote std::map again, because it is more friendly for modern memory caches.
One of the biggest changes since then has been the growth of caches.
Cache misses are very costly, so locality of reference is much more
important now. Node-based data structures, which have low locality of
reference, make much less sense. If I were designing STL today, I
would have a different set of containers. For example, an in-memory
B*-tree is a far better choice than a red-black tree for implementing
an associative container. - Alexander Stepanov
Should maps always use trees?
Another possible maps implementation would be a sorted vector (insertion sort) and binary search. This would work well
for containers which aren't modified often but are queried frequently.
I often do this in C as qsort and bsearch are built in.
Do I even need to use map?
Cache considerations mean it rarely makes sense to use std::list or std::deque over std:vector even for those situations we were taught in school (such as removing an element from the middle of the list).
Applying that same reasoning, using a for loop to linear search a list is often more efficient and cleaner than building a map for a few lookups.
Of course choosing a readable container is usually more important than performance.
AVL trees have a maximum height of 1.44logn, while RB trees have a maximum of 2logn. Inserting an element in a AVL may imply a rebalance at one point in the tree. The rebalancing finishes the insertion. After insertion of a new leaf, updating the ancestors of that leaf has to be done up to the root, or up to a point where the two subtrees are of equal depth. The probability of having to update k nodes is 1/3^k. Rebalancing is O(1). Removing an element may imply more than one rebalancing (up to half the depth of the tree).
RB-trees are B-trees of order 4 represented as binary search trees. A 4-node in the B-tree results in two levels in the equivalent BST. In the worst case, all the nodes of the tree are 2-nodes, with only one chain of 3-nodes down to a leaf. That leaf will be at a distance of 2logn from the root.
Going down from the root to the insertion point, one has to change 4-nodes into 2-nodes, to make sure any insertion will not saturate a leaf. Coming back from the insertion, all these nodes have to be analysed to make sure they correctly represent 4-nodes. This can also be done going down in the tree. The global cost will be the same. There is no free lunch! Removing an element from the tree is of the same order.
All these trees require that nodes carry information on height, weight, color, etc. Only Splay trees are free from such additional info. But most people are afraid of Splay trees, because of the ramdomness of their structure!
Finally, trees can also carry weight information in the nodes, permitting weight balancing. Various schemes can be applied. One should rebalance when a subtree contains more than 3 times the number of elements of the other subtree. Rebalancing is again done either throuh a single or double rotation. This means a worst case of 2.4logn. One can get away with 2 times instead of 3, a much better ratio, but it may mean leaving a little less thant 1% of the subtrees unbalanced here and there. Tricky!
Which type of tree is the best? AVL for sure. They are the simplest to code, and have their worst height nearest to logn. For a tree of 1000000 elements, an AVL will be at most of height 29, a RB 40, and a weight balanced 36 or 50 depending on the ratio.
There are a lot of other variables: randomness, ratio of adds, deletes, searches, etc.
It is just the choice of your implementation - they could be implemented as any balanced tree. The various choices are all comparable with minor differences. Therefore any is as good as any.
Update 2017-06-14: webbertiger edit its answer after I commented. I should point out that its answer is now a lot better to my eyes. But I kept my answer just as additional information...
Due to the fact that I think first answer is wrong (correction: not both anymore) and the third has a wrong affirmation. I feel I had to clarify things...
The 2 most popular tree are AVL and Red Black (RB). The main difference lie in the utilization:
AVL : Better if ratio of consultation (read) is bigger than manipulation (modification). Memory foot print is a little less than RB (due to the bit required for coloring).
RB : Better in general cases where there is a balance between consultation (read) and manipulation (modification) or more modification over consultation. A slightly bigger memory footprint due to the storing of red-black flag.
The main difference come from the coloring. You do have less re-balance action in RB tree than AVL because the coloring enable you to sometimes skip or shorten re-balance actions which have a relative hi cost. Because of the coloring, RB tree also have higher level of nodes because it could accept red nodes between black ones (having the possibilities of ~2x more levels) making search (read) a little bit less efficient... but because it is a constant (2x), it stay in O(log n).
If you consider the performance hit for a modification of a tree (significative) VS the performance hit of consultation of a tree (almost insignificant), it become natural to prefer RB over AVL for a general case.

Binary tree number of nodes with a given level

I need to write a program that counts the number of nodes from a certain level given in binary
tree.
I mean < numberofnodes(int level){} >
I tried writing it without any success because I don't how to get to a certain level then go
to count number of nodes.
Do it with a recursive function which only descends to a certain level.
Well, there are many ways you can do this. Best is to have a single dimensional array that keep track of the number of nodes that you add/remove at each level. Considering your requirement that would be the simplest way.
However, if provided with just a binary tree, you WILL have to traverse and go to that many levels and count the nodes, I do not see any other alternative.
To go to a certain level, you will typically need to have a variable called as 'current_depth' which will be tracking the level you are in. Once you reach your level of interest and that the nodes are visited once (typically a In order traversal would suffice) you can increment your count. Hope this helped.
I'm assuming that your binary tree is not necessarily complete (i.e., not every node has two or zero children or this becomes trivial). I'm also assuming that you are supposed to count just the nodes at a certain level, not at that level or deeper.
There are many ways to do what you're asked, but you can think of this as a graph search problem - you are given a starting node (the root of the tree), a way to traverse edges (the child links), and a criteria - a certain distance from the root.
At this point you probably learned graph search algorithms - which algorithm sounds like a natural fit for your task?
In general terms:
Recursion.
In each iteration of the recursion, you need to measure somehow what level are you on, and therefore to know how far down the tree you need to go beyond where you are now.
Recursion parts:
What are you base cases? at what conditions do you say "Okay, time to stop the recursion" ?
How can you count something in a recursion without any global count?
I think the easiest is to simply follow the recursive nature of the tree with an accumulator to track the current level (or maybe the number of levels remaining to be descended).
The non-recursive branch of that function is when you hit the level in question. At that point you simply return 1 (because you found one node at that level).
The recursive branch, simply sums the number of nodes at that level returned from the left and right recursive invocations.
This is the pseudo-code, this assumes the root has level 0
int count(x,currentLevel,desiredLevel)
if currentLevel = desiredLevel
return 1
left = 0
right = 0
if x.left != null
left = count(x.left, currentLevel+1, desiredLevel
if x.right != null
right = count(x.right, currentLevel+1, desiredLevel
return left + right
So to get the number of nodes for level 3 you call
count(root,0,3)