augmenting/index priority_queue in STL - c++

I am using STL priority_queue as an data structure in my graph application. You can safely assume it like a advance version of Prim's spanning tree algorithm.
With in the Algorithm I want to find a node in the priority queue (not just a minimum node) efficiently.[ this is needed because cost of node might get changed and need to be fixed in priority_queue]
All i have to do is augment the priority_queue and index it based on my node key's also. I don't find any way this can be done in STL. Can anyone have better idea how to do it in STL?

The std::priority_queue<T> doesn't support efficient look-up of nodes: it uses a d-ary heap, typically with d == 2. This representation doesn't keep nodes put. If you really want to use a std::priority_queue<T> with Prim's algorithm, the only way is to just add nodes with their current shortest distance and possibly add each node multiple times. This turns the size of the into O(E) instead of O(N), though, i.e., for graphs with many edges it will result in a much higher complexity.
You can use something like std::map<...> but that really suffers from pretty much the same problem: you can either locate the next node to extract efficiently or you can locate the nodes to update efficiently.
The "proper" approach is to use a node-based priority queue, e.g., a Fibanocci-heap: Since the nodes stay put, you can get a handle from the heap when inserting a node and efficiently update the distance of a node through the handle. Access to the closest node is efficient using the few top nodes in the heap's set of trees. The overall performance of basic heap operations (push(), top(), and pop()) are slower for Fibonacci heaps than for d-ary heaps but the efficient update of individual nodes makes their use worthwhile. I seem to recall that Prim's algorithm actually required Fibonacci-heaps anyway to achieve the tight complexity bound.
I know that there is an implementation of Fibonacci-heaps at Boost. An efficient implementation of Fibonacci heaps isn't entirely trivial but they are more efficient than just being of theoretical interest.

Related

Is this a bad way to implement a graph?

When I look at a book, I only show examples of how to implement graphs in almost every book by adjacent matrix method and adjacent list method.
I'm trying to create a node-based editor, in which case the number of edges that stretch out on each node is small, but there's a lot of vertex.
So I'm trying to implement the adjacent list method rather than the adjacent matrix method.
However, adjacent lists store and use each edge as a connection list.
But, I would like to use the node in the form listed below.
class GraphNode
{
int x, y;
dataType data;
vector<GraphNode*> in;
vector<GraphNode*> out;
public:
GraphNode(var...) = 0;
};
So like this, I want to make the node act as a vertex and have access to other nodes that are connected.
Because when I create a node-based editor program, I have to access and process different nodes that are connected to each node.
I want to implement this without using a linked list.
And, I'm going to use graph algorithms in this state.
Is this a bad method?
Lastly, I apologize for my poor English.
Thank you for reading.
You're just missing the point of the difference between adjacency list and adjacency matrix. The main point is the complexity of operations, like finding edges or iterating over them. If you compare a std::list and a std::vector as datatype implementing an adjacency list, both have a complexity of O(n) (n being the number of edges) for these operations, so they are equivalent.
Other considerations:
If you're modifying the graph, insertion and deletion may be relevant as well. In that case, you may prefer a linked list.
I said that the two are equivalent, but generally std::vector has a better locality of reference and less memory overhead, so it performs better. The general rule in C++ is to use std::vector for any sequential container, until profiling has shown that it is a bottleneck.
Short answer: It is probably a reasonable way for implementing a graph.
Long answer: What graph data structure to use is always dependent on what you want to use it for. A adjacency matrix is good for very dense graphs were it will not waste space due to many 0 entries and if we want to answer the question "Is there an edge between A and B?" fast. The iteration over all members of a node can take pretty long, since it has to look at a whole row and not just the neighbors.
An adjacency list is good for sparse graphs and if we mostly want to look up all neighbors of a node (which is very often the case for graph mustering algorithms). In a directed graph were we want to treat ingoing and outgoing edges seperately, it is probably a good idea to have a seperate adjacency list for ingoing and outgoing egdes (as has your code).
Regarding what container to use for the list, it depends on the use case. If you will much more often iterate over the graph and not so often delete something from it, using a vector over a list is a very good idea (basically all graph programms I ever wrote were of this type). If you have a graph that changes very often, you have to delete edges very often, you don't want to have iterator invalidation and so on, maybe it is better having a list. But that is very seldom the case.
A good design would be to make it very easy to change between list and vector so you can easily profile both and then use what is better for your program.
Btw if you often delete one edge, this is also pretty easily done fast with a vector, if you do not care about the order of your edges in adjacency list (so do not do this without thinking while iterating over the vector):
void delte_in_edge(size_t index) {
std::swap(in[i], in.back()); // The element to be deleted is now at the last position,
// the formerly last element is at position i
in.pop_back(); // Delete the current last element
}
This has O(1) complexity (and the swap is probably pretty fast).

Is this how I combine two min-heaps together?

I am currently creating a source code to combine two heaps that satisfy the min heap property with the shape invariant of a complete binary tree. However, I'm not sure if what I'm doing is the correct accepted method of merging two heaps satisfying the requirements I laid out.
Here is what I think:
Given two priority queues represented as min heaps, I insert the nodes of the second tree one by one into the first tree and fix the heap property. Then I continue this until all of the nodes in the second tree is in the first tree.
From what I see, this feels like a nlogn algorithm since I have to go through all the elements in the second tree and for every insert it takes about logn time because the height of a complete binary tree is at most logn.. But I think there is a faster way, however I'm not sure what other possible method there is.
I was thinking that I could just insert the entire tree in, but that break the shape invariant and order invariant..Is my method the only way?
In fact building a heap is possible in linear time and standard function std::make_heap guarantees linear time. The method is explained in Wikipedia article about binary heap.
This means that you can simply merge heaps by calling std::make_heap on range containing elements from both heaps. This is asymptotically optimal if heaps are of similar size. There might be a way to exploit preexisting structure to reduce constant factor, but I find it not likely.

STL Implementation of reheapify

In a graph algorithm, I need to find the node with the smallest value.
In a step of the algorithm the value of this node or its neighbors can be decreased and a few of its neightbors can be removed dependent on their value.
Also, I don't want to search the whole graph for this node each time (although it is not so big (<1000 nodes)).
Therefore I looked at the STL library and found the heap structure which almost does what I want. I can insert and delete nodes very fast, but is there a method to update the heap fast when I only changed the value of one node without resorting the whole heap? I feel it would be a huge bottleneck in the program.
First the conceptual part:
If you use the heap insertion method with the element that decreased it's value as the starting point for insertion instead of starting at the back of the collection everything just works.
I haven't done that in C++ yet, but std::push_heap looks fine for that purpose.

Why is std::map implemented as a red-black tree?

Why is std::map implemented as a red-black tree?
There are several balanced binary search trees (BSTs) out there. What were design trade-offs in choosing a red-black tree?
Probably the two most common self balancing tree algorithms are Red-Black trees and AVL trees. To balance the tree after an insertion/update both algorithms use the notion of rotations where the nodes of the tree are rotated to perform the re-balancing.
While in both algorithms the insert/delete operations are O(log n), in the case of Red-Black tree re-balancing rotation is an O(1) operation while with AVL this is a O(log n) operation, making the Red-Black tree more efficient in this aspect of the re-balancing stage and one of the possible reasons that it is more commonly used.
Red-Black trees are used in most collection libraries, including the offerings from Java and Microsoft .NET Framework.
It really depends on the usage. AVL tree usually has more rotations of rebalancing. So if your application doesn't have too many insertion and deletion operations, but weights heavily on searching, then AVL tree probably is a good choice.
std::map uses Red-Black tree as it gets a reasonable trade-off between the speed of node insertion/deletion and searching.
The previous answers only address tree alternatives and red black probably only remains for historical reasons.
Why not a hash table?
A type only requires < operator (comparison) to be used as a key in a tree. However, hash tables require that each key type has a hash function defined. Keeping type requirements to a minimum is very important for generic programming so you can use it with a wide variety of types and algorithms.
Designing a good hash table requires intimate knowledge of the context it which it will be used. Should it use open addressing, or linked chaining? What levels of load should it accept before resizing? Should it use an expensive hash that avoids collisions, or one that is rough and fast?
Since the STL can't anticipate which is the best choice for your application, the default needs to be more flexible. Trees "just work" and scale nicely.
(C++11 did add hash tables with unordered_map. You can see from the documentation it requires setting policies to configure many of these options.)
What about other trees?
Red Black trees offer fast lookup and are self balancing, unlike BSTs. Another user pointed out its advantages over the self-balancing AVL tree.
Alexander Stepanov (The creator of STL) said that he would use a B* Tree instead of a Red-Black tree if he wrote std::map again, because it is more friendly for modern memory caches.
One of the biggest changes since then has been the growth of caches.
Cache misses are very costly, so locality of reference is much more
important now. Node-based data structures, which have low locality of
reference, make much less sense. If I were designing STL today, I
would have a different set of containers. For example, an in-memory
B*-tree is a far better choice than a red-black tree for implementing
an associative container. - Alexander Stepanov
Should maps always use trees?
Another possible maps implementation would be a sorted vector (insertion sort) and binary search. This would work well
for containers which aren't modified often but are queried frequently.
I often do this in C as qsort and bsearch are built in.
Do I even need to use map?
Cache considerations mean it rarely makes sense to use std::list or std::deque over std:vector even for those situations we were taught in school (such as removing an element from the middle of the list).
Applying that same reasoning, using a for loop to linear search a list is often more efficient and cleaner than building a map for a few lookups.
Of course choosing a readable container is usually more important than performance.
AVL trees have a maximum height of 1.44logn, while RB trees have a maximum of 2logn. Inserting an element in a AVL may imply a rebalance at one point in the tree. The rebalancing finishes the insertion. After insertion of a new leaf, updating the ancestors of that leaf has to be done up to the root, or up to a point where the two subtrees are of equal depth. The probability of having to update k nodes is 1/3^k. Rebalancing is O(1). Removing an element may imply more than one rebalancing (up to half the depth of the tree).
RB-trees are B-trees of order 4 represented as binary search trees. A 4-node in the B-tree results in two levels in the equivalent BST. In the worst case, all the nodes of the tree are 2-nodes, with only one chain of 3-nodes down to a leaf. That leaf will be at a distance of 2logn from the root.
Going down from the root to the insertion point, one has to change 4-nodes into 2-nodes, to make sure any insertion will not saturate a leaf. Coming back from the insertion, all these nodes have to be analysed to make sure they correctly represent 4-nodes. This can also be done going down in the tree. The global cost will be the same. There is no free lunch! Removing an element from the tree is of the same order.
All these trees require that nodes carry information on height, weight, color, etc. Only Splay trees are free from such additional info. But most people are afraid of Splay trees, because of the ramdomness of their structure!
Finally, trees can also carry weight information in the nodes, permitting weight balancing. Various schemes can be applied. One should rebalance when a subtree contains more than 3 times the number of elements of the other subtree. Rebalancing is again done either throuh a single or double rotation. This means a worst case of 2.4logn. One can get away with 2 times instead of 3, a much better ratio, but it may mean leaving a little less thant 1% of the subtrees unbalanced here and there. Tricky!
Which type of tree is the best? AVL for sure. They are the simplest to code, and have their worst height nearest to logn. For a tree of 1000000 elements, an AVL will be at most of height 29, a RB 40, and a weight balanced 36 or 50 depending on the ratio.
There are a lot of other variables: randomness, ratio of adds, deletes, searches, etc.
It is just the choice of your implementation - they could be implemented as any balanced tree. The various choices are all comparable with minor differences. Therefore any is as good as any.
Update 2017-06-14: webbertiger edit its answer after I commented. I should point out that its answer is now a lot better to my eyes. But I kept my answer just as additional information...
Due to the fact that I think first answer is wrong (correction: not both anymore) and the third has a wrong affirmation. I feel I had to clarify things...
The 2 most popular tree are AVL and Red Black (RB). The main difference lie in the utilization:
AVL : Better if ratio of consultation (read) is bigger than manipulation (modification). Memory foot print is a little less than RB (due to the bit required for coloring).
RB : Better in general cases where there is a balance between consultation (read) and manipulation (modification) or more modification over consultation. A slightly bigger memory footprint due to the storing of red-black flag.
The main difference come from the coloring. You do have less re-balance action in RB tree than AVL because the coloring enable you to sometimes skip or shorten re-balance actions which have a relative hi cost. Because of the coloring, RB tree also have higher level of nodes because it could accept red nodes between black ones (having the possibilities of ~2x more levels) making search (read) a little bit less efficient... but because it is a constant (2x), it stay in O(log n).
If you consider the performance hit for a modification of a tree (significative) VS the performance hit of consultation of a tree (almost insignificant), it become natural to prefer RB over AVL for a general case.

Algorithm for removing multiple elements from a Red-Black tree

Is there an algorithm that allows to delete multiple nodes in RB or the only algorithm to delete nodes from RB is to do it in a way:
1. Delete one and
2. if necessary fix tree
If more than half the nodes are being deleted, you can throw away the existing tree and build a new one in less time, since insertion and deletion have the same cost.
If there is no constraint that says the tree must remain balanced while you are doing a multiple node deletion, it seems reasonable to me that you could fix the tree after doing multiple deletes.
The purpose of balancing the tree after each deletion is to make sure the delete operation is consistent in its computational cost. If you do not require deletes to be consistent in this fashion, you could write your delete algorithm differently. The fixup operation will be a more lengthy computation than after just one delete, though. It will also likely be a more complicated one, too.
You might be interested in a data structure called TeardownTree. It supports delete_range operation that works in O(k + log n) time, where n is the initial number of items in the tree and k is the number of items deleted (and returned to the caller). Full disclosure: I am the author.
I have to emphasize that the data structure does not support the insert operation, but is optimized for clone and delete_range. I have written up an informal description of the algorithm. With all the optimizations the code is now significantly different from that document, but it should be enough to grasp the idea.
The way I solved this problem was to create a linked list of nodes to be deleted and to use the standard deletion method on them in succession. I would be interested to know if there is a better algorithm for mass deletion.
I would suggest using a Treap instead of a Red-Black tree, since balancing the tree in various scenarios seems easier with a Treap v/s a Red-Black tree. I'm have the same problem as you, but with Treaps. https://cstheory.stackexchange.com/questions/20495/algorithm-to-bulk-delete-nodes-from-a-treap
Am unsure if the expected height bounds remain valid post bulk-deletion (algorithm mentioned in the question).