Merits of implementing tree by array over pointer? - c++

Recently I learnt implementing tree by struct such as
struct Node{
Node *parent;
vector<Node*> child;
Node(void):parent(nullptr){}
}
I thought this is a pretty straight forward way to implement tree,
and it is also easier to include more stuff for process within the struct.
However, I noticed in many people's code,
they prefer using an array instead of pointer.
I could understand this for Binary Tree as it is pretty easy to do it by array too
, but why on other more complex graph?

From Skiena:2008:ADM:1410219 and Cormen:2001:IA:580470 comparing adjacency matrices and adjacency lists for graphs yields that:
Adjacency matrices are faster for testing if (x, y), two nodes, have a connecting edge
Adjacency lists are faster for finding the degree (the amount of neighbours) of a given node.
A graph with m nodes and n edges consumes m + n space if implemented using adjacency lists compared to n^2 for adjacency matrices.
Adjacency matrices uses slightly less memory for big graphs.
Edge insertion/deletion performs in O(1) when using adjacency matrices.
Traversing a graph implemented using adjacency lists performs in Θ(m + n) and matrices Θ(n^2)
If a graph has many vertices but few edges adjacency matrices consume excessive memory.
So in general adjacency lists perform better.

Related

Efficient Transitive Reduction of Adjacency List DAG

I have a large directed acyclic graph, and I'd like to compute the transitive reduction of that graph.
I'm currently computing the transitive reduction using a naive depth-first search, but that algorithm is too slow for my use case. However, the efficient algorithms I've been able to find work on an adjacency matrix representation, whereas my representation is roughly equivalent to an adjacency list. (It's actually represented as a set of C++ objects, each with pointers to their children and parents).
I obviously could transform my DAG into an adjacency matrix, do the reduction, and transform it back; but that seems a bit wasteful, and I'd like a simpler algorithm if possible.
My graph contains ~100,000 nodes.

Does time complexity of dijkstra's algorithm for shortest path depends on data structure used?

One way to store the graph is to implement nodes as structures, like
struct node {
int vertex; node* next;
};
where vertex stores the vertex number and next contains link to the other node.
Another way I can think of is to implement it as vectors, like
vector<vector< pair<int,int> > G;
Now, while applying Dijkstra's algorithm for shortest path, we need to build priority queue and other required data structures and so as in case 2 (vector implementation).
Will there be any difference in complexity in above two different methods of applying graph? Which one is preferable?
EDIT:
In first case, every node is associated with a linked list of nodes which are directly accessible from the given node. In second case,
G.size() is the number of vertices in our graph
G[i].size() is the number of vertices directly reachable from vertex with index i
G[i][j].first is the index of j-th vertex reachable from vertex i
G[i][j].second is the length of the edge heading from vertex i to vertex G[i][j].first
Both are adjacency list representations. If implemented correctly, that would be expected to result in the same time complexity. You'd get a different time complexity if you use an adjacency matrix representation.
In more detail - this comes down to the difference between an array (vector) and a linked-list. When all you're doing is iterating through the entire collection (i.e. the neighbours of a vertex), as you do in Dijkstra's algorithm, this takes linear time (O(n)) regardless of whether you're using an array or linked-list.
The resulting complexity for running Dijkstra's algorithm, as noted on Wikipedia, would be
O(|E| log |V|) with a binary heap in either case.

Comparison of Graph implementation in C++ using stl

To implement a Graph we can use vector of lists std::vector<std::list<vertex>>
but i have seen somewhere if use maps like this std::map<vertex, std::set<vertex>> then we can do better. Can anybody please figure it out how this is better option than first one in terms of memory or speed whatever in which it is better?
There are two differences to note here.
std::vector<std::list<vertex>> is what is known as an "adjacency list", and std::map<vertex, std::set<vertex>> is known as a an "adjacency set", with the added difference that there is hashing of the vertex array index using a map instead of a vector. I'll talk about the first difference first (that is, list<vertex> vs set<vertex>).
The first implementation is basically an array of linked lists, where each linked list gives all the vertices adjacent a vertex. The second implementation is an ordered map mapping each vertex to a set of adjacent vertices.
Comparison of Adjacency List vs Adjacency Set order of growth:
Space: (E + V) vs (E + V)
Add Edge: 1 vs log V
Check Adjacency: (degree of vertex checked) vs log V
Iterating through Neighbours of a vertex: (degree of vertex checked) vs (log V + degree of vertex checked)
... where E is the number of edges and V the number of vertices, and degree of a vertex is the number of edges connected to it. (I'm using the language of an undirected graph but you can reason similarly for directed graphs). So if you have a very dense graph (each vertex has lots of edges, i.e. high degree) then you want to use adjacency sets.
Regarding the use of map vs vector: insert and erase are O(N) for vector and O(log N) for map. However lookup is O(1) for vector and O(log N) for map. Depending on your purposes you might use one over the other. Though you should note that there are cache optimizations and such when you use a contiguous memory space (as vector does). I don't know much about that however, but there are other answers that mention it: vector or map, which one to use?

Fastest way to run prim's on a growing range of coordinates

I was hoping someone could give me a general method for computing the MST for a problem that works from input that is formatted as such:
<number of vertices>
<x> <y>
<x> <y>
...
I understand how to implement prim's algorithm, but I was looking for a method that (using prim's algorithm) will require the least amount of memory/time to execute. Should I store everything in an adjacency matrix? If the number of vertices grows to say, 10,000, what is the optimal way to solve this problem (assuming prim's is used)?
You really need to use Prim's?
A simple way is use Kruskal algorithm to recompute the spanning tree (using only previously selected edges) every time you add a node. Since Kruskal is O(E log E) and in every iteration you'll have exactly 2*V-1 edges to compute (V-1 from previous tree + V from newly added node). You'll need O(V log V) for each insertion.
Prim's algoritm is faster if you have a dense graph (a graph that has a lot of edges). If you use an adjacency matrix, the complexity of Prim's algoritm would be O(|V|^2).
This can be improved by using a binary heap data structure with the graph represented by an adjacency list. Using this method, the complexity would be O(|E|log|V|).
Using a fibonacci heap data structure with an adjacency list would be even faster with a complexity of O(|E| + |V|log|V|).
Note: E refers to the number of edges in the graph, while V refers to the number of vertexes in the graph.
The STL has already implemented a binary heap data structure, std::priority_queue. A std::priority_queue calls the heap algoritms in the algoritm library. You could also use a std::vector (or any other container that has random access iterators) and call make_heap, push_heap, pop_heap, etc. These are all in the algoritm library. More info here: http://www.cplusplus.com/reference/algorithm/.
You could also implement your own heap data structure, but that may be too complicated and not worth the performance benefits.

Suitable data structure for large graphs

I have a large graph, is there any other data structure other than adjacency list and "adjacency matrix" in c++ stl or some other data structure which I can employ for such a large graph, actually the adjacency matrix of my graph does not fit in the main memory. My graph is directed and I am implementing dijkstra algorithm in C++.
I have seen the previous posts...but I am searching for a suitable data structure with respect to dijkstra.
By large I mean a graph containing more than 100 million nodes and edges.
It's common to represent adjacency lists as lists of integers, where the integer is the index of a node. How about getting some more space efficiency by instead treating the adjacency list as a bit string 00010111000... where a 1 in nth position represents an edge between this node and node n? Then compress the bitstring by some standard algorithm; uncompress it as you need it. The bit strings will probably compress pretty well, so this trades space efficiency for higher computational cost.