Efficient Transitive Reduction of Adjacency List DAG - directed-acyclic-graphs

I have a large directed acyclic graph, and I'd like to compute the transitive reduction of that graph.
I'm currently computing the transitive reduction using a naive depth-first search, but that algorithm is too slow for my use case. However, the efficient algorithms I've been able to find work on an adjacency matrix representation, whereas my representation is roughly equivalent to an adjacency list. (It's actually represented as a set of C++ objects, each with pointers to their children and parents).
I obviously could transform my DAG into an adjacency matrix, do the reduction, and transform it back; but that seems a bit wasteful, and I'd like a simpler algorithm if possible.
My graph contains ~100,000 nodes.

Related

Merits of implementing tree by array over pointer?

Recently I learnt implementing tree by struct such as
struct Node{
Node *parent;
vector<Node*> child;
Node(void):parent(nullptr){}
}
I thought this is a pretty straight forward way to implement tree,
and it is also easier to include more stuff for process within the struct.
However, I noticed in many people's code,
they prefer using an array instead of pointer.
I could understand this for Binary Tree as it is pretty easy to do it by array too
, but why on other more complex graph?
From Skiena:2008:ADM:1410219 and Cormen:2001:IA:580470 comparing adjacency matrices and adjacency lists for graphs yields that:
Adjacency matrices are faster for testing if (x, y), two nodes, have a connecting edge
Adjacency lists are faster for finding the degree (the amount of neighbours) of a given node.
A graph with m nodes and n edges consumes m + n space if implemented using adjacency lists compared to n^2 for adjacency matrices.
Adjacency matrices uses slightly less memory for big graphs.
Edge insertion/deletion performs in O(1) when using adjacency matrices.
Traversing a graph implemented using adjacency lists performs in Θ(m + n) and matrices Θ(n^2)
If a graph has many vertices but few edges adjacency matrices consume excessive memory.
So in general adjacency lists perform better.

Boost Graph : Test if two vertices are adjacent

I'm new in using C++ boost library in particularly the boost graph library which a needed to try coding some algorithms where i commonly check the adjacency of two vertices and dealing with other graph concepts like computing graph invariants.
What i know is that we can iterate through adjacent vertices with the function : adjacent_vertices(u, g) but i'm searching for an efficient way to test if two vertices u, v are adjacent or not without doing linear search
The AdjacencyMatrix concept gives a complexity guarantee that the edge() function must return in constant time.
To check if two vertices v and w are adjacent in G, you write edge(v, w, G).second, since the function returns a pair where the second value indicates if the edge exists.
The edge() function is implemented for other graph representations as well. Here is a graph that shows how different representations compare with regard to performance of checking vertex adjacency:
Here is the code used to generate the data for this plot. Each data point is 100 random graphs of medium density, with 100 random edge checks per each graph. Note the logarithmic y axis.
What is the best choice will eventually depend on your particular application, because for other operations the ordering of structures by speed is different. In other words, avoid premature optimization.
BGL is a highly generic library. You can adapt most any datastructure for use with its algorithms.
You can vary the edge container. You don't mention it, but I'm assuming you've been looking at the interface/complexity guarantees for boost::adjacency_list.
Indeed the edge membership test will be O(n) even if you use setS for the edge container selector. This is mostly because adjacency lists store outgoing edges are per vertex. So in worst case, each vertex contains at most one outgoing edge and the search is practically O(n) [1]
In this case you simply want to select another graph implementation.
The documentation page on Graph Concepts is a good starting point to find out about which concepts are expected. As well as, which models supply those concepts.
In the worst case you can adapt your data structure for use with Boost Graph algorithms. E.g. you could store all edges in a simple std::[unordered_]set<std::pair<VID, VID> > and adapt it to model the EdgeListGraph concept.
That way you will have performant lookups.
[1] of course this also means, in best case the search is whatever your set implementation affords: O(log n) because all edges could originate from the same vertex...

Implementing Christofides algorithm with GCAL in C++

I want to implement a slightly altered Christofides algorithm for undirected graphs, whose vertices are 2D points. Seems like I need CGAL only for triangulation, everything else is provided in boost. Am I wrong?
Is there a better way to copy a graph from Point_set_2 class into boost's adjacency_list (other than iterating over adjacency lists and adding edge for every neighbour)?

Fastest way to run prim's on a growing range of coordinates

I was hoping someone could give me a general method for computing the MST for a problem that works from input that is formatted as such:
<number of vertices>
<x> <y>
<x> <y>
...
I understand how to implement prim's algorithm, but I was looking for a method that (using prim's algorithm) will require the least amount of memory/time to execute. Should I store everything in an adjacency matrix? If the number of vertices grows to say, 10,000, what is the optimal way to solve this problem (assuming prim's is used)?
You really need to use Prim's?
A simple way is use Kruskal algorithm to recompute the spanning tree (using only previously selected edges) every time you add a node. Since Kruskal is O(E log E) and in every iteration you'll have exactly 2*V-1 edges to compute (V-1 from previous tree + V from newly added node). You'll need O(V log V) for each insertion.
Prim's algoritm is faster if you have a dense graph (a graph that has a lot of edges). If you use an adjacency matrix, the complexity of Prim's algoritm would be O(|V|^2).
This can be improved by using a binary heap data structure with the graph represented by an adjacency list. Using this method, the complexity would be O(|E|log|V|).
Using a fibonacci heap data structure with an adjacency list would be even faster with a complexity of O(|E| + |V|log|V|).
Note: E refers to the number of edges in the graph, while V refers to the number of vertexes in the graph.
The STL has already implemented a binary heap data structure, std::priority_queue. A std::priority_queue calls the heap algoritms in the algoritm library. You could also use a std::vector (or any other container that has random access iterators) and call make_heap, push_heap, pop_heap, etc. These are all in the algoritm library. More info here: http://www.cplusplus.com/reference/algorithm/.
You could also implement your own heap data structure, but that may be too complicated and not worth the performance benefits.

Suitable data structure for large graphs

I have a large graph, is there any other data structure other than adjacency list and "adjacency matrix" in c++ stl or some other data structure which I can employ for such a large graph, actually the adjacency matrix of my graph does not fit in the main memory. My graph is directed and I am implementing dijkstra algorithm in C++.
I have seen the previous posts...but I am searching for a suitable data structure with respect to dijkstra.
By large I mean a graph containing more than 100 million nodes and edges.
It's common to represent adjacency lists as lists of integers, where the integer is the index of a node. How about getting some more space efficiency by instead treating the adjacency list as a bit string 00010111000... where a 1 in nth position represents an edge between this node and node n? Then compress the bitstring by some standard algorithm; uncompress it as you need it. The bit strings will probably compress pretty well, so this trades space efficiency for higher computational cost.