Comparing two graphs

Comparing two graphs - c++

I need to compare many graphs(up to a few millions graph comparisons) and I wonder what is the fastest way to do that.
Graphs' vertices can have up to 8 neighbours/edges and vertex can have value 0 or 1. Rotated graph is still the same graph and every graph has identical number of vertices.
Graph can look like this:
Right now I'm comparing graphs by taking one vertex from first graph and comparing it with every vertex from second graph. If I find identical vertex then I check if both vertices' neighbours are identical and I repeat this until I know if graphs are identical or not.
This approach is too slow. Without discarding graphs that are for sure different, it takes more than 40 seconds to compare several thousands graphs with about one hundred vertices.
I was thinking about calculating unique value for every graph and then only compare values. I tried to do this, but I only managed to come up with values that if are equal then graphs may be equal and if values are different then graphs are different too.
If my program compares these values, then it calculates everything in about 2.5 second(which is still too slow).
And what is the best/fastest way to add vertex to this graph and update edges? Right now I'm storing this graph in std::map< COORD, Vertex > because I think searching for vertex is easier/faster that way.
COORD is vertex position on game board(vertices' positions are irrelevant in comparing graphs) and Vertex is:
struct Vertex
{
Player player; // Player is enum, FIRST = 0, SECOND = 1
Vertex* neighbours[8];
};
And this graph is representing current board state of Gomoku with wrapping at board edges and board size n*n where n can be up to 2^16.
I hope I didn't made too many errors while writing this. I hope someone can help me.

First you need to get each graph into a consistent representation, the natural way to do this is to create an ordered representation of the graph.
The first level of ordering is achieved by grouping according to the number of neighbours.
Each group of nodes with the same number of neighbours is then sorted by mapping their neighbours values (which are 0 and 1) on a binary number which is then used to enforce an order amongst the group nodes.
Then you can use a hashing function which iterates over each node of each group in the ordered form. The hashed value can then be used to provide an accelerated lookup

The problem you're trying to solve is called graph isomorphism.
The problem is in NP (although it is not known whether it's NP-Complete) and no polynomial time algorithm for it has been found.
The algorithm you describe seems to take exponential time.

This is a possible suggestion for optimization.
I would recommend to try memoization (store all the vertex pairs that are found to be different), so that the next time those two vertices are compared you just do a simple lookup and reply. This may improve the performance (or worsen it), depending on the type of graphs you have.

You have found out yourself that checking isomorphism can be done by checking one bord with all the n*n shifts times 8 rotations of the other, thus having O(n^3) complexity.
This can be reduced to O(n^2). Let's shift only in one direction, say by moving the x axis. Then we only have to find the proper y-offset. For this, we concatenate the elements as follows for both graphs:
. . 1 . 0 . . 3
0 1 . . => 0 1 2 . => 0 3 0 1 2 0 2
. 0 . . 0 . 2 .
_______
1 2 3 4 ^---- a 0 indicates start of a row
We get two arrays of size n and we have to check whether one is a cyclic permutation of the other. For this, we concatenate array a with itself and search for the other.
For example if the two arrays would be a=0301202 and b=0203012 we search for 0203012 in 03012020301202 using KMP or similar, which runs in O(n + 2n)=O(n) time (we can get rid of the whole preprocessing since the first array always is the same).
Combining this O(n) x-check with the n y-shifts and the 8 rotations gives O(n^2) overall complexity by using O(n) additional space.

Related

Algorithm for pathing between graph components

First, this is a homework question. I have a boolean matrix where 1s represent nodes, and adjacent nodes are considered connected.
for example:
1 0 0 0 1
1 1 1 0 0
1 0 0 1 1
0 0 0 0 0
0 0 0 0 0
This matrix contains 3 groups by the definition I've been given. One in the top left, consisting of 5 nodes, one in the top right consisting of 1 node, and one below that consisting of 2 nodes.
What I need to do is write a function(s) that determines the fewest number of nodes that must be added to the matrix in order to connect all of the separate components. Two groups are connected when a path can be made from any node in one group to the other.
So, what I'm asking is for someone to shove me in the right direction in terms of algorithms. I've considered how I might use path finding algorithms to find the shortest path between two groups, but am unsure of how I could do this for every group in the matrix. If I use a depth first traversal to determine the separate groups, could I then use a path finding algorithm for any arbitrary node within each group to another?

The generic problem is called the Steiner Tree problem, and it is NP-complete.
There is an algorithm that's not exponential, but gives you a suboptimal solution.
The way you can do it is to compute shortest paths between any two pair of components, do the minimum spanning tree on using just the initial components and the weights you computed, then go through you solution and eliminate cycles.
Since you have a bunch of options for connecting islands I would add a step to optimize the connections.
But an algorithm for the optimal answer: NP-complete (try every combination).

Consider every connected component(group) as a node. Then you can run MST (Minimum Spanning Tree) algorithm to find the min cost to connect all the groups.
Complexity: Complexity to build the edges = O(M*M) + O(ElgV) (M is the number of 1's on the the given grid and M*M operations to find Manhattan distance every pair of 1's to find the Manhattan distance of every pair of groups) and O(ElgV) is the complexity of finding MST

finding the min number of moves by greedy algorithm

There is a 2-d grid which contains chocolates in random cells.In one move ,i can take all the chocolates contained in one row or in one column.What will be the minimum number of moves required to take all the chocolates?
Example:cells containing chocolates are:
0 0
1 1
2 2
min. no of moves =3
0 0
1 0
0 1
min no of moves=2
I guess there is a greedy algo solution to this problem.But how to approach this problem?

I think this is a variance of classic Set Cover problem which is proved to be NP-hard.
Therefore, a greedy algorithm could only get an approximation but not the optimal solution.

For this specific problem, it is not NP-hard. It can be solved in polynomial time. The solution is as below:
Transform the 2D grid to be a bipartite graph. The left side contains nodes represent row and the right side contains nodes represent column. For each cell containing chocolate, suppose it's coordinate is (x,y), add an edge linking row_x and column_y. After the bipartite graph is established, use Hungarian algorithm to get the maximum matching. Since in bipartite graph, the answer of maximum matching equals minimum vertex covering, the answer is exactly what you want.
The Hungarian algorithm is a O(V*E) algorithm. For more details, pls refer to Hungarian algorithm
For more information about bipartite graph, pls refer to Bipartite graph, you can find why maximum matching equals minimum vertex covering here.
PS: It's neither a greedy problem nor dynamic programming problem. It is a graph problem.

Looking for a good space-partitioning data structure to generate millions of atomic bonds quickly from

I'm performing some MD simulations involving systems of millions of atoms.
I have written some code to generate a file which is just a listing of XYZ atom coordinates. Now I need to generate bonds between the atoms. If two atoms are within a certain distance of each other, that is considered a bond.
Example XYZ file:
1 0 0
2 0 0
7 0 0
10 0 0
9 0 0
So I've got five atoms. If my distance threshold is 2 units, then my bond listing will be:
1 2
3 5
4 5
(where the numbers correspond to the index of the coordinate in the XYZ file).
The naive approach to generate this list is just:
for i = 1:numAtoms
for j = i+1:numAtoms
if distance(atom[i], atom[j]) < 2
bonds.push [i, j]
However, this quickly hits an algorithmic limit and is slow even in highly optimized C for millions of atoms, at least for as frequently as I'll be doing this process.
The only experience I have with space-partitioning data structures is with kd-trees when I wrote a photon mapper once, so I don't really know what the best solution to this problem is. I'm sure there's probably something out there that's optimal for this though.
I should also mention that my simulation box is periodic, meaning that an atom at (0.5, 0, 0) will bond to an atom at (boxWidth - 0.5, 0, 0) with a distance threshold such as 2.

Simple solutions are the first to try. They are quick to code, and easy to test. If that does not give you the required performance, then you can try something trickier.
So, you can seriously prune the search space by assigning grid co-ordinates to your atoms. Nothing technical. Like a poor-man's octree...
All you need to do is have a grid size of 2, and search all atoms within the local grid and each neighbouring grid. Obviously the grid is 3D. An atom's grid location is nothing more complex than dividing each its co-ordinates by the grid size.
Obviously you do a preliminary pass and create a list (or vector) of atoms belonging to each cell. You can store the lists in a map indexed by the 3D grid co-ordinate. Then for each atom, you can just lookup the local lists and do the bond tests.
Also, don't use square-root for your distance. Operate with distance squared instead. This will save a bucket-load of processing.

Irregular Grid and Hamiltonian paths

I have a question. Is there an efficient way to get the Hamiltonian paths between two nodes in a grid graph, leaving some predefined nodes out?
eg. (4*3 grid)
1 0 0 0
0 0 0 0
0 0 2 3
finding a Hamiltonian paths in this grid b/w vertices 1 and 2, but not covering 3? It seems bipartite graphs are a way, but what according to you must be the most efficient way. The problem itself is NP complete.

Delete the nodes you dont need to include and run the brute force algorithm to find Hamiltonian path. This will take O((n-h)!+n), which is NP, where h is the number of nodes deleted, but that is the best you can do. Also to find Hamiltonian path, there is a dynamic approach but that too is exponential( O(2^n*n^2))

There's no need to leave those predefined nodes out: just consider them visited and you can still work your graph as a rectangular grid graph. I'd recommend a bitset or bitvector representation for the grid if efficiency is important.
When the grid is this small, 4x3, a brute force approach is the fastest. Dynamic programming makes it actually slower unless you have a bigger graph (6x7+). You can also use heuristics to prune your search tree, but again, the graph has to be bigger before it helps.

Efficient algorithm to find the Minimum Number of independent sets of rectangles given as input a set n of intersected rectangles?

Input is a set of n (n > 10,000) rectangles.
Output should be the minimum number of independent (non-intersecting) sets of rectangles. That is, the minimum possible number of sets where each and every set has a group of non-intersecting rectangles.
As the input is very large, what is the most efficient algorithm to solve this type of problem.
For example: The attached photo describes my problem well with both the upper part as an input and the lower part as the output
Edit:
Yes, 1 can be considered as the minimum number if and only if this single output set will not contain any intersecting rectangles.
I tried using Brute Force and pick each rectangle and loop through all the others. It actually worked but unfortunately the time complexity is horrible for large input and it may last for more than five minutes!
I searched for the problem and I found an algorithm (by Chalermsook and named as MISR) but I didn't understand its approach or more clearly, I needed an example using this approach for better explanation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js