Graph - strongly connected components - c++

Is there any fast way to determine the size of the largest strongly connected component in a graph?
I mean, like, the obvious approach would mean determining every SCC (could be done using two DFS calls, I suppose) and then looping through them and taking the maximum.
I'm pretty sure there has to be some better approach if I only need to have the size of that component and only the largest one, but I can't think of a good solution. Any ideas?
Thanks.

Let me answer your question with another question -
How can you determine which value in a set is the largest without examining all of the values?

Firstly you could use Tarjan's algorithm which needs only one DFS instead of two. If you understand the algorithm clearly, the SCCs form a DAG and this algo finds them in the reverse topological sort order. So if you have a sense of the graph (like a visual representation) and if you know that relative big SCCs occur at end of the DAG then you could stop the algorithm once first few SCCs are found.

Related

Query all pair of points whose distance is smaller than a threshhold

The problem is simple, just like the query_pair method in python scipy kdtree implmentation.
I want a c/c++ version of that. However, I find typically a kd-tree implementation only offer API of NN, k-NN, Range query.
I try to implement query_pair in c++ by using range query as the building blocks. However, in this way, I have to filter the result since each pair of point will be found twice. The performance is not good enough.
I want to know two thing:
is there any way specifically for efficient point pair query? even an approximate one is enough.
in some paper 'Reduced hair models', it said that k-NN is more efficient than fixed range search because of more efficient pruning. Why?

C++ generate random graphs suitable for TSP

I'm testing various TSP models/algorithms. Right now I'm using a full adjacency matrix, filled with random values from 1 to 100, which represents a complete directed graph.
I'm searching for a more rigorous approach that would allow me to try different kinds of random graphs, like Erdos-Renyi, small world networks and scale free networks.
I know I may have to switch to adjacency lists for the new graphs.
My approach would be generating a random graph and then ensuring there is the Hamiltonian path necessary for the problem to be a valid TSP instance. Is it possible, or is it cheaper to just try and solve the unsolvable instance (assuming all methods will terminate on such instance)?
BTW I was thinking of using the Boost Graph Library, but I'm not familiar with it, and maybe there's something more appropriate. Suggestions for alternatives are welcome, but should not be considered the main scope of this question.
I don't need a TSP solver, I need something to aid in the generation of acceptable problems.
Thanks.
You can try a monotone gray code, a.k.a a hilbert curve. It can help find a hamiltonian path:http://en.m.wikipedia.org/wiki/Gray_code.
I'm searching for a more rigorous approach that would allow me to try
different kinds of random graphs,
Check Mathematica. It has a built-in predicate to test whether a given graph has hamiltonian path or not. It also, has a predicate to generate random Hamiltonian Graphs.
In addition, If you have not try it yet, TSBLIB contains (hard and easy) instances that you may find them useful.

How to implement TSP with dynamic in C++

Recently I asked a question on Stack Overflow asking for help to solve a problem. It is a travelling salesman problem where I have up to 40,000 cities but I only need to visit 15 of them.
I was pointed to use Dijkstra with a priority queue to make a connectivity matrix for the 15 cities I need to visit and then do TSP on that matrix with DP. I had previously only used Dijkstra with O(n^2). After trying to figure out how to implement Dijkstra, I finally did it (enough to optimize from 240 seconds to 0.6 for 40,000 cities). But now I am stuck at the TSP part.
Here are the materials I used for learning TSP :
Quora
GeeksForGeeks
I sort of understand the algorithm (but not completely), but I am having troubles implementing it. Before this I have done dynamic programming with arrays that would be dp[int] or dp[int][int]. But now when my dp matrix has to be dp[subset][int] I don't have any idea how should I do this.
My questions are :
How do I handle the subsets with dynamic programming? (an example in C++ would be appreciated)
Do the algorithms I linked to allow visiting cities more than once, and if they don't what should I change?
Should I perhaps use another TSP algorithm instead? (I noticed there are several ways to do it). Keep in mind that I must get the exact value, not approximate.
Edit:
After some more research I stumbled across some competitive programming contest lectures from Stanford and managed to find TSP here (slides 26-30). The key is to represent the subset as a bitmask. This still leaves my other questions unanswered though.
Can any changes be made to that algorithm to allow visiting a city more than once. If it can be done, what are those changes? Otherwise, what should I try?
I think you can use the dynamic solution and add to each pair of node a second edge with the shortest path. See also this question:Variation of TSP which visits multiple cities.
Here is a TSP implementation, you will find the link of the implemented problem in the post.
The algorithms you linked don't allow visiting cities more than once.
For your third question, I think Phpdna answer was good.
Can cities be visited more than once? Yes and no. In your first step, you reduce the problem to the 15 relevant cities. This results in a complete graph, i.e. one where every node is connected to every other node. The connection between two such nodes might involve multiple cities on the original map, including some of the relevant ones, but that shouldn't be relevant to your algorithm in the second step.
Whether to use a different algorithm, I would perhaps do a depth-first search through the graph. Using a minimum spanning tree, you can give an upper and lower bound to the remaining cities, and use that to pick promising solutions and to discard hopeless ones (aka pruning). There was also a bunch of research done on this topic, just search the web. For example, in cases where the map is actually carthesian (i.e. the travelling costs are the distance between two points on a plane), you can exploit this info to improve the algorithms a bit.
Lastly, if you really intend to increase the number of visited cities, you will find that the time for computing it increases vastly, so you will have to abandon your requirement for an exact solution.

Features combinations

I have a list of features set (40 features) and my idea firstly was to evaluate the classifier on all the combinations that I can get. However, after I did some calculations I found that the combinations will reach millions!! Thus, it will take forever!!!!
I read about the ability of using random search method to chose random features. However, each time I run the random search I got the same features sets. Do I need to change the seed number or any option??
Also, Is using random search effective and can substitute the approach of choosing all combinations???
I would appreciate your help experts.
Many thanks in advance,
Ahmad
When you want to perform an attribute selection in WEKA, yo should take into account 2 algorithms, the searcher and the attribute evaluator (I will talk about it later).
As you said, maybe you cannot try an Exhaustive search because it takes so long, there are greedy alternatives to get good results (depending on the problem) like Best first (based on hill climbing). The option that you comment (Random search) is another approach to make the selection subsets, it makes random iterations to select subsets that will be evaluated.
Why are you getting the same subset of selected attributes? Because the Random search is selecting always the same subsets and the evaluator determines the best one (final output). But if I change the seed parameter it should change. Maybe or... maybe not. Why? Because if the algorithm performs an enough number of iterations (although it starts with a different seed) it will get the same subsets than the previous one (convergence) and the evaluator will choose the same subset as the previous execution.
If you do not want to get convergence in the selector output, just change the seed, but choose a smaller search percent to limit the exploration and get different results.
But, in my opinion, if you are getting always the same results is because the evaluator (I do not know what algorithm are you using) has determined that this subset is "the best" given your dataset. I also recommend you to try another selector like Best first or a Genetic search as your search method.

depth-first search algorithm

the depth-first algorithm implemented in the boost library visits each vertex just one time.
Is there any work around to deactivate this option. I want that the vertexes can be visited whenever there is a branch in any vertex.
any suggestion...
EDIT: The graph is acyclic.
If you want to enumerate all paths in an acyclic graph, then I don't think you can easily modify depth-first search to do that. There are algorithms specifically designed for this purpose, in particular: Rubin, F.; , "Enumerating all simple paths in a graph," Circuits and Systems, IEEE Transactions on , vol.25, no.8, pp. 641- 642, Aug 1978.
If you know the Floyd-Warshall algorithm, you can easily modify it to compute a list of paths in each element of the matrix, instead of min distance, which will do the job. The above article uses some bit operations to make this run a bit faster.
want that the vertexes can be visited
whenever there is a branch in any
vertex.
What do you propose that an iterator do when it reaches a branch at a vertex?
Depth first search is just one answer this question. Here are some others.
But you have to choose something. It's not a matter of turning off DFS.
I think that is impossible by design. Because if your graph contains cycles (and you have them there, when you say, that the vertex can be visited more than once), the algorithm will end up in endless loop.