Finding shortest circuit in a graph that visits X nodes at least once - c++

Even though I'm still a beginner, I love solving graph related problems (shortest path, searches, etc). Recently I faced a problem like this :
Given a non-directed, weighted (no negative values) graph with N nodes and E edges (a maximum of 1 edge between two nodes, an edge can only be placed between two different nodes) and a list of X nodes that you must visit, find the shortest path that starts from node 0, visits all X nodes and returns to node 0. There's always at least one path connecting any two nodes.
Limits are 1 <= N <= 40 000 / 1 <= X <= 15 / 1 <= E <= 50 000
Here's an example :
The red node ( 0 ) should be the start and finish of the path. You must visit all blue nodes (1,2,3,4) and return. The shortest path here would be :
0 -> 3 -> 4 -> 3 -> 2 -> 1 -> 0 with a total cost of 30
I thought about using Dijkstra to find the shortest path between all X (blue) nodes and then just greedy picking the closest unvisited X (blue) node, but it doesn't work (comes up with 32 instead of 30 on paper). Also I later noticed that just finding the shortest path between all pairs of X nodes will take O(X*N^2) time which is too big with so much nodes.
The only thing I could find for circuits was Eulerian circuit that only allows visiting each node once (and I don't need that). Is this solveable with Dijkstra or is there any other algorithm that could solve this?

Here is a solution which likely to be fast enough:
1)Run shortest path search algorithm from every blue node(this can be done in O(X * (E log N))) to compute pairwise distances.
2)Build a new graph with zero vertex and blue vertices only(X + 1 vertices). Add edges using pairwise distances computed during the first step.
3)The new graph is small enough to use dynamic programming solution for TSP(it has O(X^2 * 2^X) time complexity).

Related

Haar cascade for face detection xml file code explanation OpenCV

I am performing face detection using opencv haar cascade.
I wanted to know the explanation of the xml code of the haar cascade that I have included in my program. Can someone help me to understand the values presented in the XML file, for instance: weakcount, maxcount, threshold, internal nodes, leaf values etc.
I have made use of the haarcascade_frontalface_alt2.xml file. I have already performed face detection. Currently I am working on counting the number of faces detected.
As I understand, in general you already know about haarcascade structure and OpenCV implementation of it. If no, please first look into OpenCV manual and read something about cascade of boosted trees, for example, Lienhart's paper.
Now about xml structure itself.
<maxWeakCount>3</maxWeakCount>
This parameter describe amount of simple classifiers (trees) at the stage.
<stageThreshold>3.5069230198860168e-01</stageThreshold>
It is stage threshold, i. e. threshold score for exiting from cascade at the stage. During all stages, we compute final score from trees, and when final score is less then threshold, we exit from entire cascade and consider result as non-object.
<weakClassifiers>
Start of trees parameters in the stage.
<_>
<internalNodes>
0 1 0 4.3272329494357109e-03 -1 -2 1 1.3076160103082657e-02
</internalNodes>
<leafValues>
3.8381900638341904e-02 8.9652568101882935e-01 2.6293140649795532e-01
</leafValues>
</_>
This is tree description. internalNodesparameter contains the following:
0 1 or 1 0 defines leaf index in current node where we should go. In a first case we go to the left if value is below threshold and to the right if above, and in a second case we go to the right leaf if value is above threshold.
feature index
threshold for choosing leaf
there is one more -1 -2 1 ... parameters list - as I see from OpenCV sources, it is just another node with leaf indexes, but negative values are ignored according to evaluation code (also from OpenCV sources).
Consider cascade evaluation code:
do
{
CascadeClassifierImpl::Data::DTreeNode& node = cascadeNodes[root + idx];
double val = featureEvaluator(node.featureIdx);
idx = val < node.threshold ? node.left : node.right;
}
while( idx > 0 );
leafValues contains left value (i. e. left leaf score), right value (right leaf score) and tree threshold.
<_>
<rects>
<_>
6 3 1 9 -1.</_>
<_>
6 6 1 3 3.</_></rects></_>
<_>
It is feature description itself according to HAAR paradigm. Feature index from previous section describes index of rects pair.

How to find if there's a path between every pair of vertices of a given graph?

To find if there's a path between every pair of vertices in a directed graph I'm checking if all vertices can be visited from a specific vertex using DFS. The problem is that I have to do V DFS's where V is the number of vertices. (V can be up to 10^5). Is there a more efficient way to do this? Some pseudo-code or implementations would be appreciated.
Consider this graph: (1 -> 3), (2 -> 3), (3 -> 1)
There' no path from 1 to 2, but there's a path from 2 to 1 (2 -> 3 -> 1). So this means that there's a path for every pair of vertices (u -> v) even if there's no path (v -> u).
Have a look at Tarjan's algorithm for strongly connected components. If only one strongly connected component exists, this means there exists a path between every pair of vertices.
To figure this out, do a topological sort of the graph, and then traverse it in reverse pseudo-topological order. If you don't need to 'restart' a traversal this means, this means there exists a path between every possible vertex.
To find if there is a single path that visits all vertices of a directed graph (which can visit vertices and edges multiple times) then:
Find the strongly connected components [SCC] of the graph.
Reduce the graph replacing each SCC with a single "pseudo-vertex" and include the edges connecting the SCCs.
The graph will now contain no cycles (since every cycle would have been part of an SCC) so will be a rooted graph/tree and there must be:
One-or-more root pseudo-vertices (with no inbound edges);
One-or-more leaf pseudo-vertices (with no outbound edges; which can be a leaf and a root vertex at the same time); and
Zero-or-more pseudo-vertices which are part of a branch (with both inbound and outbound edges).
Trivially, each leaf and part of a branch will be reachable from an ancestor pseudo-vertex (since they have inbound edges) but it is not possible to reach one branch from another parallel branch so we only need to consider whether the resulting graph is a simple path with no branches.
Count the number of root vertices:
If there is a single root pseudo-vertex (SCC) then any vertex contained in that SCC can reach all other vertices in the graph;
If there is more than 1 root pseudo-vertex then there is no vertex that has a path to all other vertices (since you cannot reach one root from a different root).
If the singular root pseudo-vertex and each subsequent descendant pseudo-vertex only has a single outbound edge (i.e. there are no branches) until the leaf is reached then the resulting graph has contains a path that can reach all vertices.
Examples:
After reducing the SCCs to pseudo-vertices if the graph is of the form:
(1) -> (2) -> ... (n-1) -> (n)
Then there is a path that can visit all vertices.
If it is of the form:
(1_a) --\
+--> (2) -> ... (n-1) -> (n)
(1_b) --/
Then the vertex (1_a) is not reachable from (1_b) and vice-versa so there is no path that can reach all vertices.
Similarly:
/-> (n_a)
(1) -> (2) -> ... -+
\-> n_b
Then the vertex (n_a) is not reachable from (n_b) and vice-versa so there is no path that can reach all vertices.
And finally, if it is the form:
/-> (x_a) -\
(1) -> (2) -> ... -+ +-> ... (n-1) -> (n)
\-> (x_b) -/
Then there no path that can reach both (x_a) and (x_b).
I don't know why some answer seems so unnecessarily complicated. In fact, you can just use the topological sort of the graph, and check whether there is an edge connecting each node and its subsequent node.

Find optimal route in farm land-dynamic programming/Dijkstra's

I was trying to solve a question on InterviewStreet (the competition has since ended). The problem is to build a ditch from a pond to a farm, given a N*M grid of elevations. The pond and the farm are one of the tiles within the N*M grid and won't be the same tile.
The elevations are numbers between 0 and 9. Additionally, you are given the coordinates of the pond and the farm (1-indexed, row followed by column), which each take up exactly one tile on the grid. You are to write a program that, given this data, computes the minimum cost to build an irrigation ditch.
More specifically, the input that will be fed into your program will be formatted as follows:
N M
pondLocationX pondLocationY
farmLocationX farmLocationY
elevationX1Y1elevationX1Y2...elevationX1YM
elevationX2Y1elevationX2Y2...elevationX2YM
.
.
.
elevationXNY1elevationXNY2...elevationXNYM
where pondLocationX and farmLocationX are integers in the interval [1, N], and pondLocationY and farmLocationY are integers in the interval [1, M], and all elements are integers in the interval [0, 9]. Note that a single space separates the X and Y coordinates of the farm and pond, but there are no spaces separating the elevations.
Given such an input, your program should print out the minimum cost to build an irrigation ditch from the pond to the farm. The constraints are as follows. The pond and farm will not be at the same location. The elevation of all tiles except for the pond can be increased or decreased at a cost of one for every unit of change (you may leave the elevation the same for a cost of 0). N and M will each be at most 300. After paying for any excavation that is necessary, you can build a ditch at 0 additional cost if there is a sequence of tiles starting at the pond and ending at the farm such that the following are true:
(Contiguous path) Each tile in the sequence is adjacent to the previous tile (no diagonal adjacency -- tiles in the interior of the map have exactly 4 adjacent tiles)
(Downhill path) Each tile in the sequence, including the pond and farm, has an elevation that is at most that of the previous tile in the sequence.
For example, if the input is the following:
3 5
1 1
3 4
27310
21171
77721
then we can build an irrigation ditch at a cost of just 4, since it suffices to lower the tile at location (1, 3) from 3 to 1 (cost 2), raise the tile at position (1, 5) from 0 to 1 (cost 1), and lower the farm, which is at location (3, 4), from 2 to 1 (cost 1). Note that you cannot travel diagonally to get from (2, 3) to (3, 4) in one step.
Solution:
I think this is a variation of the Djikstra's algorithm, i.e. use the farm as the source node, and stop when you calculate the shortest path to the pond. The "adjacent" tiles are your neighbours, and your edge weights are the differences in your elevations.
However, since you can modify the weights in two ways i.e. if you are higher than your neighbour, then you can either 1) decrease your height to match your neighbour's or 2) increase your neighbour's height to match yours. This effect can percolate outwards and I'm not able to capture this in the algorithm.
How can I adjust Djikstra's algorithm to acommodate for the fact that the weights can be changed?
Use the Dijkstra algorithm on the 3D grid N*M*10. Two vertices (x,y,z) and (x',y',z') are connected (with an oriented arc) if (x,y) and (x',y') are adjacent and z' is not greater than z. The cost on the arc is given by the difference between z' and the initial height at (x',y'). Then find the shortedst path from the pond (with its initial length) to the farm (even if the z coordinate is not the same.
It is possible that the minimal path finded in this way passes two times on the same point (x,y). For example it could pass first from (x,y,z') and then from (x,y,z''). But if this happens you can remove the path from (x,y,z') to (x,y,z'') since replacing (x,y,z') with (x,y,z'') costs equal or less then the path from (x,y,z') to (x,y,z''). So you can assume that for every point (x,y) the path uses only a single value of z.
So the path you have found is the solution to the given problem.

Extracting segments from a list of 8-connected pixels

Current situation: I'm trying to extract segments from an image. Thanks to openCV's findContours() method, I now have a list of 8-connected point for every contours. However, these lists are not directly usable, because they contain a lot of duplicates.
The problem: Given a list of 8-connected points, which can contain duplicates, extract segments from it.
Possible solutions:
At first, I used openCV's approxPolyDP() method. However, the results are pretty bad... Here is the zoomed contours:
Here is the result of approxPolyDP(): (9 segments! Some overlap)
but what I want is more like:
It's bad because approxPolyDP() can convert something that "looks like several segments" in "several segments". However, what I have is a list of points that tend to iterate several times over themselves.
For example, if my points are:
0 1 2 3 4 5 6 7 8
9
Then, the list of point will be 0 1 2 3 4 5 6 7 8 7 6 5 4 3 2 1 9... And if the number of points become large (>100) then the segments extracted by approxPolyDP() are unfortunately not duplicates (i.e : they overlap each other, but are not strictly equal, so I can't just say "remove duplicates", as opposed to pixels for example)
Perhaps, I've got a solution, but it's pretty long (though interesting). First of all, for all 8-connected list, I create a sparse matrix (for efficiency) and set the matrix values to 1 if the pixel belongs to the list. Then, I create a graph, with nodes corresponding to pixels, and edges between neighbouring pixels. This also means that I add all the missing edges between pixels (complexity small, possible because of the sparse matrix). Then I remove all possible "squares" (4 neighbouring nodes), and this is possible because I am already working on pretty thin contours. Then I can launch a minimal spanning tree algorithm. And finally, I can approximate every branch of the tree with openCV's approxPolyDP()
To sum up: I've got a tedious method, that I've not yet implemented as it seems error-prone. However, I ask you, people at Stack Overflow: are there other existing methods, possibly with good implementations?
Edit: To clarify, once I have a tree, I can extract "branches" (branches start at leaves or nodes linked to 3 or more other nodes) Then, the algorithm in openCV's approxPolyDP() is the Ramer–Douglas–Peucker algorithm, and here is the Wikipedia picture of what it does:
With this picture, it is easy to understand why it fails when points may be duplicates of each other
Another edit: In my method, there is something that may be interesting to note. When you consider points located in a grid (like pixels), then generally, the minimal spanning tree algorithm is not useful because there are many possible minimal trees
X-X-X-X
|
X-X-X-X
is fundamentally very different from
X-X-X-X
| | | |
X X X X
but both are minimal spanning trees
However, in my case, my nodes rarely form clusters because they are supposed to be contours, and there is already a thinning algorithm that runs beforehand in the findContours().
Answer to Tomalak's comment:
If DP algorithm returns 4 segments (the segment from the point 2 to the center being there twice) I would be happy! Of course, with good parameters, I can get to a state where "by chance" I have identical segments, and I can remove duplicates. However, clearly, the algorithm is not designed for it.
Here is a real example with far too many segments:
Using Mathematica 8, I created a morphological graph from the list of white pixels in the image. It is working fine on your first image:
Create the morphological graph:
graph = MorphologicalGraph[binaryimage];
Then you can query the graph properties that are of interest to you.
This gives the names of the vertex in the graph:
vertex = VertexList[graph]
The list of the edges:
EdgeList[graph]
And that gives the positions of the vertex:
pos = PropertyValue[{graph, #}, VertexCoordinates] & /# vertex
This is what the results look like for the first image:
In[21]:= vertex = VertexList[graph]
Out[21]= {1, 3, 2, 4, 5, 6, 7, 9, 8, 10}
In[22]:= EdgeList[graph]
Out[22]= {1 \[UndirectedEdge] 3, 2 \[UndirectedEdge] 4, 3 \[UndirectedEdge] 4,
3 \[UndirectedEdge] 5, 4 \[UndirectedEdge] 6, 6 \[UndirectedEdge] 7,
6 \[UndirectedEdge] 9, 8 \[UndirectedEdge] 9, 9 \[UndirectedEdge] 10}
In[26]:= pos = PropertyValue[{graph, #}, VertexCoordinates] & /# vertex
Out[26]= {{54.5, 191.5}, {98.5, 149.5}, {42.5, 185.5},
{91.5, 138.5}, {132.5, 119.5}, {157.5, 72.5},
{168.5, 65.5}, {125.5, 52.5}, {114.5, 53.5},
{120.5, 29.5}}
Given the documentation, http://reference.wolfram.com/mathematica/ref/MorphologicalGraph.html, the command MorphologicalGraph first computes the skeleton by morphological thinning:
skeleton = Thinning[binaryimage, Method -> "Morphological"]
Then the vertex are detected; they are the branch points and the end points:
verteximage = ImageAdd[
MorphologicalTransform[skeleton, "SkeletonEndPoints"],
MorphologicalTransform[skeleton, "SkeletonBranchPoints"]]
And then the vertex are linked after analysis of their connectivity.
For example, one could start by breaking the structure around the vertex and then look for the connected components, revealing the edges of the graph:
comp = MorphologicalComponents[
ImageSubtract[
skeleton,
Dilation[vertices, CrossMatrix[1]]]];
Colorize[comp]
The devil is in the details, but that sounds like a solid starting point if you wish to develop your own implementation.
Try math morphology. First you need to dilate or close your image to fill holes.
cvDilate(pimg, pimg, NULL, 3);
cvErode(pimg, pimg, NULL);
I got this image
The next step should be applying thinning algorithm. Unfortunately it's not implemented in OpenCV (MATLAB has bwmorph with thin argument). For example with MATLAB I refined the image to this one:
However OpenCV has all needed basic morphological operations to implement thinning (cvMorphologyEx, cvCreateStructuringElementEx, etc).
Another idea.
They say that distance transform seems to be very useful in such tasks. May be so.
Consider cvDistTransform function. It creates to an image like that:
Then using something like cvAdaptiveThreshold:
That's skeleton. I guess you can iterate over all connected white pixels, find curves and filter out small segments.
I've implemented a similar algorithm before, and I did it in a sort of incremental least-squares fashion. It worked fairly well. The pseudocode is somewhat like:
L = empty set of line segments
for each white pixel p
line = new line containing only p
C = empty set of points
P = set of all neighboring pixels of p
while P is not empty
n = first point in P
add n to C
remove n from P
line' = line with n added to it
perform a least squares fit of line'
if MSE(line) < max_mse and d(line, n) < max_distance
line = line'
add all neighbors of n that are not in C to P
if size(line) > min_num_points
add line to L
where MSE(line) is the mean-square-error of the line (sum over all points in the line of the squared distance to the best fitting line) and d(line,n) is the distance from point n to the line. Good values for max_distance seem to be a pixel or so and max_mse seems to be much less, and will depend on the average size of the line segments in your image. 0.1 or 0.2 pixels have worked in fairly large images for me.
I had been using this on actual images pre-processed with the Canny operator, so the only results I have are of that. Here's the result of the above algorithm on an image:
It's possible to make the algorithm fast, too. The C++ implementation I have (closed source enforced by my job, sorry, else I would give it to you) processed the above image in about 20 milliseconds. That includes application of the Canny operator for edge detection, so it should be even faster in your case.
You can start by extraction straight lines from your contours image using HoughLinesP which is provided with openCV:
HoughLinesP(InputArray image, OutputArray lines, double rho, double theta, int threshold, double minLineLength = 0, double maxLineGap = 0)
If you choose threshold = 1 and minLineLenght small, you can even obtain all single elements. Be careful though, since it yields many results in case you have many edge pixels.

count of distinct acyclic paths from A[a,b] to A[c,d]?

I'm writing a sokoban solver for fun and practice, it uses a simple algorithm (something like BFS with a bit of difference).
now i want to estimate its running time ( O and omega). but need to know how to calculate count of acyclic paths from a vertex to another in a network.
actually I want an expression that calculates count of valid paths, between two vertices of a m*n matrix of vertices.
a valid path:
visits each vertex 0 or one times.
have no circuits
for example this is a valid path:
alt text http://megapic.ir/images/f1hgyp5yxcu8887kfvkr.png
but this is not:
alt text http://megapic.ir/images/wnnif13ir5gaqwvnwk9d.png
What is needed is a method to find count of all acyclic paths between the two vertices a and b.
comments on solving methods and tricks are welcomed.
Not a solution but maybe you can think this idea a bit further. The problem is that you'll need to calculate also the longest possible path to get all paths. The longest path problem is NP complete for general graphs, so it will get a very long time even for relative small graphs (8x8 and greater).
Imagine the start-vertex is in the top, left corner and the end vertex is in the lower, right corner of the matrix.
For a 1x2 matrix there is only 1 possible path
2x2 matrix => 2***1** paths => 2
3x2 matrix => 2***2** paths => 4
3x3 matrix => 2***4** + 2*2 paths => 12
3x4 matrix => 2***12** + 12 + 2 paths => 38
Everytime I combined the results from the previous calculation for the current number of paths. It could be that there is a close formular for such a planar graph, maybe there is even a lot of theory for that, but I am too stupid for that ...
You can use the following Java (sorry, I am not a c++ expert :-/) snippet to calculate possible paths for larger matrices:
public static void main(String[] args) {
new Main(3, 2).start();
}
int xSize;
int ySize;
boolean visited[][];
public Main(int maxX, int maxY) {
xSize = maxX;
ySize = maxY;
visited = new boolean[xSize][ySize];
}
public void start() {
// path starts in the top left corner
int paths = nextCell(0, 0);
System.out.println(paths);
}
public int nextCell(int x, int y) {
// path should end in the lower right corner
if (x == xSize - 1 && y == ySize - 1)
return 1;
if (x < 0 || y < 0 || x >= xSize || y >= ySize || visited[x][y]) {
return 0;
}
visited[x][y] = true;
int c = 0;
c += nextCell(x + 1, y);
c += nextCell(x - 1, y);
c += nextCell(x, y + 1);
c += nextCell(x, y - 1);
visited[x][y] = false;
return c;
}
=>
4x4 => 184
5x5 => 8512
6x6 => 1262816
7x7 (even this simple case takes a lot of time!) => 575780564
This means you could (only theoretically) compute all possible paths from any position of a MxM matrix to the lower, right corner and then use this matrix to quickly look up the number of paths. Dynamic programming (using previous calculated results) could speed up things a bit.
The general problem of counting the number of simple paths in a graph is #P complete. Some #P-complete problems have fully polynomial randomized approximation schemes, and some don't, but you claim not to be interested in an approximation. Perhaps there's a way to take advantage of the grid structure, as there is for computing the Tutte polynomial, but I don't have any ideas for how to do this.
There is a similar but less general problem on project Euler: http://projecteuler.net/index.php?section=problems&id=237
I think some of the solutions described in the forum there can be extended to solve your general case. It's a pretty difficult problem though, especially for your general case.
To get access to their forums, you first need to solve the problem. I won't post the answer here, nor link to a certain site that lists the answer, a site that you can easily find on google by searching for something really obvious.
This is an open question in Mathematics with direct application to chemistry and physics in using it to model polymer bonds. Some of the earliest work done on this was done during the Manhattan project (nuclear bomb WWII.)
It is better known as the Self Avoiding Walk problem.
I spent a summer at my university mathematics department researching a monte-carlo algorithm called the pivot algorithm to approximate the parameters of the asymptotic fit of the number of Self-Avoiding Walks of a given length n.
Please refer to Gordon Slade's excellent book titled "The Self Avoiding Walk" for extensive coverage of the types of techniques used to approach this problem thus far.
This is a very complex problem and I wonder what your motivation may be for considering it. Perhaps you can find a simpler model for what you want, because Self Avoiding Walks are not simple at all.
Would a matrix showing the edges work? Consider building a Matrix showing where the edges are,i.e. [a,b]=1 <=> a->b is an edge in the graph, 0 otherwise. Now, raise this Matrix to various powers to show how many ways exist to get between vertices using n steps and then sum them to get the result. This is just an idea of one way to solve the problem, there may be other ways to frame the problem.
I wonder if this belongs on MathOverflow, as another idea
True, that once you have a zero matrix you could stop exponentiating as in your case, there aren't many places to go after 3, but the paths from 1 to 3 would be the direct one and the one that goes through 2, so there are only a few matrices to add together before the all zero one is found.
I'd think there should be a way to compute a bound of say n^(n+1), where n is the number of vertices in the graph, that would indicate a stopping point as by that point, every vertex will have been visited once. I'm not sure how to get the cyclic paths out of the problem though, or can one assume that the graph is free of cycles?