count of distinct acyclic paths from A[a,b] to A[c,d]? - c++

I'm writing a sokoban solver for fun and practice, it uses a simple algorithm (something like BFS with a bit of difference).
now i want to estimate its running time ( O and omega). but need to know how to calculate count of acyclic paths from a vertex to another in a network.
actually I want an expression that calculates count of valid paths, between two vertices of a m*n matrix of vertices.
a valid path:
visits each vertex 0 or one times.
have no circuits
for example this is a valid path:
alt text http://megapic.ir/images/f1hgyp5yxcu8887kfvkr.png
but this is not:
alt text http://megapic.ir/images/wnnif13ir5gaqwvnwk9d.png
What is needed is a method to find count of all acyclic paths between the two vertices a and b.
comments on solving methods and tricks are welcomed.

Not a solution but maybe you can think this idea a bit further. The problem is that you'll need to calculate also the longest possible path to get all paths. The longest path problem is NP complete for general graphs, so it will get a very long time even for relative small graphs (8x8 and greater).
Imagine the start-vertex is in the top, left corner and the end vertex is in the lower, right corner of the matrix.
For a 1x2 matrix there is only 1 possible path
2x2 matrix => 2***1** paths => 2
3x2 matrix => 2***2** paths => 4
3x3 matrix => 2***4** + 2*2 paths => 12
3x4 matrix => 2***12** + 12 + 2 paths => 38
Everytime I combined the results from the previous calculation for the current number of paths. It could be that there is a close formular for such a planar graph, maybe there is even a lot of theory for that, but I am too stupid for that ...
You can use the following Java (sorry, I am not a c++ expert :-/) snippet to calculate possible paths for larger matrices:
public static void main(String[] args) {
new Main(3, 2).start();
}
int xSize;
int ySize;
boolean visited[][];
public Main(int maxX, int maxY) {
xSize = maxX;
ySize = maxY;
visited = new boolean[xSize][ySize];
}
public void start() {
// path starts in the top left corner
int paths = nextCell(0, 0);
System.out.println(paths);
}
public int nextCell(int x, int y) {
// path should end in the lower right corner
if (x == xSize - 1 && y == ySize - 1)
return 1;
if (x < 0 || y < 0 || x >= xSize || y >= ySize || visited[x][y]) {
return 0;
}
visited[x][y] = true;
int c = 0;
c += nextCell(x + 1, y);
c += nextCell(x - 1, y);
c += nextCell(x, y + 1);
c += nextCell(x, y - 1);
visited[x][y] = false;
return c;
}
=>
4x4 => 184
5x5 => 8512
6x6 => 1262816
7x7 (even this simple case takes a lot of time!) => 575780564
This means you could (only theoretically) compute all possible paths from any position of a MxM matrix to the lower, right corner and then use this matrix to quickly look up the number of paths. Dynamic programming (using previous calculated results) could speed up things a bit.

The general problem of counting the number of simple paths in a graph is #P complete. Some #P-complete problems have fully polynomial randomized approximation schemes, and some don't, but you claim not to be interested in an approximation. Perhaps there's a way to take advantage of the grid structure, as there is for computing the Tutte polynomial, but I don't have any ideas for how to do this.

There is a similar but less general problem on project Euler: http://projecteuler.net/index.php?section=problems&id=237
I think some of the solutions described in the forum there can be extended to solve your general case. It's a pretty difficult problem though, especially for your general case.
To get access to their forums, you first need to solve the problem. I won't post the answer here, nor link to a certain site that lists the answer, a site that you can easily find on google by searching for something really obvious.

This is an open question in Mathematics with direct application to chemistry and physics in using it to model polymer bonds. Some of the earliest work done on this was done during the Manhattan project (nuclear bomb WWII.)
It is better known as the Self Avoiding Walk problem.
I spent a summer at my university mathematics department researching a monte-carlo algorithm called the pivot algorithm to approximate the parameters of the asymptotic fit of the number of Self-Avoiding Walks of a given length n.
Please refer to Gordon Slade's excellent book titled "The Self Avoiding Walk" for extensive coverage of the types of techniques used to approach this problem thus far.
This is a very complex problem and I wonder what your motivation may be for considering it. Perhaps you can find a simpler model for what you want, because Self Avoiding Walks are not simple at all.

Would a matrix showing the edges work? Consider building a Matrix showing where the edges are,i.e. [a,b]=1 <=> a->b is an edge in the graph, 0 otherwise. Now, raise this Matrix to various powers to show how many ways exist to get between vertices using n steps and then sum them to get the result. This is just an idea of one way to solve the problem, there may be other ways to frame the problem.
I wonder if this belongs on MathOverflow, as another idea
True, that once you have a zero matrix you could stop exponentiating as in your case, there aren't many places to go after 3, but the paths from 1 to 3 would be the direct one and the one that goes through 2, so there are only a few matrices to add together before the all zero one is found.
I'd think there should be a way to compute a bound of say n^(n+1), where n is the number of vertices in the graph, that would indicate a stopping point as by that point, every vertex will have been visited once. I'm not sure how to get the cyclic paths out of the problem though, or can one assume that the graph is free of cycles?

Related

Maximizing a Ratio/Percent

I'm using cvxpy to model a problem.
Inside a very large and complex LP, I create two continuous, affine (unconstrained) expressions:x and y.
Due to how they are created, I know that 0 < x < y <= U. Obviously: x/y < 1.
In my LP objective, how do I maximize the ratio x/y?
Things I tried:
Maximizing x*cp.inv_pos(Y) states my problem is non DCP (also if I try to minimize the inverse)
I found various LP formulations for maximizing ratios (e.g. here or here) but these requires rewriting the constraints on all the terms in the expressions for x - I have no idea how to do that with cvxpy.
If this is the way to go then an example would be very helpful!

Histogram Binning of Gradient Vectors

I am working on a project that has a small component requiring the comparison of distributions over image gradients. Assume I have computed the image gradients in the x and y directions using a Sobel filter and have for each pixel a 2-vector. Obviously getting the magnitude and direction is reasonably trivial and is as follows:
However, what is not clear to me is how to bin these two components in to a two dimensional histogram for an arbitrary number of bins.
I had considered something along these lines(written in browser):
//Assuming normalised magnitudes.
//Histogram dimensions are bins * bins.
int getHistIdx(float mag, float dir, int bins) {
const int magInt = reinterpret_cast<int>(mag);
const int dirInt = reinterpret_cast<int>(dir);
const int magMod = reinterpret_cast<int>(static_cast<float>(1.0));
const int dirMod = reinterpret_cast<int>(static_cast<float>(TWO_PI));
const int idxMag = (magInt % magMod) & bins
const int idxDir = (dirInt % dirMod) & bins;
return idxMag * bins + idxDir;
}
However, I suspect that the mod operation will introduce a lot of incorrect overlap, i.e. completely different gradients getting placed in to the same bin.
Any insight in to this problem would be very much appreciated.
I would like to avoid using any off the shelf libraries as I want to keep this project as dependency light as possible. Also I intend to implement this in CUDA.
This is more of a what is an histogram question? rather than one of your tags. Two things:
In a 2D plain two directions equal by modulation of 2pi are in fact the same - so it makes sense to modulate.
I see no practical or logical reason of modulating the norms.
Next, you say you want a "two dimensional histogram", but return a single number. A 2D histogram, and what would make sense in your context, is a 3D plot - the plane is theta/R, 2 indexed, while the 3D axis is the "count".
So first suggestion, return
return Pair<int,int>(idxMag,idxDir);
Then you can make a 2D histogram, or 2 2D histograms.
Regarding the "number of bins"
this is use case dependent. You need to define the number of bins you want (maybe different for theta and R). Maybe just some constant 10 bins? Maybe it should depend on the amount of vectors? In any case, you need a function that receives either the number of vectors, or the total set of vectors, and returns the number of bins for each axis. This could be a constant (10 bins) initially, and you can play with it. Once you decide on the number of bins:
Determine the bins
For a bounded case such as 0<theta<2 pi, this is easy. Divide the interval equally into the number of bins, assuming a flat distribution. Your modulation actually handles this well - if you would have actually modulated by 2*pi, which you didn't. You would still need to determine the bin bounds though.
For R this gets trickier, as this is unbounded. Two options here, but both rely on the same tactic - choose a maximal bin. Either arbitrarily (Say R=10), so any vector longer than that is placed in the "longer than max" bin. The rest is divided equally (for example, though you could choose other distributions). Another option is for the longest vector to determine the edge of the maximal bin.
Getting the index
Once you have the bins, you need to search the magnitude/direction of the current vector in your bins. If bins are pairs representing min/max of bin (and maybe an index), say in a linked list, then it would be something like (for mag for example):
bin = histogram.first;
while ( mag > bin.min ) bin = bin.next;
magIdx = bin.index;
If the bin does not hold the index you can just use a counter and increase it in the while. Also, for the magnitude the final bin should hold "infinity" or some large number as a limit. Note this has nothing to do with modulation, though that would work for your direction - as you have coded. I don't see how this makes sense for the norm.
Bottom line though, you have to think a bit about what you want. In any case all the "objects" here are trivial enough to write yourself, or even use small arrays.
I think you should arrange your bins in a square array, and then bin by vx and vy independently.
If your gradients are reasonably even you just need to scan the data first to accumulate the min and max in x and y, and then split the gradients evenly.
If the gradients are very unevenly distributed, you might want to sort the (eg) vx first and arrange that the boundaries between each bin exactly evenly divides the values.
An intermediate solution might be to obtain the min and max ignoring the (eg) 10% most extreme values.

How to find a set of points in a spherical space with specific radius around one point in a unorganized point cloud

I have a point cloud that consist of nearly 20000 points. We consider a point in the point cloud. I want to determine the set of points that are inside an spherical space around each point within the a predefined radius. In the image below there is a very clear illustration of what I mean,
[A point with a spherical space around it [Gross et al. (2007)][1]] 1
I have written the simplest way to find each set of point by using two loops. Here is the function,
void FindPointsInsideSphere(std::vector<Point>& points, double radius)
{
for (int i = 0; i < points.size(); i++)
{
for (int j = 0; j < points.size(); j++)
{
if (i != j && Distance(points[i], points[j]) < radius)
{
points[i].Sphere.push_back(points[j]);
}
}
}
}
The issue here is that the proposed algorithm is very time consuming. I wanted to know if there are any suggestions that can accelerate the process.
There was a similar question on stackexchange: https://cstheory.stackexchange.com/questions/17971/search-for-all-nearest-neighbors-within-a-certain-radius-of-a-point-in-3d
After discussing with #AMA and googling I realized that both KDTree and uniform grids improve the time complexity of brute force.
Grid
Radius search complexity:
O(log(n) + k)
Where k is the approximate number of neighbours inside the sphere.
I've never personally tried this approach but as #AMA pointed it has better time complexity than kdtree query.
After a quick google search I found this implementation on github.
KDTree
Radius search complexity:
O(sqrt(n) + k)
Where k is the approximate number of neighbours inside the sphere.
The best open source implementation the KDTree search FLANN, wich comes with multiple languages support.
Take a look at the manual in section: 3.1.4 flann::Index::radiusSearch
If you want to implement this by yourself you can take a look at the source code in github for inspiration.
I wanted to know if there are any suggestions that can accelerate the
process.
One well known way is to utilize any of space-partitioning data structures from a long list.
The easiest to implement (arguably) is a simple 3D grid.

Error in calculating exact nearest neighbors in radius with FLANN

I am trying to find the exact number of neighbour nodes in a big 3D points dataset. The goal is for each point of the dataset to retrieve all the possible neighbours in a region with a given radius. FLANN ensures that for lower dimensional data can retrieve the exact neighbors while comparing with brute force search it seems to not be the case. The neighbors are essential for further calculations and therefore I need the exact number. I tested increasing the radius a little bit but doesn't seem to be this the problem. Is anyone aware how to calculate the exact neighbors with FLANN or other C++ library?
The code:
// All nodes to be tested for inclusion in support domain.
flann::Matrix<double> query_nodes = flann::Matrix<double>(&nodes_pos[0].x, nodes_pos.size(), 3);
// Set default search parameters
flann::SearchParams search_parameters = flann::SearchParams();
search_parameters.checks = -1;
search_parameters.sorted = false;
search_parameters.use_heap = flann::FLANN_True;
flann::KDTreeSingleIndexParams index_parameters = flann::KDTreeSingleIndexParams();
flann::KDTreeSingleIndex<flann::L2_3D<double> > index(query_nodes, index_parameters);
index.buildIndex();
//FLANN uses L2 for radius search.
double l2_radius = (this->support_layer_*grid.spacing)*(this->support_layer_*grid.spacing);
double extension = l2_radius/10.;
l2_radius+= extension;
index.radiusSearch(query_nodes, indices, dists, l2_radius, search_parameters);
Try nanoflann. It is designed for low dimensional spaces and gives exact nearest neighbors. Furthermore, it is just one header file that you can either "install" or just copy to your project.
You should check page 6+ from the flann-manual, to fine-tune your search parameters, such as target_precision, which should be set to 1, for "maximum" accuracy.
That parameter is often found as epsilon (ε) in Approximate Nearest Neighbor Search (ANNS), which is used in high dimensional spaces, in order to (try) to beat the curse of dimensionality. FLANN is usually used in 128 dimensions, not 3, as far as I can tell, which may explain the bad performance you are experiencing.
A c++ library that works well in 3 dimensions is CGAL. However, it's much larger than FLANN, because it is a library for computational geometry, thus it provides functionality for many problems, not just NNS.

Backpropagation 2-Dimensional Neuron Network C++

I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!
http://ufldl.stanford.edu/wiki/index.php/Main_Page
http://ufldl.stanford.edu/tutorial/