How to find all equals paths in degenerate tree from specific start vertex? [duplicate] - c++

I have some degenerate tree (it looks like as array or doubly linked list). For example, it is this tree:
Each edge has some weight. I want to find all equal paths, which starts in each vertex.
In other words, I want to get all tuples (v1, v, v2) where v1 and v2 are an arbitrary ancestor and descendant such that c(v1, v) = c(v, v2).
Let edges have the following weights (it is just example):
a-b = 3
b-c = 1
c-d = 1
d-e = 1
Then:
The vertex A does not have any equal path (there is no vertex from left side).
The vertex B has one equal pair. The path B-A equals to the path B-E (3 == 3).
The vertex C has one equal pair. The path B-C equals to the path C-D (1 == 1).
The vertex D has one equal pair. The path C-D equals to the path D-E (1 == 1).
The vertex E does not have any equal path (there is no vertex from right side).
I implement simple algorithm, which works in O(n^2). But it is too slow for me.

You write, in comments, that your current approach is
It seems, I looking for a way to decrease constant in O(n^2). I choose
some vertex. Then I create two set. Then I fill these sets with
partial sums, while iterating from this vertex to start of tree and to
finish of tree. Then I find set intersection and get number of paths
from this vertex. Then I repeat algorithm for all other vertices.
There is a simpler and, I think, faster O(n^2) approach, based on the so called two pointers method.
For each vertix v go at the same time into two possible directions. Have one "pointer" to a vertex (vl) moving in one direction and another (vr) into another direction, and try to keep the distance from v to vl as close to the distance from v to vr as possible. Each time these distances become equal, you have equal paths.
for v in vertices
vl = prev(v)
vr = next(v)
while (vl is still inside the tree)
and (vr is still inside the tree)
if dist(v,vl) < dist(v,vr)
vl = prev(vl)
else if dist(v,vr) < dist(v,vl)
vr = next(vr)
else // dist(v,vr) == dist(v,vl)
ans = ans + 1
vl = prev(vl)
vr = next(vr)
(By precalculating the prefix sums, you can find dist in O(1).)
It's easy to see that no equal pair will be missed provided that you do not have zero-length edges.
Regarding a faster solution, if you want to list all pairs, then you can't do it faster, because the number of pairs will be O(n^2) in the worst case. But if you need only the amount of these pairs, there might exist faster algorithms.
UPD: I came up with another algorithm for calculating the amount, which might be faster in case your edges are rather short. If you denote the total length of your chain (sum of all edges weight) as L, then the algorithm runs in O(L log L). However, it is much more advanced conceptually and more advanced in coding too.
Firstly some theoretical reasoning. Consider some vertex v. Let us have two arrays, a and b, not the C-style zero-indexed arrays, but arrays with indexation from -L to L.
Let us define
for i>0, a[i]=1 iff to the right of v on the distance exactly i there
is a vertex, otherwise a[i]=0
for i=0, a[i]≡a[0]=1
for i<0, a[i]=1 iff to the left of v on the distance exactly -i there is a vertex, otherwise a[i]=0
A simple understanding of this array is as follows. Stretch your graph and lay it along the coordinate axis so that each edge has the length equal to its weight, and that vertex v lies in the origin. Then a[i]=1 iff there is a vertex at coordinate i.
For your example and for vertex "b" chosen as v:
a--------b--c--d--e
--|--|--|--|--|--|--|--|--|-->
-4 -3 -2 -1 0 1 2 3 4
a: ... 0 1 0 0 1 1 1 1 0 ...
For another array, array b, we define the values in a symmetrical way with respect to origin, as if we have inverted the direction of the axis:
for i>0, b[i]=1 iff to the left of v on the distance exactly i there
is a vertex, otherwise b[i]=0
for i=0, b[i]≡b[0]=1
for i<0, b[i]=1 iff to the right of v on the distance exactly -i there is a vertex, otherwise b[i]=0
Now consider a third array c such that c[i]=a[i]*b[i], asterisk here stays for ordinary multiplication. Obviously c[i]=1 iff the path of length abs(i) to the left ends in a vertex, and the path of length abs(i) to the right ends in a vertex. So for i>0 each position in c that has c[i]=1 corresponds to the path you need. There are also negative positions (c[i]=1 with i<0), which just reflect the positive positions, and one more position where c[i]=1, namely position i=0.
Calculate the sum of all elements in c. This sum will be sum(c)=2P+1, where P is the total number of paths which you need with v being its center. So if you know sum(c), you can easily determine P.
Let us now consider more closely arrays a and b and how do they change when we change the vertex v. Let us denote v0 the leftmost vertex (the root of your tree) and a0 and b0 the corresponding a and b arrays for that vertex.
For arbitrary vertex v denote d=dist(v0,v). Then it is easy to see that for vertex v the arrays a and b are just arrays a0 and b0 shifted by d:
a[i]=a0[i+d]
b[i]=b0[i-d]
It is obvious if you remember the picture with the tree stretched along a coordinate axis.
Now let us consider one more array, S (one array for all vertices), and for each vertex v let us put the value of sum(c) into the S[d] element (d and c depend on v).
More precisely, let us define array S so that for each d
S[d] = sum_over_i(a0[i+d]*b0[i-d])
Once we know the S array, we can iterate over vertices and for each vertex v obtain its sum(c) simply as S[d] with d=dist(v,v0), because for each vertex v we have sum(c)=sum(a0[i+d]*b0[i-d]).
But the formula for S is very simple: S is just the convolution of the a0 and b0 sequences. (The formula does not exactly follow the definition, but is easy to modify to the exact definition form.)
So what we now need is given a0 and b0 (which we can calculate in O(L) time and space), calculate the S array. After this, we can iterate over S array and simply extract the numbers of paths from S[d]=2P+1.
Direct application of the formula above is O(L^2). However, the convolution of two sequences can be calculated in O(L log L) by applying the Fast Fourier transform algorithm. Moreover, you can apply a similar Number theoretic transform (don't know whether there is a better link) to work with integers only and avoid precision problems.
So the general outline of the algorithm becomes
calculate a0 and b0 // O(L)
calculate S = corrected_convolution(a0, b0) // O(L log L)
v0 = leftmost vertex (root)
for v in vertices:
d = dist(v0, v)
ans = ans + (S[d]-1)/2
(I call it corrected_convolution because S is not exactly a convolution, but a very similar object for which a similar algorithm can be applied. Moreover, you can even define S'[2*d]=S[d]=sum(a0[i+d]*b0[i-d])=sum(a0[i]*b0[i-2*d]), and then S' is the convolution proper.)

Related

Given n points, how can I find the number of points with given distance

I have an input of n unique points (X,Y) that are between 0 and 2^32 inclusive. The coordinates are integers.
I need to create an algorithm that finds the number of pairs of points with a distance of exactly 2018.
I have thought of checking with every other point but it would be O(n^2) and I have to make it more efficient. I also thought of using a set or a vector and sort it using a comparator based on the distance with the origin point but it wouldn't help at all.
So how can I do it efficiently?
There is one Pythagorean triple with the hypotenuse of 2018: 11182+16802=20182.
Since all coordinates are integers, the only possible differences between the coordinates (both X an Y) of the two points are 0, 1118, 1680, and 2018.
Finding all pairs of points with a given difference between X (or Y) coordinates is a simple n log n operation.
Numbers other than 2018 might need a bit more work because they might be members of more than one Pythagorean triple (for example 2015 is a hypotenuse of 3 triples). If the number is not given as a constant, but provided at run time, you will have to generate all triples with this hypotenuse. This may require some sqrt(N) effort (N is the hypotenuse, not the number of points). One can find a recipe on the math stackexchange, e.g. here (there are many others).
You could try using a Quadtree. First you start sorting your points into the quadtree. You should specify a lower limit for the cell size of e.g. 2048 wich is a power of 2. Then iterate though the points and calculate distances to the points in the same cell and to the points in adjacent cells. That way you should be able to decrease the number of distance calculations drastically.
The main difficulty will probably be implementing the tree structure. You also have to find a way to find adjacent cells (you must include the possibility to traverse upwards in the tree)
The complexity of this is probably O(n*log(n)) in the best case but don't pin me down on that.
One additional word on the distance calculation: You are probably much faster if you don't do
dx = p1x - p2x;
dy = p1y - p2y;
if ( sqrt(dx*dx + dy*dy) == 2018 ) {
...
}
but
dx = p1x - p2x;
dy = p1y - p2y;
if ( dx*dx + dy*dy == 2018*2018 ) {
...
}
Squaring is faster than taking the sqare root. So just compare the square of the distance with the square of 2018.

Find the summation of forces between all possible pairs of points?

There are n points with each having two attributes:
1. Position (from axis)
2. Attraction value (integer)
Attraction force between two points A & B is given by:
Attraction_force(A, B) = (distance between them) * Max(Attraction_val_A, Attraction_val_B);
Find the summation of all the forces between all possible pairs of points?
I tried by calculating and adding forces between all the pairs
for(int i=0; i<n-1; i++) {
for(int j=i+1; j<n; j++) {
force += abs(P[i].pos - P[j].pos) * max(P[i].attraction_val, P[j].attraction_val);
}
}
Example:
Points P1 P2 P3
Points distance: 2 3 4
Attraction Val: 4 5 6
Force = abs(2 - 3) * max(4, 5) + abs(2 - 4) * max(4, 6) + abs(3 - 4) * max(5, 6) = 23
But this takes O(n^2) time, I can't think of a way to reduce it further!!
Scheme of a solution:
Sort all points by their attraction value and process them one-by-one, starting with the one with lowest attraction.
For each point you have to quickly calculate sum of distances to all previously added points. That can be done using any online Range Sum Query problem solution, like segment tree or BIT. Key idea is that all points to the left are really not different and sum of their coordinates is enough to calculate sum of distances to them.
For each newly added point you just multiply that sum of distances (obtained on step 2) by point's attraction value and add that to the answer.
Intuitive observations that I made in order to invent this solution:
We have two "bad" functions here (somewhat "discrete"): max and modulo (in distance).
We can get rid of max by sorting our points and processing them in a specific order.
We can get rid of modulo if we process points to the left and to the right separately.
After all these transformations, we have to calculate something which, after some simple algebraic transformations, converts to an online RSQ problem.
An algorithm of:
O(N2)
is optimal, because you need the actual distance between all possible pairs.

A* Pathfinding without diagonal movement

I'm currently trying to implement A* in c++ using: Link
However for the first version i decided to not include diagonal movement.
In the summary section in point c where it loops checking the neighbours of the current point:
for each neighbour of the current square (above, below, left, right)
if neighbour on closed list or not walkable {
continue
}
if neighbour not in open list {
add to open list
set parent of neighbour to current square
update F, G, H values
} else if neighbour is on open list {
check to see if this path to that square is better,
using G cost as the measure. A lower G cost means that this is a better path.
If so, change the parent of the square to the current square,
and recalculate the G and F scores of the square.
}
If i am only allowing 4 direction movement, do i still need to check the g cost to see if the path to that square is better? For example starting at the start point all 4 neighbours of the start point will have the same g.
The point of checking for the lower G cost is to set a new path for that square if a better one is eventually found. You may initially find a path from A to a square C. But when searching later, you'll find a different path to a neighbor of C. If C is not in the closed list, the G cost of the new path to C may be lower than the first, so you want to update C with the better G (and thus F) value.
Let's analyze the situation. The basic point is that every edge in the graph has the same weight.
Assume that you are currently at vertex c and you analyze neighbor n. Then the updated distance for neighbor n would be distance(c) + w, where w is the uniform edge length. Now the question is if n can have a larger distance than that via a different path.
Assume the previous parent p of n that would lead to a longer path. Then, n has the distance distance(p) + w. In order for that expression to be larger than the updated cost from c, the following needs to hold:
distance(p) + w > distance(c) + w
distance(p) > distance(c)
So p would have to be farther away from the start than c. If you used Dijkstra's algorithm, this would not be possible because this would mean that p would be fixed after c. But you use A* and determine the order with the heuristic. Hence, the following two conditions need to hold:
distance(p) > distance(c)
distance(p) + heuristic(p) <= distance(c) + heuristic(c)
<=> heuristic(p) <= heuristic(c) - (distance(p) - distance(c))
So it comes down to your heuristic. The widely used Manhattan distance heuristic allows this. So if you use this heuristic, you have to check for a lower cost. If your heuristic does not allow the above condition (for example a constant heuristic as in the case of Dijkstra), you don't.

why does floyd warshall just use one distance matrix?

I read the pseudocode of the floyd warshall algorithm
1 let dist be a |V| × |V| array of minimum distances initialized to ∞ (infinity)
2 for each vertex v
3 dist[v][v] ← 0
4 for each edge (u,v)
5 dist[u][v] ← w(u,v) // the weight of the edge (u,v)
6 for k from 1 to |V|
7 for i from 1 to |V|
8 for j from 1 to |V|
9 if dist[i][j] > dist[i][k] + dist[k][j]
10 dist[i][j] ← dist[i][k] + dist[k][j]
11 end if
But it just uses one dist matrix to save distances.
I think there should be n dist matrixes, where n is the number of vertexes,
Or at least we need two dist matrixes.
one stores the current shortest path within vertexes k-1,
the other stores the shortest path within vertexes k,
then the first one stores shortest path within k+1,
....
How can we just store the new shortest path distances within vertexes k in original matrix for distances within vertexes k-1?
this picture shows we need D0, D1, D2....D(n)
You're right in the sense that the original formula requires that calculations for step k needs to use calculations from step k-1:
That can be organized easily if, as you say, first matrix is used to store values from step k-1 second is used to store values from k, the first one is used again to store values from k+1 etc.
But, if we use the same matrix when updating values, in the above formula we might accidentally use instead of if value for index i,k has already been updated during the current round k, or we might get instead of if value for index k,j has been updated. Won't that be a violation of the algorithm, as we are using the wrong recursive update formula?
Well, not really. Remember, Floyd-Warshall algorithm deals with "no negative cycles" constraint, which means that there is no cycle with edges that sum to a negative value. This means that for any k the shortest path from node k to node k is 0 (otherwise it would mean that there is a path from k to k with edges that sum to a negative value). So by definition:
Now, let's just take the first formula and replace j with k:
And then let's replace in the same formula 'i' with 'k':
So, basically, will have the same value as and will have the same value as , so it doesn't really matter whether these values are updated or not during the cycle 'k' and so you can update the same matrix while reading it without breaking the algorithm.
You are partially correct here.
The output of Floyd Warshall Algorithm(i.e the NxN matrix) DOESN'T help to reconstruct the actual shortest path between any two given vertices.
These paths can be recovered if we retain a parent matrix P, such that it stores the last intermediate vertex used for each vertex pair (x, y). Say this value is k.
The shortest path from x to y is the concatenation of the shortest path from x to k with the shortest path from k to y, which can be reconstructed recursively given the matrix P.
Note,however, that most all-pairs applications need only the resulting distance matrix.These jobs are what Floyd’s algorithm was designed for.

Populating STL surface mesh uniformly with points

I would like to be able to take a STL file (triangulated surface mesh) and populate the mesh with points such that the density of points in constant. I am writing the program in Fortran.
So far I can read in binary STL files and store vertices and surface normals. Here is an example file which has been read in (2D view for simplicity).
My current algorithm fills each triangle using the following formula:
x = v1 + a(v2 - v1) + b(v3 - v1) (from here)
Where v1, v2, v3 are the vertices of the triangle and x is a arbitrary position within the triangle (or on the edges) . "a" and "b" vary between 0 and 1 and their sum is less than 1. They represent the distance along two of the edges (which start from the same vertex). The gap between the particles should be the same for each edge. Below is an example of the results I get:
The resulting particle density if nowhere near uniform. Do you have any idea how I can adapt my code such that the density will be constant from triangle to triangle? Relevant code below:
! Do for every triangle in the STL file
DO i = 1, nt
! The distance vector from the second point to the first
v12 = (/v(1,j+1)-v(1,j),v(2,j+1)-v(2,j),v(3,j+1)- v(3,j)/)
! The distance vector from the third point to the first
v13 = (/v(1,j+2)-v(1,j),v(2,j+2)-v(2,j),v(3,j+2)- v(3,j)/)
! The scalar distance from the second point to the first
dist_a = sqrt( v12(1)**2 + v12(2)**2 + v12(3)**2 )
! The scalar distance from the third point to the first
dist_b = sqrt( v13(1)**2 + v13(2)**2 + v13(3)**2 )
! The number of particles to be generated along the first edge vector
no_a = INT(dist_a / spacing)
! The number of particles to be generated along the second edge vector
no_b = INT(dist_b / spacing)
! For all the particles to be generated along the first edge
DO a = 1, no_a
! For all the particles to be generated along the second edge
DO b = 1, no_b
IF ((REAL(a)/no_a)+(REAL(b)/no_b)>1) EXIT
temp(1) = v(1,j) + (REAL(a)/no_a)*v12(1) + (REAL(b)/no_b)*v13(1)
temp(2) = v(2,j) + (REAL(a)/no_a)*v12(2) + (REAL(b)/no_b)*v13(2)
temp(3) = v(3,j) + (REAL(a)/no_a)*v12(3) + (REAL(b)/no_b)*v13(3)
k = k + 1
s_points(k, 1:3) = (/temp(1), temp(2), temp(3)/)
END DO
END DO
j = j + 3
END DO
The solution I went with was to split each triangle into two right angled triangles. This is done by projecting the vetex opposite the longest edge orthogonally onto the longest edge. This splits the trianlge into two smaller triangles each with a 90 degree angle. A detailed answer on how to do this can be found here. By generating points along both 90° bends, a uniform distribution of particles can be achieved.
This method needs to be adapted so that particles are not generated more than once along edges which are common to multiple triangles. I have not done this yet. See image below for results. This solution does not achieve an isotropic distribution but this is not a concern for my intended application.
(Thanks to Vladimir F for his comments and advice on norm2, I tried to implement his approach but was not competent enough to get it to work).