Axis of symmetry in cyclic graph - c++

I have to write a program in c++ which returns the number of axis of symmetry in a cyclic graph.
A cyclic graph has an axis of symmetry when values between opposite vertices or edges on the left are a mirror image for values on the right.
The axis of symmetry may intersect both vertices and edges.
for example:
Is there any way to do this faster than O(n^2)?

n.m.’s answer is actually nearly correct, but not in any case.
Lets call one of the nodes the start node, and the axis, passing start node, the main axis.
Flipping a graph over some axis is equals to flipping it over main axis and rotation:
After rotation, main node can be placed on any other node place (and we also always can find current axis for doing this).
If we store our graph as a string, then flipped graph described by a reversed string cyclically shifted by 0 to N-1 positions.
Equality of those strings means the equality of the graphs. Obviously, number of such matches is equals to the number of occurs of reversed string in the twice repeated graph’s string:
So yes, KMP does the trick with O(N) complexity.
But you should avoid the case when str equals to reverse(str), because match will be counted with both 0 and N shifts, despite the fact they describe the same axis. So, you should use not concatenation of str and itself, but only first (2*N – 1) chars of this concatenation to achieve the proper behavior in any case.


Data structure for Hamming cube

I have a Hamming cube, of general dimension, but in practice, usually, the dimension ranges from 3 to 6.
The search algorithm is:
Input: any vertex, `v`.
Find all vertices that lie in Hamming distance 1 from `v`.
Find all vertices that lie in Hamming distance 2 from `v`.
I do not know in advance how far away from v will I need to go. I might stop at distance 1 for example.
For instance, given this cube:
and v = 100, I would need to the vertices at Hamming distance 1, which are 000, 101, 110 (at any order). Then, I might need to get those in distance 2, namely 111, 001, 010. In the unlikely event of needing the vertices at distance 3 too, I Would get 011 as well.
A vertex of the cube may contain IDs (integers).
Which would be an appropriate Data structure to store this cube and efficiently search it? I am not really interested in other operations.
I thought about sorting all the bit sequences somehow, so that I can easily access them, but didn't get anything to work.
My approach so far:
data-structureUse a hashtable (specifically std::unordered_map), where the keys are the vertices and the values are the IDs.
algorithmGiven a vertex v, generate all sequences of bits within Hamming distance t (i.e. 1, 2, ...).
However, this requires me to call a function every time a vertex varrives (which often happens). I have a function to achieve this, based on this.
I'm rusty with C++, so I'll keep this abstract.
The neighbors of a given point of your Hamming cube are easily computable. Given a vertex's bit sequence, flip each bit individually.
You could precompute that, though. You could cache the results of your neighbors() function, or you could save them to an array. Each vertex would have its own neighbors, so you have one array for each vertex. That gives you, essentially, your adjacency list.
With that adjacency list, you can search your Hamming cube using depth-limited search, a variant of DFS (or BFS, I guess—but space complexity is worse) that only goes k units deep.
Your data structure is a good choice, but consider that your vertices are binary strings, so they cover all points from 0 to 2^n - 1. You might as well just use an array—lookup will still be O(1), and it'll be more compact because there aren't unused buckets sitting around.

Algorithm for fast array comparison and replacing elements with closest value. (Tracking Points)

I have two arrays currPoints and prevPoints. Both are not necessarily the same size. I want to compare each element in currPoints with prevPoints and replace the value in prevPoints that is closest to the value in currPoints.
After applying the algorithm
So what is the best algorithm/method for this? It needs to be fast.
Context: If it helps, I am trying to work on a tracking algorithm that takes points from two consecutive frames in a video and tries to figure out which points in the first frame correspond to points in the second frame. I hope to track objects and tag them with an ID this way. Speed is crucial as processing is to be done in realtime.
You need to sort both the arrays first. But remember the original orientation of the prevPoints array as you need to get the original array again at the end.
So after sorting:
Now you basically need to figure out which of the currPoints should get into prevPoints. The algorithm is will be similar to merge 2 sorted arrays just that you won't merge, instead replace values.
Initially both pointers are at the start of the corresponding arrays. 1 from currpoints should replace 2 in prevPoints based on the fact that the value in currPoints is less than prevPoints and you know that the next points in PrevPoints will only be higher than 2 (sorted arry, remember). Replace and move on the pointers.
Now currpointer is at 9 and prevpointer is at 5. Calculate the absolute difference and keep a store of the minimium absolute difference encountered so far and also the value of the number that caused the least minimum absolute difference to be encountered.(4 in this case). Move the prevpointer forward as the currpointer is pointing to a higher value.
Now prevpointer at 10 and currpointer at 9. 9 is less than 10 and so a replacement has to be done. As this minimum absolute difference is less than the earlier one ( 1 < 4 ) so 10 will be replaced by 9.
Now the prevpointer is at 13 and currpointer is at 15.
Proceed in the same fashion.
Rearrange the prevPoints array to the original orientation.
Hope this helps!!!
We sort the first list by the x positions, and the second list by the y positions. So each point has a position in each list. Now the way you do this for a nearest neighbor search (at least what I came up with) is you find the position of the point in each list through a binary search. Then we know 4 directions, either +-1 x or +-y, and basically we travel in each of these directions until such time as the best length so far is greater than the distance of just that one coordinate.
So we search in the each direction. And say the closest point is at a distance of 25, then if our next coord in the +X direction is beyond 25 in just the +X direction we can stop because even if the change in Y is 0, it cannot be closer.
This makes for a highly effective and quick n(log(n)) closest point algorithm to find a single point. But, also since we only need the two sorted lists once we have those in n(log(n)) time we can find the nearest point for all the remaining points in something like log(n) time. Find the position in the x sorted list, find the position in the y sorted list. Then spiral out until you truncate and have certainly found the nearest point. But, since the scaffolding is the same in each case it should simply end up being quite quick.
Though given your actual test case you might want to come up with something that is simply a very effective heuristic.
Simply tracing the points seems really naive, if we are tracing the same thing from frame to frame it should be the case that the point from F0 to F1 in F2 should actually be equal to the distance it travelled in F0 to F1. If we assume all these points are traveling in roughly straight lines, we could do a much better job than simply closest points. We could find generally the curves these points are taking. If we guess that their position should be `F2 by interpolating F0 and F1 and low and behold the position of a point there is really really close. Then we can be quite sure we nailed that.
Equally the objects one would assume have all the points travel roughly the same direction. Like each point travels +5,+5 from F0 to F1, not only can we guess their positions of F2 but we can know these objects make up the same object rather effectively.

Finding shortest path in a graph, with additional restrictions

I have a graph with 2n vertices where every edge has a defined length. It looks like **
I'm trying to find the length of the shortest path from u to v (smallest sum of edge lengths), with 2 additional restrictions:
The number of blue edges that the path contains is the same as the number of red edges.
The number of black edges that the path contains is not greater than p.
I have come up with an exponential-time algorithm that I think would work. It iterates through all binary combinations of length n - 1 that represent the path starting from u in the following way:
0 is a blue edge
1 is a red edge
There's a black edge whenever
the combination starts with 1. The first edge (from u) is then the first black one on the left.
the combination ends with 0. Then last edge (to v) is then the last black one on the right.
adjacent digits are different. That means we went from a blue edge to a red edge (or vice versa), so there's a black one in between.
This algorithm would ignore the paths that don't meet the 2 requirements mentioned earlier and calculate the length for the ones that do, and then find the shortest one. However doing it this way would probably be awfully slow and I'm looking for some tips to come up with a faster algorithm. I suspect it's possible to achieve with dynamic programming, but I don't really know where to start. Any help would be very appreciated. Thanks.
Seems like Dynamic Programming problem to me.
In here, v,u are arbitrary nodes.
Source node: s
Target node: t
For a node v, such that its outgoing edges are (v,u1) [red/blue], (v,u2) [black].
D(v,i,k) = min { ((v,u1) is red ? D(u1,i+1,k) : D(u1,i-1,k)) + w(v,u1) ,
D(u2,i,k-1) + w(v,u2) }
D(t,0,k) = 0 k <= p
D(v,i,k) = infinity k > p //note, for any v
D(t,i,k) = infinity i != 0
v - the current node
i - #reds_traversed - #blues_traversed
k - #black_edges_left
The stop clauses are at the target node, you end when reaching it, and allow reaching it only with i=0, and with k<=p
The recursive call is checking at each point "what is better? going through black or going though red/blue", and choosing the best solution out of both options.
The idea is, D(v,i,k) is the optimal result to go from v to the target (t), #reds-#blues used is i, and you can use up to k black edges.
From this, we can conclude D(s,0,p) is the optimal result to reach the target from the source.
Since |i| <= n, k<=p<=n - the total run time of the algorithm is O(n^3), assuming implemented in Dynamic Programming.
Edit: Somehow I looked at the "Finding shortest path" phrase in the question and ignored the "length of" phrase where the original question later clarified intent. So both my answers below store lots of extra data in order to easily backtrack the correct path once you have computed its length. If you don't need to backtrack after computing the length, my crude version can change its first dimension from N to 2 and just store one odd J and one even J, overwriting anything older. My faster version can drop all the complexity of managing J,R interactions and also just store its outer level as [0..1][0..H] None of that changes the time much, but it changes the storage a lot.
To understand my answer, first understand a crude N^3 answer: (I can't figure out whether my actual answer has better worst case than crude N^3 but it has much better average case).
Note that N must be odd, represent that as N=2H+1. (P also must be odd. Just decrement P if given an even P. But reject the input if N is even.)
Store costs using 3 real coordinates and one implied coordinate:
J = column 0 to N
R = count of red edges 0 to H
B = count of black edges 0 to P
S = side odd or even (S is just B%1)
We will compute/store cost[J][R][B] as the lowest cost way to reach column J using exactly R red edges and exactly B black edges. (We also used J-R blue edges, but that fact is redundant).
For convenience write to cost directly but read it through an accessor c(j,r,b) that returns BIG when r<0 || b<0 and returns cost[j][r][b] otherwise.
Then the innermost step is just:
If (S)
cost[J+1][R][B] = red[J]+min( c(J,R-1,B), c(J,R-1,B-1)+black[J] );
cost[J+1][R][B] = blue[J]+min( c(J,R,B), c(J,R,B-1)+black[J] );
Initialize cost[0][0][0] to zero and for the super crude version initialize all other cost[0][R][B] to BIG.
You could super crudely just loop through in increasing J sequence and whatever R,B sequence you like computing all of those.
At the end, we can find the answer as:
min( min(cost[N][H][all odd]), black[N]+min(cost[N][H][all even]) )
But half the R values aren't really part of the problem. In the first half any R>J are impossible and in the second half any R<J+H-N are useless. You can easily avoid computing those. With a slightly smarter accessor function, you could avoid using the positions you never computed in the boundary cases of ones you do need to compute.
If any new cost[J][R][B] is not smaller than a cost of the same J, R, and S but lower B, that new cost is useless data. If the last dim of the structure were map instead of array, we could easily compute in a sequence that drops that useless data from both the storage space and the time. But that reduced time is then multiplied by log of the average size (up to P) of those maps. So probably a win on average case, but likely a loss on worst case.
Give a little thought to the data type needed for cost and the value needed for BIG. If some precise value in that data type is both as big as the longest path and as small as half the max value that can be stored in that data type, then that is a trivial choice for BIG. Otherwise you need a more careful choice to avoid any rounding or truncation.
If you followed all that, you probably will understand one of the better ways that I thought was too hard to explain: This will double the element size but cut the element count to less than half. It will get all the benefits of the std::map tweak to the basic design without the log(P) cost. It will cut the average time way down without hurting the time of pathological cases.
Define a struct CB that contains cost and black count. The main storage is a vector<vector<CB>>. The outer vector has one position for every valid J,R combination. Those are in a regular pattern so we could easily compute the position in the vector of a given J,R or the J,R of a given position. But it is faster to keep those incrementally so J and R are implied rather than directly used. The vector should be reserved to its final size, which is approx N^2/4. It may be best if you pre compute the index for H,0
Each inner vector has C,B pairs in strictly increasing B sequence and within each S, strictly decreasing C sequence . Inner vectors are generated one at a time (in a temp vector) then copied to their final location and only read (not modified) after that. Within generation of each inner vector, candidate C,B pairs will be generated in increasing B sequence. So keep the position of bestOdd and bestEven while building the temp vector. Then each candidate is pushed into the vector only if it has a lower C than best (or best doesn't exist yet). We can also treat all B<P+J-N as if B==S so lower C in that range replaces rather than pushing.
The implied (never stored) J,R pairs of the outer vector start with (0,0) (1,0) (1,1) (2,0) and end with (N-1,H-1) (N-1,H) (N,H). It is fastest to work with those indexes incrementally, so while we are computing the vector for implied position J,R, we would have V as the actual position of J,R and U as the actual position of J-1,R and minU as the first position of J-1,? and minV as the first position of J,? and minW as the first position of J+1,?
In the outer loop, we trivially copy minV to minU and minW to both minV and V, and pretty easily compute the new minW and decide whether U starts at minU or minU+1.
The loop inside that advances V up to (but not including) minW, while advancing U each time V is advanced, and in typical positions using the vector at position U-1 and the vector at position U together to compute the vector for position V. But you must cover the special case of U==minU in which you don't use the vector at U-1 and the special case of U==minV in which you use only the vector at U-1.
When combining two vectors, you walk through them in sync by B value, using one, or the other to generate a candidate (see above) based on which B values you encounter.
Concept: Assuming you understand how a value with implied J,R and explicit C,B is stored: Its meaning is that there exists a path to column J at cost C using exactly R red branches and exactly B black branches and there does not exist exists a path to column J using exactly R red branches and the same S in which one of C' or B' is better and the other not worse.
Your exponential algorithm is essentially a depth-first search tree, where you keep track of the cost as you descend.
You could make it branch-and-bound by keeping track of the best solution seen so far, and pruning any branches that would go beyond the best so far.
Or, you could make it a breadth-first search, ordered by cost, so as soon as you find any solution, it is among the best.
The way I've done this in the past is depth-first, but with a budget.
I prune any branches that would go beyond the budget.
Then I run if with budget 0.
If it doesn't find any solutions, I run it with budget 1.
I keep incrementing the budget until I get a solution.
This might seem like a lot of repetition, but since each run visits many more nodes than the previous one, the previous runs are not significant.
This is exponential in the cost of the solution, not in the size of the network.

Given an even number of vertices, how to find an optimum set of pairs based on proximity?

The problem:
We have a set of n vertices in 3D euclidean space, and there is an even number of these vertices.
We want to pair them up based on their proximity. In other words, we'd like to be able to find a set of vertex pairs, where the vertices in each pair are as close as possible together.
We want to minimise sacrificing the proximity between the vertices of any other pairs as much as possible in doing this.
I am not looking for the most optimal solution (if it even strictly exists/can be done), just a reasonable one that can be computed relatively quickly.
A relatively awful brute force approach involves choosing a vertex and looping through the rest to find its nearest neighbor and then repeating until there are none left. Of course as we near the end of the list the closest vertex could be very far away, but it is the only choice, therefore this can fail badly on the third point above.
A common approach for this kind of problems (especially if n is large) is to precompute a spatial index structure, such as a kd tree or an octtree and perform the search for nearest neighbors with the help of it. Through the nodes of the octtree, the available point are put into bins, so you can be sure they are mutually close. Also you minimize the number of comparisons.
A sketch of the implementation with an octtree: you need a Node class that stores its bounding box. A derived LeafNode class stores small number of points up to a maximum (e.g. k = 20), that are added with an insert function. A derived NonLeafNode class stores references to 8 subnodes (which may be both Leaf and NonLeafNodes).
The tree is represented by a root node, all insertions and queries start here. The tree is built up by starting with the first k points being inserted into a LeafNode. If the k+1st point is inserted, the bounding box is split into 8 sub boxes and the contained points are sorted into them. The current LeafNode is replaced by one NonLeafNode with 8 subnodes.
This is iterated until all points are in the tree.
For nearest neighbor searches, the tree is traversed starting from the root node by comparing with the bounding box. If the query point is within a node's bounding box, the traversal goes into that node. Note that if you found the nearest candidate, you also need to check with neighboring nodes in the octtree.
For a kdtree implementation check the wikipedia page, looks quite straigthforward.
Since you are not looking for an optimal solution, here's a heuristic you may consider.
For each point p compute two points: the nearest neighbour and the farthest neighbour that are closest and farthest to p respectively. Now let q be the point with the largest farthest neighbour (q is an extreme point in the input). Match q with its nearest neighbour, delete both of them and recursively compute the matching for the remaining points.
This is certainly NOT optimal, but it does seem to do reasonably well on small input sets. If you need an optimal solution you should read about the euclidean matching problem.

Using STL map for searching for points lying in a rectangular area?

I have with me a lot of x,y points and each x,y point has some extra data associated with it. This extra data I'll be storing in a struct.
My application requires that given any one point, I'll have to find how many other points lie within a rectangular area surrounding this point (this point is at the centre of the rectangle).
One logic I've thought of is to store all x points as the keys in a map A and all y points as the keys in another map B.
Map A will have x as the key and y values as the value.
Map B will have y as the key and the associated struct as the value.
This way, if the given point is (10.5,20.6), I can use upper_bound(10.5+RECTANGLE_WIDTH) and lower_bound(10.5-RECTANGLE_WIDTH) to find the range of x values lying within the rectangle and for the corresponding y values, find whether the y values lie within the +- range of 20.6.
My whole point of using map was because I have a massive store of x,y points and the searching has to be done every two seconds. So I had to use the log(n) search of map.
I feel that this can be done in a more efficient way. Suggestions?
This is a typical application for a quadtree. The quadtree facilitates lookup of the m points lying in your rectangle in O(log(n) + m), where n is the total number of points.
Edit: Your approach using the map is not nearly as efficient. For randomly distributed points, it would have an O(sqrt(n)) average complexity, and O(n) worst-case.
How about you store the points as a simple 2 dimensional array of pointers to those structs, and when you need to find a point x,y it's a simple index operation. The same goes for any other points (x+a,y+b).
If you use a std::map of points the lookup will always be O(log N) where N is the number of points you have.
Your other option would be to divide your search space into buckets and put your point into buckets. You then calculate in your rectangle:
any buckets for which all the points are inside your rectangle
any for which there is some overlap.
For those that there is some overlap you can then look up in your collection which is O(M) if you use the right collection type per bucket, but M should be smaller than N. It may even be that M rarely exceeds a handful in which case you can probably check them linearly.
Working out which buckets overlap is a constant time operation but you have to run through these linearly (even to check if they are empty) so having too many of them may also be an issue.
The first observation would be that std::map wouldn't be the most efficient structure in any case. Your input is pretty much fixed, apparently (from the comments). In that case, std::binary_search on a sorted std::vector is more efficient. The main benefit of std::map over a sorted std::vector is that insertion is O(log N) instead of O(N), and you don't need that.
The next observation would be that you can probabaly afford to be a bit inaccurate in the first phase. Your output set will probably be a lot smaller than the total number of points (else a linear search would be in order). But assuming that this is the case, you might benefit from rounding up your rectangle. This will result in more candidate points, which you then check against the precise boundary.
For instance, if your points lay randomly distributed in the XY plane between (0,0) and (200,300), it would be possible to create a 20x30 matrix with each holding an subarea of size (10,10). If you now need points in the rectangle from (64,23) to (78, 45), you need to check subareas [6,2], [6,3], [6,4], [7,2], [7,3] and [7,4] - only 6 of the 600. In the second step, you'd throw out results such as (61, 25).