Here's an illustration of the steps taken thus far:
Pseudo-random rectangle generation
"Central node" insertion, rect separation, and final node selections
Delaunay triangulation (shown with previously selected nodes)
Rendering of triangle edges
At this point (Step 5), I would like to use this data to form a Minimum Spanning Tree, but there's a slight catch...
Somewhere in the graph (likely near the center, but not always) will be a node that requires between 3-5 connections to it from other unique nodes. This complicates things, since every other node should only contain a single connection, and the data structures being used make it difficult to determine "what's connected to what" in a solid, traversable format.
So, given an array of triangles in the above format, and a random vertex to use as the "root node", how could I properly traverse the network to create an MST where there are at least 3 connections to our "central node", but no more than 5 connections to it? Is this possible?
Since it's rare to have a vertex in a Delaunay triangulation have much more than 6 edges, you can use brute force: there are only 20+15+6 ways to select 3, 4, or 5 edges out of 6 (respectively), so try all of them. For each of the 41 (up to 336 for degree 9) small trees (the root and a few edges) thus created, run either Kruskal's algorithm or Prim's algorithm starting with that tree already "found" to be part of the MST. (Ignore the root's other edges so as not to increase its degree further.) Then just pick the best one (including the cost of the seed tree).
As for the general neighbor information problem, it seems you just need to build a standard graph representation first. For example, you can make an adjacency list by scanning over all your Edge objects and storing each endpoint in a list associated with the other (in a map<Vector2<T>,vector<Vector2<T>>> or an equivalent based on whatever identifiers for your vertices).
I've taken a workaround approach...
After step 3 of my algorithm, I simply remove all edges which connect to the "central node", keeping track of which edges form the "ring" (aka "edge loop") around it, and run the MST on all remaining edges.
For the MST, I went with the boost graph library.
That made it easy to loop through the triangles I had, adding each of its three edges to an adjacency_list. Then a simple call to whichever boost-provided MST algorithm took care of the rest.
Finally, I readd the edges that were previously taken out. The shortest path is whatever it was in the previous step, plus the length of whichever readded edge that connects to another edge on the "ring" is shortest.
I can then add (or remove) an arbitrary number of previous edges to ensure there are between 3 to 5 edges connecting from the edge loop to the "central node".
Doing things in this order also allows me to know as soon as step 3 if we'll even have a valid number of edges, so we don't waste cycles running a MST.
Related
I am trying to perform testing on my graph class's dijkstras algorithm. To do this, I generate a graph with a couple thousand vertices, and then make the graph connected by randomly adding thousands of edges until the graph is connected. I can then run the search between any two random vertices over and over and be sure that there is a path between them. The problem is, I often end-up with a nearly-dense graph, which because I am using an adjacency list representation, causes my search algorithm to be terribly slow.
Question :
Given a set of vertices V, how do you generate a strongly-connected, directed graph, that has significantly less edges than a dense-graph over the same vertices would have?
I was thinking about simply doing the following :
vertex 1 <--> vertex 2, vertex 2 <--> vertex 3, ..., vertex n-1 <--> vertex n
And then randomly adding like n/10 edges throughout the graph, but this doesn't seem like an optimal way of coming up with random graph structures to test my search algorithms on.
One approach would be to maintain a set of strongly connected components (starting with |V| single-vertex components), and in each iteration, merge some random subset of them into a single connected component by connecting a random vertex of each one to a random vertex of the next one, forming a cycle.
This will tend to generate very sparse graphs, so depending on your use case, you might want to toss in some extra random edges as well.
EDIT: Intuitively I think you'd want to use an exponential distribution when deciding how many components to merge in a single iteration. I don't have any real support for that, though.
I don't know if there is a better way of doing it, but at least this seems to work:
I would add E (directed) edges between random vertices. This will generate several clusters of vertices.
Then, I need to connect those clusters to form a chain of clusters, so ensuring that from a cluster I can reach any other cluster. For this I can label a random vertex of each cluster as the "master"vertex, and join the master vertices forming a loop. Thus, you have a strongly-connected directed graph composed of clusters (not vertices yet). The last master should be connected back to the first master, thus creating a loop.
Now, in order to turn it into a strongly-connected digraph composed of vertices I need to make each cluster a strongly-connected digraph by itself. But this is easy if I run a DFS starting at the master node of a cluster and each time I find a leaf I add an edge from that leaf to its master vertex. Note that the DFS must not traverse outside of the cluster.
I think this may work, though the topology will not be truly random, it will loop like a big loop composed of smaller graphs joined together. But depending on the algorithm you need to test, this may come in handy.
EDIT:
If after that you want to have a more random topology, you can add random edges between vertices of different clusters. That doesn't invalidate the rules and creates more complex paths for your algorithm to traverse.
I was reading through the BOOST library and noticed that they dint have an algorithm to find bridges in a graph, they do have one to find articulation points. Is there anyway this could be done efficiently?
I have an idea:
1. Use the BOOST to find articulation points
2. Using out_edges,find all edges attached the each articulation point
3. remove them and calculate the number on connected components,( I assume my graph is originally fully connected), if its more than 1,i add this edge to the bridges.
QUESTION: Is it necessary for bridges to be attached to articulation points? I just assumed they are,couldn't find anything no the net.
I would also like an idea how to approach this.
My other approach would be more naive and take O(v*(V+E)), which is very bad.
Sounds a bit slow with the whole graph rewriting. You'd have to check if Boost counts a single-connected vertex as an articulation point. (If not, this slightly complicates things).
Now it is fairly obvious that a bridge must be an edge between two articulation points, but not all edges between articulation points are necessarily bridges. Consider a chain of 4 articulation points connected by 3 edges: A-b-c-D. Now add a node e connected to both b and c. The outer two bridges remain bridges, and so all 4 original nodes remain articulation points, but the middle node is no longer a bridge.
This means we have a necessary but insufficient extra condition: both nodes of the edge must be articulation points. Here's where the slight complication steps in; if Boost doesn't count single-connected nodes as articulation points, you have to treat them specially. But that's still simple; that one edge is necessarily a bridge.
This extra condition is quite efficient in dense graphs, as most nodes will not be articulation points. As a result, you cull most candidate edges before trying to alter the graph. Secondly, you don't need to test the component count of the resulting altered graph. You just need to check if the two articulation points are still connected after you cut the edge linking them directly.
There is a much easier way when you realize if a biconnected component only contains one edge this edge is a bridge. It is very easy to implement using boost (http://www.boost.org/doc/libs/1_58_0/libs/graph/example/biconnected_components.cpp):
Calculate the biconnected components
Create a edge counter for each component. Iteratate over all edge and increase the coresponding edge counter of the respective component
Iterate again over all edges and check if the edge counter of the corresponding component is 1, if so this edge is a bridge
I have a rectangular grid shaped DAG where the horizontal edges always point right and the vertical edges always point down. The edges have positive costs associated with them. Because of the rectangular format, the nodes are being referred to using zero-based row/column. Here's an example graph:
Now, I want to perform a search. The starting vertex will always be in the left column (column with index 0) and in the upper half of the graph. This means I'll pick the start to be either (0,0), (1,0), (2,0), (3,0) or (4,0). The goal vertex is always in the right column (column with index 6) and "corresponds" to the start vertex:
start vertex (0,0) corresponds to goal vertex (5,6)
start vertex (1,0) corresponds to goal vertex (6,6)
start vertex (2,0) corresponds to goal vertex (7,6)
start vertex (3,0) corresponds to goal vertex (8,6)
start vertex (4,0) corresponds to goal vertex (9,6)
I only mention this to demonstrate that the goal vertex will always be reachable. It's possibly not very important to my actual question.
What I want to know is what search algorithm should I use to find the path from start to goal? I am using C++ and have access to the Boost Graph Library.
For those interested, I'm trying to implement Fuchs' suggestions from his Optimal Surface Reconstruction from Planar Contours paper.
I looked at A* but to be honest didn't understand it and wasn't how the heuristic works or even whether I could come up with one!
Because of the rectangular shape and regular edge directions, I figured there might be a well-suited algorithm. I considered Dijkstra
but the paper I mention said there were quicker algorithms (but annoyingly for me doesn't provide an implementation), plus that's
single-source and I think I want single-pair.
So, this is your problem,
DAG no cycles
weights > 0
Left weight < Right weight
You can use a simple exhaustive search defining every possible route. So you have a O(NxN) algoririthm. And then you will choose the shortest path. It is not a very smart solution, but it is effective.
But I suppose you want to be smarter than that, let's consider that if a particular node can be reached from two nodes, you can find the minimum of the weights at the two nodes plus the cost for arriving to the current node. You can consider this as an extension of the previous exhaustive search.
Remember that a DAG can be drawn in a line. For DAG linearization here's an interesting resource.
Now you have just defined a recursive algorithm.
MinimumPath(A,B) = MinimumPath(MinimumPath(A,C)+MinimumPath(A,D)+,MinimumPath(...)+MinimumPath(...))
Of course the starting point of recursion is
MinimumPath(Source,Source)
which is 0 of course.
As far as I know, there isn't an out of the box algorithm from boost to do this. But this is quite straightforward to implement it.
A good implementation is here.
If, for some reason, you do not have a DAG, Dijkstra's or Bellman-Ford should be used.
if I'm not mistaken, from the explanation this is really an optimal path problem not a search problem since the goal is known. In optimization I don't think you can avoid doing an exhaustive search if you want the optimal path and not an approximate path.
From the paper it seems like you are really subdividing a space many times then running the algorithm. This would reduce your N to closer to a constant in the context of the entire surface making O(N^2) not so bad.
That being said perhaps dynamic programming would be a good strategy where the N would be bounded by the difference between your start and goal node. Here is an example form genomic alignment. Just an illustration to give you an idea of how it works.
Construct a N by N array of cost values all set to 0 or some default.
#
For i in size N:
For j in size N:
#cost of getting here from i-1 or j-1
cost[i,j] = min(cost[i-1,j] + i edge cost , cost[i,j-1] + j edge cost)
Once you have your table filled in, start at bottom right corner. Starting from your goal, Choose to go to the node with the lowest cost of getting there. Work your way backwards choosing the lowest value until you reach the start node (or rather the array entry corresponding to the start node). This should give you the optimal path very simply.
Dynamic programming works by solving the optimization on smaller sub-problems. In this case the sub-problems are optimal paths to preceding nodes. I think the rectangular nature of your graph makes this a good fit.
I have a wireless mesh network of nodes, each of which is capable of reporting its 'distance' to its neighbors, measured in (simplified) signal strength to them. The nodes are geographically in 3d space but because of radio interference, the distance between nodes need not be trigonometrically (trigonomically?) consistent. I.e., given nodes A, B and C, the distance between A and B might be 10, between A and C also 10, yet between B and C 100.
What I want to do is visualize the logical network layout in terms of connectness of nodes, i.e. include the logical distance between nodes in the visual.
So far my research has shown the multidimensional scaling (MDS) is designed for exactly this sort of thing. Given that my data can be directly expressed as a 2d distance matrix, it's even a simpler form of the more general MDS.
Now, there seem to be many MDS algorithms, see e.g. http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html and http://tapkee.lisitsyn.me/ . I need to do this in C++ and I'm hoping I can use a ready-made component, i.e. not have to re-implement an algo from a paper. So, I thought this: https://sites.google.com/site/simpmatrix/ would be the ticket. And it works, but:
The layout is not stable, i.e. every time the algorithm is re-run, the position of the nodes changes (see differences between image 1 and 2 below - this is from having been run twice, without any further changes). This is due to the initialization matrix (which contains the initial location of each node, which the algorithm then iteratively corrects) that is passed to this algorithm - I pass an empty one and then the implementation derives a random one. In general, the layout does approach the layout I expected from the given input data. Furthermore, between different runs, the direction of nodes (clockwise or counterclockwise) can change. See image 3 below.
The 'solution' I thought was obvious, was to pass a stable default initialization matrix. But when I put all nodes initially in the same place, they're not moved at all; when I put them on one axis (node 0 at 0,0 ; node 1 at 1,0 ; node 2 at 2,0 etc.), they are moved along that axis only. (see image 4 below). The relative distances between them are OK, though.
So it seems like this algorithm only changes distance between nodes, but doesn't change their location.
Thanks for reading this far - my questions are (I'd be happy to get just one or a few of them answered as each of them might give me a clue as to what direction to continue in):
Where can I find more information on the properties of each of the many MDS algorithms?
Is there an algorithm that derives the complete location of each node in a network, without having to pass an initial position for each node?
Is there a solid way to estimate the location of each point so that the algorithm can then correctly scale the distance between them? I have no geographic location of each of these nodes, that is the whole point of this exercise.
Are there any algorithms to keep the 'angle' at which the network is derived constant between runs?
If all else fails, my next option is going to be to use the algorithm I mentioned above, increase the number of iterations to keep the variability between runs at around a few pixels (I'd have to experiment with how many iterations that would take), then 'rotate' each node around node 0 to, for example, align nodes 0 and 1 on a horizontal line from left to right; that way, I would 'correct' the location of the points after their relative distances have been determined by the MDS algorithm. I would have to correct for the order of connected nodes (clockwise or counterclockwise) around each node as well. This might become hairy quite quickly.
Obviously I'd prefer a stable algorithmic solution - increasing iterations to smooth out the randomness is not very reliable.
Thanks.
EDIT: I was referred to cs.stackexchange.com and some comments have been made there; for algorithmic suggestions, please see https://cs.stackexchange.com/questions/18439/stable-multi-dimensional-scaling-algorithm .
Image 1 - with random initialization matrix:
Image 2 - after running with same input data, rotated when compared to 1:
Image 3 - same as previous 2, but nodes 1-3 are in another direction:
Image 4 - with the initial layout of the nodes on one line, their position on the y axis isn't changed:
Most scaling algorithms effectively set "springs" between nodes, where the resting length of the spring is the desired length of the edge. They then attempt to minimize the energy of the system of springs. When you initialize all the nodes on top of each other though, the amount of energy released when any one node is moved is the same in every direction. So the gradient of energy with respect to each node's position is zero, so the algorithm leaves the node where it is. Similarly if you start them all in a straight line, the gradient is always along that line, so the nodes are only ever moved along it.
(That's a flawed explanation in many respects, but it works for an intuition)
Try initializing the nodes to lie on the unit circle, on a grid or in any other fashion such that they aren't all co-linear. Assuming the library algorithm's update scheme is deterministic, that should give you reproducible visualizations and avoid degeneracy conditions.
If the library is non-deterministic, either find another library which is deterministic, or open up the source code and replace the randomness generator with a PRNG initialized with a fixed seed. I'd recommend the former option though, as other, more advanced libraries should allow you to set edges you want to "ignore" too.
I have read the codes of the "SimpleMatrix" MDS library and found that it use a random permutation matrix to decide the order of points. After fix the permutation order (just use srand(12345) instead of srand(time(0))), the result of the same data is unchanged.
Obviously there's no exact solution in general to this problem; with just 4 nodes ABCD and distances AB=BC=AC=AD=BD=1 CD=10 you cannot clearly draw a suitable 2D diagram (and not even a 3D one).
What those algorithms do is just placing springs between the nodes and then simulate a repulsion/attraction (depending on if the spring is shorter or longer than prescribed distance) probably also adding spatial friction to avoid resonance and explosion.
To keep a "stable" diagram just build a solution and then only update the distances, re-using the current position from previous solution as starting point. Picking two fixed nodes and aligning them seems a good idea to prevent a slow drift but I'd say that spring forces never end up creating a rotational momentum and thus I'd expect that just scaling and centering the solution should be enough anyway.
I have an undirected graph network made of streets and crossings, and I would like to know if there is any algorithm to help me finding closed loops, ie places where I can put buildings.
Any help appreciated, thanks !
Based on comments to my earlier answer:
It seems the graphs are all undirected and planar, i.e. can be embedded in a 2D plane without crossing edges, and one such embedding is given. This embedding will partition the plane. E.g. a figure 8 partitions the plane in three: two "inner" areas and an infinite outer area. An alternative view is that all edges of a node are cyclically ordered. (This is the essential part that allows us to apply graph theory)
A partition is necessarily enclosed by a cycle, but not all cycles may partition a single area. In the trivial case of a figure 8, though, all three areas are directly associated with a distinct cycle.
The input graph can generally be simplified. Some nodes may have only a single edge; they can't contribute to the partitioning and can be removed along with the edge. Other nodes have two edges connecting distinct nodes. Here, the node and the two edges can be replaced by a direct edge connecting the neighbors. I.e. a figure 8 graph can be simplified to two nodes and three edges between them. (This is not a necessary step but helps computation).
Now, each vertex will have two areas to either side (since they're undirected, "left and right" aren't obvious distinctions). So, for |V| vertices, we need to consider 2 * |V| sides. They're in general not distinct. Two adjacent edges (connected to the same node) may border the same area, if they're also adjacent in the cyclic order of edges of that node. Obviously, for nodes with only two edges, the two edges share both areas (which is why we'd eliminated them in the previous step). For nodes with three edges, any two edges share at least one area.
So, here's how to enumerate those areas: Assign a sequential number to all edges and vertices. Assign a direction to each edge so it runs from the lower-numbered edge to the higher. Start with vertex 1, right side, and number this area 1. Trace the boundary edges of this area, assigning the same number 1 to the appropriate sides of its boundary edges. You do this by taking at each node the next adjacent edge in counter-cyclical order. When you get back to your starting point, you know all edges bounding area 1.
You then check the left edge of the first vertex. If it's not part of area 1, then it's area 2, and you apply the same algorithm. Next, check vertex 2, right side and left side, etc. Each time you find an edge and a side that's unnumbered yet, assign the next area number and trace the edges of the newly founded area.
There's a slight problem with determining which area number corresponds to infinity. To see this, take a simple () graph: two edges, two nodes, and two areas (inside and outside). Due to the random numbering of edges and vertices, outside may end up as either 1 or 2. That's unavoidable; in graph theory there's no distinction between inside and outside.
It's a standard function in the Boost Graph library. See this previous answer for details.