I have an undirected graph network made of streets and crossings, and I would like to know if there is any algorithm to help me finding closed loops, ie places where I can put buildings.
Any help appreciated, thanks !
Based on comments to my earlier answer:
It seems the graphs are all undirected and planar, i.e. can be embedded in a 2D plane without crossing edges, and one such embedding is given. This embedding will partition the plane. E.g. a figure 8 partitions the plane in three: two "inner" areas and an infinite outer area. An alternative view is that all edges of a node are cyclically ordered. (This is the essential part that allows us to apply graph theory)
A partition is necessarily enclosed by a cycle, but not all cycles may partition a single area. In the trivial case of a figure 8, though, all three areas are directly associated with a distinct cycle.
The input graph can generally be simplified. Some nodes may have only a single edge; they can't contribute to the partitioning and can be removed along with the edge. Other nodes have two edges connecting distinct nodes. Here, the node and the two edges can be replaced by a direct edge connecting the neighbors. I.e. a figure 8 graph can be simplified to two nodes and three edges between them. (This is not a necessary step but helps computation).
Now, each vertex will have two areas to either side (since they're undirected, "left and right" aren't obvious distinctions). So, for |V| vertices, we need to consider 2 * |V| sides. They're in general not distinct. Two adjacent edges (connected to the same node) may border the same area, if they're also adjacent in the cyclic order of edges of that node. Obviously, for nodes with only two edges, the two edges share both areas (which is why we'd eliminated them in the previous step). For nodes with three edges, any two edges share at least one area.
So, here's how to enumerate those areas: Assign a sequential number to all edges and vertices. Assign a direction to each edge so it runs from the lower-numbered edge to the higher. Start with vertex 1, right side, and number this area 1. Trace the boundary edges of this area, assigning the same number 1 to the appropriate sides of its boundary edges. You do this by taking at each node the next adjacent edge in counter-cyclical order. When you get back to your starting point, you know all edges bounding area 1.
You then check the left edge of the first vertex. If it's not part of area 1, then it's area 2, and you apply the same algorithm. Next, check vertex 2, right side and left side, etc. Each time you find an edge and a side that's unnumbered yet, assign the next area number and trace the edges of the newly founded area.
There's a slight problem with determining which area number corresponds to infinity. To see this, take a simple () graph: two edges, two nodes, and two areas (inside and outside). Due to the random numbering of edges and vertices, outside may end up as either 1 or 2. That's unavoidable; in graph theory there's no distinction between inside and outside.
It's a standard function in the Boost Graph library. See this previous answer for details.
Related
For a project I am working on I have some txt files that have from id's, to id's, and weight. The id's are used to identify each vertex and the weight represents the distance between the vertices. The graph is undirected and completely connected and I am using c++ and openFrameworks. How can I translate this data into (x,y) coordinate points for a graph this 1920x1080, while maintaining the weight specified in the text files?
You can only do this if the dimension of the graph is 2 or less.
You therefore need to determine the dimension of the graph--this is a measure of its connectivity.
If the dimension is 2 or less, then you will always be able to plot the graph on a Euclidian plane while preserving relative edge lengths, as long as you allow the edges to intersect. If you prohibit intersecting edges, then you can only plot the graph on a Euclidian plane if the ratio of cycle size to density of cycles in the graph is sufficiently low throughout the graph (quite hard to compute). I can tell you how to plot the trickiest bit--cycles--and give you a general approach, but you actually have some freedom in how you plot this, so please, drop a comment or edit the question if you have a more specific request.
If you don't know whether you have cycles, then find out! Here are some efficient algorithms.
Plotting cycles. Give the first node in the cycle arbitrary coordinates.
Give the second node in the cycle coordinates bounded by the distance from the first node.
For example, if you plot the first node at (0,0) and the edge between the first and second nodes has length 1, then plot the second node at (1, 0).
Now it gets tricky because you need to calculate angles.
Count up the number n of nodes in the cycle.
In order for the cycle to form a polygon, the sum of the angles formed by the cycle must be (n - 2) * 180 degrees, where n is the number of sides (i.e., the number of nodes).
Now you won't have a regular polygon unless all the edge lengths are the same, so you can't just use (n - 2) / n * 180 degrees as your angle. If you make the angles too sharp, then the edges will be forced to intersect; and if you make them too large, then you will not be able to close the polygon. Compute the internal angles as explained on StackExchange Math.
Repeat this process to plot every cycle in the graph, in arbitrary positions if needed. You can always translate the cycles later if needed.
Plotting everything else. My naive, undoubtedly inefficient approach is to traverse each node in each cycle and plot the adjacent nodes (the 'branches') layer by layer. Then rotate and translate entire cycles (including connected branches) to ensure every edge can reach both of its nodes. Finally, if possible, rotate branches (relative to their connected cycles) as needed to resolve intersecting edges.
Again, please modify the question or drop a comment if you are looking for something more specific. A full algorithm (or full code, in any common language) would make a very long answer.
I've applied three different methods of getting sets of points as follows.
Every method produces a vector of Points. Each method is in a different color, red, blue, and green.
Here is the combined image, overlaying all 3 of the sets of points
As you can see in the combined image there are spots in which all three sets "agree" on (i.e are generally in the exact same spot). I would like to find these particular spots and combine them into a single coordinate. I'm not sure where to start with approaching this problem. I've looked into K-means clustering, but to me it seems the problem is that K-means will cluster all the points and take the average with surrounding points, shifting the cluster center from the original position. I could loop through all the points in all the vectors that store the points, but as these images get larger with more points, it becomes very costly and inefficient.
Does anybody have any tips on how to approach this problem? I've been using OpenCV with C++.
Notionally, what you want to do is consider the complete tripartite graph on the three sets of points with edges weighted by distance. Then select edges in order of weight until a triangle appears; call those points a corresponding set, choose (say) their centroid to represent them, and remove them from the graph. Stop when the edge length exceeds some tolerance.
The mathematical justification for this approach is that it is independent of point ordering (except in the unlikely case of problematic ties in distances between points).
The practical implementation of this algorithm (for a significant number of points) involves a search data structure that can quickly find nearby points (not just the nearest): bins of the threshold size, a quad trie, or a k-d tree would work. Probably you would create one for each point set and use the other sets’ points as query points.
Here's an illustration of the steps taken thus far:
Pseudo-random rectangle generation
"Central node" insertion, rect separation, and final node selections
Delaunay triangulation (shown with previously selected nodes)
Rendering of triangle edges
At this point (Step 5), I would like to use this data to form a Minimum Spanning Tree, but there's a slight catch...
Somewhere in the graph (likely near the center, but not always) will be a node that requires between 3-5 connections to it from other unique nodes. This complicates things, since every other node should only contain a single connection, and the data structures being used make it difficult to determine "what's connected to what" in a solid, traversable format.
So, given an array of triangles in the above format, and a random vertex to use as the "root node", how could I properly traverse the network to create an MST where there are at least 3 connections to our "central node", but no more than 5 connections to it? Is this possible?
Since it's rare to have a vertex in a Delaunay triangulation have much more than 6 edges, you can use brute force: there are only 20+15+6 ways to select 3, 4, or 5 edges out of 6 (respectively), so try all of them. For each of the 41 (up to 336 for degree 9) small trees (the root and a few edges) thus created, run either Kruskal's algorithm or Prim's algorithm starting with that tree already "found" to be part of the MST. (Ignore the root's other edges so as not to increase its degree further.) Then just pick the best one (including the cost of the seed tree).
As for the general neighbor information problem, it seems you just need to build a standard graph representation first. For example, you can make an adjacency list by scanning over all your Edge objects and storing each endpoint in a list associated with the other (in a map<Vector2<T>,vector<Vector2<T>>> or an equivalent based on whatever identifiers for your vertices).
I've taken a workaround approach...
After step 3 of my algorithm, I simply remove all edges which connect to the "central node", keeping track of which edges form the "ring" (aka "edge loop") around it, and run the MST on all remaining edges.
For the MST, I went with the boost graph library.
That made it easy to loop through the triangles I had, adding each of its three edges to an adjacency_list. Then a simple call to whichever boost-provided MST algorithm took care of the rest.
Finally, I readd the edges that were previously taken out. The shortest path is whatever it was in the previous step, plus the length of whichever readded edge that connects to another edge on the "ring" is shortest.
I can then add (or remove) an arbitrary number of previous edges to ensure there are between 3 to 5 edges connecting from the edge loop to the "central node".
Doing things in this order also allows me to know as soon as step 3 if we'll even have a valid number of edges, so we don't waste cycles running a MST.
I was reading through the BOOST library and noticed that they dint have an algorithm to find bridges in a graph, they do have one to find articulation points. Is there anyway this could be done efficiently?
I have an idea:
1. Use the BOOST to find articulation points
2. Using out_edges,find all edges attached the each articulation point
3. remove them and calculate the number on connected components,( I assume my graph is originally fully connected), if its more than 1,i add this edge to the bridges.
QUESTION: Is it necessary for bridges to be attached to articulation points? I just assumed they are,couldn't find anything no the net.
I would also like an idea how to approach this.
My other approach would be more naive and take O(v*(V+E)), which is very bad.
Sounds a bit slow with the whole graph rewriting. You'd have to check if Boost counts a single-connected vertex as an articulation point. (If not, this slightly complicates things).
Now it is fairly obvious that a bridge must be an edge between two articulation points, but not all edges between articulation points are necessarily bridges. Consider a chain of 4 articulation points connected by 3 edges: A-b-c-D. Now add a node e connected to both b and c. The outer two bridges remain bridges, and so all 4 original nodes remain articulation points, but the middle node is no longer a bridge.
This means we have a necessary but insufficient extra condition: both nodes of the edge must be articulation points. Here's where the slight complication steps in; if Boost doesn't count single-connected nodes as articulation points, you have to treat them specially. But that's still simple; that one edge is necessarily a bridge.
This extra condition is quite efficient in dense graphs, as most nodes will not be articulation points. As a result, you cull most candidate edges before trying to alter the graph. Secondly, you don't need to test the component count of the resulting altered graph. You just need to check if the two articulation points are still connected after you cut the edge linking them directly.
There is a much easier way when you realize if a biconnected component only contains one edge this edge is a bridge. It is very easy to implement using boost (http://www.boost.org/doc/libs/1_58_0/libs/graph/example/biconnected_components.cpp):
Calculate the biconnected components
Create a edge counter for each component. Iteratate over all edge and increase the coresponding edge counter of the respective component
Iterate again over all edges and check if the edge counter of the corresponding component is 1, if so this edge is a bridge
I'm asking this question out of curiosity, having first tried to implement such an algorithm before using GLU's for performance reasons.
I've looked into common algorithms (Delaunay, Ear Clipping are often mentioned), but I can't seem to understand how GLU does its job so well all the time.
Do any of you have interesting papers or articles on that subjects?
There are some notes alongside the source:
This is only a very brief overview. There is quite a bit of
additional documentation in the source code itself.
Goals of robust tesselation
The tesselation algorithm is fundamentally a 2D algorithm. We
initially project all data into a plane; our goal is to robustly
tesselate the projected data. The same topological tesselation is
then applied to the input data.
Topologically, the output should always be a tesselation. If the
input is even slightly non-planar, then some triangles will
necessarily be back-facing when viewed from some angles, but the goal
is to minimize this effect.
The algorithm needs some capability of cleaning up the input data as
well as the numerical errors in its own calculations. One way to do
this is to specify a tolerance as defined above, and clean up the
input and output during the line sweep process. At the very least,
the algorithm must handle coincident vertices, vertices incident to an
edge, and coincident edges.
Phases of the algorithm
Find the polygon normal N.
Project the vertex data onto a plane. It does not need to be perpendicular to the normal, eg. we can project onto the plane
perpendicular to the coordinate axis whose dot product with N is
largest.
Using a line-sweep algorithm, partition the plane into x-monotone regions. Any vertical line intersects an x-monotone region in at
most one interval.
Triangulate the x-monotone regions.
Group the triangles into strips and fans.
Finding the normal vector
A common way to find a polygon normal is to compute the signed area
when the polygon is projected along the three coordinate axes. We
can't do this, since contours can have zero area without being
degenerate (eg. a bowtie).
We fit a plane to the vertex data, ignoring how they are connected
into contours. Ideally this would be a least-squares fit; however for
our purpose the accuracy of the normal is not important. Instead we
find three vertices which are widely separated, and compute the normal
to the triangle they form. The vertices are chosen so that the
triangle has an area at least 1/sqrt(3) times the largest area of any
triangle formed using the input vertices.
The contours do affect the orientation of the normal; after computing
the normal, we check that the sum of the signed contour areas is
non-negative, and reverse the normal if necessary.
Projecting the vertices
We project the vertices onto a plane perpendicular to one of the three
coordinate axes. This helps numerical accuracy by removing a
transformation step between the original input data and the data
processed by the algorithm. The projection also compresses the input
data; the 2D distance between vertices after projection may be smaller
than the original 2D distance. However by choosing the coordinate
axis whose dot product with the normal is greatest, the compression
factor is at most 1/sqrt(3).
Even though the accuracy of the normal is not that important (since
we are projecting perpendicular to a coordinate axis anyway), the
robustness of the computation is important. For example, if there are many vertices which lie almost along a line, and one vertex V
which is well-separated from the line, then our normal computation
should involve V otherwise the results will be garbage.
The advantage of projecting perpendicular to the polygon normal is
that computed intersection points will be as close as possible to
their ideal locations. To get this behavior, define TRUE_PROJECT.
The Line Sweep
There are three data structures: the mesh, the event queue, and the
edge dictionary.
The mesh is a "quad-edge" data structure which records the topology of
the current decomposition; for details see the include file "mesh.h".
The event queue simply holds all vertices (both original and computed
ones), organized so that we can quickly extract the vertex with the
minimum x-coord (and among those, the one with the minimum y-coord).
The edge dictionary describes the current intersection of the sweep
line with the regions of the polygon. This is just an ordering of the
edges which intersect the sweep line, sorted by their current order of
intersection. For each pair of edges, we store some information about
the monotone region between them -- these are call "active regions"
(since they are crossed by the current sweep line).
The basic algorithm is to sweep from left to right, processing each
vertex. The processed portion of the mesh (left of the sweep line) is
a planar decomposition. As we cross each vertex, we update the mesh
and the edge dictionary, then we check any newly adjacent pairs of
edges to see if they intersect.
A vertex can have any number of edges. Vertices with many edges can
be created as vertices are merged and intersection points are
computed. For unprocessed vertices (right of the sweep line), these
edges are in no particular order around the vertex; for processed
vertices, the topological ordering should match the geometric
ordering.
The vertex processing happens in two phases: first we process are the
left-going edges (all these edges are currently in the edge
dictionary). This involves:
deleting the left-going edges from the dictionary;
relinking the mesh if necessary, so that the order of these edges around the event vertex matches the order in the dictionary;
marking any terminated regions (regions which lie between two left-going edges) as either "inside" or "outside" according to
their winding number.
When there are no left-going edges, and the event vertex is in an
"interior" region, we need to add an edge (to split the region into
monotone pieces). To do this we simply join the event vertex to the
rightmost left endpoint of the upper or lower edge of the containing
region.
Then we process the right-going edges. This involves:
inserting the edges in the edge dictionary;
computing the winding number of any newly created active regions. We can compute this incrementally using the winding of each edge
that we cross as we walk through the dictionary.
relinking the mesh if necessary, so that the order of these edges around the event vertex matches the order in the dictionary;
checking any newly adjacent edges for intersection and/or merging.
If there are no right-going edges, again we need to add one to split
the containing region into monotone pieces. In our case it is most
convenient to add an edge to the leftmost right endpoint of either
containing edge; however we may need to change this later (see the
code for details).
Invariants
These are the most important invariants maintained during the sweep.
We define a function VertLeq(v1,v2) which defines the order in which
vertices cross the sweep line, and a function EdgeLeq(e1,e2; loc)
which says whether e1 is below e2 at the sweep event location "loc".
This function is defined only at sweep event locations which lie
between the rightmost left endpoint of {e1,e2}, and the leftmost right
endpoint of {e1,e2}.
Invariants for the Edge Dictionary.
Each pair of adjacent edges e2=Succ(e1) satisfies EdgeLeq(e1,e2) at any valid location of the sweep event.
If EdgeLeq(e2,e1) as well (at any valid sweep event), then e1 and e2 share a common endpoint.
For each e in the dictionary, e->Dst has been processed but not e->Org.
Each edge e satisfies VertLeq(e->Dst,event) && VertLeq(event,e->Org) where "event" is the current sweep line
event.
No edge e has zero length.
No two edges have identical left and right endpoints. Invariants for the Mesh (the processed portion).
The portion of the mesh left of the sweep line is a planar graph, ie. there is some way to embed it in the plane.
No processed edge has zero length.
No two processed vertices have identical coordinates.
Each "inside" region is monotone, ie. can be broken into two chains of monotonically increasing vertices according to VertLeq(v1,v2)
a non-invariant: these chains may intersect (slightly) due to
numerical errors, but this does not affect the algorithm's operation.
Invariants for the Sweep.
If a vertex has any left-going edges, then these must be in the edge dictionary at the time the vertex is processed.
If an edge is marked "fixUpperEdge" (it is a temporary edge introduced by ConnectRightVertex), then it is the only right-going
edge from its associated vertex. (This says that these edges exist
only when it is necessary.)
Robustness
The key to the robustness of the algorithm is maintaining the
invariants above, especially the correct ordering of the edge
dictionary. We achieve this by:
Writing the numerical computations for maximum precision rather
than maximum speed.
Making no assumptions at all about the results of the edge
intersection calculations -- for sufficiently degenerate inputs,
the computed location is not much better than a random number.
When numerical errors violate the invariants, restore them
by making topological changes when necessary (ie. relinking
the mesh structure).
Triangulation and Grouping
We finish the line sweep before doing any triangulation. This is
because even after a monotone region is complete, there can be further
changes to its vertex data because of further vertex merging.
After triangulating all monotone regions, we want to group the
triangles into fans and strips. We do this using a greedy approach.
The triangulation itself is not optimized to reduce the number of
primitives; we just try to get a reasonable decomposition of the
computed triangulation.