First of all, I should say I am not familiar with the Graph theory and also my mathematics knowledge is very poor. Anyhow I am using graph concepts for my analysis.
Basically, I am decomposing an undirected graph (say G) into cycles (closed graph). The specialty of my cycle is that they are the shortest cycles that one can traverse between two vertices (as they are cycle, starting and ending are same though). According to my example graph, my cycles are (1,4,5,1)(1,2,3,4,1)(7,9,8,7) (I neglect the cycles whose length is less than 3).
Edit: I use depth first search to get the cycles and then got the smallest cycles.
Later, I am further braking those cycles into directed paths. In here, I broke the cycles through the edges (through red lines in figure), so that I inserted starting and ending nodes for my new path graphs. So for the cycle (7,9,8,7)=> new directed paths are (a,9,c)(d,8,7,b)
Edit: the further breaking is done only for selected cycles. It is just inserting a new vector and updating the elements. Any graph theory related algorithms doesn't involve here.
Then I do some analysis with my data.
I did all above things. So, my problem is how to describe the entire
things with mathematical notations (without example like I said). This is very hard for me as I do not have even basics.
I was trying and googling but still cannot find a way to describe what I did. I guess, the thing what I did is clear for you.
So, Could you please help me, How to describe
decomposing a undirected graph into cycles (shortest cycles)
Cycle breaking via edges and make directed path graphs (as shown in figure)
with mathematical notation (according to graph theory)
I have seen many authors use different notations and symbols to define graphs and their sub graphs, but for me, I can not define such things as my basic are too poor. So, Please help me to say these things in a formal, mathematical way. Thanks in advance.
I have inserted sample figures to get idea also.
Note: I have add c++ tag as many computer scientists use graph theories and would like to have a response.
The first problem you might encounter in an attempt to put your operations in a mathematical description is your definition of the "shortest cycles" as cycles are typically defined as a sequence of vertices connected by edges in which the first one is also the last one.
Math crashcourse
In math a graph is typcally described by two sets V (like vertices) and E (like edges)
The set E consisting of sets with two elements each of them being a vertex.
Such as
V = { v1, v2, ...., vn }
E = { ..., {vi, vk}, ... }
Every set in E correspends to one edge in your graph.
As such a (connected) path is typcially defined as:
A sequence of vertices v1, ...., vn with the property that for every two consecutive vertexes in the sequence vi and vi+1 the set { vi, vi+1 } is an element of the set E.
(practically speaking: there is an edge from vertex vi to vertex vi+i)
A cycle is typically defined as a path with the property: v1 = vn (thus the first vertex is also the last one)
Whith this definition an your example already the sequence: 1, 4, 1 forms a cycle (in the mathematical sense)
As such every edge in your graph would count as a "shortest" cycle, while the examples given are definately longer!
You told that you
... neglect the cycles whose length is less than 3
this doesn't look to bad as a starting point for your description. Unfortunately I didn't completely understand the next steps you want to perform.
Advice
My advice, or the least the way I would approach the problem is to convert the rather long description to some kind of shorter algorithmic description while refining on exactly how you try to perform the task. This way getting to your final description shouldn't be too hard to accomplish. Especially don't forget to tell what exactly the input to your algorithm is. Even that doesn't seem to be too clear from your description.
are you starting with a already known set of "shortest" cycles?
or are you just given a graph as input and have to determine the "shortest" cycles yourself?
if you detect them yourself how exactly is this done?
Especially don't forget to tell about this part of the story if it applies as it seems to be one of the most crucial ones to your problem.
Related
I'm working on modeling a path search and deduction board game, to practice some concepts I am learning in school. This is a first attempt at analyzing graphs for me, and I would appreciate some advice on what kind of data structure might be appropriate for what I am trying to do.
The game I am modeling presents as a series of ~200 interconnected nodes, as shown below. Given a known starting position for the adversary (node 84, for example, in the figure below) the goal is to identify possible locations of the adversary's hideout. The adversary's moves away from 84 are, naturally unknown.
Fig 1 - Illustrative Sub-Graph with Adversary Initial Position at Node 84
Initially, this leads to a situation like the one below. Given the adversary started at 84, he/she can only be at 66, 86 or 99 after taking their first turn. And so on.
Fig 2 - Possible Locations for Adversary after 1, 2 and 3 Turns (Based on Fig 1 Graph)
So far, I have modeled the board itself as an undirected graph - using an implementation of OCaml's ocamlgraph library. What I am now trying to do is to model the path taken by the adversary through the graph, so as to identify potential locations of the adversary after each turn.
While convenient for illustration purposes, the tree representation implied by the figure above has several drawbacks:
First, keeping track of all possible paths through the network is unnecessary (I care only about terminal location of the adversary's hideout, not the path taken) as well as burdensome: each node is connected to ~7 other nodes, on average. By the time we hit the end of the game's 15 turns, that's a lot of branches!
Second, I suspect pruning would become an issue as well. Indeed, part of the exercise here is to maximally exploit the limited information about the adversary's movements that revealed as the game goes on. This information either states that the adversary "has never been to node X" or "has previously visited node X."
Information of the first type (e.g. "adversary has never been to node 65") would lead me to want to prune the tree "from above" by traveling down through the branches and cutting off any branch that is invalidated by the revealed information.
Fig 3 - Pruning from the Top ("Adversary Has Never Been to Node 65")
Information of the second type (e.g. "Adversary has Visited Node 100") would, however, invite pruning "from below" to cut off any branch that was not consistent with the information.
Fig 4 - Pruning from the Bottom (e.g. "Adversary Has Visited Node 100")
It seems to me that a naive tree approach would be a messy proposition, so I thought I would ask for any suggestions or advice on the best data structure to use here, or how to better approach the problem.
It's really hard to give advice for your case, as any optimization should be preceded by profiling. It sounds like you need a bitset of some sort and/or incidence matrix. For BitSet you can either use Batteries implementation or just implement your own using OCaml arbitrary precision numbers with Zarith library. For incidence matrix, you can opt into trivial _ array array, use the Bigarray module, or, again, use Zarith and implement your own efficient representation using bitwise operations.
And if I were you, I would start with defining the abstraction that you need (i.e., the interface) then start with a drop in implementation, and later optimize based on the real input, by substituting implementations.
Are there any kind of algorithms out there that can assist and accelerate in the construction of a jigsaw puzzle where the edges are already identified and each edge is guaranteed to fit exactly one other edge (or no edges if that piece is a corner or border piece)?
I've got a data set here that is roughly represented by the following structure:
struct tile {
int a, b, c, d;
};
tile[SOME_LARGE_NUMBER] = ...;
Each side (a, b, c, and d) is uniquely indexed within the puzzle so that only one other tile will match an edge (if that edge has a match, since corner and border tiles might not).
Unfortunately there are no guarantees past that. The order of the tiles within the array is random, the only guarantee is that they're indexed from 0 to SOME_LARGE_NUMBER. Likewise, the side UIDs are randomized as well. They all fall within a contiguous range (where the max of that range depends on the number of tiles and the dimensions of the completed puzzle), but that's about it.
I'm trying to assemble the puzzle in the most efficient way possible, so that I can ultimately address the completed puzzle using rows and columns through a two dimensional array. How should I go about doing this?
The tile[] data defines an undirected graph where each node links with 2, 3 or 4 other nodes. Choose a node with just 2 links and set that as your origin. The two links from this node define your X and Y axes. If you follow, say, the X axis link, you will arrive at a node with 3 links — one pointing back to the origin, and two others corresponding to the positive X and Y directions. You can easily identify the link in the X direction, because it will take you to another node with 3 links (not 4).
In this way you can easily find all the pieces along one side until you reach the far corner, which only has two links. Of all the pieces found so far, the only untested links are pointing in the Y direction. This makes it easy to place the next row of pieces. Simply continue until all the pieces have been placed.
This might be not what you are looking for, but because you asked for "most efficient way possible", here is a relatively recent scientific solution.
Puzzles are a complex combinatorial problem (NP-complete) and require some help from Academia to solve them efficiently. State of the art algorithms was recently beaten by genetic algorithms.
Depending on your puzzle sizes (and desire to study scientific stuff ;)) you might be interested in this paper: A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles . GAs would work around in surprising ways some of the problems you encounter in classic algorithms.
Note that genetic algorithms are embarrassingly parallel, so there is a straightforward way to do calculations on parallel machines, such as multi-core CPUs, GPUs (CUDA/OpenCL) and even distributed/cloud frameworks. Which makes them hundreds to thousands times faster. GPU-accelerated GAs unlock puzzle sizes unavailable for conventional algorithms.
I'm using the LEMON library in a project of mine and I have a doubt about how to best use it to evaluate a complete distance matrix between vertices in a given set.
So, consider we are given a large graph (represented as a ListDigraph), a subset of vertices S and we need to evaluate all shortest paths between any two vertices in S.
The easiest way to do that would be running the Dijkstra algorithm for each combination of two vertices in S, but of course this not the best idea in terms of efficiency.
One thing that I thought was to evaluate one path from a vertex i to a vertex j, both in S, and then search the ProcessedMap for any other vertex in S. If I find one, say k, I already have the distance from i to k. This would most probably reduce the number of calls to the algorithm. However I still think there should be a better solution in lemon.
Is adding multiple sources of any help? I didn't quite understand yet the behaviour of the class Dijkstra when using this feature.
Thank you =)
The problem:
N nodes are related to each other by a 'closeness' factor ranging from 0 to 1, where a factor of 1 means that the two nodes have nothing in common and 0 means the two nodes are exactly alike.
If two nodes are both close to another node (i.e. they have a factor close to 0) then this doesn't mean that they will be close together, although probabilistically they do have a much higher chance of being close together.
-
The question:
If another node is placed in the set, find the node that it is closest to in the shortest possible amount of time.
This isn't a homework question, this is a real world problem that I need to solve - but I've never taken any algorithm courses etc so I don't have a clue what sort of algorithm I should be researching.
I can index all of the nodes before another one is added and gather closeness data between each node, but short of comparing all nodes to the new node I haven't been able to come up with an efficient solution. Any ideas or help would be much appreciated :)
Because your 'closeness' metric obeys the triangle inequality, you should be able to use a variant of BK-Trees to organize your elements. Adapting them to real numbers should simply be a matter of choosing an interval to quantize your number on, and otherwise using the standard Bk-Tree procedure. Some experimentation may be required - you might want to increase the resolution of the quantization as you progress down the tree, for instance.
but short of comparing all nodes to
the new node I haven't been able to
come up with an efficient solution
Without any other information about the relationships between nodes, this is the only way you can do it since you have to figure out the closeness factor between the new node and each existing node. A O(n) algorithm can be a perfectly decent solution.
One addition you might consider - keep in mind we have no idea what data structure you are using for your objects - is to organize all present nodes into a graph, where nodes with factors below a certain threshold can be considered connected, so you can first check nodes that are more likely to be similar/related.
If you want the optimal algorithm in terms of speed, but O(n^2) space, then for each node create a sorted list of other nodes (ordered by closeness).
When you get a new node, you have to add it to the indexed list of all the other nodes, and all the other nodes need to be added to its list.
To find the closest node, just find the first node on any node's list.
Since you already need O(n^2) space (in order to store all the closeness information you need basically an NxN matrix where A[i,j] represents the closeness between i and j) you might as well sort it and get O(1) retrieval.
If this closeness forms a linear spectrum (such that closeness to something implies closeness to other things that are close to it, and not being close implies not being close to those close), then you can simply do a binary or interpolation sort on insertion for closeness, handling one extra complexity: at each point you have to see if closeness increases or decreases below or above.
For example, if we consider letters - A is close to B but far from Z - then the pre-existing elements can be kept sorted, say: A, B, E, G, K, M, Q, Z. To insert say 'F', you start by comparing with the middle element, [3] G, and the one following that: [4] K. You establish that F is closer to G than K, so the best match is either at G or to the left, and we move halfway into the unexplored region to the left... 3/2=[1] B, followed by E, and we find E's closer to F, so the match is either at E or to its right. Halving the space between our earlier checks at [3] and [1], we test at [2] and find it equally-distant, so insert it in between.
EDIT: it may work better in probabilistic situations, and require less comparisons, to start at the ends of the spectrum and work your way in (e.g. compare F to A and Z, decide it's closer to A, see if A's closer or the halfway point [3] G). Also, it might be good to finish with a comparison to the closest few points either side of where the binary/interpolation led you.
ACM Surveys September 2001 carried two papers that might be relevant, at least for background. "Searching in Metric Spaces", lead author Chavez, and "Searching in High Dimensional Spaces - Index Structures for Improving the Performance of Multimedia Databases", lead author Bohm. From memory, if all you have is the triangle inequality, you can use it to some effect, but if you can trim your data down to a sensible number of dimensions, you can do better by using a search structure that knows about this dimensional structure.
Facebook has this thing where it puts you and all of your friends in a graph, then slowly moves everyone around until people are grouped together based on mutual friends and so on.
It looked to me like they just made anything <0.5 an attractive force, anything >0.5 a repulsive force, and moved people with every iteration based on the net force. After a couple hundred iterations, it was looking pretty darn good.
Note: this is not an algorithm it is a heuristic. In the facebook implementation I saw, two people were not able to reach equilibrium and kept dancing around each other. It turns out they were actually the same person with two different accounts.
Also, it took about 15 minutes on a decent computer and ~100 nodes. YMMV.
It looks suspiciously like a Nearest Neighbor Search problem (also called a similarity search)
I need a functions thats find a cycle in an undirected graph (boost) and returns its vertices and edges. It needs only return the vertices/edges of one cycle in the graph.
My question is - what is the best way to do this using with boost? I am not experienced using it.
I do not know Boost, but here is the answer from S.O. at a conceptual level:
Here is my guess: walk through the graph using BFS. On every node note its "depth" and add a reference to a "parent" (should be only one even if there are many cycles). Once you discover that a link from A to B creates a cycle (because B is already colored), then:
1) Backtrack from A back to the root, save the edges / vertices along the way.
2) Backtrack from B back to the root, save the edges / vertices along the way.
3) Add A, B, AB
4) "Sort" to restore the proper order. Consider using a LIFO (stack) for 1) and a FIFO for 2)
I hope this helps.
Generally you can do this with a depth first search. I'm not intimately familiar with boost's graphing facilities but this page will give you an overview of the algorithm.
If you want to find a cycle, then using depth first search should do just fine. The DFS visitor has a back_edge function. When it's called, you have an edge in the cycle. You can then walk the predecessor map to reconstruct the cycle. Note that:
There's the strong_components function, to find, well, strong components
Finding all cycles, as opposed to a cycle, is much harder problem, and I believe Boost.Graph does not have a implementation for that at present