How to find all clusters of forest on map ?
I have simple class cell like (Type is enum {RIVER, FOREST,GRASS,HILL}
class Cell{
public:
Type type;
int x;
int y
};
and map like vector<Cell> grid. Can anyone suggest me algorithm to create list<list<Cell>> clusters where list contains FOREST cells in same cluster (cluster are set of connected cells - connection can be in eight direction:up,down,left,right,up_right,up_left,down_left,down_right)? I need to find all clusters of forest on map and put every single cluster in list<Cell>.
The algorithm is rather simple and it actually doesn't even depend on the exact definition of what a cluster is. Say you have a predicate cluster(f0, f1) which yields true if f0 and f1 are in the same cluster. All you need to do is to run though the grid and find a forest. If a cell f is a forest, you check if cluster(f, other) for each known forest. If cluster(f, other) yields true you add f to the cluster of other. You continue to check other known forests in other clusters: when you find another cell c in another cluster for cluster(f, c) also yields true, you merge (std::list<Cell>::spice()) the two clusters.
I had put this as a comment, but may as well answer:
Look up the union-find algorithm. Using path compression, you can just
walk through the structure afterwards and create a list for each root,
adding your cells to the appropriate list as you go.
Link: http://en.wikipedia.org/wiki/Disjoint-set_data_structure
For all your cells, perform a union with the cell above and to the left. If you want diagonals to join, then also include the top-left and top-right diagonal).
Use the path-compression version of union-find so that all nodes in a cluster point to a single root. Then all you have to do is walk through your structure (after doing all the unions) and add nodes as you go. Pseudo(ish)code:
foreach node
Find(node) // this ensures path compression
if not clusters.hasList(node.root)
clusters.createList(node.root)
end
list <- clusters.getList(node.root)
list.append(node)
end
The above assumes that if a node is a root, then node.root points to node.
Related
I saw the following implementation of topological sort using DFS on Leetcode https://leetcode.com/problems/course-schedule/discuss/58509/18-22-lines-C++-BFSDFS-Solutions
Now the part of this that is confusing me is the representation of the directed graph which is used for the top sort. The graph is created as follows:
vector<unordered_set<int>> make_graph(int numCourses, vector<pair<int, int>>& prerequisites) {
vector<unordered_set<int>> graph(numCourses);
for (auto pre : prerequisites)
graph[pre.second].insert(pre.first);
return graph;
}
What is confusing me is this line:
graph[pre.second].insert(pre.first);
Presumably the graph is being represented as an adjacency list; if so why is each node being represented by the incoming edges and not the outgoing edges? Interestingly, if I flip pre.second and pre.first like this:
graph[pre.first].insert(pre.second);
The top sort still works. However, it seems most implementations of the same problem use the former method. Does this generalize to all directed graphs? I was taught in my undergraduate degree that a directed graph's adjacency list should contain a list of each nodes outgoing nodes. Is the choice of incoming vs outgoing node arbitrary for the representation of the adjacency list?
To the specific problem which only requires answering true or false, it doesn't matter if you flip every edge. That's because a graph is topological sortable if and only if it has no loops. But if you want an order of taking, it doesn't work as you can see in the different results of [[0, 1]] and [[1, 0]].
Which way to save the graph depends on how you solve the problem. In this given case, we need to know the indegrees of every node (course) and also to update it every time we delete a node from the graph (take the course), so that we know if we can delete a node (we can do it when the indegree is 0). When updating, we minus 1 to each node that the deleted node direct to. If you apply this method (as most do), it's clear how you should save the graph
I want to create a graph with nodes and edges, where each node will contain n number of values. We would be given with the n values of the starting node, from which we need to generate other nodes where each value in each node would be of the form either:
t_n=t_(n-1)+2
or
t_n=t_(n-1)-1
When such a node is generated, it should create an edge from the old node to the new node.
I know this might be very trivial job, but I have very limited programming knowledge. I have been suggested to use classes in C++ or structure to represent the nodes. Please help me in creating the graph with nodes that would have multiple values and further the next nodes would be generated from the parent node following the above rule. Some C++ code would be very helpful.
Thanks in Advance.
here you have some code but I don't really fully understand your task.
- graph with nodes and edges
- each node has n number of values
- we are given n values of the starting point
- need to generate other nodes where each value in each node would be either
- t_n=t_(n-1)+2
- t_n=t_(n-1)-1
- when such node is generated, it creates an edge from the old node to the new node.
this starting point: do we have to generate a graph from it? what is with the creation of the edge from the old node and the new node? is old node here the starting point?
does n number of values means to where the point is connected to (as a chain of the other edges to which this edge is connected to)? example we are provided a node with a chain of numbers (6, 4, 5) where this means we need to generate extra edges which would be connected x times (first one linked to our starting point would be linked to 6 edges, one of them being the starting point)
will edit my answer when I have more information. could you please draw an example in paint and upload it online and provide the link? it would be easier to imagine.
I'm using ELKI to cluster, in a hierarchical way, a dataset of geolocations using OPTICSXi.
The result of the execution of the algorithm is a set of files.
The content of a file could be:
# Cluster: nameOfCluster
# OPTICSModel
# Parents: nameOfParents (this element doesn't exist for the root cluster)
# Children: nameOfChild_0, nameOfChild_1 ... nameOfChild_n, (optional)
ID=1 lat0 lon0 reachability=?
ID=3062 lat1 lon1 reachability=1.30972586 predecessor=1
ID=7383 lat2 lon2 reachability=2.56784445 predecessor=3062
ID=42839 lat3 lon3 reachability=4.05510623 predecessor=1
I don't understand if the elements that are in each file (in the example there are four elements) belong to the same cluster or could belong to different clusters. In the latter case, I need to write some code that builds the clusters ( for example looking at the predecessor of each node), or there are some parameters that could I specify in Elki to obtain each single cluster?
By default, ELKI will produce a directory with one file per cluster. Unless the output file already exists, in which case you will get all the clusters written into the same file, separated with comments as seen above.
With a hierarchical result, such as OPTICSXi, your should however also treat all members of the child clusters to be also part of the parent. These are clusters nested into the parent. They are not repeated in the parent, to reduce redundancy in the output.
Compare the output of OPTICSXi to OPTICS output. What the Xi approach does, is split the data for you, based on sudden drops in reachability-distance. All clusters of Xi should be subsequences of the original OPTICS cluster order.
In your case, you may have chosen minPts too small, if your cluster has just 4 elements. (Although, you may have truncated the file, or you may have a lot of elements in child clusters; so the output may be fine).
Also note that you will usually want to validate whether you want the first element(s) of your cluster to belong to the cluster or not; similarly the last elements. OPTICSXi tends to err on the first elements, but not in a systematic way that would be trivial to fix. The first and last elements are those that bridge the gap from one cluster to another. You really should verify these manually (which is a good reason to not choose minPts too small).
I strongly recommend to build/use a visualization for your specific use case. Then you could just load such a cluster into your visualization and visually inspect if the result makes sense to you. I have used OPTICSXi on geographic data, and that worked very well for me.
So, if I've understood well, in the example above, the cluster is composed of the elements
ID=1, ID=3062, ID=7383, ID=42839, and all the elements in nameOfChild_0, nameOfChild_1 ... nameOfChild_n.
Maybe, I don't have to join the children in the root element, because I guess I'll obtain a unique big cluster contained all my geo-locations, in fact I have 903 child elements and 18795 node (ID).
I've done a lot of tests, choosing minPoint = {2,5,10} and xi = {0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001}. I use a visualization of my clusters, but I can't find a good result. I'm having a lot of trouble.
Thanks to your reply I've understood that I split my elements too much, in the sense that for me each file is a cluster, and for this reason I don't consider the child elements in the parent, but I consider them as separated clusters.
Moreover, I noticed that the first and the last element sometimes are wrong, I've thought to verify if this elements are predecessor of at least one element in the cluster, or at least one element in the cluster is a predecessor of those. Does this make sense?
Let me start off with saying that I have very basic knowledge of nodes and graphs.
My goal is to make a solver for a maze which is stored as an array. I know exactly how to implement the algorithm for solving (I'm actually implementing a couple of them) but what my problem is, is that I am very confused on how to implement the nodes that the solver will use in each empty cell.
Here is an example array:
char maze[5][9] =
"#########",
"# # #",
"# ## ## #",
"# # #",
"#########"
My solver starts at the top left and the solution (exit) is at the bottom right.
I've read up on how nodes work and how graphs are implemented, so here is how I think I need to make this:
Starting point will become a node
Each node will have as property the column and the row number
Each node will also have as property the visited state
Visited state can be visited, visited and leads to dead end, not visited
Every time a node gets visited, every directly adjacent, empty and not visited cell becomes the visited node's child
Every visited node gets put on top of the solutionPath stack (and marked on the map as '*')
Every node that led to a dead end is removed from the stack (and marked on the map as '~')
Example of finished maze:
"#########",
"#*~#****#",
"#*##*##*#",
"#****~#*#",
"#########"
Basically my question is, am I doing something really stupid here with my way of thinking (since I am really inexperienced with nodes) and if it is could you please explain to me why? Also if possible provide me other websites to check which implement examples of graphs on real world applications so I can get a better grasp of it.
The answer really depends on what you find most important in the problem. If you're searching for efficiency and speed - you're adding way too many nodes. There's no need for so many.
The efficient method
Your solver only needs nodes at the start and end of the path, and at every possible corner on the map. Like this:
"#########",
"#oo#o o#",
"# ## ## #",
"#o oo#o#",
"#########"
There's no real need to test the other places on the map - you'll either HAVE TO walk thru them, or won't have need to even bother testing.
If it helps you - I got a template digraph class that I designed for simple graph representation. It's not very well written, but it's perfect for showing the possible solution.
#include <set>
#include <map>
template <class _nodeType, class _edgeType>
class digraph
{
public:
set<_nodeType> _nodes;
map<pair<unsigned int,unsigned int>,_edgeType> _edges;
};
I use this class to find a path in a tower defence game using the Dijkstra's algorithm. The representation should be sufficient for any other algorithm tho.
Nodes can be of any given type - you'll probably end up using pair<unsigned int, unsigned int>. The _edges connect two _nodes by their position in the set.
The easy to code method
On the other hand - if you're looking for an easy to implement method - you just need to treat every free space in the array as a possible node. And if that's what you're looking for - there's no need for designing a graph, because the array represents the problem in a perfect way.
You don't need dedicated classes to solve it this way.
bool myMap[9][5]; //the array containing the map info. 0 = impassable, 1 = passable
vector<pair<int,int>> route; //the way you need to go
pair<int,int> start = pair<int,int>(1,1); //The route starts at (1,1)
pair<int,int> end = pair<int,int>(7,3); //The road ends at (7,3)
route = findWay(myMap,start,end); //Finding the way with the algorithm you code
Where findWay has a prototype of vector<pair<int,int>> findWay(int[][] map, pair<int,int> begin, pair<int,int> end), and implements the algorithm you desire. Inside the function you'll probably need another two dimensional array of type bool, that indicates which places were tested.
When the algorithm finds a route, you usually have to read it in reverse, but I guess it depends on the algorithm.
In your particular example, myMap would contain:
bool myMap[9][5] = {0,0,0,0,0,0,0,0,0,
0,1,1,0,1,1,1,1,0,
0,1,0,0,1,0,0,1,0,
0,1,1,1,1,1,0,1,0,
0,0,0,0,0,0,0,0,0};
And findWay would return a vector containing (1,1),(1,2),(1,3),(2,3),(3,3),(4,3),(4,2),(4,1),(5,1),(6,1),(7,1),(7,2),(7,3)
I need to do a partition of approximately 50000 points into distinct clusters. There is one requirement: the size of every cluster cannot exceed K. Is there any clustering algorithm that can do this job?
Please note that upper bound, K, of every cluster is the same, say 100.
Most clustering algorithms can be used to create a tree in which the lowest level is just a single element - either because they naturally work "bottom up" by joining pairs of elements and then groups of joined elements, or because - like K-Means, they can be used to repeatedly split groups into smaller groups.
Once you have a tree, you can decide where to split off subtrees to form your clusters of size <= 100. Pruning an existing tree is often quite easy. Suppose that you want to divide an existing tree to minimise the sum of some cost of the clusters you create. You might have:
f(tree-node, list_of_clusters)
{
cost = infinity;
if (size of tree below tree-node <= 100)
{
cost = cost_function(stuff below tree-node);
}
temp_list = new List();
cost_children = 0;
for (children of tree_node)
{
cost_children += f(child, temp_list);
}
if (cost_children < cost)
{
list_of_clusters.add_all(temp_list);
return cost_children;
}
list_of_clusters.add(tree_node);
return cost;
}
One way is to use hierarchical K-means, but you keep splitting each cluster which is larger than K, until all of them are smaller.
Another (in some sense opposite approach) would be to use hierarchical agglomerative clustering, i.e. a bottom up approach and again make sure you don't merge cluster if they'll form a new one of size > K.
The issue with naive clustering is that you do indeed have to calculate a distance matrix that holds the distance of A from every other member in the set. It depends whether you've pre-processed the population or your amalgamating the clusters into typical individuals then recalculating the distance matrix again.