Normally to find the connected components on a set of points, I build a graph from the points that are placed in 2 dimensional Euclidean space where the edges are determined with thresholding. Namely, if the distance between two points is closer than a predetermined cut-off radius, then I consider them as neighbors. Then I do a depth-first search on this graph to determine the connected components.
The problem with this approach is I have to use thresholding to build the graph in the first place. I am not a computer scientist, so I never took an algorithms class. I would like to know if there is an algorithm with which I can find nearest neighbors or connected components without building the edges with thresholding? The main issue that makes thresholding so preferable is the fact that this box is periodic. That's why googling alone didn't help me much.
My code for this look like this:
// +++++
// Graph
// +++++
// ( Note that edges are lines connecting nodes or vertices )
class Graph
{
public:
Graph() {}
~Graph() {}
void setNumNodes( int nds );
int getNumNodes() { return numNodes; }
void addEdge( int nd_idx, set<int> dest );
map<int,set<int> > edges; // [node, destination]
void setThreshold( double cutoff, double carpan );
double getThreshold() { return threshold; }
private:
int numNodes;
double threshold;
};
void Graph::setNumNodes( int nds ){
numNodes = nds;
}
void Graph::addEdge( int nd_idx, set<int> dest ){
edges.insert( pair<int,set<int> >( nd_idx, dest ) );
}
void Graph::setThreshold( double cutoff, double carpan ){
threshold = 2*R + carpan*cutoff;
}
// +++++
// Function for depth-first search from a starting node
int dfsFromNode( Graph& graph, int i, stack<int>& S, vector<bool>& visited ){
int connected_comp = 0;
// Add the node to the stack
S.push( i );
// While there are nodes that are not visited
// (so as long as stack is not empty ..)
while( !S.empty() ){
// Remove the top of the stack (backtracking process)
S.pop();
if( !visited[i] ){
visited[i] = true;
connected_comp++;
set<int> neighbors;
neighbors = graph.edges[i];
for( auto j: neighbors ){
i = j;
S.push( i );
}
} // if the node is visited before, get out
} // while loop to check if the stack is empty or not
return connected_comp;
}
edit:
To reiterate the question, how can I find the nearest neighbors or connected components without doing thresholding in periodic boundary settings?
For finding connected components, you can use kd-trees. A k-d tree, (abbrev. k-dimensional tree) is an algorithm in which you split your data points into two alternating between orthogonal directions in each degrees of freedom. I find the following link quite useful for explanation.
Specifically, in the case of periodic boundary conditions, you can just ghost/image particles outside of the primary box and build the kd-tree including these particles.
Related
I am trying to implement a Breadth First Search (BFS) on my custom octree data structure to find the shortest path between two voxels in a 3D voxel octree. The start and end voxels are specified by their 3D positions (startVoxel and endVoxel) in the octree. The voxel_size is the size of a single voxel in the octree. The max_distance parameter specifies the maximum distance that the BFS search should cover. The function returns the path length (i.e., the number of voxels that need to be traversed) between the start and end voxels, or -1 if the end voxel is not found within the maximum distance.
As I am experiencing some issues with the positional coordinates of my voxels (i suspect some rounding errors) I am trying to use end_voxel_distancethreshold to eliminate the need to find the exact location of the end voxels but have some "buffer". The issue is if I make it too large, I return a lot of 0 distance, if I make it too small, I return some valid results but after a certain distance, I suddenly only return -1.
My question is: Is the below implementation valid or am I overlooking something that could result in this unexpected behavior? I am confident that my K-nearest neighbor is working correctly and returns the six nearest neighbors within a radius of the voxel size.
double bfs ( Custom_Octree& octree, double voxel_size, Eigen::Vector3d startVoxel, Eigen::Vector3d endVoxel, double max_distance )
{
// Threshold for checking if the current voxel is close enough to the end voxel
double end_voxel_distance_threshold = voxel_size;
// Create a queue and push the start voxel into it
std::queue< Eigen::Vector3d > q;
q.push( startVoxel );
// Create a vector to keep track of the visited voxels
std::vector< Eigen::Vector3d > visited;
visited.push_back( startVoxel );
// Initialize distance to 0
double distance = 0;
// Keep looping until the queue is empty
while( !q.empty() ) {
// Get the size of the queue
int size = q.size();
// For each voxel in the queue
while( size-- ) {
// Get the current voxel from the front of the queue
Eigen::Vector3d currentVoxel = q.front();
q.pop();
// Check if the current voxel is close enough to the end voxel
if( ( currentVoxel - endVoxel ).norm() <= end_voxel_distance_threshold )
{
// Return the distance between the start and end voxel
return distance;
}
// Get the center of the current voxel
Eigen::Vector3d current_voxel_center( currentVoxel[0], currentVoxel[1], currentVoxel[2] );
// Find the 6 nearest neighbors of the current voxel
auto neighbors = octree.k_nearest_neighbors( current_voxel_center, voxel_size, 6 );
// For each neighbor
for( auto neighbor : neighbors ) {
// Get the center of the neighbor voxel
double x = neighbor->x_center();
double y = neighbor->y_center();
double z = neighbor->z_center();
Eigen::Vector3d nb( x, y, z );
// If the neighbor has not been visited and the distance from the start voxel is within the max_distance
if( std::find( visited.begin(), visited.end(), nb ) == visited.end() && distance <= max_distance ) {
// Push the neighbor voxel into the queue
q.push( nb );
// Add the neighbor voxel to the visited vector
visited.push_back( nb );
}
}
}
// Increment the distance by the voxel size
distance += voxel_size;
}
// Return -1 if the end voxel was not found
return -1;
}
I have a class to implement graph in c++ as below. This is the default coding and it cannot be modified.
Graph(vector<Edge> const &edges, int N)
{
// construct a vector of vectors of Pairs to represent an adjacency list
vector<vector<Pair> > adjList;
// resize the vector to N elements of type vector<Pair>
adjList.resize(N);
// add edges to the directed graph
for (auto &edge: edges)
{
int src = edge.src;
int dest = edge.dest;
int weight = edge.weight;
// insert at the end
adjList[src].push_back(make_pair(dest, weight));
}
this->N = N;
}
In the main program, i have the default input for the constructor as below. I have to check whether the graph has a cycle or not. If it does not, the program has to generate random edges until cycle is found in the graph. The default graph does not contain cycle and it has the edges as below:
vector<Edge> edges =
{
// (x, y, w) -> edge from x to y having weight w
{ 0,1,6 }, { 0,2,12 }, { 1,4,9 }, { 3,4,1 }, { 3,2,4 }
};
I tried appending the random edges to the default graph using the code below. However,it does not work.
do
{
src=rand()%5;
dest=rand()%5;
weight=rand()%20;
vector<Edge> edges1{
{src, dest, weight}};
Graph graph1(edges1,N);
graph.push_back(graph1);
if(graph.isCyclic())
{
//print the graph
}
}while(!graph.isCyclic());
I think the push_back() function is not used properly. Anyone knows how? Thanks.
Based on the limited information provided it seems the following would work.
vector<Edge> edges = ...;
for (;;)
{
int src=rand()%5;
int dest=rand()%5;
int weight=rand()%20;
Edge new_edge{src, dest, weight};
edges.push_back(new_edge);
Graph graph(edges, N);
if (graph.isCyclic())
{
//print the graph
break; // exit the loop
}
}
But this code recreates the graph each time round the loop, so there may be something more efficient possible.
UPDATE
Seems the following might work, it avoids recreating the graph each time
vector<Edge> edges = ...;
Graph graph(edges, N);
for (;;)
{
int src=rand()%5;
int dest=rand()%5;
int weight=rand()%20;
graph.adjList[src].push_back(std::make_pair(dest, weight));
if (graph.isCyclic())
{
//print the graph
break; // exit the loop
}
}
Is there a way to modify this to show the route of the shortest path? For example, if i had a list of numbers like (3,1),(3,0),(4,3),(2,1) the output for getting from 4 to 1 would be 4->3,3->1
// Prints shortest paths from src to all other vertices
void Graph::shortestPath(int src)
{
// Create a priority queue to store vertices that
// are being preprocessed. This is weird syntax in C++.
// Refer below link for details of this syntax
// http://geeksquiz.com/implement-min-heap-using-stl/
priority_queue< iPair, vector <iPair> , greater<iPair> > pq;
// Create a vector for distances and initialize all
// distances as infinite (INF)
vector<int> dist(V, INF);
// Insert source itself in priority queue and initialize
// its distance as 0.
pq.push(make_pair(0, src));
dist[src] = 0;
/* Looping till priority queue becomes empty (or all
distances are not finalized) */
while (!pq.empty())
{
// The first vertex in pair is the minimum distance
// vertex, extract it from priority queue.
// vertex label is stored in second of pair (it
// has to be done this way to keep the vertices
// sorted distance (distance must be first item
// in pair)
int u = pq.top().second;
pq.pop();
// 'i' is used to get all adjacent vertices of a vertex
list< pair<int, int> >::iterator i;
for (i = adj[u].begin(); i != adj[u].end(); ++i)
{
// Get vertex label and weight of current adjacent
// of u.
int v = (*i).first;
int weight = (*i).second;
// If there is shorted path to v through u.
if (dist[v] > dist[u] + weight)
{
// Updating distance of v
dist[v] = dist[u] + weight;
pq.push(make_pair(dist[v], v));
}
}
}
// Print shortest distances stored in dist[]
printf("Vertex Distance from Source\n");
for (int i = 0; i < V; ++i)
printf("%d \t\t %d\n", i, dist[i]);
}
Putting in an array that stores the numbers of the path like 4,3,3,1 (using above example) seems like the best idea but i don't know where to insert the array in this code to do that.
Just as you save the distances for each vertex in the dist vector, save the predecessor vertex that last updated it in a vector called predecessor.
vector<int> dist(V, INF);
vector<int> predecessor(V, 0);
Then whenever you update the distance, update the predecessor:
dist[v] = dist[u] + weight;
predecessor[v] = u;
Finally, you can trace for any vertex the shortest path (Backward) to the source:
printf("Vertex Distance from Source shortest path from source\n");
for (int i = 0; i < V; ++i)
{
printf("%d \t\t %d\t\t", i, dist[i]);
int j = i;
do
{
printf("%d,", j);
j = predecessor[j];
} while(j != src);
printf("\n");
}
Sounds like a homework problem.
Your idea to store the numbers of the path would be great, if this were a DFS. Unfortunately, Djikstra's algorithm doesn't naturally keep track of the path like a DFS does; it simply takes the next closest node and updates the distance values. It's probably more similar to a BFS in that regard.
What you could do is as you update the distances to each node, somehow store which node you're coming from (maybe in your iPair struct if you're allowed to, maybe in a map/array if you have a way to ID your nodes). I'll call it a "from" reference for the sake of this post. Then, each time you find a shorter path to a node, you can also update that from reference.
How do you find the path to a given node then? Simple: just start at the end node, and follow the "from" references back to the source.
Disclaimer: There are some bad practices in this following code
Hello, I just had a few questions on how to correctly format my KD tree K nearest neighbor search. Here is an example of my function.
void nearest_neighbor(Node *T, int K) {
if (T == NULL) return;
nearest_neighbor(T->left, K);
//do stuff find dist etc
if(?)nearest_neighbor(T->right, K);
}
This code is confusing so I will try to explain it. My function only takes the k value and a Node T. What I am trying to do is find the distance between the current node and every other value in the structure. These all work, the issue I'm having is understanding when and how to call the recursive calls nearest_neighbor(T->left/T->right,K) I know I am meant to prune the calls to the right side but I'm not sure how to do this. This is an multidimensional KD Tree by the way. Any guidance to better examples would be very appreciated.
I would advise you to implement like Wikipedia says, where for your specific question, this:
Starting with the root node, the algorithm moves down the tree
recursively, in the same way that it would if the search point were
being inserted (i.e. it goes left or right depending on whether the
point is lesser than or greater than the current node in the split
dimension).
answers the question. Of course you can have this image in mind:
where if you have more two dimensions like in the example, you simply split in the first dimension, then in the second, then in the third, then in the forth and so on, and then you follow a cyclic policy, so that when you reach the final dimension, you start from the first dimension again.
The general idea is to keep a global point closest to the target, updating with newly discovered points and never descending into an n-gon that can't possibly contain a point closer than the nearest to the target already found. I'll show it in C rather than C++. You can easily translate to object-oriented form.
#define N_DIM <k for the k-D tree>
typedef float COORD;
typedef struct point_s {
COORD x[N_DIM];
} POINT;
typedef struct node_s {
struct node_s *lft, *rgt;
POINT p[1];
} NODE;
POINT target[1]; // target for nearest search
POINT nearest[1]; // nearest found so far
POINT b0[1], b1[1]; // search bounding box
bool prune_search() {
// Return true if no point in the bounding box [b0..b1] is closer
// to the target than than the current value of nearest.
}
void search(NODE *node, int dim);
void search_lft(NODE *node, int dim) {
if (!node->lft) return;
COORD save = b1->p->x[dim];
b1->p->x[dim] = node->p->x[dim];
if (!prune_search()) search(node->lft, (dim + 1) % N_DIM);
b1->p->x[dim] = save;
}
void search_rgt(NODE *node, int dim) {
if (!node->rgt) return;
COORD save = b0->p->x[dim];
b0->p->x[dim] = node->p->x[dim];
if (!prune_search()) search(node->rgt, (dim + 1) % N_DIM);
b0->p->x[dim] = save;
}
void search(NODE *node, int dim) {
if (dist(node->p, target) < dist(nearest, target)) *nearest = *node->p;
if (target->p->x[dim] < node->p->x[dim]) {
search_lft(node, dim);
search_rgt(node, dim);
} else {
search_rgt(node, dim);
search_lft(node, dim);
}
}
/** Set *nst to the point in the given kd-tree nearest to tgt. */
void get_nearest(POINT *nst, POINT *tgt, NODE *root) {
*b0 = POINT_AT_NEGATIVE_INFINITY;
*b1 = POINT_AT_POSITIVE_INFINITY;
*target = *tgt;
*nearest = *root->p;
search(root, 0);
*nst = *nearest;
}
Note this is not the most economical implementation. It does some unnecessary nearest updates and pruning comparisons for simplicity. But its asymptotic performance is as expected for kd-tree NN. After you get this one working, you can use it as a base implementation to squeeze out the extra comparisons.
Warning: Fairly long question, perhaps too long. If so, I apologize.
I'm working on a program involving a nearest neighbor(s) search of a kd tree (in this example, it is an 11 dimensional tree with 3961 individual points). We've only just learned about them, and while I have a good grasp of what the tree does, I get very confused when it comes to the nearest neighbor search.
I've set up a 2D array of points, each containing a quality and a location, which looks like this.
struct point{
double quality;
double location;
}
// in main
point **parray;
// later points to an array of [3961][11] points
I then translated the data so it has zero mean, and rescaled it for unit variance. I won't post the code as it's not important to my questions. Afterwards, I built the points into the tree in random order like this:
struct Node {
point *key;
Node *left;
Node *right;
Node (point *k) { key = k; left = right = NULL; }
};
Node *kd = NULL;
// Build the data into a kd-tree
random_shuffle(parray, &parray[n]);
for(int val=0; val<n; val++) {
for(int dim=1; dim<D+1; dim++) {
kd = insert(kd, &parray[val][dim], dim);
}
}
Pretty standard stuff. If I've used random_shuffle() incorrectly, or if anything is inherently wrong about the structure of my tree, please let me know. It should shuffle the first dimension of the parray, while leaving the 11 dimensions of each in order and untouched.
Now I'm on to the neighbor() function, and here's where I've gotten confused.
The neighbor() function (last half is pseudocode, where I frankly have no idea where to start):
Node *neighbor (Node *root, point *pn, int d,
Node *best, double bestdist) {
double dist = 0;
// Recursively move down tree, ignore the node we are comparing to
if(!root || root->key == pn) return NULL;
// Dist = SQRT of the SUMS of SQUARED DIFFERENCES of qualities
for(int dim=1; dim<D+1; dim++)
dist += pow(pn[d].quality - root->key->quality, 2);
dist = sqrt(dist);
// If T is better than current best, current best = T
if(!best || dist<bestdist) {
bestdist = dist;
best = root;
}
// If the dist doesn't reach a plane, prune search, walk back up tree
// Else traverse down that tree
// Process root node, return
}
Here's the call to neighbor in main(), mostly uncompleted. I'm not sure what should be in main() and what should be in the neighbor() function:
// Nearest neighbor(s) search
double avgdist = 0.0;
// For each neighbor
for(int i=0; i<n; i++) {
// Should this be an array/tree of x best neighbors to keep track of them?
Node *best;
double bestdist = 1000000000;
// Find nearest neighbor(s)?
for(int i=0; i<nbrs; i++) {
neighbor(kd, parray[n], 1, best, &bestdist);
}
// Determine "distance" between the two?
// Add to total dist?
avgdist += bestdist;
}
// Average the total dist
// avgdist /= n;
As you can see, I'm stuck on these last two sections of code. I've been wracking my brain over this for a few days now, and I'm still stuck. It's due very soon, so of course any and all help is appreciated. Thanks in advance.
The kd-tree does not involve shuffling.
In fact, you will want to use sorting (or better, quickselect) to build the tree.
First solve it for the nearest neighbor (1NN). It should be fairly clear how to find the kNN once you have this part working, by keeping a heap of the top candidates, and using the kth point for pruning.