Where can I find an serial C/C++ implementation of the k-nearest neighbour algorithm?
Do you know of any library that has this?
I have found openCV but the implementation is already parallel.
I want to start from a serial implementation and parallelize it with pthreads openMP and MPI.
Thanks,
Alex
How about ANN? http://www.cs.umd.edu/~mount/ANN/. I have once used the kdtree implementation, but there are other options.
Quoting from the website: "ANN is a library written in C++, which supports data structures and algorithms for both exact and approximate nearest neighbor searching in arbitrarily high dimensions."
I wrote a C++ implementation for a KD-tree with nearest neighbor search. You can easily extend it for K-nearest neighbors by adding a priority queue.
Update: I added support for k-nearest neighbor search in N dimensions
The simplest way to implement this is to loop through all elements and store K nearest. (just comparing). Complexity of this is O(n) which is not so good but no preprocessing is needed. So now really depends on your application. You should use some spatial index to partition area where you search for knn. For some application grid based spatial structure is just fine (just divide your world into fixed block and search only within closes blocks first). This is good when your entities are evenly distributed. Better approach is to use some hierarchical structure like kd-tree... It really all depends on what you need
for more information including pseudocode look in these presentations:
http://www.ulozto.net/xCTidts/dpg06-pdf
http://www.ulozto.net/xoh6TSD/dpg07-pdf
Related
I want an algorithm to be able to find an optimal path between two vertices on a graph (with positive int weights).The thing is my graph is relatively big (up to 100 vertices). I have considered the dijkstra algorithm but as I searched the net most implementions use the adjacency matrix which in my case will be 100x100.
If you could recommend me a certain source to read and learn from , or even better provide me with a c++ implementaion it will be great.
PS: The algorithm needs to output the required route and not just the shortest distance between two points.
Thank you for your time.
Have you looked into A*?
Here's a good article to start reading: http://www.redblobgames.com/pathfinding/a-star/introduction.html
I have a grid of a fixed size. I'm trying to tile it into
multiple rectangles of different sizes.
The size difference is needed because I have to balance
the content of each rectangles in order to parallelize
the process and have balanced threads.
The grid contains individuals, so some cases may be free,
others may contain different types of individuals.
I've been told to look into "Plane tiling algorithms" but
I can't seem to find anything or can't find the right term
for what I'm looking for.
It looks to me like you want to do space subdivision in order to have some kind of search operations up and running in a balanced way.
For that, there are Quadtrees available, balanced Binary Space Partitioning Trees, and K-d trees.
If you need optimized parallel space partitioning, there is a great article about parallelized Octree and a related library coming from Uni Bonn, Germany which is quite recent, and the library is open source.
opencv has an implementation of max-flow algorithm (class GCGRAPH in file gcgraph.hpp). It's available here.
Does anyone know which particular max-flow algorithm is implemented by this class?
I am not 100% confident about this, but I believe that the algorithm is based on this research paper describing max-flow algorithms for computer vision. Specifically, Section 3 describes a new algorithm for computing maximum flows.
I haven't lined up every detail of the paper's algorithm with the implementation of the algorithm, but many details seem to match:
The algorithm described works by using a bidirectional search from both s and t, which the implementation is doing as well: for example, there's a comment reading // grow S & T search trees, find an edge connecting them.
The algorithm described keeps track of a set of orphaned nodes, which the variable std::vector<Vtx*> orphans seems to track in the implementation.
The algorithm described works by building up a set of trees and reusing them; the algorithm implementation keeps track of a tree associated with each node.
I hope this helps!
And another algorithm I'm looking for: A free C/C++ implementation of the average distance to nearest neighbour problem.
So basically I have a cloud of points in 3D and I want the average over the distances between all points and their respective nearest neighbours. So easiest way to do this would be to find the nearest neighbour for every point, calculate the distance of that neighbour to the point, and devide the sum of those distances by the number of points. However, there are much better algorithms, as this has much redundancy and approximates run even faster. I'm looking for a free C/C++ implementation of those better algorithms.
An ε-Approximate if fine.
The C++ library FLANN allows you to do "fast approximate nearest-neighbor searches." It's written in C++ and claims to be one of the fastest implementations of this sort of search available.
Hope this helps!
You might try a Quadtree, as described in this question. There are many implementations for your problem in other 3D/2D graphic libraries, too.
I used GEOS, the 'Geometry Engine, Open Source' once in a project some years ago and was very satisfied.
Is there any fast way to determine the size of the largest strongly connected component in a graph?
I mean, like, the obvious approach would mean determining every SCC (could be done using two DFS calls, I suppose) and then looping through them and taking the maximum.
I'm pretty sure there has to be some better approach if I only need to have the size of that component and only the largest one, but I can't think of a good solution. Any ideas?
Thanks.
Let me answer your question with another question -
How can you determine which value in a set is the largest without examining all of the values?
Firstly you could use Tarjan's algorithm which needs only one DFS instead of two. If you understand the algorithm clearly, the SCCs form a DAG and this algo finds them in the reverse topological sort order. So if you have a sense of the graph (like a visual representation) and if you know that relative big SCCs occur at end of the DAG then you could stop the algorithm once first few SCCs are found.