Optimizing the Dijkstra's algorithm - c++

I need a graph-search algorithm that is enough in our application of robot navigation and I chose Dijkstra's algorithm.
We are given the gridmap which contains free, occupied and unknown cells where the robot is only permitted to pass through the free cells. The user will input the starting position and the goal position. In return, I will retrieve the sequence of free cells leading the robot from starting position to the goal position which corresponds to the path.
Since executing the dijkstra's algorithm from start to goal would give us a reverse path coming from goal to start, I decided to execute the dijkstra's algorithm backwards such that I would retrieve the path from start to goal.
Starting from the goal cell, I would have 8 neighbors whose cost horizontally and vertically is 1 while diagonally would be sqrt(2) only if the cells are reachable (i.e. not out-of-bounds and free cell).
Here are the rules that should be observe in updating the neighboring cells, the current cell can only assume 8 neighboring cells to be reachable (e.g. distance of 1 or sqrt(2)) with the following conditions:
The neighboring cell is not out of bounds
The neighboring cell is unvisited.
The neighboring cell is a free cell which can be checked via the 2-D grid map.
Here is my implementation:
#include <opencv2/opencv.hpp>
#include <algorithm>
#include "Timer.h"
static const int UNKNOWN_CELL = 197;
static const int FREE_CELL = 255;
static const int OCCUPIED_CELL = 0;
/// STRUCTURES for easier management.
struct vertex {
cv::Point2i id_;
cv::Point2i from_;
vertex(cv::Point2i id, cv::Point2i from)
id_ = id;
from_ = from;
/// To be used for finding an element in std::multimap STL.
struct CompareID
CompareID(cv::Point2i val) : val_(val) {}
bool operator()(const std::pair<double, vertex> & elem) const {
return val_ == elem.second.id_;
cv::Point2i val_;
/// Some helper functions for dijkstra's algorithm.
uint8_t get_cell_at(const cv::Mat & image, int x, int y)
assert(x < image.rows);
assert(y < image.cols);
return image.data[x * image.cols + y];
/// Some helper functions for dijkstra's algorithm.
bool checkIfNotOutOfBounds(cv::Point2i current, int rows, int cols)
return (current.x >= 0 && current.y >= 0 &&
current.x < cols && current.y < rows);
/// Brief: Finds the shortest possible path from starting position to the goal position
/// Param gridMap: The stage where the tracing of the shortest possible path will be performed.
/// Param start: The starting position in the gridMap. It is assumed that start cell is a free cell.
/// Param goal: The goal position in the gridMap. It is assumed that the goal cell is a free cell.
/// Param path: Returns the sequence of free cells leading to the goal starting from the starting cell.
bool findPathViaDijkstra(const cv::Mat& gridMap, cv::Point2i start, cv::Point2i goal, std::vector<cv::Point2i>& path)
// Clear the path just in case
// Create working and visited set.
std::multimap<double,vertex> working, visited;
// Initialize working set. We are going to perform the djikstra's
// backwards in order to get the actual path without reversing the path.
working.insert(std::make_pair(0, vertex(goal, goal)));
// Conditions in continuing
// 1.) Working is empty implies all nodes are visited.
// 2.) If the start is still not found in the working visited set.
// The Dijkstra's algorithm
while(!working.empty() && std::find_if(visited.begin(), visited.end(), CompareID(start)) == visited.end())
// Get the top of the STL.
// It is already given that the top of the multimap has the lowest cost.
std::pair<double, vertex> currentPair = *working.begin();
cv::Point2i current = currentPair.second.id_;
// Check all arcs
// Only insert the cells into working under these 3 conditions:
// 1. The cell is not in visited cell
// 2. The cell is not out of bounds
// 3. The cell is free
for (int x = current.x-1; x <= current.x+1; x++)
for (int y = current.y-1; y <= current.y+1; y++)
if (checkIfNotOutOfBounds(cv::Point2i(x, y), gridMap.rows, gridMap.cols) &&
get_cell_at(gridMap, x, y) == FREE_CELL &&
std::find_if(visited.begin(), visited.end(), CompareID(cv::Point2i(x, y))) == visited.end())
vertex newVertex = vertex(cv::Point2i(x,y), current);
double cost = currentPair.first + sqrt(2);
// Cost is 1
if (x == current.x || y == current.y)
cost = currentPair.first + 1;
std::multimap<double, vertex>::iterator it =
std::find_if(working.begin(), working.end(), CompareID(cv::Point2i(x, y)));
if (it == working.end())
working.insert(std::make_pair(cost, newVertex));
else if(cost < (*it).first)
working.insert(std::make_pair(cost, newVertex));
// Now, recover the path.
// Path is valid!
if (std::find_if(visited.begin(), visited.end(), CompareID(start)) != visited.end())
std::pair <double, vertex> currentPair = *std::find_if(visited.begin(), visited.end(), CompareID(start));
currentPair = *std::find_if(visited.begin(), visited.end(), CompareID(currentPair.second.from_));
} while(currentPair.second.id_.x != goal.x || currentPair.second.id_.y != goal.y);
return true;
// Path is invalid!
return false;
int main()
// cv::Mat image = cv::imread("filteredmap1.jpg", CV_LOAD_IMAGE_GRAYSCALE);
cv::Mat image = cv::Mat(100,100,CV_8UC1);
std::vector<cv::Point2i> path;
for (int i = 0; i < image.rows; i++)
for(int j = 0; j < image.cols; j++)
image.data[i*image.cols+j] = FREE_CELL;
if (j == image.cols/2 && (i > 3 && i < image.rows - 3))
image.data[i*image.cols+j] = OCCUPIED_CELL;
// if (image.data[i*image.cols+j] > 215)
// image.data[i*image.cols+j] = FREE_CELL;
// else if(image.data[i*image.cols+j] < 100)
// image.data[i*image.cols+j] = OCCUPIED_CELL;
// else
// image.data[i*image.cols+j] = UNKNOWN_CELL;
// Start top right
cv::Point2i goal(image.cols-1, 0);
// Goal bottom left
cv::Point2i start(0, image.rows-1);
// Time the algorithm.
Timer timer;
findPathViaDijkstra(image, start, goal, path);
std::cerr << "Time elapsed: " << timer.getElapsedTimeInMilliSec() << " ms";
// Add the path in the image for visualization purpose.
cv::cvtColor(image, image, CV_GRAY2BGRA);
int cn = image.channels();
for (int i = 0; i < path.size(); i++)
image.data[path[i].x*cn*image.cols+path[i].y*cn+0] = 0;
image.data[path[i].x*cn*image.cols+path[i].y*cn+1] = 255;
image.data[path[i].x*cn*image.cols+path[i].y*cn+2] = 0;
cv::imshow("Map with path", image);
return 0;
For the algorithm implementation, I decided to have two sets namely the visited and working set whose each elements contain:
The location of itself in the 2D grid map.
The accumulated cost
Through what cell did it get its accumulated cost (for path recovery)
And here is the result:
The black pixels represent obstacles, the white pixels represent free space and the green line represents the path computed.
On this implementation, I would only search within the current working set for the minimum value and DO NOT need to scan throughout the cost matrix (where initially, the initially cost of all cells are set to infinity and the starting point 0). Maintaining a separate vector of the working set I think promises a better code performance because all the cells that have cost of infinity is surely to be not included in the working set but only those cells that have been touched.
I also took advantage of the STL which C++ provides. I decided to use the std::multimap since it can store duplicating keys (which is the cost) and it sorts the lists automatically. However, I was forced to use std::find_if() to find the id (which is the row,col of the current cell in the set) in the visited set to check if the current cell is on it which promises linear complexity. I really think this is the bottleneck of the Dijkstra's algorithm.
I am well aware that A* algorithm is much faster than Dijkstra's algorithm but what I wanted to ask is my implementation of Dijkstra's algorithm optimal? Even if I implemented A* algorithm using my current implementation in Dijkstra's which is I believe suboptimal, then consequently A* algorithm will also be suboptimal.
What improvement can I perform? What STL is the most appropriate for this algorithm? Particularly, how do I improve the bottleneck?

You're using a std::multimap for 'working' and 'visited'. That's not great.
The first thing you should do is change visited into a per-vertex flag so you can do your find_if in constant time instead of linear times and also so that operations on the list of visited vertices take constant instead of logarithmic time. You know what all the vertices are and you can map them to small integers trivially, so you can use either a std::vector or a std::bitset.
The second thing you should do is turn working into a priority queue, rather than a balanced binary tree structure, so that operations are a (largish) constant factor faster. std::priority_queue is a barebones binary heap. A higher-radix heap---say quaternary for concreteness---will probably be faster on modern computers due to its reduced depth. Andrew Goldberg suggests some bucket-based data structures; I can dig up references for you if you get to that stage. (They're not too complicated.)
Once you've taken care of these two things, you might look at A* or meet-in-the-middle tricks to speed things up even more.

Your performance is several orders of magnitude worse than it could be because you're using graph search algorithms for what looks like geometry. This geometry is much simpler and less general than the problems that graph search algorithms can solve. Also, with a vertex for every pixel your graph is huge even though it contains basically no information.
I heard you asking "how can I make this better without changing what I'm thinking" but nevertheless I'll tell you a completely different and better approach.
It looks like your robot can only go horizontally, vertically or diagonally. Is that for real or just a side effect of you choosing graph search algorithms? I'll assume the latter and let it go in any direction.
The algorithm goes like this:
(0) Represent your obstacles as polygons by listing the corners. Work in real numbers so you can make them as thin as you like.
(1) Try for a straight line between the end points.
(2) Check if that line goes through an obstacle or not. To do that for any line, show that all corners of any particular obstacle lie on the same side of the line. To do that, translate all points by (-X,-Y) of one end of the line so that that point is at the origin, then rotate until the other point is on the X axis. Now all corners should have the same sign of Y if there's no obstruction. There might be a quicker way just using gradients.
(3) If there's an obstruction, propose N two-segment paths going via the N corners of the obstacle.
(4) Recurse for all segments, culling any paths with segments that go out of bounds. That won't be a problem unless you have obstacles that go out of bounds.
(5) When it stops recursing, you should have a list of locally optimised paths from which you can choose the shortest.
(6) If you really want to restrict bearings to multiples of 45 degrees, then you can do this algorithm first and then replace each segment by any 45-only wiggly version that avoids obstacles. We know that such a version exists because you can stay extremely close to the original line by wiggling very often. We also know that all such wiggly paths have the same length.


Trying to scan a vector of vehicles an extract information for lane switching C++

I am trying to build a simulation which contains certain objects. I have vehicles and lanes. I have an engine which allows vehicles to advance, based on their velocity and acceleration.
bool Lane::allowedOvertake(double pos, double mindist)
for (unsigned int iV = 0; iV < getNVehiclesinLane() - 1; iV++)
if ((fVehicles[iV]->getPosition() > pos - mindist) // If inside rear safety distance.
|| fVehicles[iV]->getPosition() < pos + mindist) // If inside front safety distance.
//else {return false;}
return true;
I would like this for loop to scan over all the vehicles in a lane, so that a vehicle from a neighbouring lane can check whether it can move into this scanned lane. As a note, the pos and mindist parameters are the positions and the minimum distance the lane seeking vehicle needs to safely switch lanes. Also, fVehicles is a vector of vehicles. If the result is true, I then use an if statement in my 'master' object, the road, which allows for an actual switch to take place (using vector.insert()).
I currently get vehicles switching lanes without regard. At first glance, I would suspect it is the above function's logic which is incorrect. Any help in providing a fix, or even a better solution, would be appreciated.
Note: I have a vector of vehicles, and a vector of lanes. However, the vehicles are not ordered in the vector by their positions. I have been advised to re-design this so that the order of the vehicles in the vector are more significant and one can benefit from this when developing the code. However, for now, I would like to fix the design I currently have. Then I will look into redesigning the simulation to make the order more significant. Besides this, my problem above would still exist, just in a slightly different form.
Given an unsorted vector, you have to check if all of them are distant enough from the passed position or, in other words, if none of them is too close:
#include <algorithm>
bool Lane::allowedOvertake(double pos, double mindist)
return std::none_of(
fVehicles.begin(), fVehicles.end(), [pos, mindist] (auto & v) {
return v->getPosition() <= pos + mindist
and v->getPosition() >= pos - mindist;

Loop to check apple position against snake body positions

I'm trying to figure out how to write a loop to check the position of a circle against a variable number of rectangles so that the apple is not placed on top of the snake, but I'm having a bit of trouble thinking it through. I tried:
apple.setPosition(randX()*20+10, randY()*20+10); // apple is a CircleShape
while (apple.getPosition() == snakeBody[i].getPosition());
Although, in this case, if it detects a collision with one rectangle of the snake's body, it could end up just placing the apple at a previous position of the body. How do I make it check all positions at the same time, so it can't correct itself only to have a chance of repeating the same problem again?
There are three ways (I could think of) of generating a random number meeting a requirement:
The first way, and the simpler, is what you're trying to do: retry if it doesn't.
However, you should change the condition so that it checks all the forbidden cells at once:
bool collides_with_snake(const sf::Vector2f& pos, //not sure if it's 2i or 2f
const /*type of snakeBody*/& snakeBody,
std::size_t partsNumber) {
bool noCollision = true;
for( std::size_t i = 0 ; i < partsNumber && noCollision ; ++i )
noCollision = pos != snakeBody[i].getPosition()
return !noCollision;
apple.setPosition(randX()*20+10, randY()*20+10);
while (collides_with_snake(apple.getCollision(), snakeBody,
/* snakeBody.size() ? */));
The second way is to try to generate less numbers and find a function which will map these numbers to the set you want. For instance, if your grid has N cells, you could generate a number between 0 and N - [number of parts of your Snake] then map this number X to the smallest number Y such that this integer doesn't refer to a cell occupied by a snake part and X = Y + S where S is the number of cells occupied by a snake part referred by a number smaller than Y.
It's more complicated though.
The third way is to "cheat" and choose a stronger requirement which is easier to enforce. For instance, if you know that the cell body is N cells long, then only spawn the apple on a cell which is N + 1 cells away of the snakes head (you can do that by generating the angle).
The question is very broad, but assuming that snakeBody is a vector of Rectangles (or derived from Rectanges), and that you have a checkoverlap() function:
do {
// assuming that randX() and randY() allways return different random variables
apple.setPosition(randX()*20+10, randY()*20+10); // set the apple
} while (any_of(snakeBody.begin(), snakeBody.end(), [&](Rectangle &r)->bool { return checkoverlap(r,apple); } );
This relies on standard algorithm any_of() to check in one simple expression if any of the snake body elements overlaps the apple. If there's an overlap, we just iterate once more and get a new random position until it's fine.
If snakebody is an array and not a standard container, just use snakeBody, snakeBody+snakesize instead of snakeBody.begin(), snakeBody.end() in the code above.
If the overlap check is as simple as to compare the postition you can replace return checkoverlap(r,apple); in the code above with return r.getPosition()==apple.getPosition();
The "naive" approach would be generating apples and testing their positions against the whole snake until we find a free spot:
bool applePlaced = false;
while(!applePlaced) { //As long as we haven't found a valid place for the apple
apple.setPosition(randX()*20+10, randY()*20+10);
applePlaced = true; //We assume, that we can place the apple
for(int i=0; i<snakeBody.length; i++) { //Check the apple position with all snake body parts
if(apple.getPosition() == snakeBody[i].getPosition()) {
applePlaced=false; //Our prediction was wrong, we could not place the apple
break; //No further testing necessary
The better way would be storing all free positions in an array and then pick a Position out of this array(and delete it from the array), so that no random testing is necessary. It requires also updating the array if the snakes moves.

Compute the "lower contour" of a set of segments in the plane in `O(n log n)`

Suppose you've a set s of horizontal line segments in the plane described by a starting point p, an end point q and a y-value.
We can assume that all values of p and qare pairwise distinct and no two segments overlap.
I want to compute the "lower contour" of the segment.
We can sort s by p and iterate through each segment j. If i is the "active" segment and j->y < i->y we "switch to" j (and output the corresponding contour element).
However, what can we do, when no such j exists and we find a j with i->q < j->p. Then, we would need to switch to the "next higher segment". But how do we know that segment? I can't find a way such that the resulting algorithm would have a running time of O(n log n). Any ideas?
A sweep line algorithm is an efficient way to solve your problem. As explained previously by Brian, we can sort all the endpoints by the x-coordinate and process them in order. An important distinction to make here is that we are sorting the endpoints of the segment and not the segments in order of increasing starting point.
If you imagine a vertical line sweeping from left to right across your segments, you will notice two things:
At any position, the vertical line either intersects a set of segments or nothing. Let's call this set the active set. The lower contour is the segment within the active set with the smallest y-coordinate.
The only x-coordinates where the lower contour can change are the segment endpoints.
This immediately brings one observation: the lower contour should be a list of segments. A list of points does not provide sufficient information to define the contour, which can be undefined at certain x-coordinates (where there are no segments).
We can model the active set with an std::set ordered by the y position of the segment. Processing the endpoints in order of increasing x-coordinate. When encountering a left endpoint, insert the segment. When encountering a right endpoint, erase the segment. We can find the active segment with the lowest y-coordinate with set::begin() in constant time thanks to the ordering. Since each segment is only ever inserted once and erased once, maintaining the active set takes O(n log n) time in total.
In fact, it is possible to maintain a std::multiset of only the y-coordinates for each segment that intersects the sweep line, if it is easier.
The assumption that the segments are non-overlapping and have distinct endpoints is not entirely necessary. Overlapping segments are handled both by the ordered set of segments and the multiset of y-coordinates. Coinciding endpoints can be handled by considering all endpoints with the same x-coordinate at one go.
Here, I assume that there are no zero-length segments (i.e. points) to simplify things, although they can also be handled with some additional logic.
std::list<segment> lower_contour(std::list<segment> segments)
enum event_type { OPEN, CLOSE };
struct event {
event_type type;
const segment &s;
inline int position() const {
return type == OPEN ? s.sp : s.ep;
struct order_by_position {
bool operator()(const event& first, const event& second) {
return first.position() < second.position();
std::list<event> events;
for (auto s = segments.cbegin(); s != segments.cend(); ++s)
events.push_back( event { OPEN, *s } );
events.push_back( event { CLOSE, *s } );
// maintain a (multi)set of the y-positions for each segment that intersects the sweep line
// the ordering allows querying for the lowest segment in O(log N) time
// the multiset also allows overlapping segments to be handled correctly
std::multiset<int> active_segments;
bool contour_is_active = false;
int contour_y;
int contour_sp;
// the resulting lower contour
std::list<segment> contour;
for (auto i = events.cbegin(); i != events.cend();)
auto j = i;
int current_position = i->position();
while (j != events.cend() && j->position() == current_position)
switch (j->type)
case OPEN: active_segments.insert(j->s.y); break;
case CLOSE: active_segments.erase(j->s.y); break;
i = j;
if (contour_is_active)
if (active_segments.empty())
// the active segment ends here
contour_is_active = false;
contour.push_back( segment { contour_sp, current_position, contour_y } );
// if the current lowest position is different from the previous one,
// the old active segment ends here and a new active segment begins
int current_y = *active_segments.cbegin();
if (current_y != contour_y)
contour.push_back( segment { contour_sp, current_position, contour_y } );
contour_y = current_y;
contour_sp = current_position;
if (!active_segments.empty())
// a new contour segment begins here
int current_y = *active_segments.cbegin();
contour_is_active = true;
contour_y = current_y;
contour_sp = current_position;
return contour;
As Brian also mentioned, a binary heap like std::priority_queue can also be used to maintain the active set and tends to outperform std::set, even if it does not allow arbitrary elements to be deleted. You can work around this by flagging a segment as removed instead of erasing it. Then, repeatedly remove the top() of the priority_queue if it is a flagged segment. This might end up being faster, but it may or may not matter for your use case.
First sort all the endpoints by x-coordinate (both starting and ending points). Iterate through the endpoints and keep a std::set of all the y-coordinates of active segments. When you reach a starting point, add its y-coordinate to the set and "switch" to it if it's the lowest; when you reach an ending point, remove its y-coordinate from the set and recalculate the lowest y-coordinate using the set. This gives an O(n log n) solution overall.
A balanced binary search tree such as that used to implement std::set generally has a large constant factor. You can speed up this approach by using a binary heap (std::priority_queue) instead of a set, with the lowest y-coordinate at the root. In this case, you can't remove a non-root node, but when you reach such an ending point, just mark the segment inactive in an array. When the root node is popped, continue popping until there is a new root node that hasn't been marked inactive already. I think this will be about twice as fast as the set-based approach, but you'll have to code it yourself and see, if that's a concern.

How do I most efficiently perform collision detection on a group of spheres

Suppose I have a CPU with several cores, on which I want to find which spheres are touching. Any set of spheres where each sphere is connected (ie. they're all touching at least one of the spheres in the set) is called a "group" and is to be organized into a vector called, in the example below, "group_members". To achieve this I am currently using a rather expensive operation that looks conceptually like this:
vector<Sphere*> unallocated_spheres = all_spheres; // start with a copy of all spheres
vector<vector<Sphere*>> group_sequence; // groups will be collected here
while (unallocated_spheres.size() > 0U) // each iteration of this will represent the creation of a new group
std::vector<Sphere*> group_members; // this will store all members of the current group
group_members.push_back(unallocated_spheres.back()); // start with the last sphere (pop_back requires less resources than erase)
unallocated_spheres.pop_back(); // it has been allocated to a group so remove it from the unallocated list
// compare each sphere in the new group to every other sphere, and continue to do so until no more spheres are added to the current group
for (size_t i = 0U; i != group_members.size(); ++i) // iterators would be unsuitable in this case
Sphere const * const sphere = group_members[i]; // the sphere to which all others will be compared to to check if they should be added to the group
auto it = unallocated_spheres.begin();
while (it != unallocated_spheres.end())
// check if the iterator sphere belongs to the same group
if ((*it)->IsTouching(sphere))
// it does belong to the same group; add it and remove it from the unallocated_spheres vector and repair iterators
it = unallocated_spheres.erase(it); // repair the iterator
else ++it; // if no others were found, increment iterator manually
Does anyone have any suggestions for improving the efficiency of this code in terms of wall time? My program spends a significant fraction of the time running through these loops, and any advice on how to structurally change it to make it more efficient would be appreciated.
Note that as these are spheres, "IsTouching()" is a very quick floating point operation (comparing position and radii of the two spheres). It looks like this (note that x,y and z are the position of the sphere in that euclidean dimension):
// input whether this cell is touching the input cell (or if they are the same cell; both return true)
bool const Sphere::IsTouching(Sphere const * const that) const
// Apply pythagoras' theorem in 3 dimensions
double const dx = this->x - that->x;
double const dy = this->y - that->y;
double const dz = this->z - that->z;
// get the sum of the radii of the two cells
double const rad_sum = this->radius + that->radius;
// to avoid taking the square root to get actual distances, we instead compare
// the square of the pythagorean distance with the square of the radii sum
return dx*dx + dy*dy + dz*dz < rad_sum*rad_sum;
Does anyone have any suggestions for improving the efficiency of this code in terms of wall time?
Change the algorithm. Low-level optimization won't help you. (although you'll achieve very small speedup if you move group_members outside of the while loop)
You need to use space partitioning (bsp-tree, oct-tree) or sweep and prune algorithm.
Sweep and prune (wikipedia has links to original article, plus you can google it) can easily handle 100000 moving and potentially colliding spheres on single-core machine (well, as long as you don't put them all at the same coordinates) and is a bit easier to implement than space partitioning. If you know maximum possible size of colliding object, sweep and prune will be more suitable/simpler to implement.
If you're going to use sweep and prune algorithm, you should learn insertion sort algorithm. This sorting algorithm is faster than pretty much any other algorithm when you work on "almost" sorted data, which is the case with sweep-and-prune. Of course, you'll also need some implementation of quicksort or heapsort, but standard library provides that.

KD tree, slow tree construction

I am trying to build KD Tree (static case). We assume points are sorted on both x and y coordinates.
For even depth of recursion the set is split into two subsets with a vertical line going through median x coordinate.
For odd depth of recursion the set is split into two subsets with a horizontal line going through median y coordinate.
The median can be determined from sorted set according to x / y coordinate. This step I am doing before each splitting of the set. And I think that it causes the slow construction of the tree.
Please could you help me check any and optimize the code?
I can not find the k-th nearest neighbor, could somebody help me with the code?
Thank you very much for your help and patience...
Please see the sample code:
class KDNode
Point2D *data;
KDNode *left;
KDNode *right;
void KDTree::createKDTree(Points2DList *pl)
//Create list
KDList kd_list;
//Create KD list (all input points)
for (unsigned int i = 0; i < pl->size(); i++)
//Sort points by x
std::sort(kd_list.begin(), kd_list.end(), sortPoints2DByY());
//Build KD Tree
root = buildKDTree(&kd_list, 1);
KDNode * KDTree::buildKDTree(KDList *kd_list, const unsigned int depth)
//Build KD tree
const unsigned int n = kd_list->size();
//No leaf will be built
if (n == 0)
return NULL;
//Only one point: create leaf of KD Tree
else if (n == 1)
//Create one leaft
return new KDNode(new Point2D ((*kd_list)[0]));
//At least 2 points: create one leaf, split tree into left and right subtree
//New KD node
KDNode *node = NULL;
//Get median index
const unsigned int median_index = n/2;
//Create new KD Lists
KDList kd_list1, kd_list2;
//The depth is even, process by x coordinate
if (depth%2 == 0)
//Create new median node
node = new KDNode(new Point2D( (*kd_list)[median_index]));
//Split list
for (unsigned int i = 0; i < n; i++)
//Geta actual point
Point2D *p = &(*kd_list)[i];
//Add point to the first list: x < median.x
if (p->getX() < (*kd_list)[median_index].getX())
//Add point to the second list: x > median.x
else if (p->getX() > (*kd_list)[median_index].getX())
//Sort points by y for the next recursion step: slow construction of the tree???
std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByY());
std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByY());
//The depth is odd, process by y coordinates
//Create new median node
node = new KDNode(new Point2D((*kd_list)[median_index]));
//Split list
for (unsigned int i = 0; i < n; i++)
//Geta actual point
Point2D *p = &(*kd_list)[i];
//Add point to the first list: y < median.y
if (p->getY() < (*kd_list)[median_index].getY())
//Add point to the second list: y < median.y
else if (p->getY() >(*kd_list)[median_index].getY())
//Sort points by x for the next recursion step: slow construction of the tree???
std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByX());
std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByX());
//Build left subtree
node->setLeft( buildKDTree(&kd_list1, depth +1 ) );
//Build right subtree
node->setRight( buildKDTree(&kd_list2, depth + 1 ) );
//Return new node
return node;
The sorting to find the median is probably the worst culprit here, since that is O(nlogn) while the problem is solvable in O(n) time. You should use nth_element instead: http://www.cplusplus.com/reference/algorithm/nth_element/. That'll find the median in linear time on average, after which you can split the vector in linear time.
Memory management in vector is also something that can take a lot of time, especially with large vectors, since every time the vector's size is doubled all the elements have to be moved. You can use the reserve method of vector to reserve exactly enough space for the vectors in the newly created nodes, so they need not increase dynamically as new stuff is added with push_back.
And if you absolutely need the best performance, you should use lower level code, doing away with vector and reserving plain arrays instead. Nth element or 'selection' algorithms are readily available and not too hard to write yourself: http://en.wikipedia.org/wiki/Selection_algorithm
Some hints on optimizing the kd-tree:
Use a linear time median finding algorithm, such as QuickSelect.
Avoid actually using "node" objects. You can store whole tree using the points only, with ZERO additional information. Essentially by just sorting an array of objects. The root node will then be in the middle. A rearrangement that puts the root first, then uses a heap layout will likely be nicer to the CPU memory cache on query time, but more tricky to build.
Not really an answer to your questions, but I would highly recommend the forum at http://ompf.org/forum/
They have some great discussions over there for fast kd-tree constructions in various contexts. Perhaps you'll find some inspiration over there.
The OMPF forums have since gone down, although a direct replacement is currently available at http://ompf2.com/
Your first culprit is sorting to find the median. This is almost always the bottleneck for K-d tree construction, and using more efficient algorithms here will really pay off.
However, you're also constructing a pair of variable-sized vectors each time you split and transferring elements to them.
Here I recommend the good ol' singly-linked list. The beauty of the linked list is that you can transfer elements from parent to child by simply changing next pointers to point at the child's root pointer instead of the parent's.
That means no heap overhead whatsoever during construction to transfer elements from parent nodes to child nodes, only to aggregate the initial list of elements to insert to the root. That should do wonders as well, but if you want even faster, you can use a fixed allocator to efficiently allocate nodes for the linked list (as well as for the tree) and with better contiguity/cache hits.
Last but not least, if you're involved in intensive computing tasks that call for K-d trees, you need a profiler. Measure your code and you'll see exactly what lies at the culprit, and with exact time distributions.