How to calculate the miss links in a BVH tree? - c++

I am creating an OpenGl based ray tracer for polygon models. To accelerate the application I am using BVH-trees. Because there is no recursion in GLSL, I decided to find an other way to traverse the bounding boxes, sent to the fragment shader as shader storage buffers.
I would like to implement that kind of way:Traversal of BVH tree in shaders
Actually I don't really understand how to calculate the hit and miss links during the construction of the tree. Hit and miss links help the program to navigate to the next node (bounding box) during the traverse, whether it is intersected or not missed.
Until now I created the method to construct the tree, as well as I can also put the tree into a simple array. I have depth-first implementation to flatten the tree into the array.
Here are the depth-first, tree flattening methods:
FlatBvhNode nodeConverter2(BvhNode node, int& ind){
FlatBvhNode result = FlatBvhNode(node.bBox.min, node.bBox.max, ind, node.isLeaf,
node.indices);
return result;
}
void flattenRecursion(const BvhNode &bvhNode, vector<FlatBvhNode>& nodes, int& ind) {
++ind;
nodes.push_back(nodeConverter2(bvhNode, ind));
if (!bvhNode.isLeaf) {
flattenRecursion(*bvhNode.children.at(0), nodes, ind);
flattenRecursion(*bvhNode.children.at(1), nodes,ind);
}
}
vector<FlatBvhNode>* flatten(const BvhNode& root) {
vector<FlatBvhNode>* nodesArray=new vector<FlatBvhNode>;
nodesArray->reserve(root.countNodes());
int ind=0;
flattenRecursion(root, *nodesArray, ind);
return nodesArray;
}
I have to calculate the following "links" :
The image is from: source. The image shows the different linkings. So, for example the ray intersects a bounding box (Hit links), we can move to the next node in the array. This is all right as I have depth-first traversal. The problem is coming when I have to move to the sibling or even to the parent's sibling. How can I implement these linkings / offsets? I know I should create and indices but how to do this with depth-first tree construction.
Any help is appreciated.

I do not have an answer about a depth-first tree, but I have figured out a way to do that if your tree is a heap. So here is some code in GLSL I used
int left(in int index) { // left child
return 2 * index + 1;
}
int right(in int index) { // right child
return 2 * index + 2;
}
int parent(in int index) {
return (index - 1) / 2;
}
int right_sibling(in int index) { // a leaf hit or a miss link
int result = index;
while(result % 2 == 0 && result != 0) {
result = parent(result);
}
return result + 1 * int(result != 0);
}
I am using this and it works with a pretty reasonable speed. The only problem I have is that loop, which slows the performance. I would really like to have a constant complexity expression in that function.

Related

Traversal of Bounding Volume Hierachy in Shaders

I am working on a path tracer using vulkan compute shaders. I implemented a tree representing a bounding volume hierachy. The idea of the BVH is to minimize the amount of objects a ray intersection test needs to be performed on.
#1 Naive Implementation
My first implementation is very fast, it traverses the tree down to a single leaf of the BVH tree. However, the ray might intersect multiple leaves. This code then leads to some triangles not being rendered (although they should).
int box_index = -1;
for (int i = 0; i < boxes_count; i++) {
// the first box has no parent, boxes[0].parent is set to -1
if (boxes[i].parent == box_index) {
if (intersect_box(boxes[i], ray)) {
box_index = i;
}
}
}
if (box_index > -1) {
uint a = boxes[box_index].ids_offset;
uint b = a + boxes[box_index].ids_count;
for (uint j = a; j < b; j++) {
uint triangle_id = triangle_references[j];
// triangle intersection code ...
}
}
#2 Multi-Leaf Implementation
My second implementation accounts for the fact that multiple leaves might be intersected. However, this implementation is 36x slower than implementation #1 (okay, I miss some intersection tests in #1, but still...).
bool[boxes.length()] hits;
hits[0] = intersect_box(boxes[0], ray);
for (int i = 1; i < boxes_count; i++) {
if (hits[boxes[i].parent]) {
hits[i] = intersect_box(boxes[i], ray);
} else {
hits[i] = false;
}
}
for (int i = 0; i < boxes_count; i++) {
if (!hits[i]) {
continue;
}
// only leaves have ids_offset and ids_count defined (not set to -1)
if (boxes[i].ids_offset < 0) {
continue;
}
uint a = boxes[i].ids_offset;
uint b = a + boxes[i].ids_count;
for (uint j = a; j < b; j++) {
uint triangle_id = triangle_references[j];
// triangle intersection code ...
}
}
This performance difference drives me crazy. It seems only having a single statement like if(dynamically_modified_array[some_index]) has a huge impact on performance. I suspect that the SPIR-V or GPU compiler is no longer able to do its optimization magic? So here are my questions:
Is this indeed an optimization problem?
If yes, can I transform implementation #2 to be better optimizable?
Can I somehow give optimization hints?
Is there a standard way to implement BVH tree queries in shaders?
After some digging, I found a solution. Important to understand is that the BVH tree does not exclude the possibility that one needs to evaluate all leaves.
Implementation #3 below, uses hit and miss links. The boxes need to be sorted in a way that in the worst case all of them are queried in the correct order (so a single loop is enough). However, links are used to skip nodes which don't need to be evaluated. When the current node is a leaf node, the actual triangle intersections are performed.
hit link ~ which node to jump to in case of a hit (green below)
miss link ~ which node to jump to in case of a miss (red below)
Image taken from here. The associated paper and source code is also on Prof. Toshiya Hachisuka's page. The same concept is also described in this paper referenced in the slides.
#3 BVH Tree with Hit and Miss Links
I had to extend the data which is pushed to the shader with the links. Also some offline fiddling was required to store the tree correctly. At first I tried using a while loop (loop until box_index_next is -1) which resulted in a crazy slowdown again. Anyway, the following works reasonably fast:
int box_index_next = 0;
for (int box_index = 0; box_index < boxes_count; box_index++) {
if (box_index != box_index_next) {
continue;
}
bool hit = intersect_box(boxes[box_index], ray);
bool leaf = boxes[box_index].ids_count > 0;
if (hit) {
box_index_next = boxes[box_index].links.x; // hit link
} else {
box_index_next = boxes[box_index].links.y; // miss link
}
if (hit && leaf) {
uint a = boxes[box_index].ids_offset;
uint b = a + boxes[box_index].ids_count;
for (uint j = a; j < b; j++) {
uint triangle_id = triangle_references[j];
// triangle intersection code ...
}
}
}
This code is about 3x slower than the fast, but flawed implementation #1. This is somewhat expected, now the speed depends on the actual tree, not on the gpu optimization. Consider, for example, a degenerate case where triangles are aligned along an axis: a ray in the same direction might intersect with all triangles, then all tree leaves need to be evaluated.
Prof. Toshiya Hachisuka proposes a further optimization for such cases in his sildes (page 36 and onward): One stores multiple versions of the BVH tree, spatially sorted along x, -x, y, -y, z and -z. For traversal the correct version needs to be selected based on the ray. Then one can stop the traversal as soon as a triangle from a leaf is intersected, since all remaining nodes to be visited will be spatially behind this node (from the ray point of view).
Once the BVH tree is built, finding the links is quite straightforward (some python code below):
class NodeAABB(object):
def __init__(self, obj_bounds, obj_ids):
self.children = [None, None]
self.obj_bounds = obj_bounds
self.obj_ids = obj_ids
def split(self):
# split recursively and create children here
raise NotImplementedError()
def is_leaf(self):
return set(self.children) == {None}
def build_links(self, next_right_node=None):
if not self.is_leaf():
child1, child2 = self.children
self.hit_node = child1
self.miss_node = next_right_node
child1.build_links(next_right_node=child2)
child2.build_links(next_right_node=next_right_node)
else:
self.hit_node = next_right_node
self.miss_node = self.hit_node
def collect(self):
# retrieve in depth first fashion for correct order
yield self
if not self.is_leaf():
child1, child2 = self.children
yield from child1.collect()
yield from child2.collect()
After you store all AABBs in an array (which will be sent to the GPU) you can use hit_node and miss_node to look up the indices for the links and store them as well.

Optimizing the Dijkstra's algorithm

I need a graph-search algorithm that is enough in our application of robot navigation and I chose Dijkstra's algorithm.
We are given the gridmap which contains free, occupied and unknown cells where the robot is only permitted to pass through the free cells. The user will input the starting position and the goal position. In return, I will retrieve the sequence of free cells leading the robot from starting position to the goal position which corresponds to the path.
Since executing the dijkstra's algorithm from start to goal would give us a reverse path coming from goal to start, I decided to execute the dijkstra's algorithm backwards such that I would retrieve the path from start to goal.
Starting from the goal cell, I would have 8 neighbors whose cost horizontally and vertically is 1 while diagonally would be sqrt(2) only if the cells are reachable (i.e. not out-of-bounds and free cell).
Here are the rules that should be observe in updating the neighboring cells, the current cell can only assume 8 neighboring cells to be reachable (e.g. distance of 1 or sqrt(2)) with the following conditions:
The neighboring cell is not out of bounds
The neighboring cell is unvisited.
The neighboring cell is a free cell which can be checked via the 2-D grid map.
Here is my implementation:
#include <opencv2/opencv.hpp>
#include <algorithm>
#include "Timer.h"
/// CONSTANTS
static const int UNKNOWN_CELL = 197;
static const int FREE_CELL = 255;
static const int OCCUPIED_CELL = 0;
/// STRUCTURES for easier management.
struct vertex {
cv::Point2i id_;
cv::Point2i from_;
vertex(cv::Point2i id, cv::Point2i from)
{
id_ = id;
from_ = from;
}
};
/// To be used for finding an element in std::multimap STL.
struct CompareID
{
CompareID(cv::Point2i val) : val_(val) {}
bool operator()(const std::pair<double, vertex> & elem) const {
return val_ == elem.second.id_;
}
private:
cv::Point2i val_;
};
/// Some helper functions for dijkstra's algorithm.
uint8_t get_cell_at(const cv::Mat & image, int x, int y)
{
assert(x < image.rows);
assert(y < image.cols);
return image.data[x * image.cols + y];
}
/// Some helper functions for dijkstra's algorithm.
bool checkIfNotOutOfBounds(cv::Point2i current, int rows, int cols)
{
return (current.x >= 0 && current.y >= 0 &&
current.x < cols && current.y < rows);
}
/// Brief: Finds the shortest possible path from starting position to the goal position
/// Param gridMap: The stage where the tracing of the shortest possible path will be performed.
/// Param start: The starting position in the gridMap. It is assumed that start cell is a free cell.
/// Param goal: The goal position in the gridMap. It is assumed that the goal cell is a free cell.
/// Param path: Returns the sequence of free cells leading to the goal starting from the starting cell.
bool findPathViaDijkstra(const cv::Mat& gridMap, cv::Point2i start, cv::Point2i goal, std::vector<cv::Point2i>& path)
{
// Clear the path just in case
path.clear();
// Create working and visited set.
std::multimap<double,vertex> working, visited;
// Initialize working set. We are going to perform the djikstra's
// backwards in order to get the actual path without reversing the path.
working.insert(std::make_pair(0, vertex(goal, goal)));
// Conditions in continuing
// 1.) Working is empty implies all nodes are visited.
// 2.) If the start is still not found in the working visited set.
// The Dijkstra's algorithm
while(!working.empty() && std::find_if(visited.begin(), visited.end(), CompareID(start)) == visited.end())
{
// Get the top of the STL.
// It is already given that the top of the multimap has the lowest cost.
std::pair<double, vertex> currentPair = *working.begin();
cv::Point2i current = currentPair.second.id_;
visited.insert(currentPair);
working.erase(working.begin());
// Check all arcs
// Only insert the cells into working under these 3 conditions:
// 1. The cell is not in visited cell
// 2. The cell is not out of bounds
// 3. The cell is free
for (int x = current.x-1; x <= current.x+1; x++)
for (int y = current.y-1; y <= current.y+1; y++)
{
if (checkIfNotOutOfBounds(cv::Point2i(x, y), gridMap.rows, gridMap.cols) &&
get_cell_at(gridMap, x, y) == FREE_CELL &&
std::find_if(visited.begin(), visited.end(), CompareID(cv::Point2i(x, y))) == visited.end())
{
vertex newVertex = vertex(cv::Point2i(x,y), current);
double cost = currentPair.first + sqrt(2);
// Cost is 1
if (x == current.x || y == current.y)
cost = currentPair.first + 1;
std::multimap<double, vertex>::iterator it =
std::find_if(working.begin(), working.end(), CompareID(cv::Point2i(x, y)));
if (it == working.end())
working.insert(std::make_pair(cost, newVertex));
else if(cost < (*it).first)
{
working.erase(it);
working.insert(std::make_pair(cost, newVertex));
}
}
}
}
// Now, recover the path.
// Path is valid!
if (std::find_if(visited.begin(), visited.end(), CompareID(start)) != visited.end())
{
std::pair <double, vertex> currentPair = *std::find_if(visited.begin(), visited.end(), CompareID(start));
path.push_back(currentPair.second.id_);
do
{
currentPair = *std::find_if(visited.begin(), visited.end(), CompareID(currentPair.second.from_));
path.push_back(currentPair.second.id_);
} while(currentPair.second.id_.x != goal.x || currentPair.second.id_.y != goal.y);
return true;
}
// Path is invalid!
else
return false;
}
int main()
{
// cv::Mat image = cv::imread("filteredmap1.jpg", CV_LOAD_IMAGE_GRAYSCALE);
cv::Mat image = cv::Mat(100,100,CV_8UC1);
std::vector<cv::Point2i> path;
for (int i = 0; i < image.rows; i++)
for(int j = 0; j < image.cols; j++)
{
image.data[i*image.cols+j] = FREE_CELL;
if (j == image.cols/2 && (i > 3 && i < image.rows - 3))
image.data[i*image.cols+j] = OCCUPIED_CELL;
// if (image.data[i*image.cols+j] > 215)
// image.data[i*image.cols+j] = FREE_CELL;
// else if(image.data[i*image.cols+j] < 100)
// image.data[i*image.cols+j] = OCCUPIED_CELL;
// else
// image.data[i*image.cols+j] = UNKNOWN_CELL;
}
// Start top right
cv::Point2i goal(image.cols-1, 0);
// Goal bottom left
cv::Point2i start(0, image.rows-1);
// Time the algorithm.
Timer timer;
timer.start();
findPathViaDijkstra(image, start, goal, path);
std::cerr << "Time elapsed: " << timer.getElapsedTimeInMilliSec() << " ms";
// Add the path in the image for visualization purpose.
cv::cvtColor(image, image, CV_GRAY2BGRA);
int cn = image.channels();
for (int i = 0; i < path.size(); i++)
{
image.data[path[i].x*cn*image.cols+path[i].y*cn+0] = 0;
image.data[path[i].x*cn*image.cols+path[i].y*cn+1] = 255;
image.data[path[i].x*cn*image.cols+path[i].y*cn+2] = 0;
}
cv::imshow("Map with path", image);
cv::waitKey();
return 0;
}
For the algorithm implementation, I decided to have two sets namely the visited and working set whose each elements contain:
The location of itself in the 2D grid map.
The accumulated cost
Through what cell did it get its accumulated cost (for path recovery)
And here is the result:
The black pixels represent obstacles, the white pixels represent free space and the green line represents the path computed.
On this implementation, I would only search within the current working set for the minimum value and DO NOT need to scan throughout the cost matrix (where initially, the initially cost of all cells are set to infinity and the starting point 0). Maintaining a separate vector of the working set I think promises a better code performance because all the cells that have cost of infinity is surely to be not included in the working set but only those cells that have been touched.
I also took advantage of the STL which C++ provides. I decided to use the std::multimap since it can store duplicating keys (which is the cost) and it sorts the lists automatically. However, I was forced to use std::find_if() to find the id (which is the row,col of the current cell in the set) in the visited set to check if the current cell is on it which promises linear complexity. I really think this is the bottleneck of the Dijkstra's algorithm.
I am well aware that A* algorithm is much faster than Dijkstra's algorithm but what I wanted to ask is my implementation of Dijkstra's algorithm optimal? Even if I implemented A* algorithm using my current implementation in Dijkstra's which is I believe suboptimal, then consequently A* algorithm will also be suboptimal.
What improvement can I perform? What STL is the most appropriate for this algorithm? Particularly, how do I improve the bottleneck?
You're using a std::multimap for 'working' and 'visited'. That's not great.
The first thing you should do is change visited into a per-vertex flag so you can do your find_if in constant time instead of linear times and also so that operations on the list of visited vertices take constant instead of logarithmic time. You know what all the vertices are and you can map them to small integers trivially, so you can use either a std::vector or a std::bitset.
The second thing you should do is turn working into a priority queue, rather than a balanced binary tree structure, so that operations are a (largish) constant factor faster. std::priority_queue is a barebones binary heap. A higher-radix heap---say quaternary for concreteness---will probably be faster on modern computers due to its reduced depth. Andrew Goldberg suggests some bucket-based data structures; I can dig up references for you if you get to that stage. (They're not too complicated.)
Once you've taken care of these two things, you might look at A* or meet-in-the-middle tricks to speed things up even more.
Your performance is several orders of magnitude worse than it could be because you're using graph search algorithms for what looks like geometry. This geometry is much simpler and less general than the problems that graph search algorithms can solve. Also, with a vertex for every pixel your graph is huge even though it contains basically no information.
I heard you asking "how can I make this better without changing what I'm thinking" but nevertheless I'll tell you a completely different and better approach.
It looks like your robot can only go horizontally, vertically or diagonally. Is that for real or just a side effect of you choosing graph search algorithms? I'll assume the latter and let it go in any direction.
The algorithm goes like this:
(0) Represent your obstacles as polygons by listing the corners. Work in real numbers so you can make them as thin as you like.
(1) Try for a straight line between the end points.
(2) Check if that line goes through an obstacle or not. To do that for any line, show that all corners of any particular obstacle lie on the same side of the line. To do that, translate all points by (-X,-Y) of one end of the line so that that point is at the origin, then rotate until the other point is on the X axis. Now all corners should have the same sign of Y if there's no obstruction. There might be a quicker way just using gradients.
(3) If there's an obstruction, propose N two-segment paths going via the N corners of the obstacle.
(4) Recurse for all segments, culling any paths with segments that go out of bounds. That won't be a problem unless you have obstacles that go out of bounds.
(5) When it stops recursing, you should have a list of locally optimised paths from which you can choose the shortest.
(6) If you really want to restrict bearings to multiples of 45 degrees, then you can do this algorithm first and then replace each segment by any 45-only wiggly version that avoids obstacles. We know that such a version exists because you can stay extremely close to the original line by wiggling very often. We also know that all such wiggly paths have the same length.

kd-tree construction very slow

I am trying to implement a kd-tree for my C++ (DirectX) project to speed up my collision detection.
My implementation is a really primitive recursive function. The nth_element seems to be working okay (only 1 fps difference if i comment it out). I am not quite sure where the culprit it comming from.
KDTreeNode Box::buildKDTree(std::vector<Ball> balls, int depth) {
if (balls.size() < 3) {
return KDTreeNode(balls[0].getPos(), KDTreeLeaf(), KDTreeLeaf());
}
Variables::currAxis = depth % 3;
size_t n = (balls.size() / 2);
std::nth_element(balls.begin(), balls.begin() + n, balls.end()); // SORTS FOR THE ACCORDING AXIS - SEE BALL.CPP FOR IMPLEMENTATION
std::vector<Ball> leftSide(balls.begin(), balls.begin() + n);
std::vector<Ball> rightSide(balls.begin() + n, balls.end());
return KDTreeNode(balls[n].getPos(), this->buildKDTree(leftSide, depth + 1), this->buildKDTree(rightSide, depth + 1));
}
I have overwritten the bool operator in the Ball class:
bool Ball::operator < (Ball& ball)
{
if (Variables::currAxis == 0) {
return (XMVectorGetX(this->getPos()) < XMVectorGetX(ball.getPos()));
} else if (Variables::currAxis == 1) {
return (XMVectorGetY(this->getPos()) < XMVectorGetY(ball.getPos()));
} else {
return (XMVectorGetZ(this->getPos()) < XMVectorGetZ(ball.getPos()));
}
}
I am pretty sure that this is not an optimal way to handle the construction in real time.
Maybe you can help me to get on the right track.
There is one other thing what i am really wondering about: Say i have a lot of spheres in the scene and i use a kd-tree. How do i determine in what leaf they belong? Because at the contruction i am only using the center position, but not their actual diameter? How do i go about this then?
Thanks
EDIT: I've implemented all the suggested changes and it runs very good now. Thanks!
Here is what i did:
KDTreeNode Box::buildKDTree(std::vector<Ball>::iterator start, std::vector<Ball>::iterator end, int depth) {
if ((end-start) == 1) {
return KDTreeNode(balls[0].getPos(), &KDTreeLeaf(), &KDTreeLeaf());
}
Variables::currAxis = depth % 3;
size_t n = (abs(end-start) / 2);
std::nth_element(start, start + n, end); // SORTS FOR THE ACCORDING AXIS - SEE BALL.CPP FOR IMPLEMENTATION
return KDTreeNode(balls[n].getPos(), &this->buildKDTree(start, (start+n), depth + 1), &this->buildKDTree((start+n), end, depth + 1));
}
As you can see i am not copying the vectors anymore and i am also passing the left and right child as reference so that they are not copied.
I see two possible problems:
Passing the vector to the function as a value (this effectively copies the whole vector)
Creating new vectors for the smaller and bigger elements, instead of some in-place processing
Basically the function copies all balls in the initial vector for every level of your kd-tree twice. This should cause some serious slow down, so try to avoid requesting so much memory.
One way to solve it would be to access the data of the vector directly, use nth_element etc. and only pass the indices of the subvectors to the recursive call.

How do you make Tree Data Structures in C++?

I'm taking a class in AI Methods along with a friend of mine, and we've partenered for the final project, which is coding Othello & an AI for it using C++ and OpenGL.
So far we have the board and the Othello Engine (I'm using an MVC type approach). But the one thing thats proving difficult to grasp is the AI.
We're supposed to write an AI that uses Alpha-Beta pruning on a tree to quickly calculate the next move it should make.
The concepts of Alpha-Beta pruning, as well as the algorithm for detecting which squares are worth more than others, as far as the game is concerned.
However, my partner nor I have yet to take the data structures class, and as such we don't know how to properly create a tree in C++ or even where to get started.
So my question to you, Stack Overflow is: Where do I get started to quickly (and effectively) write and traverse a Tree for Alpha-Beta Pruning in C++ without using STL. (Our assignment states that we're not allowed to use STL).
Any and all help is appreciated, thank you!
The tree for alpha-beta pruning is usually implicit. It is a way of preventing your AI search algorithm from wasting time on bad solutions. Here is the pseudocode from Wikipedia:
function alphabeta(node, depth, α, β, Player)
if depth = 0 or node is a terminal node
return the heuristic value of node
if Player = MaxPlayer
for each child of node
α := max(α, alphabeta(child, depth-1, α, β, not(Player) ))
if β ≤ α
break (* Beta cut-off *)
return α
else
for each child of node
β := min(β, alphabeta(child, depth-1, α, β, not(Player) ))
if β ≤ α
break (* Alpha cut-off *)
return β
(* Initial call *)
alphabeta(origin, depth, -infinity, +infinity, MaxPlayer)
The function recursively evaluates board positions. The "node" is the current position, and where it says "for each child of node" is where you generate new board positions resulting from each possible move at the current one. The depth parameter controls how far ahead you want to evaluate the tree, for analyzing moves to an unlimited depth might be impractical.
Still, if you have to build a tree of some given depth before pruning it for educational purposes, the structure for a tree with nodes that can have variable numbers of children is very simple and could look something like this:
struct Node
{
Node ** children;
int childCount;
double value;
};
Node * tree;
Here children is a Node array with childCount members. Leaf nodes would have childCount=0. To construct the tree, you would search the availabile board positions like this:
Node * CreateTree(Board board, int depth)
{
Node * node = new Node();
node.childCount = board.GeAvailableMoveCount();
node.value = board.GetValue;
if (depth > 0 && node.childCount > 0)
{
node.children = new Node * [node.childCount];
for (int i = 0; i != node.childCount; ++i)
node.children[i] = CreateTree(board.Move(i), depth - 1);
}
else
{
node.children = NULL;
}
return node;
}
void DeleteTree(Node * tree)
{
for (int i = 0; i != tree.childCount; ++i)
DeleteTree(tree.children[i]);
delete [] tree.children; // deleting NULL is OK
delete tree;
}

KD tree, slow tree construction

I am trying to build KD Tree (static case). We assume points are sorted on both x and y coordinates.
For even depth of recursion the set is split into two subsets with a vertical line going through median x coordinate.
For odd depth of recursion the set is split into two subsets with a horizontal line going through median y coordinate.
The median can be determined from sorted set according to x / y coordinate. This step I am doing before each splitting of the set. And I think that it causes the slow construction of the tree.
Please could you help me check any and optimize the code?
I can not find the k-th nearest neighbor, could somebody help me with the code?
Thank you very much for your help and patience...
Please see the sample code:
class KDNode
{
private:
Point2D *data;
KDNode *left;
KDNode *right;
....
};
void KDTree::createKDTree(Points2DList *pl)
{
//Create list
KDList kd_list;
//Create KD list (all input points)
for (unsigned int i = 0; i < pl->size(); i++)
{
kd_list.push_back((*pl)[i]);
}
//Sort points by x
std::sort(kd_list.begin(), kd_list.end(), sortPoints2DByY());
//Build KD Tree
root = buildKDTree(&kd_list, 1);
}
KDNode * KDTree::buildKDTree(KDList *kd_list, const unsigned int depth)
{
//Build KD tree
const unsigned int n = kd_list->size();
//No leaf will be built
if (n == 0)
{
return NULL;
}
//Only one point: create leaf of KD Tree
else if (n == 1)
{
//Create one leaft
return new KDNode(new Point2D ((*kd_list)[0]));
}
//At least 2 points: create one leaf, split tree into left and right subtree
else
{
//New KD node
KDNode *node = NULL;
//Get median index
const unsigned int median_index = n/2;
//Create new KD Lists
KDList kd_list1, kd_list2;
//The depth is even, process by x coordinate
if (depth%2 == 0)
{
//Create new median node
node = new KDNode(new Point2D( (*kd_list)[median_index]));
//Split list
for (unsigned int i = 0; i < n; i++)
{
//Geta actual point
Point2D *p = &(*kd_list)[i];
//Add point to the first list: x < median.x
if (p->getX() < (*kd_list)[median_index].getX())
{
kd_list1.push_back(*p);
}
//Add point to the second list: x > median.x
else if (p->getX() > (*kd_list)[median_index].getX())
{
kd_list2.push_back(*p);
}
}
//Sort points by y for the next recursion step: slow construction of the tree???
std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByY());
std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByY());
}
//The depth is odd, process by y coordinates
else
{
//Create new median node
node = new KDNode(new Point2D((*kd_list)[median_index]));
//Split list
for (unsigned int i = 0; i < n; i++)
{
//Geta actual point
Point2D *p = &(*kd_list)[i];
//Add point to the first list: y < median.y
if (p->getY() < (*kd_list)[median_index].getY())
{
kd_list1.push_back(*p);
}
//Add point to the second list: y < median.y
else if (p->getY() >(*kd_list)[median_index].getY())
{
kd_list2.push_back(*p);
}
}
//Sort points by x for the next recursion step: slow construction of the tree???
std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByX());
std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByX());
}
//Build left subtree
node->setLeft( buildKDTree(&kd_list1, depth +1 ) );
//Build right subtree
node->setRight( buildKDTree(&kd_list2, depth + 1 ) );
//Return new node
return node;
}
}
The sorting to find the median is probably the worst culprit here, since that is O(nlogn) while the problem is solvable in O(n) time. You should use nth_element instead: http://www.cplusplus.com/reference/algorithm/nth_element/. That'll find the median in linear time on average, after which you can split the vector in linear time.
Memory management in vector is also something that can take a lot of time, especially with large vectors, since every time the vector's size is doubled all the elements have to be moved. You can use the reserve method of vector to reserve exactly enough space for the vectors in the newly created nodes, so they need not increase dynamically as new stuff is added with push_back.
And if you absolutely need the best performance, you should use lower level code, doing away with vector and reserving plain arrays instead. Nth element or 'selection' algorithms are readily available and not too hard to write yourself: http://en.wikipedia.org/wiki/Selection_algorithm
Some hints on optimizing the kd-tree:
Use a linear time median finding algorithm, such as QuickSelect.
Avoid actually using "node" objects. You can store whole tree using the points only, with ZERO additional information. Essentially by just sorting an array of objects. The root node will then be in the middle. A rearrangement that puts the root first, then uses a heap layout will likely be nicer to the CPU memory cache on query time, but more tricky to build.
Not really an answer to your questions, but I would highly recommend the forum at http://ompf.org/forum/
They have some great discussions over there for fast kd-tree constructions in various contexts. Perhaps you'll find some inspiration over there.
Edit:
The OMPF forums have since gone down, although a direct replacement is currently available at http://ompf2.com/
Your first culprit is sorting to find the median. This is almost always the bottleneck for K-d tree construction, and using more efficient algorithms here will really pay off.
However, you're also constructing a pair of variable-sized vectors each time you split and transferring elements to them.
Here I recommend the good ol' singly-linked list. The beauty of the linked list is that you can transfer elements from parent to child by simply changing next pointers to point at the child's root pointer instead of the parent's.
That means no heap overhead whatsoever during construction to transfer elements from parent nodes to child nodes, only to aggregate the initial list of elements to insert to the root. That should do wonders as well, but if you want even faster, you can use a fixed allocator to efficiently allocate nodes for the linked list (as well as for the tree) and with better contiguity/cache hits.
Last but not least, if you're involved in intensive computing tasks that call for K-d trees, you need a profiler. Measure your code and you'll see exactly what lies at the culprit, and with exact time distributions.