KD tree, slow tree construction - c++

I am trying to build KD Tree (static case). We assume points are sorted on both x and y coordinates.
For even depth of recursion the set is split into two subsets with a vertical line going through median x coordinate.
For odd depth of recursion the set is split into two subsets with a horizontal line going through median y coordinate.
The median can be determined from sorted set according to x / y coordinate. This step I am doing before each splitting of the set. And I think that it causes the slow construction of the tree.
Please could you help me check any and optimize the code?
I can not find the k-th nearest neighbor, could somebody help me with the code?
Thank you very much for your help and patience...
Please see the sample code:
class KDNode
{
private:
Point2D *data;
KDNode *left;
KDNode *right;
....
};
void KDTree::createKDTree(Points2DList *pl)
{
//Create list
KDList kd_list;
//Create KD list (all input points)
for (unsigned int i = 0; i < pl->size(); i++)
{
kd_list.push_back((*pl)[i]);
}
//Sort points by x
std::sort(kd_list.begin(), kd_list.end(), sortPoints2DByY());
//Build KD Tree
root = buildKDTree(&kd_list, 1);
}
KDNode * KDTree::buildKDTree(KDList *kd_list, const unsigned int depth)
{
//Build KD tree
const unsigned int n = kd_list->size();
//No leaf will be built
if (n == 0)
{
return NULL;
}
//Only one point: create leaf of KD Tree
else if (n == 1)
{
//Create one leaft
return new KDNode(new Point2D ((*kd_list)[0]));
}
//At least 2 points: create one leaf, split tree into left and right subtree
else
{
//New KD node
KDNode *node = NULL;
//Get median index
const unsigned int median_index = n/2;
//Create new KD Lists
KDList kd_list1, kd_list2;
//The depth is even, process by x coordinate
if (depth%2 == 0)
{
//Create new median node
node = new KDNode(new Point2D( (*kd_list)[median_index]));
//Split list
for (unsigned int i = 0; i < n; i++)
{
//Geta actual point
Point2D *p = &(*kd_list)[i];
//Add point to the first list: x < median.x
if (p->getX() < (*kd_list)[median_index].getX())
{
kd_list1.push_back(*p);
}
//Add point to the second list: x > median.x
else if (p->getX() > (*kd_list)[median_index].getX())
{
kd_list2.push_back(*p);
}
}
//Sort points by y for the next recursion step: slow construction of the tree???
std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByY());
std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByY());
}
//The depth is odd, process by y coordinates
else
{
//Create new median node
node = new KDNode(new Point2D((*kd_list)[median_index]));
//Split list
for (unsigned int i = 0; i < n; i++)
{
//Geta actual point
Point2D *p = &(*kd_list)[i];
//Add point to the first list: y < median.y
if (p->getY() < (*kd_list)[median_index].getY())
{
kd_list1.push_back(*p);
}
//Add point to the second list: y < median.y
else if (p->getY() >(*kd_list)[median_index].getY())
{
kd_list2.push_back(*p);
}
}
//Sort points by x for the next recursion step: slow construction of the tree???
std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByX());
std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByX());
}
//Build left subtree
node->setLeft( buildKDTree(&kd_list1, depth +1 ) );
//Build right subtree
node->setRight( buildKDTree(&kd_list2, depth + 1 ) );
//Return new node
return node;
}
}

The sorting to find the median is probably the worst culprit here, since that is O(nlogn) while the problem is solvable in O(n) time. You should use nth_element instead: http://www.cplusplus.com/reference/algorithm/nth_element/. That'll find the median in linear time on average, after which you can split the vector in linear time.
Memory management in vector is also something that can take a lot of time, especially with large vectors, since every time the vector's size is doubled all the elements have to be moved. You can use the reserve method of vector to reserve exactly enough space for the vectors in the newly created nodes, so they need not increase dynamically as new stuff is added with push_back.
And if you absolutely need the best performance, you should use lower level code, doing away with vector and reserving plain arrays instead. Nth element or 'selection' algorithms are readily available and not too hard to write yourself: http://en.wikipedia.org/wiki/Selection_algorithm

Some hints on optimizing the kd-tree:
Use a linear time median finding algorithm, such as QuickSelect.
Avoid actually using "node" objects. You can store whole tree using the points only, with ZERO additional information. Essentially by just sorting an array of objects. The root node will then be in the middle. A rearrangement that puts the root first, then uses a heap layout will likely be nicer to the CPU memory cache on query time, but more tricky to build.

Not really an answer to your questions, but I would highly recommend the forum at http://ompf.org/forum/
They have some great discussions over there for fast kd-tree constructions in various contexts. Perhaps you'll find some inspiration over there.
Edit:
The OMPF forums have since gone down, although a direct replacement is currently available at http://ompf2.com/

Your first culprit is sorting to find the median. This is almost always the bottleneck for K-d tree construction, and using more efficient algorithms here will really pay off.
However, you're also constructing a pair of variable-sized vectors each time you split and transferring elements to them.
Here I recommend the good ol' singly-linked list. The beauty of the linked list is that you can transfer elements from parent to child by simply changing next pointers to point at the child's root pointer instead of the parent's.
That means no heap overhead whatsoever during construction to transfer elements from parent nodes to child nodes, only to aggregate the initial list of elements to insert to the root. That should do wonders as well, but if you want even faster, you can use a fixed allocator to efficiently allocate nodes for the linked list (as well as for the tree) and with better contiguity/cache hits.
Last but not least, if you're involved in intensive computing tasks that call for K-d trees, you need a profiler. Measure your code and you'll see exactly what lies at the culprit, and with exact time distributions.

Related

How to calculate the miss links in a BVH tree?

I am creating an OpenGl based ray tracer for polygon models. To accelerate the application I am using BVH-trees. Because there is no recursion in GLSL, I decided to find an other way to traverse the bounding boxes, sent to the fragment shader as shader storage buffers.
I would like to implement that kind of way:Traversal of BVH tree in shaders
Actually I don't really understand how to calculate the hit and miss links during the construction of the tree. Hit and miss links help the program to navigate to the next node (bounding box) during the traverse, whether it is intersected or not missed.
Until now I created the method to construct the tree, as well as I can also put the tree into a simple array. I have depth-first implementation to flatten the tree into the array.
Here are the depth-first, tree flattening methods:
FlatBvhNode nodeConverter2(BvhNode node, int& ind){
FlatBvhNode result = FlatBvhNode(node.bBox.min, node.bBox.max, ind, node.isLeaf,
node.indices);
return result;
}
void flattenRecursion(const BvhNode &bvhNode, vector<FlatBvhNode>& nodes, int& ind) {
++ind;
nodes.push_back(nodeConverter2(bvhNode, ind));
if (!bvhNode.isLeaf) {
flattenRecursion(*bvhNode.children.at(0), nodes, ind);
flattenRecursion(*bvhNode.children.at(1), nodes,ind);
}
}
vector<FlatBvhNode>* flatten(const BvhNode& root) {
vector<FlatBvhNode>* nodesArray=new vector<FlatBvhNode>;
nodesArray->reserve(root.countNodes());
int ind=0;
flattenRecursion(root, *nodesArray, ind);
return nodesArray;
}
I have to calculate the following "links" :
The image is from: source. The image shows the different linkings. So, for example the ray intersects a bounding box (Hit links), we can move to the next node in the array. This is all right as I have depth-first traversal. The problem is coming when I have to move to the sibling or even to the parent's sibling. How can I implement these linkings / offsets? I know I should create and indices but how to do this with depth-first tree construction.
Any help is appreciated.
I do not have an answer about a depth-first tree, but I have figured out a way to do that if your tree is a heap. So here is some code in GLSL I used
int left(in int index) { // left child
return 2 * index + 1;
}
int right(in int index) { // right child
return 2 * index + 2;
}
int parent(in int index) {
return (index - 1) / 2;
}
int right_sibling(in int index) { // a leaf hit or a miss link
int result = index;
while(result % 2 == 0 && result != 0) {
result = parent(result);
}
return result + 1 * int(result != 0);
}
I am using this and it works with a pretty reasonable speed. The only problem I have is that loop, which slows the performance. I would really like to have a constant complexity expression in that function.

Compute the "lower contour" of a set of segments in the plane in `O(n log n)`

Suppose you've a set s of horizontal line segments in the plane described by a starting point p, an end point q and a y-value.
We can assume that all values of p and qare pairwise distinct and no two segments overlap.
I want to compute the "lower contour" of the segment.
We can sort s by p and iterate through each segment j. If i is the "active" segment and j->y < i->y we "switch to" j (and output the corresponding contour element).
However, what can we do, when no such j exists and we find a j with i->q < j->p. Then, we would need to switch to the "next higher segment". But how do we know that segment? I can't find a way such that the resulting algorithm would have a running time of O(n log n). Any ideas?
A sweep line algorithm is an efficient way to solve your problem. As explained previously by Brian, we can sort all the endpoints by the x-coordinate and process them in order. An important distinction to make here is that we are sorting the endpoints of the segment and not the segments in order of increasing starting point.
If you imagine a vertical line sweeping from left to right across your segments, you will notice two things:
At any position, the vertical line either intersects a set of segments or nothing. Let's call this set the active set. The lower contour is the segment within the active set with the smallest y-coordinate.
The only x-coordinates where the lower contour can change are the segment endpoints.
This immediately brings one observation: the lower contour should be a list of segments. A list of points does not provide sufficient information to define the contour, which can be undefined at certain x-coordinates (where there are no segments).
We can model the active set with an std::set ordered by the y position of the segment. Processing the endpoints in order of increasing x-coordinate. When encountering a left endpoint, insert the segment. When encountering a right endpoint, erase the segment. We can find the active segment with the lowest y-coordinate with set::begin() in constant time thanks to the ordering. Since each segment is only ever inserted once and erased once, maintaining the active set takes O(n log n) time in total.
In fact, it is possible to maintain a std::multiset of only the y-coordinates for each segment that intersects the sweep line, if it is easier.
The assumption that the segments are non-overlapping and have distinct endpoints is not entirely necessary. Overlapping segments are handled both by the ordered set of segments and the multiset of y-coordinates. Coinciding endpoints can be handled by considering all endpoints with the same x-coordinate at one go.
Here, I assume that there are no zero-length segments (i.e. points) to simplify things, although they can also be handled with some additional logic.
std::list<segment> lower_contour(std::list<segment> segments)
{
enum event_type { OPEN, CLOSE };
struct event {
event_type type;
const segment &s;
inline int position() const {
return type == OPEN ? s.sp : s.ep;
}
};
struct order_by_position {
bool operator()(const event& first, const event& second) {
return first.position() < second.position();
}
};
std::list<event> events;
for (auto s = segments.cbegin(); s != segments.cend(); ++s)
{
events.push_back( event { OPEN, *s } );
events.push_back( event { CLOSE, *s } );
}
events.sort(order_by_position());
// maintain a (multi)set of the y-positions for each segment that intersects the sweep line
// the ordering allows querying for the lowest segment in O(log N) time
// the multiset also allows overlapping segments to be handled correctly
std::multiset<int> active_segments;
bool contour_is_active = false;
int contour_y;
int contour_sp;
// the resulting lower contour
std::list<segment> contour;
for (auto i = events.cbegin(); i != events.cend();)
{
auto j = i;
int current_position = i->position();
while (j != events.cend() && j->position() == current_position)
{
switch (j->type)
{
case OPEN: active_segments.insert(j->s.y); break;
case CLOSE: active_segments.erase(j->s.y); break;
}
++j;
}
i = j;
if (contour_is_active)
{
if (active_segments.empty())
{
// the active segment ends here
contour_is_active = false;
contour.push_back( segment { contour_sp, current_position, contour_y } );
}
else
{
// if the current lowest position is different from the previous one,
// the old active segment ends here and a new active segment begins
int current_y = *active_segments.cbegin();
if (current_y != contour_y)
{
contour.push_back( segment { contour_sp, current_position, contour_y } );
contour_y = current_y;
contour_sp = current_position;
}
}
}
else
{
if (!active_segments.empty())
{
// a new contour segment begins here
int current_y = *active_segments.cbegin();
contour_is_active = true;
contour_y = current_y;
contour_sp = current_position;
}
}
}
return contour;
}
As Brian also mentioned, a binary heap like std::priority_queue can also be used to maintain the active set and tends to outperform std::set, even if it does not allow arbitrary elements to be deleted. You can work around this by flagging a segment as removed instead of erasing it. Then, repeatedly remove the top() of the priority_queue if it is a flagged segment. This might end up being faster, but it may or may not matter for your use case.
First sort all the endpoints by x-coordinate (both starting and ending points). Iterate through the endpoints and keep a std::set of all the y-coordinates of active segments. When you reach a starting point, add its y-coordinate to the set and "switch" to it if it's the lowest; when you reach an ending point, remove its y-coordinate from the set and recalculate the lowest y-coordinate using the set. This gives an O(n log n) solution overall.
A balanced binary search tree such as that used to implement std::set generally has a large constant factor. You can speed up this approach by using a binary heap (std::priority_queue) instead of a set, with the lowest y-coordinate at the root. In this case, you can't remove a non-root node, but when you reach such an ending point, just mark the segment inactive in an array. When the root node is popped, continue popping until there is a new root node that hasn't been marked inactive already. I think this will be about twice as fast as the set-based approach, but you'll have to code it yourself and see, if that's a concern.

Optimizing the Dijkstra's algorithm

I need a graph-search algorithm that is enough in our application of robot navigation and I chose Dijkstra's algorithm.
We are given the gridmap which contains free, occupied and unknown cells where the robot is only permitted to pass through the free cells. The user will input the starting position and the goal position. In return, I will retrieve the sequence of free cells leading the robot from starting position to the goal position which corresponds to the path.
Since executing the dijkstra's algorithm from start to goal would give us a reverse path coming from goal to start, I decided to execute the dijkstra's algorithm backwards such that I would retrieve the path from start to goal.
Starting from the goal cell, I would have 8 neighbors whose cost horizontally and vertically is 1 while diagonally would be sqrt(2) only if the cells are reachable (i.e. not out-of-bounds and free cell).
Here are the rules that should be observe in updating the neighboring cells, the current cell can only assume 8 neighboring cells to be reachable (e.g. distance of 1 or sqrt(2)) with the following conditions:
The neighboring cell is not out of bounds
The neighboring cell is unvisited.
The neighboring cell is a free cell which can be checked via the 2-D grid map.
Here is my implementation:
#include <opencv2/opencv.hpp>
#include <algorithm>
#include "Timer.h"
/// CONSTANTS
static const int UNKNOWN_CELL = 197;
static const int FREE_CELL = 255;
static const int OCCUPIED_CELL = 0;
/// STRUCTURES for easier management.
struct vertex {
cv::Point2i id_;
cv::Point2i from_;
vertex(cv::Point2i id, cv::Point2i from)
{
id_ = id;
from_ = from;
}
};
/// To be used for finding an element in std::multimap STL.
struct CompareID
{
CompareID(cv::Point2i val) : val_(val) {}
bool operator()(const std::pair<double, vertex> & elem) const {
return val_ == elem.second.id_;
}
private:
cv::Point2i val_;
};
/// Some helper functions for dijkstra's algorithm.
uint8_t get_cell_at(const cv::Mat & image, int x, int y)
{
assert(x < image.rows);
assert(y < image.cols);
return image.data[x * image.cols + y];
}
/// Some helper functions for dijkstra's algorithm.
bool checkIfNotOutOfBounds(cv::Point2i current, int rows, int cols)
{
return (current.x >= 0 && current.y >= 0 &&
current.x < cols && current.y < rows);
}
/// Brief: Finds the shortest possible path from starting position to the goal position
/// Param gridMap: The stage where the tracing of the shortest possible path will be performed.
/// Param start: The starting position in the gridMap. It is assumed that start cell is a free cell.
/// Param goal: The goal position in the gridMap. It is assumed that the goal cell is a free cell.
/// Param path: Returns the sequence of free cells leading to the goal starting from the starting cell.
bool findPathViaDijkstra(const cv::Mat& gridMap, cv::Point2i start, cv::Point2i goal, std::vector<cv::Point2i>& path)
{
// Clear the path just in case
path.clear();
// Create working and visited set.
std::multimap<double,vertex> working, visited;
// Initialize working set. We are going to perform the djikstra's
// backwards in order to get the actual path without reversing the path.
working.insert(std::make_pair(0, vertex(goal, goal)));
// Conditions in continuing
// 1.) Working is empty implies all nodes are visited.
// 2.) If the start is still not found in the working visited set.
// The Dijkstra's algorithm
while(!working.empty() && std::find_if(visited.begin(), visited.end(), CompareID(start)) == visited.end())
{
// Get the top of the STL.
// It is already given that the top of the multimap has the lowest cost.
std::pair<double, vertex> currentPair = *working.begin();
cv::Point2i current = currentPair.second.id_;
visited.insert(currentPair);
working.erase(working.begin());
// Check all arcs
// Only insert the cells into working under these 3 conditions:
// 1. The cell is not in visited cell
// 2. The cell is not out of bounds
// 3. The cell is free
for (int x = current.x-1; x <= current.x+1; x++)
for (int y = current.y-1; y <= current.y+1; y++)
{
if (checkIfNotOutOfBounds(cv::Point2i(x, y), gridMap.rows, gridMap.cols) &&
get_cell_at(gridMap, x, y) == FREE_CELL &&
std::find_if(visited.begin(), visited.end(), CompareID(cv::Point2i(x, y))) == visited.end())
{
vertex newVertex = vertex(cv::Point2i(x,y), current);
double cost = currentPair.first + sqrt(2);
// Cost is 1
if (x == current.x || y == current.y)
cost = currentPair.first + 1;
std::multimap<double, vertex>::iterator it =
std::find_if(working.begin(), working.end(), CompareID(cv::Point2i(x, y)));
if (it == working.end())
working.insert(std::make_pair(cost, newVertex));
else if(cost < (*it).first)
{
working.erase(it);
working.insert(std::make_pair(cost, newVertex));
}
}
}
}
// Now, recover the path.
// Path is valid!
if (std::find_if(visited.begin(), visited.end(), CompareID(start)) != visited.end())
{
std::pair <double, vertex> currentPair = *std::find_if(visited.begin(), visited.end(), CompareID(start));
path.push_back(currentPair.second.id_);
do
{
currentPair = *std::find_if(visited.begin(), visited.end(), CompareID(currentPair.second.from_));
path.push_back(currentPair.second.id_);
} while(currentPair.second.id_.x != goal.x || currentPair.second.id_.y != goal.y);
return true;
}
// Path is invalid!
else
return false;
}
int main()
{
// cv::Mat image = cv::imread("filteredmap1.jpg", CV_LOAD_IMAGE_GRAYSCALE);
cv::Mat image = cv::Mat(100,100,CV_8UC1);
std::vector<cv::Point2i> path;
for (int i = 0; i < image.rows; i++)
for(int j = 0; j < image.cols; j++)
{
image.data[i*image.cols+j] = FREE_CELL;
if (j == image.cols/2 && (i > 3 && i < image.rows - 3))
image.data[i*image.cols+j] = OCCUPIED_CELL;
// if (image.data[i*image.cols+j] > 215)
// image.data[i*image.cols+j] = FREE_CELL;
// else if(image.data[i*image.cols+j] < 100)
// image.data[i*image.cols+j] = OCCUPIED_CELL;
// else
// image.data[i*image.cols+j] = UNKNOWN_CELL;
}
// Start top right
cv::Point2i goal(image.cols-1, 0);
// Goal bottom left
cv::Point2i start(0, image.rows-1);
// Time the algorithm.
Timer timer;
timer.start();
findPathViaDijkstra(image, start, goal, path);
std::cerr << "Time elapsed: " << timer.getElapsedTimeInMilliSec() << " ms";
// Add the path in the image for visualization purpose.
cv::cvtColor(image, image, CV_GRAY2BGRA);
int cn = image.channels();
for (int i = 0; i < path.size(); i++)
{
image.data[path[i].x*cn*image.cols+path[i].y*cn+0] = 0;
image.data[path[i].x*cn*image.cols+path[i].y*cn+1] = 255;
image.data[path[i].x*cn*image.cols+path[i].y*cn+2] = 0;
}
cv::imshow("Map with path", image);
cv::waitKey();
return 0;
}
For the algorithm implementation, I decided to have two sets namely the visited and working set whose each elements contain:
The location of itself in the 2D grid map.
The accumulated cost
Through what cell did it get its accumulated cost (for path recovery)
And here is the result:
The black pixels represent obstacles, the white pixels represent free space and the green line represents the path computed.
On this implementation, I would only search within the current working set for the minimum value and DO NOT need to scan throughout the cost matrix (where initially, the initially cost of all cells are set to infinity and the starting point 0). Maintaining a separate vector of the working set I think promises a better code performance because all the cells that have cost of infinity is surely to be not included in the working set but only those cells that have been touched.
I also took advantage of the STL which C++ provides. I decided to use the std::multimap since it can store duplicating keys (which is the cost) and it sorts the lists automatically. However, I was forced to use std::find_if() to find the id (which is the row,col of the current cell in the set) in the visited set to check if the current cell is on it which promises linear complexity. I really think this is the bottleneck of the Dijkstra's algorithm.
I am well aware that A* algorithm is much faster than Dijkstra's algorithm but what I wanted to ask is my implementation of Dijkstra's algorithm optimal? Even if I implemented A* algorithm using my current implementation in Dijkstra's which is I believe suboptimal, then consequently A* algorithm will also be suboptimal.
What improvement can I perform? What STL is the most appropriate for this algorithm? Particularly, how do I improve the bottleneck?
You're using a std::multimap for 'working' and 'visited'. That's not great.
The first thing you should do is change visited into a per-vertex flag so you can do your find_if in constant time instead of linear times and also so that operations on the list of visited vertices take constant instead of logarithmic time. You know what all the vertices are and you can map them to small integers trivially, so you can use either a std::vector or a std::bitset.
The second thing you should do is turn working into a priority queue, rather than a balanced binary tree structure, so that operations are a (largish) constant factor faster. std::priority_queue is a barebones binary heap. A higher-radix heap---say quaternary for concreteness---will probably be faster on modern computers due to its reduced depth. Andrew Goldberg suggests some bucket-based data structures; I can dig up references for you if you get to that stage. (They're not too complicated.)
Once you've taken care of these two things, you might look at A* or meet-in-the-middle tricks to speed things up even more.
Your performance is several orders of magnitude worse than it could be because you're using graph search algorithms for what looks like geometry. This geometry is much simpler and less general than the problems that graph search algorithms can solve. Also, with a vertex for every pixel your graph is huge even though it contains basically no information.
I heard you asking "how can I make this better without changing what I'm thinking" but nevertheless I'll tell you a completely different and better approach.
It looks like your robot can only go horizontally, vertically or diagonally. Is that for real or just a side effect of you choosing graph search algorithms? I'll assume the latter and let it go in any direction.
The algorithm goes like this:
(0) Represent your obstacles as polygons by listing the corners. Work in real numbers so you can make them as thin as you like.
(1) Try for a straight line between the end points.
(2) Check if that line goes through an obstacle or not. To do that for any line, show that all corners of any particular obstacle lie on the same side of the line. To do that, translate all points by (-X,-Y) of one end of the line so that that point is at the origin, then rotate until the other point is on the X axis. Now all corners should have the same sign of Y if there's no obstruction. There might be a quicker way just using gradients.
(3) If there's an obstruction, propose N two-segment paths going via the N corners of the obstacle.
(4) Recurse for all segments, culling any paths with segments that go out of bounds. That won't be a problem unless you have obstacles that go out of bounds.
(5) When it stops recursing, you should have a list of locally optimised paths from which you can choose the shortest.
(6) If you really want to restrict bearings to multiples of 45 degrees, then you can do this algorithm first and then replace each segment by any 45-only wiggly version that avoids obstacles. We know that such a version exists because you can stay extremely close to the original line by wiggling very often. We also know that all such wiggly paths have the same length.

How to Store Very Large Graphs Space Efficiently Yet have Fast Indexing?

I am working on a graph with 875713 nodes and 5105039 edges. Using vector<bitset<875713>> vec(875713) or array<bitset<875713>, 875713> throws a segfault at me. I need to calculate all-pair-shortest-paths with path recovery. What alternative data structures do I have?
I found this SO Thread but it doesn't answer my query.
EDIT
I tried this after reading the suggestions, seems to work. Thanks everyone for helping me out.
vector<vector<uint>> neighboursOf; // An edge between i and j exists if
// neighboursOf[i] contains j
neighboursOf.resize(nodeCount);
while (input.good())
{
uint fromNodeId = 0;
uint toNodeId = 0;
getline(input, line);
// Skip comments in the input file
if (line.size() > 0 && line[0] == '#')
continue;
else
{
// Each line is of the format "<fromNodeId> [TAB] <toNodeId>"
sscanf(line.c_str(), "%d\t%d", &fromNodeId, &toNodeId);
// Store the edge
neighboursOf[fromNodeId].push_back(toNodeId);
}
}
Your graph is sparse, that is, |E| << |V|^2, so you should probably either use a sparse matrix to represent your adjacency matrix, or equivalently, store for each node a list of its neighbors (which is results in a jagged array), like this -
vector<vector<int> > V (number_of_nodes);
// For each cell of V, which is a vector itself, push only the indices of adjacent nodes.
V[0].push_back(2); // Node number 2 is a neighbor of node number 0
...
V[number_of_nodes-1].push_back(...);
This way, your expected memory requirements are O(|E| + |V|) instead of O(|V|^2), which in your case should be around 50 MB instead of a gazzillion MB.
This will also result in a faster Dijkstra (or any other shortest-path algorithm) since you only need to consider the neighbors of a node at each step.
You could store lists of edges per node in a single array. If the number of edges per node is variable you can terminate the lists with a null edge. This will avoid the space overhead for many small lists (or similar data structures). The result could look like this:
enum {
MAX_NODES = 875713,
MAX_EDGES = 5105039,
};
int nodes[MAX_NODES+1]; // contains index into array edges[].
// index zero is reserved as null node
// to terminate lists.
int edges[MAX_EDGES+MAX_NODES]; // contains null terminated lists of edges.
// each edge occupies a single entry in the
// array. each list ends with a null node.
// there are MAX_EDGES entries and MAX_NODES
// lists.
[...]
/* find edges for node */
int node, edge, edge_index;
for (edge_index=nodes[node]; edges[edge_index]; edge_index++) {
edge = edges[edge_index];
/* do something with edge... */
}
Minimizing the space overhead is very important since you have a huge number of small data structures. The overhead for each list of nodes is just one integer, this is much less than the overhead of e.g. a stl vector. Also the lists are continuously layed out in memory, which means that there is no wasted space between any two lists. With variable sized vectors this will not be the case.
Reading all edges for any given node will be very fast because the edges for any node are stored continuously in memory.
The downside of this data arrangement is that when you initialize the arrays and construct the edge lists, you need to have all the edges for a node at hand. This is not a problem if you get the edges sorted by node, but does not work well if the edges are in random order.
If we declare a Node as below:
struct{
int node_id;
vector<int> edges; //all the edges starts from this Node.
} Node;
Then all the nodes can be expressed as below:
array<Node> nodes;

improving performance for graph connectedness computation

I am writing a program to generate a graph and check whether it is connected or not. Below is the code. Here is some explanation: I generate a number of points on the plane at random locations. I then connect the nodes, NOT based on proximity only. By that I mean to say that a node is more likely to be connected to nodes that are closer, and this is determined by a random variable that I use in the code (h_sq) and the distance. Hence, I generate all links (symmetric, i.e., if i can talk to j the viceversa is also true) and then check with a BFS to see if the graph is connected.
My problem is that the code seems to be working properly. However, when the number of nodes becomes greater than ~2000 it is terribly slow, and I need to run this function many times for simulation purposes. I even tried to use other libraries for graphs but the performance is the same.
Does anybody know how could I possibly speed everything up?
Thanks,
int Graph::gen_links() {
if( save == true ) { // in case I want to store the structure of the graph
links.clear();
links.resize(xy.size());
}
double h_sq, d;
vector< vector<luint> > neighbors(xy.size());
// generate links
double tmp = snr_lin / gamma_0_lin;
// xy is a std vector of pairs containing the nodes' locations
for(luint i = 0; i < xy.size(); i++) {
for(luint j = i+1; j < xy.size(); j++) {
// generate |h|^2
d = distance(i, j);
if( d < d_crit ) // for sim purposes
d = 1.0;
h_sq = pow(mrand.randNorm(0, 1), 2.0) + pow(mrand.randNorm(0, 1), 2.0);
if( h_sq * tmp >= pow(d, alpha) ) {
// there exists a link between i and j
neighbors[i].push_back(j);
neighbors[j].push_back(i);
// options
if( save == true )
links.push_back( make_pair(i, j) );
}
}
if( neighbors[i].empty() && save == false ) {
// graph not connected. since save=false i dont need to store the structure,
// hence I exit
connected = 0;
return 1;
}
}
// here I do BFS to check whether the graph is connected or not, using neighbors
// BFS code...
return 1;
}
UPDATE:
the main problem seems to be the push_back calls within the inner for loops. It's the part that takes most of the time in this case. Shall I use reserve() to increase efficiency?
Are you sure the slowness is caused by the generation but not by your search algorithm?
The graph generation is O(n^2) and you can't do too much to it. However, you can apparently use memory in exchange of some of the time if the point locations are fixed for at least some of the experiments.
First, distances of all node pairs, and pow(d, alpha) can be precomputed and saved into memory so that you don't need to compute them again and again. The extra memory cost for 10000 nodes will be about 800mb for double and 400mb for float..
In addition, sum of square of normal variable is chi-square distribution if I remember correctly.. Probably you can have some precomputed table lookup if the accuracy allowed?
At last, if the probability that two nodes will be connected are so small if the distance exceeds some value, then you don't need O(n^2) and probably you can only calculate those node pairs that have distance smaller than some limits?
As a first step you should try to use reserve for both inner and outer vectors.
If this does not bring performance up to your expectations I believe this is because memory allocations that are still happening.
There is a handy class I've used in similar situations, llvm::SmallVector (find it in Google). It provides a vector with few pre-allocated items, so you can have decrease number of allocations by one per vector.
It still can grow when it is running out of items in pre-allocated space.
So:
1) Examine the number of items you have in your vectors on average during runs (I'm talking about both inner and outer vectors)
2) Put in llvm::SmallVector with a pre-allocation of such size (as vector is allocated on the stack you might need to increase stack size, or reduce pre-allocation if you are restricted on available stack memory).
Another good thing about SmallVector is that it has almost the same interface as std::vector (could be easily put instead of it)