for my application I need to create a fixed size buffer (3 elements) of point clouds.
To do this I tried the naive way in my callback (I'm working on ROS):
vector< vector<Point2d> > points_buffer(3); // buffer of point clouds ,fixed size = 3
void laserToWorldCallback(const icars_laser_roi::stx_points::ConstPtr& laser_points, const icars_2d_map_manager::Status::ConstPtr& car_pos){
double x_w, y_w;
double x, y;
vector<Point2d> temp;
for(int i = 0; i < laser_points->points_x.size(); i++){
// get the coordinates
x = laser_points->points_x[i];
y = laser_points->points_y[i];
// tranform the coordinates
x_w = car_pos->xGlobal + x*cos(car_pos->yaw) - y*sin(car_pos->yaw);
y_w = car_pos->yGlobal + x*sin(car_pos->yaw) + y*cos(car_pos->yaw);
temp.push_back(Point2d(x_w, y_w));
}
if(points_buffer.size() != 3){ // the buffer is not empty
points_buffer.push_back(temp);
}else{ // the buffer is empty, delete last element and push_back
// delete last element
points_buffer[0] = points_buffer[1];
points_buffer[1] = points_buffer[2];
points_buffer[3] = temp;
}
}
}
But this way seems to me a bit rough and not efficient at all.
Might someone suggest me a more elegant and efficient way to do what I want?
Thank you
Regards
To fix some efficiency problems. First after declaration of temp you can already reserve the memory it will use with
temp.reserve(laser_points->points_x.size());
So there will be no reallocation of memory in push_back method.
If you are using c++11 or greater, in the case buffer is not yet full, you can move the content of the temp with std::move.
points_buffer.push_back(std::move(temp));
This is a O(1) operation. The content of temp after this is valid but unspecified.
Then in the deleting the last element use vector::swap instead of copy as it will swap the content and is guaranteed to be constant in time.
points_buffer[0].swap(points_buffer[1]);
points_buffer[1].swap(points_buffer[2]);
points_buffer[2].swap(temp); //There is a typo here index should be 2 not 3.
The program would be more readable, if you would wrap point_buffer in a class. Then you could also consider not rotating the content of whole vector but keeping track of the first index. This would work well also for larger point_buffer than 3. Then adding new element to buffer would be just
point_buffer[fist_element_].swap(temp);
first_element=(first_element_+1)%3;
then to access the element at position i you could implement the operator[] as
vector<Point2d>& operator[](int i){
return point_buffer[(i+first_element)%3];
}
Related
I am a data scientist, currently working on some C++ code to extract triplet particles from a rather large text file containing 2D coordinate data of particles in ~10⁵ consecutive frames. I am struggling with a strange memory error that I don't seem to understand.
I have a vector of structs, which can be divided into snippets defined by their frame. For each frame, I build an array with unique ID's for each individual coordinate pair, and if at any point the coordinate pair is repeated, the coordinate pair is given the old coordinate pair. This I then use later to define whether the particle triplet is indeed a trimer.
I loop over all particles and search forward for any corresponding coordinate pair. After I'm done, and no particles were found, I define this triplet to be unique and push the coordinates into a vector that corresponds to particle IDs.
The problem is: after the 18th iteration, at line trimerIDs[i][0] = particleCounter; , the variable trimerCands (my big vector array) suddenly becomes unreadable. Can this be that the vector pointer object is being overwritten? I put this vector fully on the heap, but even if I put it on stack, the error persists.
Do any of you have an idea of what I might be overlooking? Please note that I am rather new at C++, coming from other, less close to the metal, languages. While I think I understand how stack/heap allocations work, especially with respect to vectors/vector structs, I might be very wrong!
The error that Eclipse gives me in the variables tab is:
Failed to execute MI command:
-data-evaluate-expression trimerCands
Error message from debugger back end:
Cannot access memory at address 0x7fff0000000a
The function is as follows.
struct trimerCoords{
float x1,y1,x2,y2,x3,y3;
int frame;
int tLength1, tLength2, tLength3;
};
void removeNonTrimers(std::vector<trimerCoords> trimerCands, int *trCandLUT){
// trimerCands is a vector containing possible trimers, tLengthx is an attribute of the particle;
// trCandLUT is a look up table array with indices;
for (int currentFrame = 1; currentFrame <=framesTBA; currentFrame++){ // for each individual frame
int nTrimers = trCandLUT[currentFrame] - trCandLUT[currentFrame-1]; // get the number of trimers for this specific frame
int trimerIDs[nTrimers][3] = {0}; // preallocate an array for each of the inidivual particle in each triplet;
int firstTrim = trCandLUT[currentFrame-1]; // first index for this particular frame
int lastTrim = trCandLUT[currentFrame] - 1; // last index for this particular frame
bool found;
std::vector<int> traceLengths;
traceLengths.reserve(nTrimers*3);
// Block of code to create a unique ID array for this particular frame
std::vector<Particle> currentFound;
Particle tempEntry;
int particleCounter = 0;
for (int i = firstTrim; i <= lastTrim; i++){
// first triplet particle. In the real code, this is repeated three times, for x2/y2 and x3/y3, corresponding to the
tempEntry.x = trimerCands[i].x1;
tempEntry.y = trimerCands[i].y1;
found = false;
for (long unsigned int j = 0; j < currentFound.size(); j++){
if (fabs(tempEntry.x - currentFound[j].x) + fabs(tempEntry.y - currentFound[j].y) < 0.001){
trimerIDs[i][0] = j; found = true; break;
}
}
if (found == false) {
currentFound.push_back(tempEntry);
traceLengths.push_back(trimerCands[i].tLength1);
trimerIDs[i][0] = particleCounter;
particleCounter++;
}
}
// end block of create unique ID code block
compareTrips(nTrimers, trimerIDs, traceLengths, trimerfile_out);
}
}
If anything's unclear, let me know!
I am holding some 2-D points with x-y coordinates in a list. I have a method which sorts the array according to the distances the points have with the cursor and the method returns the pointer to the point that is closest to the cursor.
However I am using &points.first() and this always points to the first element of the list. However the pointer changes after I resort the list. How do I get a pointer that points to the specific ELEMENT, not the first element of the list.
I've tried:
&points.first()
QList<Point2> points;
Point2 *DrawingWidget::closestToCursor(){
// Current mouse position
Point2 pos(m_x, m_y);
// There are no points
if(points.isEmpty()){
return NULL;
}
// Sorts according to distance to the cursor
std::sort(std::begin(points), std::end(points), [&pos](Point2 a, Point2 b) {
return pos.distanceFrom(a) < pos.distanceFrom(b);
});
// We dont allow points closer than 50px appart
if(pos.distanceFrom(points.first()) > 50){
return NULL;
}
// Even after the resort, this always points to the first element of the vector. How do I get this elements pointer instead?
// Currently it seems that the pointer is basically LIST+0x0, however if the element shifts to whatever position, how do I still have its pointer?
return &points.first();
}
Each time I call this method near a new point, the pointer just shifts to the first element of the list, which is what it's supposed to DO, I know this. But how do I do it like I need to?
You should probably do linear search to find that element because sorting is more expensive.
Linear search is O(N).
Sorting is O(N*log2(N)).
E.g.:
auto& found = *std::min_element(std::begin(points), std::end(points),
[&pos](Point a, Point b) { return pos.distanceFrom(a) < pos.distanceFrom(b); });
return pos.distanceFrom(found) > 50 ? 0 : &found;
Since your list ends up sorted, you can find the original first point in log2(n) steps using a binary search:
#include <algorithm>
Point2 *DrawingWidget::closestToCursor() {
if (points.isEmpty())
return NULL;
Point2 pos(m_x, m_y);
auto cmpfun = [&pos](Point2 a, Point2 b) {
return pos.distanceFrom(a) < pos.distanceFrom(b);
});
auto firstPoint = points.first();
std::sort(std::begin(points), std::end(points), cmpfun);
if (pos.distanceFrom(points.first()) > 50)
return NULL;
// return a pointer to the original first point
return &*std::lower_bound(std::begin(points), std::end(points),
firstPoint, cmpfun);
}
There are other approaches, such as a decorate-sort-undecorate to sort the pointers and truly retain the original point, but those would likely end up being significantly more expensive to execute.
What would be the fastest and most efficient way of using std::remove_if with lambda predicate to delete multiple elements at the same time? At the moment I have a point struct with position and unique id. Inside an update loop we fill the points vector and we add points to be deleted at the end of the update loop. At the moment I have to call remove_if inside a loop to remove all the deleted points from the points vector. For example if we add 10 points per frame and after that we loop all the points to check if the point is outside screen bounds, if it is its added to deletedPoints_.
struct Point
{
/// Position.
Vector3 position_;
/// Unique id per point
int id_;
}
/// Current max id
int maxId_;
/// All points
std::vector<Point> points_;
/// Deleted points
std::vector<Point> deletedPoints_;
//Updates with 60fps
void App::Update()
{
/// Add 10 points per frame
for (int i = 0; i < 10; ++i)
{
Point newPoint;
/// Add position
newPoint.position_ = worldPosition;
/// Add id starts from 1
maxId_ += 1;
startPoint.id_ = maxId_;
/// Add new point in points
points_.push(newPoint);
}
/// If points outside of screen bounds add them to deletedPoints_
if (points_.size() > 0)
{
for (int i = 0; i < points_.size(); ++i)
{
/// Bounds
Vector2 min = Vector2(0.00,0.00);
Vector2 max = Vector2(1.00,1.00);
/// Check Bounds
if(points_[i].x < min.x || points_[i].y < min.y || points_[i].x > max.x || points_[i].y > max.y)
{
deletedPoints_.push(points_[i]);
}
}
/// Loop deleted points
for (int i = 0; i < deletedPoints_.size(); ++i)
{
int id = deletedPoints_[i].id_;
/// Remove by id
auto removeIt = std::remove_if(points_.begin(), points_.end(),
[id](const TrailPoint2& point)
{ return point.id_ == id; });
points_.erase(removeIt, points_.end());
}
}
}
Without changing your structures, the quickest fix is to invert the whole loop and check deletedPoints from inside the lambda instead.
Then, make deletedPoints a std::set<int> storing your unique IDs. Then it'll be relatively fast, because std::set<int>::find doesn't need to scan the entire container, though your final complexity will still not be quite linear-time.
std::vector<Point> points_;
std::set<int> deletedPointIds_;
/// Remove by id
auto removeIt = std::remove_if(points_.begin(), points_.end(),
[&](const TrailPoint2& point)
{ return deletedPointIds_.count(point.id_); });
points_.erase(removeIt, points_.end());
deletedPointIds_.clear();
That being said, whether the switch over to std::set will be actually faster depends on a few things; you lose memory locality and drop cache opportunities due to the way in which set's elements are stored.
An alternative might be to keep the vector (of IDs not points!), pre-sort it, then use std::binary_search to get the benefits of a quick search as well as the benefits of sequentially-stored data. However, performing this search may not be suitable for your application, depending on how much data you have and on how often you need to execute this algorithm.
You could also use a std::unordered_set<int> instead of a std::set; this has the same problems as a std::set but the hash-based lookup may be faster than a tree-based lookup. Again, this entirely depends on the size, form and distribution of your data.
Ultimately, the only way to know for sure, is to try a few things at simulated extents and measure it.
I have a 2D matrix and I want to copy its values to a 1D array vertically in an efficient way as the following way.
Matrice(3x3)
[1 2 3;
4 5 6;
7 8 9]
myarray:
{1,4,7,2,5,8,3,6,9}
Brute force takes 0.25 sec for 1000x750x3 image. I dont want to use vector because I give myarray to another function(I didnt write this function) as input. So, is there a c++ or opencv function that I can use? Note that, I'm using opencv library.
Copying matrix to array is also fine, I can first take the transpose of the Mat, then I will copy it to array.
cv::Mat transposed = myMat.t();
uchar* X = transposed.reshape(1,1).ptr<uchar>(0);
or
int* X = transposed.reshape(1,1).ptr<int>(0);
depending on your matrix type. It might copy data though.
You can optimize to make it more cache friendly, i.e. you can copy blockwise, keeping track of the positions in myArray, where the data should go to. The point is, that you brute force approach will most likely make each access to the matrix being off-cache, which has a tremendous performance impact. Hence it is better to copy vertical/horizontal taking the cache line size into account.
See the idea bbelow (I didn't test it, so it has most likely bugs, but it should make the idea clear).
size_t cachelinesize = 128/sizeof(pixel); // assumed cachelinesize of 128 bytes
struct pixel
{
char r;
char g;
char b;
};
array<array<pixel, 1000>, 750> matrice;
vector<pixel> vec(1000*750);
for (size_t row = 0; row<matrice.size; ++row)
{
for (size_t col = 0; col<matrice[0].size; col+=cachelinesize)
{
for (size_t i = 0; i<cachelinesize; ++i)
{
vec[row*(col+i)]=matrice[row][col+i]; // check here, if right copy order. I didn't test it.
}
}
}
If you are using the matrix before the vertical assignment/querying, then you can cache the necessary columns when you hit each one of the elements of columns.
//Multiplies and caches
doCalcButCacheVerticalsByTheWay(myMatrix,calcType,myMatrix2,cachedColumns);
instead of
doCalc(myMatrix,calcType,myMatrix2); //Multiplies
then use it like this:
...
tmpVariable=cachedColumns[i];
...
For example, upper function multiplies the matrix with another one, then when the necessary columns are reached, caching into a temporary array occurs so you can access elements of it later in a contiguous order.
I think Mat::reshape is what you want. It does not copying data.
I am trying to build KD Tree (static case). We assume points are sorted on both x and y coordinates.
For even depth of recursion the set is split into two subsets with a vertical line going through median x coordinate.
For odd depth of recursion the set is split into two subsets with a horizontal line going through median y coordinate.
The median can be determined from sorted set according to x / y coordinate. This step I am doing before each splitting of the set. And I think that it causes the slow construction of the tree.
Please could you help me check any and optimize the code?
I can not find the k-th nearest neighbor, could somebody help me with the code?
Thank you very much for your help and patience...
Please see the sample code:
class KDNode
{
private:
Point2D *data;
KDNode *left;
KDNode *right;
....
};
void KDTree::createKDTree(Points2DList *pl)
{
//Create list
KDList kd_list;
//Create KD list (all input points)
for (unsigned int i = 0; i < pl->size(); i++)
{
kd_list.push_back((*pl)[i]);
}
//Sort points by x
std::sort(kd_list.begin(), kd_list.end(), sortPoints2DByY());
//Build KD Tree
root = buildKDTree(&kd_list, 1);
}
KDNode * KDTree::buildKDTree(KDList *kd_list, const unsigned int depth)
{
//Build KD tree
const unsigned int n = kd_list->size();
//No leaf will be built
if (n == 0)
{
return NULL;
}
//Only one point: create leaf of KD Tree
else if (n == 1)
{
//Create one leaft
return new KDNode(new Point2D ((*kd_list)[0]));
}
//At least 2 points: create one leaf, split tree into left and right subtree
else
{
//New KD node
KDNode *node = NULL;
//Get median index
const unsigned int median_index = n/2;
//Create new KD Lists
KDList kd_list1, kd_list2;
//The depth is even, process by x coordinate
if (depth%2 == 0)
{
//Create new median node
node = new KDNode(new Point2D( (*kd_list)[median_index]));
//Split list
for (unsigned int i = 0; i < n; i++)
{
//Geta actual point
Point2D *p = &(*kd_list)[i];
//Add point to the first list: x < median.x
if (p->getX() < (*kd_list)[median_index].getX())
{
kd_list1.push_back(*p);
}
//Add point to the second list: x > median.x
else if (p->getX() > (*kd_list)[median_index].getX())
{
kd_list2.push_back(*p);
}
}
//Sort points by y for the next recursion step: slow construction of the tree???
std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByY());
std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByY());
}
//The depth is odd, process by y coordinates
else
{
//Create new median node
node = new KDNode(new Point2D((*kd_list)[median_index]));
//Split list
for (unsigned int i = 0; i < n; i++)
{
//Geta actual point
Point2D *p = &(*kd_list)[i];
//Add point to the first list: y < median.y
if (p->getY() < (*kd_list)[median_index].getY())
{
kd_list1.push_back(*p);
}
//Add point to the second list: y < median.y
else if (p->getY() >(*kd_list)[median_index].getY())
{
kd_list2.push_back(*p);
}
}
//Sort points by x for the next recursion step: slow construction of the tree???
std::sort(kd_list1.begin(), kd_list1.end(), sortPoints2DByX());
std::sort(kd_list2.begin(), kd_list2.end(), sortPoints2DByX());
}
//Build left subtree
node->setLeft( buildKDTree(&kd_list1, depth +1 ) );
//Build right subtree
node->setRight( buildKDTree(&kd_list2, depth + 1 ) );
//Return new node
return node;
}
}
The sorting to find the median is probably the worst culprit here, since that is O(nlogn) while the problem is solvable in O(n) time. You should use nth_element instead: http://www.cplusplus.com/reference/algorithm/nth_element/. That'll find the median in linear time on average, after which you can split the vector in linear time.
Memory management in vector is also something that can take a lot of time, especially with large vectors, since every time the vector's size is doubled all the elements have to be moved. You can use the reserve method of vector to reserve exactly enough space for the vectors in the newly created nodes, so they need not increase dynamically as new stuff is added with push_back.
And if you absolutely need the best performance, you should use lower level code, doing away with vector and reserving plain arrays instead. Nth element or 'selection' algorithms are readily available and not too hard to write yourself: http://en.wikipedia.org/wiki/Selection_algorithm
Some hints on optimizing the kd-tree:
Use a linear time median finding algorithm, such as QuickSelect.
Avoid actually using "node" objects. You can store whole tree using the points only, with ZERO additional information. Essentially by just sorting an array of objects. The root node will then be in the middle. A rearrangement that puts the root first, then uses a heap layout will likely be nicer to the CPU memory cache on query time, but more tricky to build.
Not really an answer to your questions, but I would highly recommend the forum at http://ompf.org/forum/
They have some great discussions over there for fast kd-tree constructions in various contexts. Perhaps you'll find some inspiration over there.
Edit:
The OMPF forums have since gone down, although a direct replacement is currently available at http://ompf2.com/
Your first culprit is sorting to find the median. This is almost always the bottleneck for K-d tree construction, and using more efficient algorithms here will really pay off.
However, you're also constructing a pair of variable-sized vectors each time you split and transferring elements to them.
Here I recommend the good ol' singly-linked list. The beauty of the linked list is that you can transfer elements from parent to child by simply changing next pointers to point at the child's root pointer instead of the parent's.
That means no heap overhead whatsoever during construction to transfer elements from parent nodes to child nodes, only to aggregate the initial list of elements to insert to the root. That should do wonders as well, but if you want even faster, you can use a fixed allocator to efficiently allocate nodes for the linked list (as well as for the tree) and with better contiguity/cache hits.
Last but not least, if you're involved in intensive computing tasks that call for K-d trees, you need a profiler. Measure your code and you'll see exactly what lies at the culprit, and with exact time distributions.