I'm using a particle physics library written in c++ for a game.
In order to draw the particles I must get an array of all their positions like so..
b2Vec2* particlePositionBuffer = world->GetParticlePositionBuffer();
This returns an array of b2Vec2 objects (which represent 2 dimensional vectors in the physics engine).
Also I can get and set their colour using
b2ParticleColor* particleColourBuffer = world->GetParticleColorBuffer();
I would like to get the 10% of the particles with the highest Y values (and then change their colour)
My idea is..
1. Make an array of structs the same size as the particlePositionBuffer array, the struct just contains an int (the particles index in the particlePositionBuffer array) and a float (the particles y position)
2.Then I sort the array by the y position.
3.Then I use the int in the struct from the top 10% of structs in my struct array to do stuff to their colour in the particleColourBuffer array.
Could someone show me how to sort and array of structs like that in c++ ?
Also do you think this is a decent way of going about this? I only need to do it once (not every frame)
Following may help:
// Functor to compare indices according to Y value.
struct comp
{
explicit comp(b2Vec2* particlePositionBuffer) :
particlePositionBuffer(particlePositionBuffer)
{}
operator (int lhs, int rhs) const
{
// How do you get Y coord ?
// note that I do rhs < lhs to have higher value first.
return particlePositionBuffer[rhs].getY() < particlePositionBuffer[lhs].getY();
}
b2Vec2* particlePositionBuffer;
};
void foo()
{
const std::size_t size = world->GetParticleCount(); // How do you get Count ?
const std::size_t subsize = size / 10; // check for not zero ?
std::vector<std::size_t> indices(size);
for (std::size_t i = 0; i != size; ++i) {
indices[i] = i;
}
std::nth_element(indices.begin(), indices.begin() + subsize, indices.end(),
comp(world->GetParticlePositionBuffer()));
b2ParticleColor* particleColourBuffer = world->GetParticleColorBuffer();
for (std::size_t i = 0; i != subsize; ++i) {
changeColor(particleColourBuffer[i])
}
}
If your particle count is low, it won't matter much either way, and sorting them all first with a simple stl sort routine would be fine.
If the number were large though, I'd create a binary search tree whose maximum size was 10% of the number of your particles. Then I'd maintain the minY actually stored in the tree for quick rejection purposes. Then this algorithm should do it:
Walk through your original array and add items to the tree until it is full (10%)
Update your minY
For remaining items in original array
If item.y is less than minY, go to next item (quick rejection)
Otherwise
Remove the currently smallest Y value from the tree
Add the larger Y item to the tree
Update MinY
A binary search tree has a nice advantage of quick insert, quick search, and maintained ordering. If you want to be FAST, this is better than a complete sort on the entire array.
Related
I have a collection of 512D std::vector to store face embeddings. I create my index and perform training on a subset of the data.
int d = 512;
size_t nb = this->templates.size() // 95000
size_t nt = 50000; // training data size
std::vector<float> training_set(nt * d);
faiss::IndexFlatIP coarse_quantizer(d);
int ncentroids = int(4 * sqrt(nb)));
faiss::IndexIVFPQ index(&coarse_quantizer,d,ncentroids,4,8);
std::vector<float> training_set(nt*d);
The this->templates has an index value in [0] and the 512D vectors in [1]. My question is about the training and indexing. I have this currently:
int v=0;
for (auto const& element : this->templates)
{
std::vector<double> enrollment_template = element.second;
for (int i=0;i<d;i++){
training_set[(v*d)+i] = (float)enrollment_template.at(i);
v++;
}
index.train(nt,training_set.data());
FAISS Index.Train function
virtual void train(idx_t n, const float *x)
Perform training on a representative set of vectors
Parameters:
n – nb of training vectors
x – training vecors, size n * d
Is that the proper way of adding the 512D vector data into Faiss for training? It seems to me that if I have 2 face embeddings that are 512D in size, the training_set would be like this:
training_set[0-511] - Face #1's 512D vectors
training_set[512-1024] - Face #2's 512D vectors
and since Faiss knows we are working with 512D vectors, it will intelligently parse them out of the array.
Here's a more efficient way to write it:
int v = 0;
for (auto const& element : this->templates)
{
auto& enrollment_template = element.second; // not copy
if (v + d > training_set.size()) {
break; // prevent overflow, "nt" is smaller than templates.size()
}
for (int i = 0; i < d; i++) {
training_set[v] = enrollment_template[i]; // not at()
v++;
}
}
We avoid a copy with auto& enrollment_template, avoid extra branching with enrollment_template[i] (you know you won't be out of bounds), and simplify the address computation with training_set[v] by making v a count of elements rather than rows.
Further efficiency could be gained if templates can be changed to store floats rather than doubles--then you'd just be bitwise-copying 512 floats rather than converting doubles to floats.
Also, be sure to declare d as constexpr to give the compiler the best chance of optimizing the loop.
I'm writing a C++ program to perform calculations on a huge graph and therefore has to be as fast as possible. I have a 100MB textfile of unweighted edges and am reading them into a 2D vector of integers (first index = nodeID, then a sorted list of nodeIDs of nodes which have edges to that node). Also, during the program, the edges are looked up exactly in the order in which they're stored in the list. So my expectation was that, apart from a few bigger gaps, it'd always be nicely preloaded to the cache. However, according to my profiler, iterating through the edges of a player is an issue. Therefore I suspect, that the 2D vector isn't placed in memory compactly.
How can I ensure that my 2D vector is as compact as possible and the subvectors in the order in which they should be?
(I thought for example about making a "2D array" from the 2D vector, first an array of pointers, then the lists.)
BTW: In case it wasn't clear: The nodes can have different numbers of edges, so a normal 2D array is no option. There are a couple ones with lots of edges, but most have very few.
EDIT:
I've solved the problem and my program is now more than twice as fast:
There was a first solution and then a slight improvement:
I put the lists of neighbour ids into a 1D integer array and had another array to know where a certain id's neighbour lists start
I got a noticeable speedup by replacing the pointer array (a pointer needs 64 bit) with a 32 bit integer array containing indices instead
What data structure are you using for the 2d vector? If you use std::vector then the memory will be contiguous.
Next, if pointers are stored then only the address will take advantage of the vectors spacial locality. Are you accessing the object pointed to when iterating the edges and if so this could be a bottleneck. To get around this perhaps you can setup your objects so they are also in contiguous memory and take advantage of spacial locality.
Finally the way in which you access the members of a vector affects the caching. Make sure you are accessing in an order advantageous to the container used (eg change column index first when iterating).
Here's some helpful links:
Cache Blocking Techniques
SO on cache friendly code
I have written a few of these type structures by having a 2D view onto a 1D vector and there are lots of different ways to do it. I have never made one that allows the internal arrays to vary in length before so this may contain bugs but should illustrate the general approach:
#include <cassert>
#include <iostream>
#include <vector>
template<typename T>
class array_of_arrays
{
public:
array_of_arrays() {}
template<typename Iter>
void push_back(Iter beg, Iter end)
{
m_idx.push_back(m_vec.size());
m_vec.insert(std::end(m_vec), beg, end);
}
T* operator[](std::size_t row) { assert(row < rows()); return &m_vec[m_idx[row]]; }
T const* operator[](std::size_t row) const { assert(row < rows()); return &m_vec[m_idx[row]]; }
std::size_t rows() const { return m_idx.size(); }
std::size_t cols(std::size_t row) const
{
assert(row <= m_idx.size());
auto b = m_idx[row];
auto e = row + 1 >= m_idx.size() ? m_vec.size() : m_idx[row + 1];
return std::size_t(e - b);
}
private:
std::vector<T> m_vec;
std::vector<std::size_t> m_idx;
};
int main()
{
array_of_arrays<int> aoa;
auto data = {2, 4, 3, 5, 7, 2, 8, 1, 3, 6, 1};
aoa.push_back(std::begin(data), std::begin(data) + 3);
aoa.push_back(std::begin(data) + 3, std::begin(data) + 8);
for(auto row = 0UL; row < aoa.rows(); ++row)
{
for(auto col = 0UL; col < aoa.cols(row); ++col)
{
std::cout << aoa[row][col] << ' ';
}
std::cout << '\n';
}
}
Output:
2 4 3
5 7 2 8 1
I am creating a sparse matrix in CSR format, for which I start with a vector of matrix element structures. It needs to be std::vector at the beginning because I don't know ahead of time how many non-zeros my matrix is going to have. Then, to fill up the appropriate arrays for the CSR matrix, I need to first sort this array of non-zeros, in the order they appear in the matrix if one goes through it line-by-line. But above a certain matrix size (roughly 1 500 000 non-zeros), the sorted vector does not start from the beginning of the matrix. It is still sorted, but starts around row 44000.
// Matrix element struct:
struct mel
{
int Ncols;
int row,col;
MKL_Complex16 val;
void print();
};
// Custom function for sorting:
struct less_than_MElem
{
inline bool operator() (const mel& ME1, const mel& ME2)
{
return ( ( ME1.row*ME1.Ncols+ME1.col ) < ( ME2.row*ME2.Ncols+ME2.col ) );
}
};
int main()
{
std::vector<mel> mevec;
/* long piece of code that fills up mevec */
std::sort( mevec.begin(), mevec.end(), less_than_MElem() );
return 0;
}
I thought maybe as the vector was grown dynamically it wound up in separate blocks in the memory and the iterator wasn't pointing at the genuine beginning/end anymore. So I have tried creating a new vector and started with resizing it to the size that is known by that time. Then copied the elements one-by-one into this new vector and sorted it, but the result was the same.
Nelements = mevec.size();
std::vector<mel> nzeros;
nzeros.resize(Nelements);
for( int i = 0; i < Nelements; i++ )
{
nzeros[i].Ncols = mevec[i].Ncols;
nzeros[i].row = mevec[i].row;
nzeros[i].col = mevec[i].col;
nzeros[i].val = mevec[i].val;
}
std::sort( nzeros.begin(), nzeros.end(), less_than_MElem() );
Can anyone think of a solution?
Suppose I have a Mat of indices (locations) called B, We can say that this Mat has dimensions of 1 x 100 and We suppose to have another Mat, called A, full of data of the same dimensions of B.
Now, I would access to the data of A with B. Usually I would create a for loop and I would take for each elements of B, the right elements of A. For the most fussy of the site, this is the code that I would write:
for(int i=0; i < B.cols; i++){
int index = B.at<int>(0, i);
std::cout<<A.at<int>(0, index)<<std:endl;
}
Ok, now that I showed you what I could do, I ask you if there is a way to access the matrix A, always using the B indices, in a more intelligent and fast way. As someone could do in python thanks to the numpy.take() function.
This operation is called remapping. In OpenCV, you can use function cv::remap for this purpose.
Below I present the very basic example of how remap algorithm works; please note that I don't handle border conditions in this example, but cv::remap does - it allows you to use mirroring, clamping, etc. to specify what happens if the indices exceed the dimensions of the image. I also don't show how interpolation is done; check the cv::remap documentation that I've linked to above.
If you are going to use remapping you will probably have to convert indices to floating point; you will also have to introduce another array of indices that should be trivial (all equal to 0) if your image is one-dimensional. If this starts to represent a problem because of performance, I'd suggest you implement the 1-D remap equivalent yourself. But benchmark first before optimizing, of course.
For all the details, check the documentation, which covers everything you need to know to use te algorithm.
cv::Mat<float> remap_example(cv::Mat<float> image,
cv::Mat<float> positions_x,
cv::Mat<float> positions_y)
{
// sizes of positions arrays must be the same
int size_x = positions_x.cols;
int size_y = positions_x.rows;
auto out = cv::Mat<float>(size_y, size_x);
for(int y = 0; y < size_y; ++y)
for(int x = 0; x < size_x; ++x)
{
float ps_x = positions_x(x, y);
float ps_y = positions_y(x, y);
// use interpolation to determine intensity at image(ps_x, ps_y),
// at this point also handle border conditions
// float interpolated = bilinear_interpolation(image, ps_x, ps_y);
out(x, y) = interpolated;
}
return out;
}
One fast way is to use pointer for both A (data) and B (indexes).
const int* pA = A.ptr<int>(0);
const int* pIndexB = B.ptr<int>(0);
int sum = 0;
for(int i = 0; i < Bi.cols; ++i)
{
sum += pA[*pIndexB++];
}
Note: Be carefull with pixel type, in this case (as you write in your code) is int!
Note2: Using cout for each point access put the optimization useless!
Note3: In this article Satya compare four methods for pixel access and fastest seems "foreach": https://www.learnopencv.com/parallel-pixel-access-in-opencv-using-foreach/
I am writing a module that estimates optical flow. At each time step it consumes an std::vector where each element of the vector is a current pixel location and a previous pixel location. The vector is not ordered. New pixels that were previously not seen will be present and flow locations that were not found will be gone. Is there a correct way to match elements in the new vector to the set of optical flow locations being estimated?
The vectors are on the order of 2000 elements.
These are the approaches I am considering:
naively iterate through the new vector for each estimated optical flow location
naively iterating through the new vector but removing each matched location so the search gets faster as it goes on
run std::sort on my list and the new list at every time step. Then iterate through the new vector starting at the last matched index +1
I'm suspecting that there is an accepted way to go about this but I don't have any comp sci training.
I'm in c++ 11 if that is relevant.
// each element in the new vector is an int. I need to check if
// there are matches between the new vec and old vec
void Matcher::matchOpticalFlowNaive(std::vector<int> new_vec)
{
for(int i = 0; i < this->old_vec.size(); i++)
for(int j =0; j < new_vec.size(); j++)
if(this->old_vec[i] == new_vec[j]){
do_stuff(this->old_vec[i], new_vec[j])
j = new_vec.size();
}
}
Not sure to understand what do you need but, supposing that your Matcher is constructed with a vector of integer, that there ins't important the order and that you need check this vector with other vectors (method matchOpticalFlowNaive()) to do something when there is a match, I suppose you can write something as follows
struct Matcher
{
std::set<int> oldSet;
Matcher (std::vector<int> const & oldVect)
: oldSet{oldVect.cbegin(), oldVect.cend()}
{ }
void matchOpticalFlowNaive (std::vector<int> const & newVec)
{
for ( auto const & vi : newVec )
{
if ( oldSet.cend() != oldSet.find(vi) )
/* do something */ ;
}
}
};
where the Matcher object is constructed with a vector that is used to initialize a std::set (or a std::multi_set, or a unordered set/multiset?) to make simple the work in matchOpticalFlowNaive()