Fastest between mat of Vec2 or 2 Mat - c++

I have a question concerning access speed of OpenCV matrix.
I currently need two channels of unsigned char to contain my data.
But at one point i need to split my data to process them separately (which probably results in a matrix copy)
for( auto ptr = ROI.begin<cv::Vec2b>(); ptr!=ROI.end<cv::Vec2b>();ptr++){
//insert values
}
cv::split(ROI,channels_vector)
process(channels_vector[0]);
process(channels_vector[1]);
more_stuff(ROI);
My question is the following :
Should I use two different matrix at the beginning to avoid the split or let it like this ?
Or as it may depend on my computation what is the difference of cost between two accesses of a matrix and a matrix copy ?

Related

Memory Efficiency - Eigen::VectorXd in a loop

I have a Measurement object that has two Eigen::VectorXd members -- one for position and the other velocity.
Measurements are arranged in a dataset by scans -- i.e., at each timestep, a new scan of measurements is added to the dataset. These types are defined as:
typedef std::shared_ptr<Measurement> MeasurementPtr;
typedef std::vector<MeasurementPtr> scan_t;
typedef std::vector<scan_t> dataset_t;
At the beginning of each iteration of my algorithm, I need to apply a new transformation to each measurement. Currently, I have:
for (auto scan = dataset_.begin(); scan != dataset_.end(); ++scan)
for (auto meas = scan->begin(); meas != scan->end(); ++meas) {
// Transform this measurement to bring it into the same
// coordinate frame as the current scan
if (scan != std::prev(dataset_.end())) {
core::utils::perspective_transform(T_, (*meas)->pos);
core::utils::perspective_transform(T_, (*meas)->vel);
}
}
Where perspective_transform is defined as
void perspective_transform(const Eigen::Projective2d& T, Eigen::VectorXd& pos) {
pos = (T*pos.homogeneous()).hnormalized();
}
Adding this code increases computation time by 40x when I run the algorithm with scans in the dataset with 50 measurements in each scan -- making it rather slow. I believe this is because I have 550 small objects, each with 2 Eigen memory writes. I removed the writing of the result to memory and my benchmark shows only a slight decrease -- suggesting that this is a memory-efficiency problem and not a computation bottleneck.
How can I speed up this computation? Is there a way to first loop through and create an Eigen::Matrix from Eigen::Map that I could then do the computation once and have it automatically update the two members of the all the Measurement objects?
You might want to rework your data-structures.
Currently you have an array-of-struct (AOS), with a number of indirections.
A structure-of-arrays (SOA) is generally more efficient in memory access.
What about ?:
struct Scant_t
{
Eigen::MatrixXd position;
Eigen::MatrixXd velocity;
}
the .rowwise() and .colwise() operators might be powerfull enough to do the homogeneous transform, which would save you writing the inner loop.

What data structure is prefered instead of manipulating multiple vectors

I have implemented a class that makes computations on images. The processing is being done on a subset of the given images (lets say 100 out of 1000) at a time and each image takes a different number of iterations to finish. The processing uses GPUs and therefore it is not possible to use all the images at once. When the processing of an image is finished then this image is removed and another one is added. So I am using three different vectors image_outcome, image_index, image_operation to keep infromations about the images:
The image_outcome is a std::vector<float> and each of its elements is a value that is used as a criterion to decide when the image is finished.
The image_index is a std::vector<int> that holds the index of image in the original dataset.
The image_operation is a std::vector<MyEnumValue> that holds the operation that is used to update the image_outcome. Is of an enum type and its value is one of many possible operations.
There are also two functions, one to remove the finished images and one to add as many images as removed (if there are still enough in the input).
The remove_images() function takes all three matrices and the image matrix and removes the elements using the std::vector.erase().
The add_images() takes again the three matrices and the image matrix adds new images and the relevant information to the vectors.
Because I am using an erase() on each vector with the same index (and also a similar way to add) I was thinking to:
Use a private struct that has three vectors (nested struct).
Use a private class that is implemented using three vectors (nested class).
Use a different data-structure other than vec.
A hight-level example of the code can be fund below:
class ComputationClass {
public:
// the constructor initializes the member variables
ComputationClass();
void computation_algorithm(std::vector<cv::Mat> images);
private:
// member variables which define the algorithms parameters
// add_images() and remove_images() functions take more than these
// arguments, but I only show the relevant here
add_images(std::vector<float>&, std::vector<int>&, std::vector<MyEnumValue>&);
remove_images(std::vector<float>&, std::vector<int>&, std::vector<MyEnumValue>&);
};
void ComputationClass::computation_algorithm(std::vector<cv::Mat> images) {
std::vector<float> image_output;
std::vector<int> image_index;
std::vector<MyEnumValue> image_operation;
add_images(image_output, image_index, image_operation);
while (there_are_still_images_to_process) {
// make computations by updating the image_output vector
// check which images finished computing
remove_images(image_output, image_index, image_operation);
add_images(image_output, image_index, image_operation);
}
}
I think, instead of a struct with 3 vectors, a single vector of user-defined objects would work better.
std::vector<MyImage> images;
class MyImage {
Image OImage; // the actual image
float fOutcome;
int dIndex;
MyEnumValue eOperation;
bool getIsDone() {
return fOutcome > 0; // random condition
}
}
You can add to vector or erase from vector with a condition
if( (*it).getIsDone() ) {
VMyVector.erase( it );
}
In my opinion, maintaining 3 vectors that go parallel is easy to make mistakes and hard to modify.

Armadillo: efficient RAM sparse batch insertion

I know that Sparse matrix in armadillo is still in preliminary support.
I'm using armadillo lib in my quantum systems research and I have problem to construct sparse mat in effective RAM way.
So far I was using my own implementation of sparse matrixes, but I want to have an optimized matrix class.
I'm filling elements in batch mode:
umat loc(2,size);
cx_vec val(size);
// calculate loc and val
...
//
sp_cx_mat Hamiltonian(loc, val);
This kind of action copy values from loc,val to constructor of Hamiltonian and for some few seconds require 2x RAM. I calculate huge matrix (size is about 2**L, where L=22, 24, ...) so I wish I had well optimised code in memory.
For comparison, matrix size: 705432x705432 - RAM and "filling time":
my implementation (COO format): time 7.95s, memory 317668kB
armadillo (CSC format): time 5.32s, memory 715000kB
Is it possible to deallocate fragments of vectors: loc, val on the fly to save memory, element by element?
The answer here will be to use the other sparse matrix constructor that takes the CSC format, so you will need to modify your // calculate loc and val code, instead filling the following three arrays:
values (length equal to number of points)
row_indices (length equal to number of points)
col_ptrs (length equal to number of columns plus one)
The points should be arranged in column-major ordering in the values and row_indices vectors, and the col_ptrs vector contains the number of nonzero elements before the beginning of the column. That is, col_ptrs[0] will always contain 0, col_ptrs[1] will contain the number of nonzero elements in the first column, col_ptrs[2] will contain the number of nonzero elements in the first and second columns, and col_ptrs[n_cols + 1] will contain the number of nonzero elements in the matrix.
For more documentation on this constructor, see the "Batch constructors" section of http://arma.sourceforge.net/docs.html#SpMat ; this is the fourth entry in that list.
If you cannot easily modify your calculation code to adhere to that format, then you might be better off trying to specify sort_locations = false to the constructor you are using, if you are not already doing that.

How to get values of a Matrix which are non zero

I am translating some matlab code to c++ using opencv. I want to get the values of a Matrix which satisfies a condition. I created a mask for this and when I apply it to the original Matrix I get the same size of the original matrix but with 0 values which are not there in the mask. But my question is how can I get only the values that are non-zero in the matrix and assign it to a different matrix.
My matlab code is:
for i= 1:size(no,1)
mask= labels==i;
op = orig(mask, :); //op only contains the values from the orig matrix which are in the mask. So orig size and op size is not the same
.....
end
The c++ translation that I have now is:
for (int i=0; i<n.rows; i++)
{
Mat mask;
compare(labels,i,mask,CMP_EQ);
Mat op;
orig.copyTo(op,mask); //Here the orig size and the op size is always same but values which are not in the mask are 0
}
So, how can I create a matrix which only has the values that the mask satisfies???
You might try to make use of cv::SparseMat (http://docs.opencv.org/modules/core/doc/basic_structures.html#sparsemat), which only keeps non-zero values in a hash.
When you assign a regular cv::Mat to a cv::SparseMat, it automatically captures the non-zero values. From that point, you can iterate through the non-zero values and manipulate them as you'd like.
Hope I got question correctly and it helps!
OpenCv does support Matrix Expresions like A > B or A <= Band so on.
This is stated in the Documentation off cv::Mat
If you're simply wanting to store values, the Mat object is probably not the best way, since it has been made for the purpose of containing images.
In that case, use an std::vector object instead of the cv::Mat object, and you can use the .push_back handle whenever you find an element that is non-zero, which will dynamically resize the vector.
If you're trying to create a new image, then you have to be specific about what kind of image you want to see, because if you don't know how many nonzero elements there are, how can you set the width and height? Also you might end up with an odd number of elements.

OpenCV Add columns to a matrix

in OpenCV 2 and later there is method Mat::resize that let's you add any number of rows with the default value to your matrix is there any equivalent method for the column. and if not what is the most efficient way to do this.
Thanks
Use cv::hconcat:
Mat mat;
Mat cols;
cv::hconcat(mat, cols, mat);
Worst case scenario: rotate the image by 90 degrees and use Mat::resize(), making columns become rows.
Since OpenCV, stores elements of matrix rows sequentially one after another there is no direct method to increase column size but I bring up myself two solutions for the above matter,
First using the following method (the order of copying elements is less than other methods), also you could use a similar method if you want to insert some rows or columns not specifically at the end of matrices.
void resizeCol(Mat& m, size_t sz, const Scalar& s)
{
Mat tm(m.rows, m.cols + sz, m.type());
tm.setTo(s);
m.copyTo(tm(Rect(Point(0, 0), m.size())));
m = tm;
}
And the other one if you are insisting not to include even copying data order into your algorithms then it is better to create your matrix with the big number of rows and columns and start the algorithm with the smaller submatrix then increase your matrix size by Mat::adjustROI method.