What is the most efficient matrix representation in C++? - c++

I hope that this question is not OT.
I'm implementing a VLAD encoder using the VLFeat implementation and SIFT descriptors from different implementations to compare them (OpenCV, VLFeat, OpenSIFT).
This is supposed to be an high performance application in C++ (I know that SIFT is very inefficient, I'm implementing a parallel version of it).
Now, VLAD wants as input the pointer to a set of contiguous descriptors (math vectors). The point is that usually this SIFT descriptors are represented as a matrix, so it's easier to manage them.
So supposing that we have a matrix of 3 descriptors in 3 dimensions (I'm using these numbers for sake of simplicity, actually it's thousands of descriptors in 128 dimensions):
1 2 3
4 5 6
7 8 9
I need to do feed vl_vlad_encode with the pointer to:
1 2 3 4 5 6 7 8 9
An straightforward solution is saving descriptors in a cv::Mat m object and then pass m.data to vl_vlad_encode.
However I don't know if cv::Mat is an efficient matrix representation. For example, Eigen::Matrix is an alternative (I think it's easy to obtain the representation above using this object), but I don't know which implementation is faster/more efficient or if there is any other reason because I should prefer one instead of the other.
Another possible alternative is using std::vector<std::vector<float>> v, but I don't know if using v.data() I would obtain the representation above instead of:
1 2 3 *something* 4 5 6 *something* 7 8 9
Obviously *something* would mess up vl_vlad_encode.
Any other suggestion is more than welcome!

Unless you do some weird stuff (see here for details), data in a Mat are guaranteed to be continuous. You can think of a Mat as a lightweight wrapper over a float* (or other types) that allows easier access to the data. So it's as efficient as a pointer, but with a few nice-to-have abstractions.
If you need to efficiently load/save from/to file, you can save the Mat in binary format using matread and matwrite.

std::vector<std::vector<float>> v is not going to perform very well without some effort, since the memory will not be contiguous.
Once you have your memory contiguous, be it float[], float[][] or std::array/vector, how well it will perform will depend on how you iterate over your matrix. If it's random access, then it makes little difference; if you're iterating all columns per for then it's better to have your data grouped by column rather than row.

Related

C++ - std::vector safe multi-threading

I'm developing a program to calculate the determinant of matrix (probably big, up to 1000 rows).
Since it can be big, I use multi-threading when the dim(M) > 250 ( < 250 calculates in 100 ms).
My idea is to split the matrix in 4 parts and Gauss eliminate each part simultaneously, then recollect the matrix and calculate the determinant.
I would like to know, if it is safe to access one vector in multiple threads, since it's guaranteed that I will only access the different parts of it?
Also, what tips are good to limit the memory usage?
I use vector<vector<double>>, so 8 * 1000 * 1000 doubles can be a really big trouble.
It is safe as far as you don't change its size.
The content of the vector won't be moved unless you make a resize. So as long as you access to different parts of the memory it is safe.

Array for visited co-ordinates in c++

I am working on a game which has a map of 16000 X 9000 units, If I am at any point X,Y on map, I can see upto a radius of 2000 units. I wanted something from which I could manage whether I've visited a particular region or not. The main question is Should I take an array of bools? It will be too large bool visited[16000*9000]. So wanted advise, thanks. I am new to stackoverflow, sorry if I am not to the point.
If you need the discovered region to be circular (which your use of 'radius' implies) you have to use this huge arry, yes.
If it does not have to be a perfect circle, then you can simply downsample: say you use a roughness of 10 blocks - then you only need an array of 1600x90 size - a factor 100 reduction compared to the perfect circle.
It would indeed be inefficient to use an array of bool types. Mainly because the size of a bool in C++ can be luxuriously big. (On my platform, it's 8 bits long which means that 7 bits of it are not used.) The C++ standard does not specify the value of sizeof(bool).
Do consider using a std::vector<bool> instead: this is an explicit specialisation of std::vector and the C++ standard guarantees this is tightly packed: i.e. there is no wasted space. You might need a std::vector<std::vector<bool>> if you have difficultly acquiring one contiguous block of memory. This all said, some folk dislike the bool vector specialisation with a vengeance so do consider this carefully before diving in. (There is a movement to consider scheduling it for deprecation!)
Or you could lump areas of your graph together yourself into a set of integral types such as unsigned.

C++ performance - reading numbers from a file

I have a file in which are numbers divided by spaces (i.e. matrix). For xample:
1 2 3
4 5 6
I would like to read these numbers and store them in two dimensional array int**. I've found numerous solutions how to solve this, but I don't know, which of these gives the best performance.
Furthermore I would like to ask, if there is possibility to read the mentioned file in paralell.
EDIT: The data I want to read is much more bigger (I included the data only as an example), I would like store big matrices, possibly with rows of different lengths into the mentioned array for further manipulation
For the best performance to read the file in paralell, you can use couple copies of file. And you can use row index to quickly read particular row.

Optimizing for 3D imaging processes in C++

I am working with 3D volumetric images, possibly large (256x256x256). I have 3 such volumes that I want to read in and operate on. Presently, each volume is stored as a text file of numbers which I read in using ifstream. I save it as a matrix (This is a class I have written by dynamic allocation of a 3D array). Then I perform operations on these 3 matrices, addition, multiplication and even Fourier transform. So far, everything works well, but, it takes a hell lot of time, especially the Fourier transform since it has 6 nested loops.
I want to know how I can speed this up. Also, whether the fact that I have stored the images in text files makes a difference. Should I save them as binary or in some other easier/faster to read in format? Is fstream the fastest way I can read in? I use the same 3 matrices each time without changing them. Does that make a difference? Also, is pointer to pointer to pointer the best way to store a 3D volume? If not what else can I do?
Also, is pointer to pointer to pointer best way to store a 3d volume?
Nope thats usually very ineficient.
If not what else can I do?
Its likely that you will get better performance if you store it in a contiguous block, and use computed offsets into the block.
I'd usually use a structure like this:
class DataBlock {
unsigned int nx;
unsigned int ny;
unsigned int nz;
std::vector<double> data;
DataBlock(in_nx,in_ny,in_nz) :
nx(in_nx), ny(in_ny), nz(in_nz) , data(in_nx*in_ny*in_nz, 0)
{}
//You may want to make this check bounds in debug builds
double& at(unsigned int x, unsigned int y, unsigned int z) {
return data[ x + y*nx + z*nx*ny ];
};
const double& at(unsigned int x, unsigned int y, unsigned int z) const {
return data[ x + y*nx + z*nx*ny ];
};
private:
//Dont want this class copied, so remove the copy constructor and assignment.
DataBlock(const DataBlock&);
DataBlock&operator=(const DataBlock&);
};
Storing a large (2563 elements) 3D image file as plain text is a waste of resources.
Without loss of generality, if you have a plain text file for your image and each line of your file consists of one value, you will have to read several characters until you find the end of the line (for a 3-digit number, these will be 4 bytes; 3 bytes for the digits, 1 byte for newline). Afterwards you will have to convert these single digits to a number. When using binary, you directly read a fixed amount of bytes and you will have your number. You could and should write and read it as a binary image.
There are several formats for doing so, the one I would recommend is the meta image file format of VTK. In this format, you have a plain text header file and a binary file with the actual image data. With the information from the header file you will know how large your image is and what datatype you will be using. In your program, you then directly read the binary data and save it to a 3D array.
If you really want to speed things up, use CUDA or OpenCL which will be pretty fast for your applications.
There are several C++ libraries that can help you with writing, saving and manipulating image data, including the before-mentioned VTK and ITK.
2563 is a rather large number. Parsing 2563 text strings will take a considerable amount of time. Using binary will make the reading/writing process much faster because it doesn't require converting a number to/from string, and using much less space. For example to read the number 123 as char from a text file the program will need to read it as a string and convert from decimal to binary using lots of multiplies by 10. Whereas if you had written it directly as the binary value 0b01111011 you only need to read that byte back again into memory, no conversion at all.
Using hexadecimal number may also increase reading speed since each hex digit can map directly to binary value but if you need more speed, binary file is the way to go. Just a fread command is enough to load the whole 2563 bytes = 16MB file into memory in less than 1 sec. And when you're done, just fwrite it back to file. To speedup you can use SIMD (SSE/AVX), CUDA or another parallel processing technique. You can improve the speed even further by multithreading or by only saving the non zero values because in many cases, most values will often be 0's.
Another reason maybe because your array is large and each dimension is a power of 2. This has been discussed in many questions on SO:
Why is there huge performance hit in 2048x2048 versus 2047x2047 array multiplication?
Why is my program slow when looping over exactly 8192 elements?
Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?
You may consider changing the last dimension to 257 and try again. Or better use another algorithm like divide and conquer that's more cache friendly
You should add timers around the load and the process so you know which is taking the most time, and focus your optimization efforts on it. If you control the file format, make one that is more efficient to read. If it is the processing, I'll echo what previous folks have said, investigate efficient memory layout as well as GPGPU computing. Good luck.

MPI_SCATTER Fortran Matrices by Rows

What is the best way to scatter a Fortran 90 matrix by its rows rather than columns? That is, let's say I have a matrix a(4,50) and I want to MPI_SCATTER it onto two processes where each part is alocal(2,50), where rank 0 has rows 1 and 2, and rank 1 has 3 and 4. Now, in C, this is simple since arrays are row-major, but in Fortran 90 they are column-major.
I'm trying to avoid using TRANSPOSE to flip a before scattering (i.e, doubling the memory use), and I figure there must be a way in MPI to do this. Would it be MPI_TYPE_VECTOR? MPI_TYPE_CREATE_SUBARRAY?
Likewise, what if I have a 3d array b(4,50,3) and I want two scattered matrices of blocal(2,50,3) distributed as above?
Yes, MPI_TYPE_VECTOR and MPI_TYPE_CREATE_SUBARRAY are what you want. The former for your first problem, the latter for your second. Comment if you want me to write the calls for you !
Didn't most of the MPI data transfer calls have a stride argument? Set it to the size of the data type times the height of the matrix and there you go...
I've taken a look to the MPI reference and there wasn't a explicit argument to that, but if you go to the example 5.12, they show how to send strided ints with MPI_Scatterv and MPI_Gatherv.