This question comes after solving the problem I got in this question. I have a c++ code that processes frames from a camera and generates a matrix for each processed frame. I want to send to matlab engine each matrix, so at the end of the execution I have in stored all the matrices.
I am conffused about how to do this, I send a matrix in each iteration but it is overwritting it all the time, so at the end I only have one. Here is a code example:
matrix.cpp
#include helper.h
mxArray *mat;
mat = mxCreateDoubleMatrix(13, 13, mxREAL);
memcpy(mxGetPr(mat),matrix.data, 13*13*sizeof(double));
engPutVariable(engine, "mat", mat);
I also tried to use a counter to dinamically name the different matrices, but it didn't work as matlab engine requires the variables to be defined first. Any help will be appreciated. Thanks.
You can create a cell array in matlab workspace like this:
mwSize size = 10;
mxArray* cell = mxCreateCellArray(1, &size);
for(size_t i=0;i<10;i++)
{
mxArray *mat;
mat = mxCreateDoubleMatrix(13, 13, mxREAL);
memcpy(mxGetPr(mat),matrix.data, 13*13*sizeof(double));
mwIndex subscript = i;
int index = mxCalcSingleSubscript(cell , 1,&subscript);
mxSetCell(m_cell , index, mat);
}
engPutVariable(engine, "myCell", cell);
If you don't know the number of frames a priori, don't try to expand the mxArray in C. It is not convenient. You were already close to start. All your problems can be solved with:
engEvalString(engine, "your command here")
Read more here.
The simplest approach is something like:
engPutVariable(engine, "mat", mat);
engEvalString("frames{length(frames)+1} = mat;");
Don't do it exactly that, it is an illustration and will be very slow. Much better to preallocate, say 1000 frames then expand it another 1000 (or a more appropriate number) when needed. Even better is to not use cell arrays which are slow. Instead you could use a 3D array, such as:
frames = zeros(13,13,1000);
frames(:,:,i) = mat;
i = i + 1;
Again, preallocate in blocks. You get the idea. If you really need to be fast, you could build the 3D arrays in C and ship them to MATLAB when they fill.
Maybe you can use vector<mxArray> from stdlib.
Related
I have a code where I'm reading 1024x1024 float matrix from disk then I'm getting some elements of it and doing some process on the new matrix as follows.
// mask is the 1Kx1K matrix that 1/64 element of it are 1 other elements are 0;
// it is a mask for **Mat data**
string filename = "filepath";
Mat data(1024,1024,CV_32F);
readMatrix(filename, data);
Mat smallMat(128,128,CV_32F);
getSmallerMat(data, mask, smallMat);
I read from float Mat from disk and fill smallMat using getSmallerMat(...) which is simply two for loops checking if mask(i,j) == 1, write to next position in smallMat
readMatrix(string fpath,Mat& data){
FILE* fp = fopen(fpath.c_str(),"rb");
if (!fp)perror("fopen");
int size = 1024;
data.create(size,size,CV_32F);
float* buffer= new float[size];
for(int i=0;i<size;++i) {
fread(buffer,sizeof(float),size,fp);
for(int j=0;j<size;++j){
data.at<float>(i,j)=buffer[j];
}
}
fclose(fp);
free(buffer);
}
What I want to do is just reading matrix elements whose corresponding value in mask is equal to 1. My problem is how will I pick (i,j)-th element from the disk.
Reading whole matrix and squeezing it takes 15 ms, I want to make it faster but I couldn't achieve to do it.
Consider this pic is my mask matrix. I want to read only white pixels only.
Thanks,
I am not sure that i understand the question correctly, but are you looking for a method to access data on the hard disk more quickly than via a stream? For finding some specific matrix element (i,j) in your stream you need to read the whole file (in the worst case), i.e. the complexity is linear, this can't be helped.
However, if you actually know the position in the fiel exactly (i.e. if you use a fixed length format for representing your doubles, etc.) seekg
http://www.cplusplus.com/reference/istream/istream/seekg/
should be faster than actually reading all characters until the desired position.
EDIT:
Given the discussion in comments to other answers I want to stress that using some seek in a file stream is O(N), hence multiple seeks for specific element will be way slower than just reading the whole file. I am not aware of a method to access data stored on hard disk in O(1). However, if all you ever need is matrices which are zero outside your mask, you should familiarize yourself with the concept of sparse matrices.
See e.g. https://en.wikipedia.org/wiki/Sparse_matrix and the documentation for your library, e.g. http://www.boost.org/doc/libs/1_39_0/libs/numeric/ublas/doc/matrix_sparse.htm
I am not sure if I have understood your problem or not; but if you want to read i,j th element from a file which contains the only float elements you should be able to get it like below -
float get(int i, int j, int rowsize, FILE * fp) {
float retVal = -1.0f; //-infinity may be?
// if you need restoring the stream pos
long lastPos = ftell(fp);
// ff to i*row + j
fseek(fp , ((i * rowsize) + j) * sizeof(float), SEEK_SET);
fread((unsigned char *)&retVal, sizeof(float), 1, fp);
// restore prevpos
// bla bla bla
return retVal;
}
You should be able to read any file which contains fixed size element very fast using the fseek and some arithmatic from start end or current file pointer. check the fseek documentation for more details.
From your code it appears your matrix is stored in binary as a memory image of the floats. What you want is go directly to the index on the disk where the (i,j) float is. You can compute this position using the following formula: index = i*colWidth+j where colWidth is 1024 in your case. You can use fseek and ftell to move your position and get your position in the file opened by fopen.
I have a 2D matrix and I want to copy its values to a 1D array vertically in an efficient way as the following way.
Matrice(3x3)
[1 2 3;
4 5 6;
7 8 9]
myarray:
{1,4,7,2,5,8,3,6,9}
Brute force takes 0.25 sec for 1000x750x3 image. I dont want to use vector because I give myarray to another function(I didnt write this function) as input. So, is there a c++ or opencv function that I can use? Note that, I'm using opencv library.
Copying matrix to array is also fine, I can first take the transpose of the Mat, then I will copy it to array.
cv::Mat transposed = myMat.t();
uchar* X = transposed.reshape(1,1).ptr<uchar>(0);
or
int* X = transposed.reshape(1,1).ptr<int>(0);
depending on your matrix type. It might copy data though.
You can optimize to make it more cache friendly, i.e. you can copy blockwise, keeping track of the positions in myArray, where the data should go to. The point is, that you brute force approach will most likely make each access to the matrix being off-cache, which has a tremendous performance impact. Hence it is better to copy vertical/horizontal taking the cache line size into account.
See the idea bbelow (I didn't test it, so it has most likely bugs, but it should make the idea clear).
size_t cachelinesize = 128/sizeof(pixel); // assumed cachelinesize of 128 bytes
struct pixel
{
char r;
char g;
char b;
};
array<array<pixel, 1000>, 750> matrice;
vector<pixel> vec(1000*750);
for (size_t row = 0; row<matrice.size; ++row)
{
for (size_t col = 0; col<matrice[0].size; col+=cachelinesize)
{
for (size_t i = 0; i<cachelinesize; ++i)
{
vec[row*(col+i)]=matrice[row][col+i]; // check here, if right copy order. I didn't test it.
}
}
}
If you are using the matrix before the vertical assignment/querying, then you can cache the necessary columns when you hit each one of the elements of columns.
//Multiplies and caches
doCalcButCacheVerticalsByTheWay(myMatrix,calcType,myMatrix2,cachedColumns);
instead of
doCalc(myMatrix,calcType,myMatrix2); //Multiplies
then use it like this:
...
tmpVariable=cachedColumns[i];
...
For example, upper function multiplies the matrix with another one, then when the necessary columns are reached, caching into a temporary array occurs so you can access elements of it later in a contiguous order.
I think Mat::reshape is what you want. It does not copying data.
I'm looking at a project involving online (streaming) data. I want to work with a sliding window of that data. For example, say that I want to hold 10 values in my vector. When value 11 comes in, I want to drop value 1, shift everything over, and then place value 11 where value 10 was.
The long way would be something like the following:
int n = 9;
thrust::device_vector<float> val;
val.resize(n+1,0);
// Shift left
for(int i=0; i != n-1; i++){
val[i] = val[i+1];
}
// add the new value to the last position
val[n] = newValue;
Is there a "fast" way to do this with thrust? The project I'm looking at will have around 500 vectors that will need this operation done simultaneously.
Thanks!
As I have said, Ring buffer is what you need. No need to shift there, only one counter and a fixed size array.
Let's think how we may deal with 500 of ring buffers.
If you want to have 500 (let it be 512) sliding windows and process them all on the GPU, then you might pack them into one big 2D texture, where each column is an array of samples for the same moment.
If you're getting new samples for each of the vector at once (I mean one new sample for each 512 buffers at one processing step), then this "ring texture" (like a cylinder) only needs to be updated once (upload the array of new samples at each step) and you need just one counter.
I highly recommend using a different, yet still free, library for this problem. In 4 lines of ArrayFire code, you can do all 500 vectors, as follows:
array val = array(window_width, num_vectors);
val = shift(val, 0, 1);
array newValue = array(1,num_vectors);
val(span,end) = newValue;
I benchmarked against Thrust code for the same and ArrayFire is getting about a 10X speedup over Thrust.
Downside is that ArrayFire is not open source, but it is still free for this sort of problem.
Want you want is simply thrust::copy. You can't do a shift in place in parallel, because you can't guarantee a value is read before it is written.
int n = 9;
thrust::device_vector<float> val_in(n);
thrust::device_vector<float> val_out(n+1);
thrust::copy(val_in.begin() + 1, val_in.end(), val_out.begin());
// add the new value to the last position
val_out[n] = newValue;
I am trying to write a bag of features system image recognition system. One step in the algorithm is to take a larger number of small image patches (say 7x7 or 11x11 pixels) and try to cluster them into groups that look similar. I get my patches from an image, turn them into gray-scale floating point image patches, and then try to get cvKMeans2 to cluster them for me. I think I am having problems formatting the input data such that KMeans2 returns coherent results. I have used KMeans for 2D and 3D clustering before but 49D clustering seems to be a different beast.
I keep getting garbage values for the returned clusters vector, so obviously this is a garbage in / garbage out type problem. Additionally the algorithm runs way faster than I think it should for such a huge data set.
In the code below the straight memcpy is only my latest attempt at getting the input data in the correct format, I spent a while using the built in OpenCV functions, but this is difficult when your base type is CV_32FC(49).
Can OpenCV 1.1's KMeans algorithm support this sort of high dimensional analysis?
Does someone know the correct method of copying from images to the K-Means input matrix?
Can someone point me to a free, Non-GPL KMeans algorithm I can use instead?
This isn't the best code as I am just trying to get things to work right now:
std::vector<int> DoKMeans(std::vector<IplImage *>& chunks){
// the size of one image patch, CELL_SIZE = 7
int chunk_size = CELL_SIZE*CELL_SIZE*sizeof(float);
// create the input data, CV_32FC(49) is 7x7 float object (I think)
CvMat* data = cvCreateMat(chunks.size(),1,CV_32FC(49) );
// Create a temporary vector to hold our data
// we'll copy into the matrix for KMeans
int rdsize = chunks.size()*CELL_SIZE*CELL_SIZE;
float * rawdata = new float[rdsize];
// Go through each image chunk and copy the
// pixel values into the raw data array.
vector<IplImage*>::iterator iter;
int k = 0;
for( iter = chunks.begin(); iter != chunks.end(); ++iter )
{
for( int i =0; i < CELL_SIZE; i++)
{
for( int j=0; j < CELL_SIZE; j++)
{
CvScalar val;
val = cvGet2D(*iter,i,j);
rawdata[k] = (float)val.val[0];
k++;
}
}
}
// Copy the data into the CvMat for KMeans
// I have tried various methods, but this is just the latest.
memcpy( data->data.ptr,rawdata,rdsize*sizeof(float));
// Create the output array
CvMat* results = cvCreateMat(chunks.size(),1,CV_32SC1);
// Do KMeans
int r = cvKMeans2(data, 128,results, cvTermCriteria(CV_TERMCRIT_EPS+CV_TERMCRIT_ITER, 1000, 0.1));
// Copy the grouping information to our output vector
vector<int> retVal;
for( int y = 0; y < chunks.size(); y++ )
{
CvScalar cvs = cvGet1D(results, y);
int g = (int)cvs.val[0];
retVal.push_back(g);
}
return retVal;}
Thanks in advance!
Though I'm not familiar with "bag of features", have you considered using feature points like corner detectors and SIFT?
You might like to check out http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/ for another open source clustering package.
Using memcpy like this seems suspect, because when you do:
int rdsize = chunks.size()*CELL_SIZE*CELL_SIZE;
If CELL_SIZE and chunks.size() are very large you are creating something large in rdsize. If this is bigger than the largest storable integer you may have a problem.
Are you wanting to change "chunks" in this function?
I'm guessing that you don't as this is a K-means problem.
So try passing by reference to const here. (And generally speaking this is what you will want to be doing)
so instead of:
std::vector<int> DoKMeans(std::vector<IplImage *>& chunks)
it would be:
std::vector<int> DoKMeans(const std::vector<IplImage *>& chunks)
Also in this case it is better to use static_cast than the old c style casts. (for example static_cast(variable) as opposed to (float)variable ).
Also you may want to delete "rawdata":
float * rawdata = new float[rdsize];
can be deleted with:
delete[] rawdata;
otherwise you may be leaking memory here.
I need some C++/pointer help. When I create an RGB IplImage and I want to access i,j I use the following C++ class taken from: http://www.cs.iit.edu/~agam/cs512/lect-notes/opencv-intro/opencv-intro.html
template<class T> class Image
{
private:
IplImage* imgp;
public:
Image(IplImage* img=0) {imgp=img;}
~Image(){imgp=0;}
void operator=(IplImage* img) {imgp=img;}
inline T* operator[](const int rowIndx) {
return ((T *)(imgp->imageData + rowIndx*imgp->widthStep));}
};
typedef struct{
unsigned char b,g,r;
} RgbPixel;
typedef struct{
float b,g,r;
} RgbPixelFloat;
typedef Image<RgbPixel> RgbImage;
typedef Image<RgbPixelFloat> RgbImageFloat;
typedef Image<unsigned char> BwImage;
typedef Image<float> BwImageFloat;
I've been working with CUDA so sometimes I have to put all the data into an array, I like to keep every channel in its own array, seems easier to handle the data that way. So I would usually do something like this:
IplImage *image = cvLoadImage("whatever.tif");
RgbImageFloat img(image);
for(int i = 0; i < exrIn->height; i++)
{
for(int j = 0; j < exrIn->width; j++)
{
hostr[j*data->height+i] = img[i][j].r;
hostg[j*data->height+i] = img[i][j].g;
hostb[j*data->height+i] = img[i][j].b;
}
}
I would then copy my data to the device, do some stuff with it, get it back to the host and then loop, yet again, through the array assigning the data back to the IplImage and saving my results.
It seems like I'm looping to much there has to be a faster way to do this with pointers but I'm lost, there has to be a more efficient way to do it. Is there a way I can simply use a pointer for every channel? I tried doing something like this but it didn't work:
float *hostr = &img[0][0].r
float *hostg = &img[0][0].b
float *hostb = &img[0][0].g
Any suggestions? Thanks!
EDIT:
Thanks everyone for answering. Maybe I wasn't very clear on my question. I am familiar on how to access channels and their data. What I am interested is in increasing the performance and efficiency of completely copying data off the IplImage to a standard array, more along the lines of what csl said so far. The problem I see is that the way data in an IplImage is arranged is "rgbrgbrgbrgb".
Firstly, if you're comfortable with C++, you should consider using OpenCV 2.0 which does away with different data types for images and matrices (IplImage* and CvMat*) and uses one structure (Mat) to handle both. Apart from automatic memory management and a truckload of useful routines to handle channels, etc. and some MATLAB-esque ones as well, it's really fun to use.
For your specific problem, you access the channels of an IplImage* with Mat, like this:
IplImage *image = cvLoadImage("lena.bmp");
Mat Lena(image);
vector<Mat> Channels;
split(Lena,Channels);
namedWindow("LR",CV_WINDOW_AUTOSIZE);
imshow("LR",Channels[0]);
waitKey();
Now you have the copies of each channel in the vector Channels.
If you don't want to use OpenCV2.0 and extract channels, note the following. OpenCV orders multi-channel images in the following manner:
x(1,1,1) x(1,1,2) x(1,1,3) x(1,2,1) x(1,2,2) x(1,2,3) ...
where x(i,j,k) = an element in row i of column j in channel k
Also, OpenCV pads it's images .. so don't forget to jump rows with widthStep which accounts for these padding gaps. And along the lines of what csl said, increase your row pointer in the outer loop (using widthStep) and increment this pointer to access elements in a row.
NOTE:
Since you're using 2.0 now, you can bypass IplImage* with Mat Lena = imread("Lena.bmp");.
There is room for a lot of improvement here. So much, that you should read up on how people access bitmaps.
First of all, increase memory locality as much as possible. This will increase cache hits, and performance. I.e., don't use three separate arrays for each color channel. Store each together, since you probably will be working mostly on pixels.
Secondly, don't do that y*width calculation for every pixel. When done in an inner loop, it consumes a lot of cycles.
Lastly, if you just want a complete copy of the image, then you could simply do a memcpy(), which is very fast. I couldn't deduce if you converted from floats to integers, but if not, use memcpy() for non-overlapping regions.
If you wonder how you can do this with pointers (kind of pseudo-code, and also not tested):
float *dst = &hostg[0][0];
RgbPixelFloat *src = &img[0][0];
RgbPixelFloat *end = &img[HEIGHT][WIDTH] + 1;
// copy green channel of whole image
while ( src != end ) {
*dst = src->g;
++dst;
++src;
}