For one of my project, I took interest in Caffe, and more generaly in deep learning. After several hours, I managed to get Caffe installed on my computer. I am now trying to make us of it.
So I have already loaded the network as follow :
std::string model_file = "/home/CXX/Desktop/caffemodel/deploy.prototxt";
std::string trained_file = "/home/CXX/Desktop/caffemodel/modelWeights.caffemodel";
Caffe::set_mode(Caffe::CPU);
boost::shared_ptr<Net<float>> net_;
net_.reset(new Net<float>(model_file, TEST));
net_->CopyTrainedLayersFrom(trained_file);
The loaded network and weights are not mine. Please find below the structure of the input and output layers:
name: "simple_conv-dense"
input: "data"
input_dim: 1
input_dim: 1
input_dim: 250
input_dim: 250
layer {
name: "conv1"
bottom: "data"
type: "Convolution"
top: "conv1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
The input is a single 250*250 depth (values normalized between 0 and 1) "image". The pre-processing is already handled, and my data is stored in a Matrix (personnal library, pointers to std::vector elements), so that you can access it this way like a 2D array (data[i][j])
The output of the network is organized in this order: [NbBlob][NbClass][outHeight][outWidth], giving in my case [1][46][250][250]
I have already written the code to retrieve the output:
Blob<float>* output_layer = net_->output_blobs()[0];
const float* begin = output_layer->cpu_data();
for (int k = 0; k < 46; k++)
for (int h = 0; h < 250; h++)
for (int w = 0; w < 250; w++){
currentprob = *(begin + ((k * 250 + h) * 250 + w));
This code has been checked, by summing the pixel-wise 46 class predections, giving obviously 1 as a result for a single pixel.
My problem is that I do not know how to feed my data in the network. I first retrieve the input layer by this method:
Blob<float>* input_layer = net_->input_blobs()[0];
From the debugger, I do know that input_layer has an attribute named capacity_ which has the expected value (62 500, being 250*250).
So here is my question: How can one feed his data into the input layer? I have spend quite some time looking by myself, but I have no idea where to look anymore.
Please note that I am not using OpenCV, and that I have barely any background on Deep learning (Bachelor student).
Thank you for the time you might spend helping me. Any kind of help (documentation, pseudo-code, code, explanations) is very welcome.
PS: using namespace caffe;
EDIT: added more input layer info. Typos.
I would try directly push the data to the net:
Blob<float>* input_layer = net_->input_blobs()[0];
float* input_data = input_layer->mutable_cpu_data(); // get pointer to Blob's data storage
for ( int i=0; i < 250; i++ ) {
for ( int j=0; j < 250; j++ ) {
input_data[i*250 + j] = data[i][j]; // I hope I did not filp anything here...
}
}
net_->forward(); // do forward pass
Depending on how your data is arranged, you might be able to replace the nested loop with a more elegant memcpy...
Related
i'm writing a code for SVM detection using Opencv. Actualy, for the training data, i've two matrix (positive and negative features) creating by:
const size_t N=12;
std::vector<std::array<int,N>> matrixForTrainingDataPos;
std::vector<std::array<int,N>> matrixForTrainingDataNeg;
populated with 12 features for each image. I've 100 positive images and 140 negative images and then matrixForTrainingDataPos is [100][12] and matrixForTrainingDataNeg[140][12]. Now i have to concatenate them to get:
float trainingData[240][12] = {--------};
Mat trainingDataMat(240, 12, CV_32FC1, trainingData);
I tried with some operation as pusk_back but I did not succeed. I am , however, managed to build an array of 240 elements for the labeling: 100 with 1 and 140 with -1 using two for cicle. Next step is save a trainingData on xml file so that once launched the program if there is no file creates it , avoiding all the processing of trainingData if you have already made
can you help me?
tanks!
int count = 0;
for(int i = 0; i < matrixForTrainingDataPos.size(); i++)
{
for (int j = 0; j < N; j++)
{
trainingData[count][j] = matrixForTrainingDataPos[i][j];
}
count++;
}
/* copy negative sample matrix data */
it works. But the compiler returned an exception for me so I had to declare ad unsigned the i j variables: for (unsigned i=0......
for my second question? how i save and load this matrix on xml file, on first passage, to avoid in the next steps to be recalculated?
I´m doing a program with opencv and a stereo camera. I want to know what detected point in firs camera below with what detected point in second camera. The think is I have some detectors, extractors and matches methods, and following the example in opencv I have a algorithm to filter the matches and only draw good matches but in my case the min_dist parameter depends on my trackBar position.
This is the code of the opencv example:http://docs.opencv.org/doc/tutorials/features2d/feature_flann_matcher/feature_flann_matcher.html#feature-flann-matcher
And there are the changes that I did to move the minimum distance between matches.
//TrackBar position
dist_track = getTrackbarPos(nombreTrackbar, BUTTON_WINDOW);
cout <<"Posicion de la barra: " << dist_track << endl;
good_matches.clear();
//Obtain good_matches
for( int i = 0; i < descriptors[0].rows; i++ )
{ if( matches[i].distance <= coef*dist_track)
{ good_matches.push_back( matches[i]);}
}
The main think is that when I put the trackBar at the begining I have correct matches but when the trackbar is put in the end, the matches that I found aren´t correct. In this case I found a lot of matches but many of them are wrong.
Now I´m trying to do correctly the images. I want to use a mask in drawmatches function to force that the second-camera-points detected has to be near to the epipolar line. Can someone ask me something about it?
Do someone knows how to use the mask parameter to force that the founded matcher need to be in the epipolar line?
Or how to create the mask parameter?
Thanks friends!
Finally I decided to change the way i will operate. I'm trying to cut the originals images and keep only the necessary information. I mean I get out the information of the photo that it doesn't mind for my application and only keep the information that I'm goint to use.
My idea is to use the epipolar lines of both cameras to determine the interest area that I have, I will calculate where is the epipolar lines in both images and then y cut the images and only keep the information where the epipolar lines are.
Doing it I obtain two new images and my idea is to pass the news images to the matcher method to see if I can obtain more successful matching.
Image before cut off:
Image later cut off:
However I have a problem with the computation time. My code requiresa high computacional cost and sometimes the program fails. The error says "Segmentation fault:11".
"Bus error 10" appears if I quit the waitKey() line.
My code to copy the main image content in the second one is there:
> for (int i=0; i<RightEpipolarLines.rows; i++) {
> float m = -RightEpipolarLines(i, 0)/RightEpipolarLines(i, 1);
> float n = -RightEpipolarLines(i, 2)/RightEpipolarLines(i, 1);
> for (int x = 0; x < 480; x++) {
> float y_prima = m*x + n;
> int y = int (y_prima);
> (cut_image[0].at<float>(y, x)) = capture[0].at<float>(y, x);
> }
>
> }
> waitKey();
> for (int i = 0; i<LeftEpipolarLines.rows; i++) {
> float m = LeftEpipolarLines(i, 0)/LeftEpipolarLines(i, 1);
> float n = -LeftEpipolarLines(i, 2)/LeftEpipolarLines(i, 1);
> for (int x = 0; x < 480; x++) {
> float y_prima = m*x + n;
> int y = int (y_prima);
> (cut_image[1].at<float>(y, x)) = capture[1].at<float>(y, x);
> }
> }
> waitKey();
Does someone know how to pass the information between real capture and cut_image more efficiently? I only would like to pass the pixels information that are near the epipolar lines.
I am using v4l2 api to grab images from a Microsoft Lifecam and then transferring these images over TCP to a remote computer. I am also encoding the video frames into a MPEG2VIDEO using ffmpeg API. These recorded videos play too fast which is probably because not enough frames have been captured and due to incorrect FPS settings.
The following is the code which converts a YUV422 source to a RGB888 image. This code fragment is the bottleneck in my code as it takes nearly 100 - 150 ms to execute which means I can't log more than 6 - 10 FPS at 1280 x 720 resolution. The CPU usage is 100% as well.
for (int line = 0; line < image_height; line++) {
for (int column = 0; column < image_width; column++) {
*dst++ = CLAMP((double)*py + 1.402*((double)*pv - 128.0)); // R - first byte
*dst++ = CLAMP((double)*py - 0.344*((double)*pu - 128.0) - 0.714*((double)*pv - 128.0)); // G - next byte
*dst++ = CLAMP((double)*py + 1.772*((double)*pu - 128.0)); // B - next byte
vid_frame->data[0][line * frame->linesize[0] + column] = *py;
// increment py, pu, pv here
}
'dst' is then compressed as jpeg and sent over TCP and 'vid_frame' is saved to the disk.
How can I make this code fragment faster so that I can get atleast 30 FPS at 1280x720 resolution as compared to the present 5-6 FPS?
I've tried parallelizing the for loop across three threads using p_thread, processing one third of the rows in each thread.
for (int line = 0; line < image_height/3; line++) // thread 1
for (int line = image_height/3; line < 2*image_height/3; line++) // thread 2
for (int line = 2*image_height/3; line < image_height; line++) // thread 3
This gave me only a minor improvement of 20-30 milliseconds per frame.
What would be the best way to parallelize such loops? Can I use GPU computing or something like OpenMP? Say spwaning some 100 threads to do the calculations?
I also noticed higher frame rates with my laptop webcam as compared to the Microsoft USB Lifecam.
Here are other details:
Ubuntu 12.04, ffmpeg 2.6
AMG-A8 quad core processor with 6GB RAM
Encoder settings:
codec: AV_CODEC_ID_MPEG2VIDEO
bitrate: 4000000
time_base: (AVRational){1, 20}
pix_fmt: AV_PIX_FMT_YUV420P
gop: 10
max_b_frames: 1
If all you care about is fps and not ms per frame (latency), another option would be a separate thread per frame.
Threading is not the only option for speed improvements. You could also perform integer operations as opposed to floating point. And SIMD is an option. Using an existing library like sws_scale will probably give you the best performance.
Mak sure you are compiling -O3 (or -Os).
Make sure debug symbols are disabled.
Move repeated operations outside the loop e.g.
// compiler cant optimize this because another thread could change frame->linesize[0]
int row = line * frame->linesize[0];
for (int column = 0; column < image_width; column++) {
...
vid_frame->data[0][row + column] = *py;
You can precompute tables, so there is no math in the loop:
init() {
for(int py = 0; py <= 255 ; ++py)
for(int pv = 0; pv <= 255 ; ++pv)
ytable[pv][py] = CLAMP(pv + 1.402*(py - 128.0));
}
for (int column = 0; column < image_width; column++) {
*dst++ = ytable[*pv][*py];
Just to name a few options.
I think unless you want to reinvent the painful wheel, using pre-existing options (ffmpeg' libswscale or ffmpeg's scale filter, gstreamer's scale plugin, etc.) is a much better option.
But if you want to reinvent the wheel for whatever reason, show the code you used. For example, thread startup is expensive, so you'd want to create the threads before measuring your looptime and reuse threads from frame-to-frame. Better yet is frame-threading, but that adds latency. This is usually ok but depends on your use case. More importantly, don't write C code, learn to write x86 assembly (simd), all previously mentioned libraries use simd for such conversions, and that'll give you a 3-4x speedup (since it allows you to do 4-8 pixels instead of 1 per iteration).
You could build blocks of x lines and convert each block in a separate thread
do not mix integer and floating point arithmetic!
char x;
char y=((double)x*1.5); /* ouch casting double<->int is slow! */
char z=(x*3)>>1; /* fixed point arithmetic rulez */
use SIMD (though this would be easier if both input and output data were properly aligned...e.g. by using RGB8888 as output)
use openMP
an alternative that does not require any coding of the processing, would be to simply do your entire processing using a framework that does proper timestamping throughout the pipeline (starting at image acquisition time), and is hopefully optimized enough to deal with big data. e.g. gstreamer
Would something like this not work?
#pragma omp parallel for
for (int line = 0; line < image_height; line++) {
for (int column = 0; column < image_width; column++) {
dst[ ( image_width*line + column )*3 ] = CLAMP((double)*py + 1.402*((double)*pv - 128.0)); // R - first byte
dst[ ( image_width*line + column )*3 + 1] = CLAMP((double)*py - 0.344*((double)*pu - 128.0) - 0.714*((double)*pv - 128.0)); // G - next byte
dst[ ( image_width*line + column )*3 + 2] = CLAMP((double)*py + 1.772*((double)*pu - 128.0)); // B - next byte
vid_frame->data[0][line * frame->linesize[0] + column] = *py;
// increment py, pu, pv here
}
Of course you have to also handle incrementing py, py, pv part accordingly.
Usually transformation of pixel format is performed with using of only integer variables.
It's allow to prevent conversion between float point and integer variables.
Also it's allow to use more effectively SIMD extensions of modern CPUs.
For example, this is a code of conversion YUV to BGR:
const int Y_ADJUST = 16;
const int UV_ADJUST = 128;
const int YUV_TO_BGR_AVERAGING_SHIFT = 13;
const int YUV_TO_BGR_ROUND_TERM = 1 << (YUV_TO_BGR_AVERAGING_SHIFT - 1);
const int Y_TO_RGB_WEIGHT = int(1.164*(1 << YUV_TO_BGR_AVERAGING_SHIFT) + 0.5);
const int U_TO_BLUE_WEIGHT = int(2.018*(1 << YUV_TO_BGR_AVERAGING_SHIFT) + 0.5);
const int U_TO_GREEN_WEIGHT = -int(0.391*(1 << YUV_TO_BGR_AVERAGING_SHIFT) + 0.5);
const int V_TO_GREEN_WEIGHT = -int(0.813*(1 << YUV_TO_BGR_AVERAGING_SHIFT) + 0.5);
const int V_TO_RED_WEIGHT = int(1.596*(1 << YUV_TO_BGR_AVERAGING_SHIFT) + 0.5);
inline int RestrictRange(int value, int min = 0, int max = 255)
{
return value < min ? min : (value > max ? max : value);
}
inline int YuvToBlue(int y, int u)
{
return RestrictRange((Y_TO_RGB_WEIGHT*(y - Y_ADJUST) +
U_TO_BLUE_WEIGHT*(u - UV_ADJUST) +
YUV_TO_BGR_ROUND_TERM) >> YUV_TO_BGR_AVERAGING_SHIFT);
}
inline int YuvToGreen(int y, int u, int v)
{
return RestrictRange((Y_TO_RGB_WEIGHT*(y - Y_ADJUST) +
U_TO_GREEN_WEIGHT*(u - UV_ADJUST) +
V_TO_GREEN_WEIGHT*(v - UV_ADJUST) +
YUV_TO_BGR_ROUND_TERM) >> YUV_TO_BGR_AVERAGING_SHIFT);
}
inline int YuvToRed(int y, int v)
{
return RestrictRange((Y_TO_RGB_WEIGHT*(y - Y_ADJUST) +
V_TO_RED_WEIGHT*(v - UV_ADJUST) +
YUV_TO_BGR_ROUND_TERM) >> YUV_TO_BGR_AVERAGING_SHIFT);
}
This code is taken here (http://simd.sourceforge.net/). Also here there is a code optimized for different SIMDs.
I'm trying to partition a cv::Mat into smaller cv::Mat's using OpenCV. I found this method online but I can't get it to work. I want to partition a cv::Mat of, say 640 x 480 into blocks of say, 32 x 32 and operate on each block individually as I go along.
Here is my code. curr_frame contains the total image as a cv::Mat. N_per_col and N_per_row contain the number of mb_sz x mb_sz blocks per column and row respectively.
void ClassName::partition( void )
{
for( i = 0; i < N_per_col; i += mb_sz )
{
for( j = 0; j < N_per_row; j += mb_sz )
{
cv::Mat tmp_img( curr_frame, cv::Rect( i, j, mb_sz, mb_sz ) );
// Do stuff with tmp_img here
}
}
}
This compiles fine but at runtime I get an image full of NULL pixels in tmp_img. curr_frame is definitely OK, as I can view it with imshow().
The documentation is not very clear on this, so any help would be greatly appreciated.
As i mentioned in the comments, the code is corret. to be sure, I tested it with opencv 2.4.1 and the result was as you would expect. so i guess the problem is with something else not mentioned here.
I'm trying to create a simple OCR application with SVM, openCV, C++ and Visual Studio 2008 (mfc app).
My training samples are binary images of machine-printed digits (0-9). I want to use DAGSVM for this multi-class problem. So I need to create 45 SVMs, each of which is the SVM of 2 class (SVM(0,1), SVM(0,2)... SVM(8,9)).
Here's how things are going:
SVM's parameters:
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::LINEAR;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);
Data of training images of class i are stored in matrix trainData[i] (each row is the pixels of a 28x28 image, which means the matrix has 784 cols).
When training each SVM, I create 2 matrix called curTrainData & curTrainLabel.
for (int i = 0; i < 9; i++)
for (int j = i+1; j < 10; j++)
{
curTrainData.create(trainData[i].rows + trainData[j].rows, 784, CV_32FC1);
curTrainLabel.create(curTrainData.rows, 1, CV_32FC1);
// merge 2 matrix: trainData[i] & trainData[j]
for (int k = 0; k < trainData[i].rows; k++)
{
curTrainLabel.at<float>(k, 0) = 1.0; // class of digit i
for (int l = 0; l < 784; l++)
curTrainData.at<float>(k,l) = trainData[i].at<float>(k,l);
}
for (int k = 0; k < trainData[j].rows; k++)
{
curTrainLabel.at<float>(k + trainData[i].rows, 0) = -1.0; // class of digit j
for (int l = 0; l < 784; l++)
curTrainData.at<float>(k + trainData[i].rows,l) = trainData[j].at<float>(k,l);
}
svms[i][j].train(curTrainData, curTrainLabel, Mat(), Mat(), params);
}
I got error at the call svms[i][j].train.... The full error is:
Unhandled exception at 0x75b5d36f in svm.exe: Microsoft C++ exception: cv::Exception at memory location 0x0022af8c..
To tell the truth I don't fully understand SVM implemented in openCV and I can't find any example of them working with objects in images.
I'm really grateful if someone can tell me what is (are) wrong :(
Update 09/03:
I had mistaken. The error comes from:
str.Format(_T("Results\trained_%d_%d.xml"), i, j);
svms[i][j].save(CT2A(str));
str is a CString variable.
It remains even if I change to:
svms[i][j].save("Results\trained.xml");
I've created the folder Results and others files are written well into it (files for methods fopen(), imwrite()...). I don't know why I can't add the folder when it comes to this save method of svm.
If you use backslash "\", you have to put "\\" instead (or you can use a frontslash "/").