using OpenCV and SVM with images - c++

I am having difficulty with reading an image, extracting features for training, and testing on new images in OpenCV using SVMs. can someone please point me to a great link? I have looked at the OpenCV Introduction to Support Vector Machines. But it doesn't help with reading in images, and I am not sure how to incorporate it.
My goals are to classify pixels in an image. These pixel would belong to a curves. I understand forming the training matrix (for instance,
image A
1,1 1,2 1,3 1,4 1,5
2,1 2,2 2,3 2,4 2,5
3,1 3,2 3,3 3,4 3,5
I would form my training matrix as a [3][2]={ {1,1} {1,2} {1,3} {1,4} {1,5} {2,1} ..{} }
However, I am a little confuse about the labels. From my understanding, I have to specify which row (image) in the training matrix corresponds, which corresponds to a curve or non-curve. But, how can I label a training matrix row (image) if there are some pixels belonging to the curve and some not belonging to a curve. For example, my training matrix is [3][2]={ {1,1} {1,2} {1,3} {1,4} {1,5} {2,1} ..{} }, pixels {1,1} and {1,4} belong to the curve but the rest does not.

I've had to deal with this recently, and here's what I ended up doing to get SVM to work for images.
To train your SVM on a set of images, first you have to construct the training matrix for the SVM. This matrix is specified as follows: each row of the matrix corresponds to one image, and each element in that row corresponds to one feature of the class -- in this case, the color of the pixel at a certain point. Since your images are 2D, you will need to convert them to a 1D matrix. The length of each row will be the area of the images (note that the images must be the same size).
Let's say you wanted to train the SVM on 5 different images, and each image was 4x3 pixels. First you would have to initialize the training matrix. The number of rows in the matrix would be 5, and the number of columns would be the area of the image, 4*3 = 12.
int num_files = 5;
int img_area = 4*3;
Mat training_mat(num_files,img_area,CV_32FC1);
Ideally, num_files and img_area wouldn't be hardcoded, but obtained from looping through a directory and counting the number of images and taking the actual area of an image.
The next step is to "fill in" the rows of training_mat with the data from each image. Below is an example of how this mapping would work for one row.
I've numbered each element of the image matrix with where it should go in the corresponding row in the training matrix. For example, if that were the third image, this would be the third row in the training matrix.
You would have to loop through each image and set the value in the output matrix accordingly. Here's an example for multiple images:
As for how you would do this in code, you could use reshape(), but I've had issues with that due to matrices not being continuous. In my experience I've done something like this:
Mat img_mat = imread(imgname,0); // I used 0 for greyscale
int ii = 0; // Current column in training_mat
for (int i = 0; i<img_mat.rows; i++) {
for (int j = 0; j < img_mat.cols; j++) {
training_mat.at<float>(file_num,ii++) = img_mat.at<uchar>(i,j);
}
}
Do this for every training image (remembering to increment file_num). After this, you should have your training matrix set up properly to pass into the SVM functions. The rest of the steps should be very similar to examples online.
Note that while doing this, you also have to set up labels for each training image. So for example if you were classifying eyes and non-eyes based on images, you would need to specify which row in the training matrix corresponds to an eye and a non-eye. This is specified as a 1D matrix, where each element in the 1D matrix corresponds to each row in the 2D matrix. Pick values for each class (e.g., -1 for non-eye and 1 for eye) and set them in the labels matrix.
Mat labels(num_files,1,CV_32FC1);
So if the 3rd element in this labels matrix were -1, it means the 3rd row in the training matrix is in the "non-eye" class. You can set these values in the loop where you evaluate each image. One thing you could do is to sort the training data into separate directories for each class, and loop through the images in each directory, and set the labels based on the directory.
The next thing to do is set up your SVM parameters. These values will vary based on your project, but basically you would declare a CvSVMParams object and set the values:
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::POLY;
params.gamma = 3;
// ...etc
There are several examples online on how to set these parameters, like in the link you posted in the question.
Next, you create a CvSVM object and train it based on your data!
CvSVM svm;
svm.train(training_mat, labels, Mat(), Mat(), params);
Depending on how much data you have, this could take a long time. After it's done training, however, you can save the trained SVM so you don't have to retrain it every time.
svm.save("svm_filename"); // saving
svm.load("svm_filename"); // loading
To test your images using the trained SVM, simply read an image, convert it to a 1D matrix, and pass that in to svm.predict():
svm.predict(img_mat_1d);
It will return a value based on what you set as your labels (e.g., -1 or 1, based on my eye/non-eye example above). Alternatively, if you want to test more than one image at a time, you can create a matrix that has the same format as the training matrix defined earlier and pass that in as the argument. The return value will be different, though.
Good luck!

Related

Image sensor linear matrix coefficients (color reproduction), how are they applied?

I have some raw images to debayer then apply colour corrections/transforms to. I use OpenCV and C++, and for the image sensor used the linear matrix coefficients are:
1.32 -0.46 0.14
-0.36 1.25 0.11
0.08 -1.96 1.88
I am not sure how to apply these to the image. It's not clear to me what I am supposed to do with them and why.
Can anyone explain what these colour reproduction or colour matrix values are, and how to use them to process an image?
Thank you!
Your question is not clear because it seems you also don't know what to do.
"what I am supposed to do with them"
First thing coming to my mind, you can convolve image with that matrix by using filter2D. According to documentation filter2D:
Convolves an image with the kernel.
The function applies an arbitrary linear filter to an image. In-place
operation is supported. When the aperture is partially outside the
image, the function interpolates outlier pixel values according to the
specified border mode.
Here is the example code snippet hpw tp use it:
Mat output;
Mat kernelMatrix = (Mat_<double>(3, 3) << 1.32, -0.46, 0.14,
-0.36, 1.25, 0.11,
0.08, -1.96, 1.88);
filter2D(rawImage, output, -1, kernelMatrix);
Before debayering you have an array B (-ayer) of MxN filtered "graylevel" values. They are physically filtered in the sense that the the number of photons measured by each one of them is affected by the color filter on top of each sensor site.
After debayering you have an array C (-olor) of MxNx3 BGR values, obtained by (essentially) reindexing the B array. However, each of the 3 values at a (row, col) image location represents 3 physical measurements. This is not the final image because we still need to "convert" the physical measurements to numbers that are representative of color channels as perceived by a human (or, more generally, by the intended user, which could also be some kind of image processing software). That is, the physical values need to be mapped to a color space.
The 3x3 "color correction" matrix you have represents one possible mapping - a simple linear one. You need to apply it in turn to each BGR triple at all (row, col) pixel locations. For example (in python/numpy/cv2):
import numpy as np
def colorCorrect(img, M):
"""Applies a color correction M to a BGR image img"""
rows, cols, depth = img.shape
assert depth == 3
assert M.shape == (3, 3)
img_corr = np.zeros((rows, cols, 3), dtype=img.dtype)
for r in range(rows):
for c in range(cols):
img_corr[r, c, :] = M.dot(img[r, c, :])
return img_corr

How to align 2 images based on their content with OpenCV

I am totally new to OpenCV and I have started to dive into it. But I'd need a little bit of help.
So I want to combine these 2 images:
I would like the 2 images to match along their edges (ignoring the very right part of the image for now)
Can anyone please point me into the right direction? I have tried using the findTransformECC function. Here's my implementation:
cv::Mat im1 = [imageArray[1] CVMat3];
cv::Mat im2 = [imageArray[0] CVMat3];
// Convert images to gray scale;
cv::Mat im1_gray, im2_gray;
cvtColor(im1, im1_gray, CV_BGR2GRAY);
cvtColor(im2, im2_gray, CV_BGR2GRAY);
// Define the motion model
const int warp_mode = cv::MOTION_AFFINE;
// Set a 2x3 or 3x3 warp matrix depending on the motion model.
cv::Mat warp_matrix;
// Initialize the matrix to identity
if ( warp_mode == cv::MOTION_HOMOGRAPHY )
warp_matrix = cv::Mat::eye(3, 3, CV_32F);
else
warp_matrix = cv::Mat::eye(2, 3, CV_32F);
// Specify the number of iterations.
int number_of_iterations = 50;
// Specify the threshold of the increment
// in the correlation coefficient between two iterations
double termination_eps = 1e-10;
// Define termination criteria
cv::TermCriteria criteria (cv::TermCriteria::COUNT+cv::TermCriteria::EPS, number_of_iterations, termination_eps);
// Run the ECC algorithm. The results are stored in warp_matrix.
findTransformECC(
im1_gray,
im2_gray,
warp_matrix,
warp_mode,
criteria
);
// Storage for warped image.
cv::Mat im2_aligned;
if (warp_mode != cv::MOTION_HOMOGRAPHY)
// Use warpAffine for Translation, Euclidean and Affine
warpAffine(im2, im2_aligned, warp_matrix, im1.size(), cv::INTER_LINEAR + cv::WARP_INVERSE_MAP);
else
// Use warpPerspective for Homography
warpPerspective (im2, im2_aligned, warp_matrix, im1.size(),cv::INTER_LINEAR + cv::WARP_INVERSE_MAP);
UIImage* result = [UIImage imageWithCVMat:im2_aligned];
return result;
I have tried playing around with the termination_eps and number_of_iterations and increased/decreased those values, but they didn't really make a big difference.
So here's the result:
What can I do to improve my result?
EDIT: I have marked the problematic edges with red circles. The goal is to warp the bottom image and make it match with the lines from the image above:
I did a little bit of research and I'm afraid the findTransformECC function won't give me the result I'd like to have :-(
Something important to add:
I actually have an array of those image "stripes", 8 in this case, they all look similar to the images shown here and they all need to be processed to match the line. I have tried experimenting with the stitch function of OpenCV, but the results were horrible.
EDIT:
Here are the 3 source images:
The result should be something like this:
I transformed every image along the lines that should match. Lines that are too far away from each other can be ignored (the shadow and the piece of road on the right portion of the image)
By your images, it seems that they overlap. Since you said the stitch function didn't get you the desired results, implement your own stitching. I'm trying to do something close to that too. Here is a tutorial on how to implement it in c++: https://ramsrigoutham.com/2012/11/22/panorama-image-stitching-in-opencv/
You can use Hough algorithm with high threshold on two images and then compare the vertical lines on both of them - most of them should be shifted a bit, but keep the angle.
This is what I've got from running this algorithm on one of the pictures:
Filtering out horizontal lines should be easy(as they are represented as Vec4i), and then you can align the remaining lines together.
Here is the example of using it in OpenCV's documentation.
UPDATE: another thought. Aligning the lines together can be done with the concept similar to how cross-correlation function works. Doesn't matter if picture 1 has 10 lines, and picture 2 has 100 lines, position of shift with most lines aligned(which is, mostly, the maximum for CCF) should be pretty close to the answer, though this might require some tweaking - for example giving weight to every line based on its length, angle, etc. Computer vision never has a direct way, huh :)
UPDATE 2: I actually wonder if taking bottom pixels line of top image as an array 1 and top pixels line of bottom image as array 2 and running general CCF over them, then using its maximum as shift could work too... But I think it would be a known method if it worked good.

OpenCV Clustering Bag Of Words K-Means

Using SIFT Detector and Extractor, with FlannBased Matcher, and the Dictionary set up for the BOWKMeansTrainer like this:
TermCriteria termCrit(CV_TERMCRIT_ITER, 100, 0.001);
int dictionarySize = 15; // -- Same as number of images given in
int retries = 1;
int flags = KMEANS_PP_CENTERS;
BOWKMeansTrainer trainBowTrainer(dictionarySize, termCrit, retries, flags);
the array size of the Clustered Extracted Keypoints will come out as [128 x 15].
Then when using the BOWImgDescriptorExtractor as the Extractor on a different set of 15 Images, with the previously extracted array as its Vocabulary, the array comes out at [15 x 15].
Why?
I can't find that much on how this all actually works, rather just where to put it and what values to give.
The result should always be [n x 15] if you have n images and k=15.
But in the first run, you looked at the vocabular, not the feature representation of the first images. The 128 you are seeing there is the SIFT dimensionality; these are 15 "typical" SIFT vectors; they are not descriptions of your images.
You need to read up on the BoW model, and why the outcome should ways be a vector of length k (potentially sparse, i.e. with many 0s) for each image. I have the impression you expect this approach to produce one 128-dimensional feature vector for each image. Also, k=15 is probably too small; and the training data set is too small as well.

Hu moments and SVM does not work

I have come across one problem when trying to train data with SVM.
I get some different regions (set of connected pixels) from face images, and regions from eyes are very similar, so I want to use Hu moments for shape description and SVM for training.
But SVM does not work properly, method svm.predict evaluates afterwards everything as non-eye, moreover the same regions which were labeled and used in traning phase as eye, are evaluated as non-eye.
Feature data consists only of 7 Hu moments. I will post here some samples of source code in a moment, thanks in advance :)
Additional info:
input image:
http://i.stack.imgur.com/GyLO0.png
Setting up basic svm for 1 image:
int image_regions = 10;
Mat training_mat(image_regions ,7,CV_32FC1); // 7 hu moments
Mat labels(image_regions ,1,CV_32FC1); // for labels 1 (eye) and -1 (non eye)
// computing hu moments
Moments moments2=moments(croppedImage,false);
double hu[7];
HuMoments(moments2,hu);
// putting them into svm traning mat
for (int k=0;k<huCounter;k++)
training_mat.at<float>(counter,k) = hu[k]; // counter is current number of region
if (isEye(...))
{
labels.at<float>(counter,0)=1.0;
}
else
{
labels.at<float>(counter,0)=-1.0;
}
//I use the following:
CvSVM svm;
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::LINEAR;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 1000, 1e-6);
// ... do the above mentioned phase, and then:
svm.train(training_mat, labels, Mat(), Mat(), params);
I hope the following suggestions can help you…..
The simplest task is to use a clustering algorithm and try to cluster the data into two classes. If an algorithm like ‘k-means’ can do the job why make things complex by using SVM and Neural Nets. I suggest you use this technique because your feature vector dimension is of a very small size (7 Hu Moments) as well as your number of samples.
Perform feature Normalization (specified in point 4) to make sure the values fall in a limited range.
Check out “is your data really separable?” As your data is small, take a few samples from positive images and a few samples from negative images and plot the feature vectors. If you can visually see the difference surely any learning algorithm can do the job for you. As I said earlier simple tricks can do better than complex math.
Only if you then decide to use SVM you should know the following:
• As I can see from your code you are using a Linear SVM, may be your data is non-separable by a linear kernel. Try using some polynomial kernel or other kernels. There is one option bool CvSVM::train_auto in openCV just have a look.
• Try to check whether the feature vector values you are getting are proper values or not (make sure that they are not some garbage values).
• Also you can perform feature normalization “ZERO MEAN and UNIT VARIENCE” before you use it for training.
• Most importantly increase the number of images for training, both positively and negatively labeled.
• Last but not least SVM is not magic, at the end of the day it is just drawing a line between two sets of points. So don’t expect it to classify anything you give it as input.
If nothing works “Just improve your feature extraction technique”

Train skin pixels using Opencv CvNormalBayesClassifier

I'm very new to OpenCV. I am trying to use the CvNormalBayesClassifier to train my program to learn skin pixel colours.
Currently I have got around 20 human pictures (face/other body parts) under different light conditions and backgrounds. I have also got 20 corresponding responses in which the skin parts are marked red and everything else marked green.
I have problem understanding how to use the function
bool CvNormalBayesClassifier::train(const CvMat* _train_data, const CvMat* _response, const Cv*Mat _var_idx = 0, const CvMat* _sample_idx=0,, bool update=false);
How should I use the current two picture libraries I have got to prepare the values that can be passed in as _train_data and _responses?
Many thanks.
You need to put in train_data the pixel values from your training image, and in responses an index corresponding to the class of this pixel (e.g. 1 for class skin, 0 for class non-skin). var_idx and sample_idx can be left as is, they are used to mask out some of the descriptors or samples in your training set. Set update to true/false depending on wether you get all the descriptors (all the pixels of all your training images) at once in case you can let it to false, or you process your training images incrementally (which might be better for memory issues), in which case you need to update your model.
Let me clarify you with the code (not checked, and using the C++ interface to OpenCV which I strongly recommand instead of the old C)
int main(int argc, char **argv)
{
CvNormalBaseClassifier classifier;
for (int i = 0; i < argc; ++i) {
cv::Mat image = // read in your training image, say cv::imread(argv[i]);
// read your mask image
cv::Mat mask = ...
cv::Mat response = mask == CV_RGB(255,0,0); // little trick: you said red pixels in your mask correspond to skin, so pixels in responses are set to 1 if corresponding pixel in mask is red, 0 otherwise.
cv::Mat responseInt;
response.convertTo(responsesInt, CV_32S); // train expects a matrix of integers
image = image.reshape(0, image.rows*image.cols); // little trick number 2 convert your width x height, N channel image into a witdth*height row matrix by N columns, as each pixel should be considere as a training sample.
responsesInt = responsesInt.reshape(0, image.rows*image.cols); // the same, image and responses have the same number of rows (w*h).
classifier.train(image, responsesInt, 0, 0, true);
}
I did a google search on this class but didn't find much information, and actually even the official opencv document does not provide direct explanation on the parameters. But I did notice one thing in opencv document
The method trains the Normal Bayes classifier. It follows the
conventions of the generic CvStatModel::train() approach with the
following limitations:
which direct me to CvStatModel class and from there I found something useful. And probably you can also take a look on the book from page 471 which gives you more details of this class. The book is free from google Books.