I'm learning Neural Networks from this bytefish machine learning guide and code. I understand it well but I would like to update the code at the previous link to use image pixel data instead of random values as the input data. In this section of the aforementioned code:
cv::randu(trainingData,0,1);
cv::randu(testData,0,1);
the training and test matrices are filled with random data. Then label data is added to the classes matrices here:
cv::Mat trainingClasses = labelData(trainingData, eq);
cv::Mat testClasses = labelData(testData, eq);
using this function:
// label data with equation
cv::Mat labelData(cv::Mat points, int equation) {
cv::Mat labels(points.rows, 1, CV_32FC1);
for(int i = 0; i < points.rows; i++) {
float x = points.at<float>(i,0);
float y = points.at<float>(i,1);
labels.at<float>(i, 0) = f(x, y, equation);
// the f() function used above
//is only a case statement with 5
//switches in it eg on of the switches is:
//case 0:
//return y > sin(x*10) ? -1 : 1;
//break;
}
return labels;
}
Then points are plotted in a window here:
plot_binary(trainingData, trainingClasses, "Training Data");
plot_binary(testData, testClasses, "Test Data");
with this function:
;; Plot Data and Class function
void plot_binary(cv::Mat& data, cv::Mat& classes, string name) {
cv::Mat plot(size, size, CV_8UC3);
plot.setTo(cv::Scalar(255.0,255.0,255.0));
for(int i = 0; i < data.rows; i++) {
float x = data.at<float>(i,0) * size;
float y = data.at<float>(i,1) * size;
if(classes.at<float>(i, 0) > 0) {
cv::circle(plot, Point(x,y), 2, CV_RGB(255,0,0),1);
} else {
cv::circle(plot, Point(x,y), 2, CV_RGB(0,255,0),1);
}
}
imshow(name, plot);
}
The plotted points, as I understand it, represent the input data multiplied by the equations in the f() function and is used by the predict functions to predict which point to plot in the mlp, knn, svm etc. functions. How do I update what is going on here to do something with Image pixel data. Any advice to get me farther would be appreciated.
"How do I update what is going on here to do something with Image pixel data" is a broad and generic question. May I ask in exchange: what do you want to do with "Image pixel data"?
Do you want an answer to: what can be done with "Image pixel data" on machine learning algorithms like ANN, SVM etc. ?
The answer is a loooong list of things encompassing thousands of research papers and hundreds of PhD theses. Some examples include: supervised and/or un-supervised classification of images into labels/tags/categories based on features like image content, objects in image, patterns in image etc. The possibilities are endless. You may perhaps want to take a look at this: http://stuff.mit.edu/afs/athena/course/urop/profit/PDFS/EdwardTolson.pdf
Now, coming back to you original objective: "I would like to update the code at the previous link to use image pixel data instead of random values as the input data"...
The implementation technique would depend largely on what you want to do. I can cite one/two easy techniques for extracting feature vectors from image, which can be fed into any machine learning algorithm of your choice...
Example 1:
You may start with using pixel intensity data as a feature vector. Here's how you may go ahead with it:
Load image using
Mat image = imread(argv[1], CV_LOAD_IMAGE_GRAYSCALE);
Resize image into a smaller area using resize. You may want to begin with small image sizes, like 8x8 or 10x10 pixels.
Loop through the image matrix, somewhat like this:
for(int row = 0; row < img.rows; ++row)
{
uchar* p = img.ptr(row);
for(int col = 0; col < img.cols; ++col)
{
*p++ //points to each pixel value in turn assuming a CV_8UC1 greyscale image
}
}
A collection of all the pixel values will give you a feature vector for that image.
Now suppose you have two classes of image. For each set of feature vector you generate, you'll have to prepare (for supervised classification) a corresponding label Mat (somewhat like the example you've mentioned). It needs to contain the class label (say, 0 and 1) for all the feature vectors present in your feature Mat.
Now feed the feature vectors and label Mat to your machine learning code and see what happens.
However, the ability of image classification based on image pixel data alone is quite limited. There are thousands of techniques for extracting image features, most of which are dependent on the application area.
Example 2:
I'll finish off with one more example for extracting feature vectors, which, in some cases, will prove to be more effective than simple image pixel values.
You may use the Histogram of Oriented Gradients descriptor for slightly better results, use this:
cv::HOGDescriptor hog;
vector<float> descriptors;
hog.compute(mat, descriptors);
The vector descriptors is your feature vector.
HOGDescriptors, when used with SVM, provides a decent classification mechanism.
You can put the pixel data of an image into a Mat called trainingData using something similar to this:
cv::Mat labelData(cv::Mat points, int equation)
{
cv::Mat labels(points.rows, 1, CV_32FC1);
for(int i = 0; i < points.rows; i++)
{
float x = points.at<float>(i,0);
float y = points.at<float>(i,1);
labels.at<float>(i, 0) = f(x, y, equation);
}
return labels;
}
Now, instead of labelData, we're going to return a Mat of pixel data. One obvious way is to use the image itself as a feature vector. However, some machine learning algorithms in openCV, including ANN, SVM etc., required special formatting of input data.
You may try something like this:
cv::Mat trainingData(cv::Mat image)
{
cv::Mat trainingVector(image.rows*image.cols, 1, CV_32FC1);
for(int i = 0; i < image.rows; i++)
{
for(int j = 0; j < image.cols; j++)
{
float valueOfPixel = image.at<float>(i,j);
trainingVector.at<float>((i*image.cols)+j, 0) = valueOfPixel;
}
}
return trainingVector;
}
(Please recheck the syntax of the code before using, I just typed it out here)
So, what the above block effectively does is change the 2D matrix of the image into a 1D array. Now, how and where you use it depends on your requirements.
Please make necessary modifications before invoking the machine learning modules.
Hope this answers your question.
Thanks.
Related
I am looking for a way to take an image and get masks of all objects in it by color. My goal is to be able to separate similarly colored objects into layers so I can further examine each layer. The plan is to use each mask against the original image to create a histogram of the colors in each object and determine the similarity with other objects in the image. If something is similar enough it will be combined with other objects to form a layer.
The problem is that I can not find a function in opencv to find all objects in an image based on color contiguity. I am sure such an algorithm exists, but it seems to be evading me. Does anyone know of an algorithm or function like this?
The best method that I have found is K-means Clustering. This separates the image into different layers based on color. It uses a k-neighbors algorithm to do so. With this I am able to effectively split the image into several layers that are of similar color.
#define numClusters 7
cv::Mat src = cv::imread("img0.png");
cv::Mat kMeansSrc(src.rows * src.cols, 3, CV_32F);
//resize the image to src.rows*src.cols x 3
//cv::kmeans expects an image that is in rows with 3 channel columns
//this rearranges the image into (rows * columns, numChannels)
for( int y = 0; y < src.rows; y++ )
{
for( int x = 0; x < src.cols; x++ )
{
for( int z = 0; z < 3; z++)
kMeansSrc.at<float>(y + x*src.rows, z) = src.at<Vec3b>(y,x)[z];
}
}
cv::Mat labels;
cv::Mat centers;
int attempts = 2;
//perform kmeans on kMeansSrc where numClusters is defined previously as 7
//end either when desired accuracy is met or the maximum number of iterations is reached
cv::kmeans(kMeansSrc, numClusters, labels, cv::TermCriteria( CV_TERMCRIT_EPS+CV_TERMCRIT_ITER, 8, 1), attempts, KMEANS_PP_CENTERS, centers );
//create an array of numClusters colors
int colors[numClusters];
for(int i = 0; i < numClusters; i++) {
colors[i] = 255/(i+1);
}
std::vector<cv::Mat> layers;
for(int i = 0; i < numClusters; i++)
{
layers.push_back(cv::Mat::zeros(src.rows,src.cols,CV_32F));
}
//use the labels to draw the layers
//using the array of colors, draw the pixels onto each label image
for( int y = 0; y < src.rows; y++ )
{
for( int x = 0; x < src.cols; x++ )
{
int cluster_idx = labels.at<int>(y + x*src.rows,0);
layers[cluster_idx].at<float>(y, x) = (float)(colors[cluster_idx]);;
}
}
std::vector<cv::Mat> srcLayers;
//each layer to mask a portion of the original image
//this leaves us with sections of similar color from the original image
for(int i = 0; i < numClusters; i++)
{
layers[i].convertTo(layers[i], CV_8UC1);
srcLayers.push_back(cv::Mat());
src.copyTo(srcLayers[i], layers[i]);
}
I suggest you convert the image to the HSV-space (Hue-Saturation-Value). Then make a histogram based on the Hue value to find thresholds online, or define them before (depends if this is a general problem or a given one).
Crate one-channel images for each layer you want to form. (set them as black)
Then then use the HSV-image and mark a layer based on the threshold values. You might want to add some constant thresholds for value and saturation too (to avoid dark and light areas)
Does this make sense to you?
I think that you should proceed in the following proceess:
Smooth you image if it has too much details.
find edges
Find all contours
Try to find the color of each contour..lets say you want to keep all contours which are red. So, keep only those contours which are red.
Once you find the contours which you want to keep, then create a mask image based upon the contours you want to keep.
Using mask image, extract the required objects from the original image.
Hi, with opencv c++, I want to do clustering to classify the connected components based on the area and height.
I do understand the concept of the clustering but i have hard time to implement it in opencv c++.
In the opencv
http://docs.opencv.org/modules/core/doc/clustering.html
There is a clustering methods kmeans
Most of the website I searched, they just explain the concept and parameters of the kmeans function in opencv c++ and most of them were copied from the opencv document website.
double kmeans(InputArray data, int K, InputOutputArray bestLabels, TermCriteria criteria, int attempts, int flags, OutputArray centers=noArray() )
There is also good example here but it was implemented in Python
http://docs.opencv.org/trunk/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/py_kmeans_opencv.html?highlight=kmeans
As i mentioned above, I have all the connected components and i can calculate areas and height of each connect components.
I want to use clustering to distinguish between connected components.
For instance, with k-means methods i would use k=2.
Thank..
I am posting the snippet, Hope this will help you....
The Height and Area of component can be used as a feature for kmean. Now here for each feature kmean will give you centre. i.e. 1 center for Area and 1 center for height of component.
Mat labels;
int attempts = 11;
Mat centers;
int no_of_features = 2;//(i.e. height, area)
Mat samples(no_of_connected_components, no_of_features, CV_32F);
int no_of_sub_classes = 1; // vary for more sub classes
for (int j = 0; j < no_of_connected_components; j++)
{
for (int x = 0; x < no_of_features; x++)
{
samples.at<float>(j, x) = connected_component_values[j,x];
//fill the values(height, area) of connected component labelling
}
}
cv::kmeans(samples, no_of_sub_classes, labels, TermCriteria(CV_TERMCRIT_ITER | CV_TERMCRIT_EPS, 10000, 0.001), attempts, KMEANS_PP_CENTERS, centers);
for (size_t si_i = 0; si_i < no_of_sub_classes ; si_i++)
{
for (size_t si_j = 0; si_j < no_of_features; si_j++)
{
KmeanTable[si_i*no_of_sub_classes + si_i][si_j] = centers.at<float>(si_i, si_j);
}
}
Here I am storing the center in kmeanTable 2D array you can use yours. Now for each connected component you can calculate the euclidean distance from centers.
The lower difference features qualify for classification.
Check this out.
Except instead of iterating over x,y, and z you'll iterate over component, and property (area, and height).
This is my code for training the dataset of for example vehicles , when it train fully , i want it to predict the data(vehicle) from video(.avi) , how to predict trained data from video and how to add that part in it ? , i want that when the vehicle is shown in the video it count it as 1 and cout that the object is detected and if second vehicle come it increment the count as 2
IplImage *img2;
cout<<"Vector quantization..."<<endl;
collectclasscentroids();
vector<Mat> descriptors = bowTrainer.getDescriptors();
int count=0;
for(vector<Mat>::iterator iter=descriptors.begin();iter!=descriptors.end();iter++)
{
count += iter->rows;
}
cout<<"Clustering "<<count<<" features"<<endl;
//choosing cluster's centroids as dictionary's words
Mat dictionary = bowTrainer.cluster();
bowDE.setVocabulary(dictionary);
cout<<"extracting histograms in the form of BOW for each image "<<endl;
Mat labels(0, 1, CV_32FC1);
Mat trainingData(0, dictionarySize, CV_32FC1);
int k = 0;
vector<KeyPoint> keypoint1;
Mat bowDescriptor1;
//extracting histogram in the form of bow for each image
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch,"%s%d%s%d%s","train/",j," (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName, 0);
detector.detect(img2, keypoint1);
bowDE.compute(img2, keypoint1, bowDescriptor1);
trainingData.push_back(bowDescriptor1);
labels.push_back((float) j);
}
//Setting up SVM parameters
CvSVMParams params;
params.kernel_type = CvSVM::RBF;
params.svm_type = CvSVM::C_SVC;
params.gamma = 0.50625000000000009;
params.C = 312.50000000000000;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 0.000001);
CvSVM svm;
printf("%s\n", "Training SVM classifier");
bool res = svm.train(trainingData, labels, cv::Mat(), cv::Mat(), params);
cout<<"Processing evaluation data..."<<endl;
Mat groundTruth(0, 1, CV_32FC1);
Mat evalData(0, dictionarySize, CV_32FC1);
k = 0;
vector<KeyPoint> keypoint2;
Mat bowDescriptor2;
Mat results(0, 1, CV_32FC1);;
for(j = 1; j <= 4; j++)
for(i = 1; i <= 60; i++)
{
sprintf( ch, "%s%d%s%d%s", "eval/", j, " (",i,").jpg");
const char* imageName = ch;
img2 = cvLoadImage(imageName,0);
detector.detect(img2, keypoint2);
bowDE.compute(img2, keypoint2, bowDescriptor2);
evalData.push_back(bowDescriptor2);
groundTruth.push_back((float) j);
float response = svm.predict(bowDescriptor2);
results.push_back(response);
}
//calculate the number of unmatched classes
double errorRate = (double) countNonZero(groundTruth- results) / evalData.rows;
The question isThis code is not predicting from video , i want to know how to predict it from the video , mean like i want to detect the vehicle from movie , like it should show 1 when it find the vehicle from movie
For those who didn't understand the question :
I want to play a movie in above code
VideoCapture cap("movie.avi"); //movie.avi is with deleted background
Suppose i have a trained data which contain vehicle's , and "movie.avi" contain 5 vehicles , so it should detect that vehicles from the movie.avi and give me 5 as output
How to do this part in the above code
From looking at your code setup
params.svm_type = CvSVM::C_SVC;
it appears that you train your classifier with more than two classes. A typical example in traffic scenario could be cars/pedestrians/bikes/... However, you were asking for a way to detect cars only. Without a description of your training data and your video it's hard to tell, if your idea makes sense. I guess what the previous answers are assuming is the following:
You loop through each frame and want to output the number of cars in that frame. Thus, a frame may contain multiple cars, say 5. If you take the whole frame as input to the classifier, it might respond "car", even if the setup might be a little off, conceptually. You cannot retrieve the number of cars reliably with this approach.
Instead, the suggestion is to try a sliding-window approach. This means, for example, you loop over each pixel of the frame and take the region around the pixel (called sub-window or region-of-interest) as input to the classifier. Assuming a fixed scale, the sub-window could have a size of 150x50px as well as your training data would. You might fixate the scale of the cars in your training data, but in real-world videos, the cars will be of different size. In order to find a car of different scale, let's say it's two-times as large as in the training data, the typical approach is to scale the image (say with a factor of 2) and repeat the sliding-window approach.
By repeating this for all relevant scales you end up with an algorithm that gives you for each pixel location and each scale the result of your classifier. This means you have three loops, or, in other words, there are three dimensions (image width, image height, scale). This is best understood as a three-dimensional pyramid. "Why a pyramid?" you might ask. Because each time the image is scaled (say 2) the image gets smaller (/larger) and the next scale is an image of different size (for eample half the size).
The pixel locations indicate the position of the car and the scale indicates the size of it. Now, if you have an N-class classifier, each slot in this pyramid will contain a number (1,...,N) indicating the class. If you had a binary classifier (car/no car), then you would end up with each slot containing 0 or 1. Even in this simple case, where you would be tempted to simply count the number of 1 and output the count as the number of cars, you still have the problem that there could be multiple responses for the same car. Thus, it would be better if you had a car detector that gives continous responses between 0 and 1 and then you could find maxima in this pyramid. Each maximum would indicate a single car. This kind of detection is successfully used with corner features, where you detect corners of interest in a so-called scale-space pyramid.
To summarize, no matter if you are simplifying the problem to a binary classification problem ("car"/"no car"), or if you are sticking to the more difficult task of distinguishing between multiple classes ("car"/"animal"/"pedestrian"/...), you still have the problem of scale and location in each frame to solve.
The code you have for using images is written with OpenCV's C interface so it's probably easy to stick with that rather than use the C++ video interface.
In which case somthing along these lines should work:
CvCapture *capture = cvCaptureFromFile("movie.avi");
IplImage *img = 0;
while(img = cvQueryFrame(capture))
{
// Process image
...
}
You should implement a sliding window approach. In each window, you should apply the SVM to get candidates. Then, once you've done it for the whole image, you should merge the candidates (if you detected an object, then it is very likely that you'll detect it again in shift of a few pixels - that's the meaning of candidates).
Take a look at the V&J code at openCV or the latentSVM code (detection by parts) to see how it's done there.
By the way, I would use the LatentSVM code (detection by parts) to detect vehicles. It has trained models for cars and for buses.
Good luck.
You need detector, not classifier. Take a look at Haar cascades, LBP cascades, latentSVM, as mentioned before or HOG detector.
I'll explain why. Detector usually scan image by sliding window, line by line. In several scales. In every window detector solve problem: "object/not object". It may give you rough results but very fast. Classifiers such as BOW, works very slow for this task. Then you should apply classifiers to regions, found by detector.
I am currently trying to use the dsift-algorithm of the vlfeat-lib. But no matter with which values I create the filter (sample step, bin size), it returns the same number of keypoints for every frame during an execution (consecutive frames are different, from a camera). The documentation on C or C++ usage is very thin, and I could not find any good examples for these languages..
Here is the relevant code:
// create filter
vlf = vl_dsift_new_basic(320, 240, 1, 3);
// transform image in cv::Mat to float vector
std::vector<float> imgvec;
for (int i = 0; i < img.rows; ++i){
for (int j = 0; j < img.cols; ++j){
imgvec.push_back(img.at<unsigned char>(i,j) / 255.0f);
}
}
// call processing function of vl
vl_dsift_process(vlf, &imgvec[0]);
// echo number of keypoints found
std::cout << vl_dsift_get_keypoint_num(vlf) << std::endl;
it returns the same number of keypoints for every frame during an
execution
This is normal with dense SIFT implementation since the number of extracted keypoints only depends on input geometrical parameters[1], i.e. step and image size.
See the documentation:
The feature frames (keypoints) are indirectly specified by the sampling steps (vl_dsift_set_steps) and the sampling bounds (vl_dsift_set_bounds).
[1]: vl_dsift_get_keypoint_num returns self->numFrames that is only updated by _vl_dsift_update_buffers which uses geometrical information only (bounds, steps and bin sizes).
I am working in Ubuntu Opencv.I am trying to do PCA analysis of a single image.I take the 3 channel image and change it to a single channel image with 3 columns and r*c number of rows.r and c being the rows and columns of the original image.When I try to display the reconstructed image after doing the backprojection on the PCA it gives me a green image.Here is my code
Mat pcaset=cvCreateMat(image->height*image->width,image->nChannels,CV_8UC1);
for(int i=0;i<image->height;i++)
{
for(int j=0;j<image->width;j++)
{
for(int k=0;k<image->nChannels;k++)
(ptrpcaset+i*pcaset.step)[k]=((ptrimage+i*image->widthStep)[3*j+k]);
}
}
int nEigens=3;
Mat databackprojected;
PCA pca(pcaset,Mat(),CV_PCA_DATA_AS_ROW,nEigens);
Mat dataprojected(pcaset.rows,nEigens,CV_8UC1);
pca.project(pcaset,dataprojected);
pca.backProject(dataprojected,databackprojected);
Mat backprojectnorm;//(databackprojected.rows,nEigens,CV_8UC1);
normalize(databackprojected,backprojectnorm,0,255,NORM_MINMAX,-1);
Mat finaldataafterreshaping(image->height,image->width,CV_8UC3);
uchar* finalptr=(uchar*)finaldataafterreshaping.data;
uchar* ptrnorm=(uchar*)backprojectnorm.data;
int x=0,y=0,i=0;
while(i<backprojectnorm.rows)
{
while(x<image->height)
{
while(y<image->width)
{
for(int k=0;k<image->nChannels;k++)
{
(finalptr+x*finaldataafterreshaping.step)[3*y+k]=(ptrnorm+i*backprojectnorm.step)[k];
}
y=y+1;i=i+1;
}
x=x+1;y=0;
}
}
imshow("Reconstructed data",finaldataafterreshaping);
You need to make the following changes:
(ptrpcaset+(j + i*image->width)*pcaset.step)[k]=((ptrimage+i*image->widthStep)[3*j+k]);
because you are not taking the j coordinate into account when you transform your data so that at the end you only save the last line of your image in the new matrix.
When you reshape your data, you need to do something like this:
float* val = (float*)&(ptrnorm+i*backprojectnorm.step)[(k*4)];
(finalptr+x*finaldataafterreshaping.step)[3*y+k]=*val;
because the matrix you get as a result is of type float and not uchar. So you need to some kind of conversion. I am not sure, if it is a good idea to do it this way, but it works. I would suggest that you have a look at the C++ API of OpenCV 2, which can handle this things in a much nicer way.
Also, the whole while(i<backprojectnrom.rows) loop is not needed.