Creating training set from multiple images in Haar Cascade - python-2.7

I am currently working on detecting multiple fruits in a given image. For example, the given image can have fruits like bananas (like yellow, red and green), mangoes, oranges,etc. I was able to create training set with only one image at a time using opencv_createsamples.
Sample Code:
C:\opencv\build\x64\vc14\bin\opencv_createsamples.exe -img redbanana.jpg -bg bg.txt -info info/info.lst -pngoutput info -maxxangle 0.5 -maxyangle 0.5 -maxzangle 0.5 -num 100
Similarly I have done for around 5 fruits, which creates separate vec file for each fruit. Its hard to create for each fruit. Is there any possibility for creating training set from multiple images with a single vec file as an output?
Is there are any other methodology to detect multiple fruits in a given image?

A haar-classifier is ideally suited to detecting one class of similar looking objects quickly as outlined in the opencv documentation http:// docs.opencv.org/2.4/modules/objdetect/doc/cascade_classification.html. For example, the opencv repository (https:// github.com/opencv/opencv) has a list of classifiers (https:// github.com/opencv/opencv/tree/master/data/haarcascades) trained for specific classes of objects.
Unless the objects to be detected are similar (like faces with different features or cars of different makes and models) training would be more effective with a classifier per fruit - e.g., bananas, oranges, mangoes etc.,.
To create a training vector based on multiple positive sample images (and for any other aspect of haar-classifier training I'd recommend the steps here - steps 5 and 6 - and the details covered at http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html. In your case the positive images should include all types of bananas, oranges, mangoes etc including variation in color etc.,.

If you want to train the classifier with different variations of the same fruit, you can generate training samples from multiple images as described here.
However, do note that Haar classifiers work in greyscale and it is difficult to guarantee differentiation between objects like red and yellow bananas.
If you want multiple classes in one classifier, I recommend YOLO (You Only Look Once) or SSD (Single Shot multibox Detector).

Related

Setting up a CNN network in keras?

I am currently trying to implement a cnn network, which can map an input to an output.
The input consist of stft of audio files, and the output is a feature vector.
Due to the different length of audio files, will the number of total samples always be different, but each sample has a frame length of 25 ms and 10 ms overlap. shape(x,2050)
The output is a feature vector shape is (x,13).
I thought the use of cnn seemed appropriate here as the stft as the each input contains some information of the previous sample due to the overlap.
Is it possible in keras to design a model, which make use of this, so the there will be calculated a convolutional sum for each row of the matrix, and somehow make it aware of the 25 frame length and the 10 overlap.
Yes it is, see line 220 of this file [1]. This is an implementation of Wavenet in Keras using convolutions. Even though they've created wrapper layers, this should give you the intuition on how to model audio samples.
[1] https://github.com/basveeling/wavenet/blob/master/wavenet.py#L220

Robust algorithm for detecting vehicles before stop line

I need to write a program that use a camera to detect presence of a vehicle inside a determined region on the road before stop line on the intersection (just like an inductive loop). Outputs will true or false based on the visibility of a vehicle on that region. Camera can be installed perpendicular to the road or above the road. Currently I need an algorithm.
The following image is a sample implementation in order to detect vehicles in the intersection:
After some study in this field I realized this technique is background subtraction, the program model background and when a vehicle got inside the area, it will be detected. But the definition says it detect moving vehicles so what if cars stop on the sensor 50%-60% of times(when signal lights becomes red)? Will they become background? Are they detected all the times?
I've seen some algorithm in the BS field, like Mixture of Gaussian, but doubt they work in the real situation because of the above problem.
Currently I program some method like averaging using OpenCV under Linux. Program calculate pixels average inside that rectangle and save this value inside a buffer, calculate mode and compare with current frame. But there are problems like vehicles lights at night, vehicles shadow in day, and stopping cars on my sensor because of red signal.
I would like to recommend better detection of vehicles than separate the foreground from the background. There is much lights conditions problems and it is old fashion.
In opencv you can use for example haar cascade or LBP for the fast and simple detection of vehicles. In opencv 3.1 there are 2 utilily for learning the detector.
To use detector is simple.
Same as
In this tutorial
There is also some sources on web where you can download already pretrained cascade for car detection.
Code in detection Opencv is simple on and easy to understand
You can find the examples on my blog. Also I have one car dataset containing 2000 car positive samples. This samples just list in bash into the list of positiva samples and use utility to create sample and traincascade. LBP cascades are little bit faster with comparable performance..
I learned cascade on windows also under Linux.. The diference is about the run the program. Also the training (vec.vec bg.dat data have to be prepared in create samples utility.. If you have dataset the prepare the training takes 20 minutes. Problem is where to find data. I got dataset on my blog. Also try to understand the script. My -w 32 -h 64 parameters are for people detection. Dor Car is better something like -w 32 -h 32.
./opencv_traincascade and parameters
opencv_traincascade.exe -data v5 -vec vec.vec -bg bg.dat -numPos 540 -numNeg 700 -numStages 11 numThreads 4 -stageType BOOST -featureType LBP -w 32 -h 64 -minHitRate 0.999995 -maxFalseAlarmRate 0.2 -maxDepth 10 -maxWeakCount 120 -mode ALL
I also collect some dataset to train the detector..
You can download the dataset also from Dataset

Best way to train a pedestrian detector using dlib

I am trying to train a pedestrian detector using dlib and the the INRIA Person Dataset.
So far I used 27 images, the training is fast but the results are unsatisfying (on other images pedestrians are rarely recognized). Here is the result of my training using the train_object_detector program that comes with dlib (in /exmaples directory) :
Saving trained detector to object_detector.svm
Testing detector on training data...
Test detector (precision,recall,AP): 1 0.653061 0.653061
Parameters used:
threads: 4
C: 1
eps: 0.01
target-size: 6400
detection window width: 47
detection window height: 137
upsample this many times : 0
I am aware that other images need to be added to the training in order to have better results but before doing that I want to be sure of the meaning of every parameter printed in the result (precision, recall, AP, c, eps, ...) I am also wondering if you have any recommandations regarding the training : what images to choose ? how many images are needed ? Do I need to annotate every object in the image ? Do I need to ignore some regions in the image ? ...
One last question, is there any trained detector (svm file) that I can use to compare my results ?
Thank you for your answers
I am not familiar with dlib in particular, but let me tell you that you will not get good results with 27 images. In order to generalize well, your classifier needs to see many images with a variety of data. It won't do you any good to supply it with 10,000 images of the same person, wearing the same outfit. You want different people, clothing, settings, angles, and lighting. The INRIA dataset should cover most of those.
Your detection window dimensions and upsample settings will determine how large people must look in the image in order for your trained classifier to detect them reliably. Your settings will detect only people at 1 scale where they are around 137/47 pixels tall/wide. If you upsample even once, you'll be able to detect people at a smaller scale (upsampling makes the person look bigger than they are). I suggest you use a larger dataset and increase the upsampling number (by how much you upsample is another discussion - that appears to be built into the library). Things will take longer, but that is the nature of training classifiers - tweak parameters, retrain, compare the results.
For precision/recall I'll refer you to this wikipedia article. These are not parameters, but results of your classifier. You want both to be as close to 1 as possible.

How to tune a training schema for a different data set in Caffe?

Currently I am following the caffe imagenet example but apply it on my own training data set. My dataset is about 2000 classes and about 10 ~ 50 images each class. Actually I was classifying vehicle images and the images were cropped to the front, so the images within each class have the same size, the same view angle(almost).
I've tried the imagenet schema but looks like it didn't work well and after about 3000 iterations the accuracy was down to 0. So I am wondering is there a practical guide on how to tune the schema?
You can delete the last layer in imagenet, add your own last layer with a different name(to fit the number of classes), specify it with a higher learning rate, and specify a lower overall learning rate. There does exist an official example here: http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
However, if the accuracy was 0 you should check the model parameters first, perhaps it's an overflow

cvSVM training produces poor results for HOGDescriptor

My objective is to train an SVM and get support vectors which i can plug into opencv's HOGdescriptor for object detection.
I have gathered 4000~ positives and 15000~ negatives and I train using the SVM provided by opencv. the results give me too many false positives.(up to 20 per image) I would clip out the false positives and add them into the pool of negatives to retrain. and I would end up with even more false positives at times! I have tried adjusting L2HysThreshold of my hogdescriptor upwards to 300 without significant improvement. is my pool of positives and negatives large enough?
the SVM training is also much faster than expected. I have tried with a feature vector size of 2916 and 12996, using grayscale images and color images on separate tries. SVM training has never taken longer than 20 minutes. I use auto_train. I am new to machine learning but from what i hear training with a dataset as large as mine should take at least a day no?
I believe cvSVM is not doing much learning and according to http://opencv-users.1802565.n2.nabble.com/training-a-HOG-descriptor-td6363437.html, it is not suited for this purpose. does anyone with experience with cvSVM have more input on this?
I am considering using SVMLight http://svmlight.joachims.org/ but it looks like there isn't a way to visualize the SVM hyperplane. What are my options?
I use opencv2.4.3 and have tried the following setsups for hogdescriptor
hog.winSize = cv::Size(100,100);
hog.cellSize = cv::Size(5,5);
hog.blockSize = cv::Size(10,10);
hog.blockStride = cv::Size(5,5); //12996 feature vector
hog.winSize = cv::Size(100,100);
hog.cellSize = cv::Size(10,10);
hog.blockSize = cv::Size(20,20);
hog.blockStride = cv::Size(10,10); //2916 feature vector
Your first descriptor dimension is way too large to be any useful. To form any reliable SVM hyperplane, you need at least the same number of positive and negative samples as your descriptor dimensions. This is because ideally you need separating information in every dimension of the hyperplane.
The number of positive and negative samples should be more or less the same unless you provide your SVM trainer with a bias parameter (may not be available in cvSVM).
There is no guarantee that HOG is a good descriptor for the type of problem you are trying to solve. Can you visually confirm that the object you are trying to detect has a distinct shape with similar orientation in all samples? A single type of flower for example may have a unique shape, however many types of flowers together don't have the same unique shape. A bamboo has a unique shape but may not be distinguishable from other objects easily, or may not have the same orientation in all sample images.
cvSVM is normally not the tool used to train SVMs for OpenCV HOG. Use the binary form of SVMLight (not free for commercial purposes) or libSVM (ok for commercial purposes). Calculate HOGs for all samples using your C++/OpenCV code and write it to a text file in the correct input format for SVMLight/libSVM. Use either of the programs to train a model using linear kernel with the optimal C. Find the optimal C by searching for the best accuracy while changing C in a loop. Calculate the detector vector (a N+1 dimensional vector where N is the dimension of your descriptor) by finding all the support vectors, multiplying alpha values by each corresponding support vector, and then for each dimension adding all the resulting alpha * values to find an ND vector. As the last element add -b where b is the hyperplane bias (you can find it in the model file coming out of SVMLight/libSVM training). Feed this N+1 dimensional detector to HOGDescriptor::setSVMDetector() and use HOGDescriptor::detect() or HOGDescriptor::detectMultiScale() for detection.
I have had successful results using SVMLight to learn SVM models when training from OpenCV, but haven't used cvSVM, so can't compare.
The hogDraw function from http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html will visualise your descriptor.