I'm trying to classify digits read on images at known positions in C++, using SVM.
for that, I sample over a rectangle at the known position of the digit, I train with a ground_truth.
I wonder how to choose the kernel of the SVM. I use the default linear kernel but my intuition tell me that it might not be the best choice.
How could I choose the kernel?
You will need to tune the kernel (if you use a nonlinear one). This guide may be useful for you: A practical guide to SVM classification
Unfortunately there is not a magic bullet for this, so experimentation is your best friend.
Probably I would start with RBF which tends to work decently in most cases, and I am agreed with your intuition that probably linear is not the best, although some times (especially when you have tons of data) it can give you good surprises :)
The problem I have found with RBF is that it tends to overfit the training set, this stop to be an issue if you have a lot of data but then a new problem raises because it tends to scale poorly and having slow training time for big data.
Related
Has anybody tried developing a SLAM system that uses deep learned features instead of the classical AKAZE/ORB/SURF features?
Scanning recent Computer Vision conferences, there seem to be quite a few reports of successful usage of neural nets to extract features and descriptors, and benchmarks indicate that they may be more robust than their classical computer vision equivalent. I suspect that extraction speed is an issue, but assuming one has a decent GPU (e.g. NVidia 1050), is it even feasible to build a real-time SLAM system running say at 30FPS on 640x480 grayscale images with deep-learned features?
This was a bit too long for a comment, so that's why I'm posting it as an answer.
I think it is feasible, but I don't see how this would be useful. Here is why (please correct me if I'm wrong):
In most SLAM pipelines, precision is more important than long-term robustness. You obviously need your feature detections/matchings to be precise to get reliable triangulation/bundle (or whatever equivalent scheme you might use). However, the high level of robustness that neural networks provide is only required with systems that do relocalization/loop closure on long time intervals (e.g. need to do relocalization in different seasons etc). Even in such scenarios, since you already have a GPU, I think it would be better to use a photometric (or even just geometric) model of the scene for localization.
We don't have any reliable noise models for the features that are detected by the neural networks. I know there have been a few interesting works (Gal, Kendall, etc...) for propagating uncertainties in deep networks, but these methods seem a bit immature for deployment ins SLAM systems.
Deep learning methods are usually good for initializing a system, and the solution they provide needs to be refined. Their results depend too much on the training dataset, and tend to be "hit and miss" in practice. So I think that you could trust them to get an initial guess, or some constraints (e.g. like in the case of pose estimation: if you have a geometric algorithm that drifts in time, then you can use the results of a neural network to constrain them. But I think that the absence of a noise model as mentioned previously will make the fusion a bit difficult here...).
So yes, I think that it is feasible and that you can probably, with careful engineering and tuning produce a few interesting demos, but I wouldn't trust it in real life.
I'm working on a project related to people detection. I successfully implemented both an HOG SVM based classifier (with libSVM) and a cascade classifier (with opencv). The svm classifier works really good, i tested over a number of videos and it is correctly detecting people with really a few false positive and a few false negative; problem here is the computational time: nearly 1.2-1.3 sec over the entire image and 0.2-0.4 sec over the foreground patches; since i'm working on a project that must be able to work in nearly real-time environment, so i switched to the cascade classifier (to get less computational time).
So i trained many different cascade classifiers with opencv (opencv_traincascade). The output is good in terms of computational time (0.2-0.3 sec over the entire image, a lot less when launched only over the foreground), so i achieved the goal, let's say. Problem here is the quality of detection: i'm getting a lot of false positive and a lot of false negative. Since the only difference between the two methods is the base classifier used in opencv (decision tree or decision stumps, anyway no SVM as far as i understand), so i'm starting to think that my problem could be the base classifier (in some way, hog feature are best separated with hyperplanes i guess).
Of course, the dataset used in libsvm and Opencv is exactly the same, both for training and for testing...for the sake of completeness, i used nearly 9 thousands positive samples and nearly 30 thousands negative samples.
Here my two questions:
is it possible to change the base weak learner in the opencv_traincascade function? if yes, it the svm one of the possible choices? if the both answers are yes, how can i do such a thing? :)
are there other computer vision or machine learning libraries that implement the svm as weak classifier and have some methods to train a cascade classifier? (are these libraries suitable to be used in conjuction with opencv?)
thank you in advance as always!
Marco.
In principle a weak classifier can be anything, but the strength of Adaboost related methods is that they are able to obtain good results out of simple classifiers (they are called “weak” for a reason).
Using SVN and Adaboost cascade is a contradiction, as the former has no need to be used in such a framework: it is able to do its job by itself, and the latter is fast just because it takes advantage of weak classifiers.
Furthermore I don’t know of any study about it and OpenCv doesn’t support it: you have to write code by yourself. It is a huge undertaking and probably you won’t get any interesting result.
Anyway if you think that HOG features are more fitted for your task, OpenCv’s traincascade has an option for it, apart from Haar and Lbp.
As to your second question, I’m not sure but quite confident that the answer is negative.
My advice is: try to get the most you can from traincascade, for example try increase the number of samples id you can and compare the results.
This paper is quite good. It simply says that SVM can be treated as a weak classifier if you use fewer samples to train it (let's say less than half of the training set). The higher the weights the more chance it will be trained by the 'weak-SVM'.
The source code is not widely available unfortunately. If you want a quick prototype, use python scikit learn and see if you can get desirable results before modifying opencv.
I need to detect a human in a video in realtime. I guess its not much different from detecting a human in a static image (except that the video image is usually much lower resolution). Can you guys point me in some direction?
I don't have no experience in the computer vision field, so I any link, article, video that could give me a introduction would be useful.
Any help is appreciated.
Thanks.
One of the most famous methods for human detection is the Histogram of Oriented Gradients (HoG) detector. This has been implemented in the OpenCV library and should be a good starting point.
One way is to use HOG features.
This first method seems to be time-consuming
nevertheless it is a very successful human detection
algorithm. The second way is to optimize the HOG
algorithm by resizing the image this method results
in more than two times increase in detecting humans
in a
image while scarifying in the detection
accuracy mainly when persons are on the edges. The
third way consists in adapting Haar feature for
human detection this solution significantly reduces
computational cost in spite of the precision.
To evaluate the proposed method, we have
established a top-view human database.
Experimental results have demonstrated the
effectiveness and efficiency of the proposed
algorithm which gives a good ratio accuracy/time
execution.
I'm writing an application that uses an SVM to do classification on some images (specifically these). My Matlab implementation works really well. Using a SIFT bag-of-words approach, I'm able to get near 100% accuracy with a linear kernel.
I need to implement this in C++ for speed/portability reasons, and so I've tried using both libsvm and dlib. I've tried multiple SVM types (c_svm, nu_svm, one_class) and multiple kernels (linear, polynomial, rbf). The best I've been able to achieve is around 50% accuracy - even on the same samples that I've trained on. I've confirmed that my feature generators are working, because when I export my c++-generated features to Matlab and train on those, I'm able to get near-perfect results again.
Is there something magical about Matlab's SVM implementation? Are there any common pitfalls or areas that I might look into that would explain the behavior I'm seeing? I know this is a little vague, but part of the problem is that I don't know where to go. Please let me know in the comments if there is other info I can provide that would be helpful.
There is nothing magical about the Matlab version of the libraries, other that it runs in Matlab which makes it harder to shoot yourself on the foot.
A check list:
Are you normalizing your data, making all values lie between 0 and 1
(or between -1 and 1), either linearly or using the mean and the
standard deviation?
Are you parameter searching for a good value of C (or C and gamma in
the case of an RBF kernel)? Doing cross validation or on a hold out set?
Are you sure that your're handling NaN, and all other floating point
nastiness? Matlab is very good at hiding this from you, C++ not so
much.
Could it be that you're loading your data incorrectly, reading a
"%s" into a double or something that is adding noise to your input
data?
Could it be that libsvm/dlib expects the data in row major order and
your're sending it in in column major (or the other way around)? Again Matlab makes this almost impossible, C++ not so much.
32-64 bit nastiness one version of the library, executable compiled
with the other?
Some other things:
Could it be that in Matlab you're somehow leaking the class (y) into
the preprocessing? no one does this on purpose, but I've seen it happen.
If you make almost any f(y) a feature, you'll get almost 100%
everytime.
Sometimes it helps to verify that everything is numerically
identical by printing to file before training both in C++ and
Matlab.
i'm very happy with libsvm using the rbf kernel. carlosdc pointed out the most common errors in the correct order :-). for libsvm - did you use the python tools shipped with libsvm? if not i recommend to do so. write your feature vectors to a file (from matlab and/or c++) and do a metatraining for the rbf kernel with easy.py. you get the parameters and a prediction for the generated model. if this prediction is ok continue with c++. from training you also get a scaled feature file (min/max transformed to -1.0/1.0 for every feature). compare these to your c++ implementation as well.
some libsvm issues: a nasty habit is (if i remember correctly) that values scaling to 0 (zero) are omitted in the scaled file. in grid.py is a parameter "nr_local_worker" which is defining the mumber of threads. you might wish to increase it.
I have implemented k-nearest algorithm in my system. It consists from 26 classes, each of 100 samples. In my case, K=7 and it was completely trial and error to get the best classification result.
I know that K should be chosen wisely to reduce the noise on the classification. But what about the number of samples? Is there any general rule such as "the more samples the better result"? Does it depend on something?
Thank you for all your responses.
You could try considering whatever underlying mechanism is generating your data, or whatever background knowledge you have on the problem, which might give you an idea of the relative size of noise and true underlying variation. E.g. predicting favourite sports team from location I would expect more change than predicting favourite sport, so would use smaller k. However I don't know of much general guidance, except to use cross-validation.