How can i apply SVM or deep neural network for image retrieval - computer-vision

After obtaining the image dataset, the feature database is constructed for all images which is a vector based on mean and sd of RGB color model and HSV color model for a portion of the image. How can I use a svm to retieve related images from the database once the query image is given.
Also how to use unsupervised learning for the above problem

Assuming the query images are unlabeled, applying SVM would require a way of knowing the labels for dataset images since SVM is a form of supervised learning, which seeks to correctly determine class labels for unlabeled data. You would need another method for generating class labels, such as unsupervised learning, so this approach does not seem relevant if you only have feature vectors but no class labels.
A neural network allows for unsupervised learning with unlabeled data, but is a rather complex approach and is the subject of academic research. You may want to consider a simpler machine learning approach such as k-Nearest Neighbors, which allows you to obtain the k closest training samples that are similar in your feature space. This algorithm is simple to implement and is found in many machine learning libraries. For example in Python you can use scikit learn.
I am unsure what type of images you are working with, but you might also want to explore using feature detector algorithms such as SIFT rather than just pixel intensities.

Related

Real-time object tracking in OpenCV

I have written an object classification program using BoW clustering and SVM classification algorithms. The program runs successfully. Now that I can classify the objects, I want to track them in real time by drawing a bounding rectangle/circle around them. I have researched and came with the following ideas.
1) Use homography by using the train set images from the train data directory. But the problem with this approach is, the train image should be exactly same as the test image. Since I'm not detecting specific objects, the test images are closely related to the train images but not essentially an exact match. In homography we find a known object in a test scene. Please correct me if I am wrong about homography.
2) Use feature tracking. Im planning to extract the features computed by SIFT in the test images which are similar to the train images and then track them by drawing a bounding rectangle/circle. But the issue here is how do I know which features are from the object and which features are from the environment? Is there any member function in SVM class which can return the key points or region of interest used to classify the object?
Thank you

SVM training C++ OpenCV

I was under the impression the training data given to train an SVM consisted of image features, but after reading this post again, the training_mat that is given to the SVM in the example is just the img_mat flattened to 1-Dimension.
So my question is, when training an SVM, do you give it whole images in their entirety, row by row, or do you detect and extract the features, and then flatten a Mat of that into 1-Dimension?
You can extract features, or you can use pixel intensity values as the features. In this example, they have done the latter. In this case, you end up with a very high number of features that many of them may be not useful. This makes the convergence of the SVM training more difficult, but can be still possible. Based on my personal experience, SVM works better if you extract a lower number of "good" features that best describe your data. However, in recent years, it has been shown that state-of-the-art estimators like deep neural networks (when used instead of SVM) can perform very well with only using the pixel intensity values as features. This has eliminated the need for feature extraction in the methods that has led to state-of-the-art results on public data sets (like ImageNet)

Face recognition using neural networks

I am doing a project on face recognition, for that I have already used different methods like eigenface, fisherface, LBP histograms and surf. But these methods are not giving me an accurate result. Surf gives good matches for exact same images, but I need to match one image with it's own different poses(wearing glasses,side pose,if somebody is covering his face) etc. LBP compares histogram of images, i.e., only color informations. So when there is high variation on lighting condition it is not showing good results. So I heard about neural networks, but I don't know much about that. Is it possible to train the system very accurately by using neural networks. If possible how can we do that?
According to this OpenCV page, there does seem to be some support for machine learning. That being said, the support does seem to be a bit limited.
What you could do, would be to:
User OpenCV to extract the face of the person.
Change the image to grey scale.
Try to manipulate so that the face is always the same size.
All the above should be doable with OpenCV itself (could be wrong, haven't messed with OpenCV in a while) so that should save you some time.
Next, you take the image, as a bitmap maybe, and feed the bitmap as a vector to the neural network. Alternatively, as #MatthiasB recommended, you could feed the features instead of individual pixels. This would simplify the data being passed, thus making the network easier to train.
As for training, you manipulate these images as above, and then feed them to the network. If a person uses glasses occasionally, you could have cases of the same person with and without glasses, etc.

Object Annotation in images with OpenCV

I am trying to develop an automatic(or semi-automatic) image annotator for my final year project with OpenCV. I have been studying many OpenCV resources and have come across cascade classification for training and detection purposes. I understood that part, and also tried the Face Detection tutorial provided with OpenCV. So, now I know how to train and detect objects.
However, I still cannot understand how can I annotate objects present in the image?
For example, the system will show that this is an object, but I want the system to show that it is a ball. How can i accomplish that?
Thanks in advance.
One binary classificator (detector) can separate objects by two classes:
positive - the object type classifier was trained for,
and negative - all others.
If you need detect several distinguished classes you should use one detector for each class, or you can train multiclass classifier ("one vs all" type of classifiers for example), but it usually works slower and with less accuracy (because detector better search for similar objects). You can also take a look at convolutional networks (by Yann LeCun).
This is a very hard task. I suggest simplifying it by using latent SVM detector and limiting yourself to the models it supplies:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html

Object recognition using LDA and ORB with different sized training image.

I'm trying to build a lightweight object recognition system using ORB for feature extraction and LDA for classification. But I'm running into an issue do to the varying size of extracted features.
These are my steps:
Extract keypoints using ORB.
Extract trainable features in the image by grouping the keypoints.
(example of whats being extracted: http://imgur.com/gaQWk)
Train the recognizer with the extracted features. (This is where problems arise)
Classify objects in an image from the wild.
If I attempt to create a generalized matrix using cv::gemm, I get an exception due to the varying sizes. My first thought was to just to normalize all the images by resizing them, but this causes a lot of accuracy issues when objects have similar small features.
Is there any solution to this? Is LDA an appropriate method for this? I know it's commonly used with facial recognition algorithms such as fisherfaces.
LDA requires fixed length features, as do most optimization and machine learning methods. You could resize the image patches to be a fixed size, but that is probably not going to be a good feature. Normally people use a scale invariant feature such as SIFT. You also might try a color histogram, or some variation of edge detection and spatial histogram binning such as a GIST vector.
It's hard to say if LDA is an appropriate method for this without knowing what you hope to accomplish. You might also look into using SVM, some form of boosting, or just plain nearest neighbor with a large training set.