I have been in the project of testing the images of sample products compared with a sample product image. I have come up with two approaches but there are problems encountered in each approach.
Method 1. Remove Background, Realign images according to features and then find the difference of two images by subtraction.
Problem: I am thinking about using template matching to extract the region of interest and save as a new picture. However, is it possible to use template matching to extract? I saw the sample provided by opencv can make a frame or rectangle around the matched object. So, it seems feasible for me to make it at the center in the new picture. If it is possible, what is the way of making a square matched as the center of a new picture? It seems a bit difficult as the matched rectangle may not be horizontal.
Method 2. Cascade Classifier Training: it seems I can train the classifier to know what bad images are and what good images are.
Problem: However, from the classifier detection sample by opencv, it was used to compare during a video. Is it possible to do so on images? Also, how could I adjust the sample error or the precision of the classifier detection?
If you have any other feasible suggestions, please kindly give me some advices on them. Thanks for your kind attention!!
Related
Please I need your help with this problem, I want to create a program to differentiate between the two forms(2 images), with a camera in real time, here are the methods. I found but I’m not sure they’re going to work because I want the detection to be feasible if the object is inclined by 90 degrees or 180 degrees by example, I have to use machine learning in this problem but I am open to any proposition, also I do not have many images in the database.
Here are the methods I found but I'm not sure they will work;
1 - Apply Canny filter to extract contours.
2 - Use a features extractors such SIFT, Fourier Descriptors, Haralick's Features, Hough Transform to extract more details which could be summarised in a short vector.
3-Then train SVM or ANN with this vector.
The goal is to detect two cases : Open or Close
Also i dont know that contours are the best way to solve this problem because the background changes a lot
The original images are valves with different shape, here is an example :
I've seen questions about detecting blurry images, but what about faded/grainy images. I have a large dataset of scanned passport-style portrait photos, and a number of them are old, hence looking faded and grainy (i.e hard to recognize the person).
Image quality metrics like BRISQUE and blur detection [link] didn't work so well and were inconsistent. The criteria for classification would be whether the photo was good enough for an average person to tell who the person was from the image.
So I tried face detection (HOG, etc), but it recognizes images where it's pretty much impossible to tell who the person is.
Ideally I'm looking for suggestions that is somewhat lightweight.
First idea I would check is image histograms. It's especially
straightforward in case of grayscale images. My assumption is
that quality photos have intensity distribution close to normal,
while grainy and faded photos do not. If histograms look similar
across images (looks like you have enough examples to check) in one
group it's easy to classify new image based on its histogram. You
can also consider counting histogram of image center's. Just area
containing eyes, nose and mouth. Low-quality images may loose this
details.
Another idea is to apply low-frequencies filter on image to remove
noise. Than count some metric based on some edge detector (Sobel,
Laplace, Canny, etc.) or just try to find any edges except one
around hair.
Another way is to average good images and compare this sample with
new ones. Higher difference will mean that observed image is not
typical portrait. Or try face-detection with cascade-based detector.
Or maybe some combination of this ideas will give a good result on your problem.
Sure it's possible to train a NN classifier, but I think it's possible to solve that specific problem without it.
I am a new user of opencv. I am currently doing a project of performing product inspection with opencv. I was planning to extract edges of good product and the bad product then compare their edges maybe with mean square difference. However, it is already quite difficult to extract the edge clearly as the first step.
Good sample:Good product
[![enter image description here][1]][1]
When I use canny edge detection, the edge of the good product (the blue part of the picture) only includes part of the product and is as follows:
Edge of good product
[![enter image description here][2]][2]
I also tried to use adaptiveThreshold to make the greyscale picture more clear and then use the edge detection. But, the edge detected is not as good as expected because of many noise.
Therefore, I would like to ask for a solution of extracting the edge or any better way of comparing good product and bad product with opencv. Sorry for the bad english above.
This task can be made simple if some assumptions like those below are valid. For example:
Products move along the same production line, and therefore the lighting across all product images stays the same.
The products lie on a flat surface parallel to the focal plane of the camera, so the rotation of objects will be only around the axis of the lens.
The lighting can be controlled (this will help you to get better edges, and is actually used in production line image processing)
Since we cannot see the images that you added, it is a bit hard to see what exactly the situation is. But if the above assumptions are valid, image differencing combined with image signatures is one possible approach.
Another possibility is to train a Haar Cascade classifier using good and bad products. With this, the edges and all that will be taken care of. But you will have to collect a lot of data and train the classifier.
I would like to compare a picture (with his descriptors) with thousand of pictures inside a database in order to do a matching. (if two pictures are the same,that is to say the same thing but it can bo rotated, a bit blured, has a different scale etc.).
For example :
I saw on StackOverflaw that compute descriptors for each picture and compare them one to one is very a long process.
I did some researches and i saw that i can do an algorithm based on Bag of Words.
I don't know exactly how is works yet, but it seems to be good. But in think, i can be mistaked, it is only to detect what kind of object is it not ?
I would like to know according to you if using it can be a good solution to compare a picture to a thousands of pictures using descriptors like Sift of Surf ?
If yes, do you have some suggestions about how i can do that ?
Thank,
Yes, it is possible. The only thing you have to pay attention is the computational requirement which can be a little overwhelming. If you can narrow the search, that usually help.
To support my answer I will extract some examples from a recent work of ours. We aimed at recognizing a painting on a museum's wall using SIFT + RANSAC matching. We have a database of all the paintings in the museum and a SIFT descriptor for each one of them. We aim at recognizing the paining in a video which can be recorded from a different perspective (all the templates are frontal) or under different lighting conditions. This image should give you an idea: on the left you can see the template and the current frame. The second image is the SIFT matching and the third shows the results after RANSAC.
Once you have the matching between your image and each SIFT descriptor in your database, you can compute the reprojection error, namely the ratio between matched points (after RANSAC) and the total number of keypoints. This can be repeated for each image and the image with the lowest reprojection error can be declared as the match.
We used this for paintings but I think that can be generalized for every kind of image (the android logo you posted in the question is a fair example i think).
Hope this helps!
I would like to know how I can use OpenCV to detect on my VideoCamera a Image. The Image can be one of 500 images.
What I'm doing at the moment:
- (void)viewDidLoad
{
[super viewDidLoad];
// Do any additional setup after loading the view.
self.videoCamera = [[CvVideoCamera alloc] initWithParentView:imageView];
self.videoCamera.delegate = self;
self.videoCamera.defaultAVCaptureDevicePosition = AVCaptureDevicePositionBack;
self.videoCamera.defaultAVCaptureSessionPreset = AVCaptureSessionPresetHigh;
self.videoCamera.defaultAVCaptureVideoOrientation = AVCaptureVideoOrientationPortrait;
self.videoCamera.defaultFPS = 30;
self.videoCamera.grayscaleMode = NO;
}
-(void)viewDidAppear:(BOOL)animated{
[super viewDidAppear:animated];
[self.videoCamera start];
}
#pragma mark - Protocol CvVideoCameraDelegate
#ifdef __cplusplus
- (void)processImage:(cv::Mat&)image;
{
// Do some OpenCV stuff with the image
cv::Mat image_copy;
cvtColor(image, image_copy, CV_BGRA2BGR);
// invert image
//bitwise_not(image_copy, image_copy);
//cvtColor(image_copy, image, CV_BGR2BGRA);
}
#endif
The images that I would like to detect are 2-5kb small. Few got text on them but others are just signs. Here a example:
Do you guys know how I can do that?
There are several things in here. I will break down your problem and point you towards some possible solutions.
Classification: Your main task consists on determining if a certain image belongs to a class. This problem by itself can be decomposed in several problems:
Feature Representation You need to decide how you are gonna model your feature, i.e. how are you going to represent each image in a feature space so you can train a classifier to separate those classes. The feature representation by itself is already a big design decision. One could (i) calculate the histogram of the images using n bins and train a classifier or (ii) you could choose a sequence of random patches comparison such as in a random forest. However, after the training, you need to evaluate the performance of your algorithm to see how good your decision was.
There is a known problem called overfitting, which is when you learn too well that you can not generalize your classifier. This can usually be avoided with cross-validation. If you are not familiar with the concept of false positive or false negative, take a look in this article.
Once you define your feature space, you need to choose an algorithm to train that data and this might be considered as your biggest decision. There are several algorithms coming out every day. To name a few of the classical ones: Naive Bayes, SVM, Random Forests, and more recently the community has obtained great results using Deep learning. Each one of those have their own specific usage (e.g. SVM ares great for binary classification) and you need to be familiar with the problem. You can start with simple assumptions such as independence between random variables and train a Naive Bayes classifier to try to separate your images.
Patches: Now you mentioned that you would like to recognize the images on your webcam. If you are going to print the images and display in a video, you need to handle several things. it is necessary to define patches on your big image (input from the webcam) in which you build a feature representation for each patch and classify in the same way you did in the previous step. For doing that, you could slide a window and classify all the patches to see if they belong to the negative class or to one of the positive ones. There are other alternatives.
Scale: Considering that you are able to detect the location of images in the big image and classify it, the next step is to relax the toy assumption of fixes scale. To handle a multiscale approach, you could image pyramid which pretty much allows you to perform the detection in multiresolution. Alternative approaches could consider keypoint detectors, such as SIFT and SURF. Inside SIFT, there is an image pyramid which allows the invariance.
Projection So far we assumed that you had images under orthographic projection, but most likely you will have slight perspective projections which will make the whole previous assumption fail. One naive solution for that would be for instance detect the corners of the white background of your image and rectify the image before building the feature vector for classification. If you used SIFT or SURF, you could design a way of avoiding explicitly handling that. Nevertheless, if your input is gonna be just squares patches, such as in ARToolkit, I would go for manual rectification.
I hope I might have given you a better picture of your problem.
I would recommend using SURF for that, because pictures can be on different distances form your camera, i.e changing the scale. I had one similar experiment and SURF worked just as expected. But SURF has very difficult adjustment (and expensive operations), you should try different setups before you get the needed results.
Here is a link: http://docs.opencv.org/modules/nonfree/doc/feature_detection.html
youtube video (in C#, but can give an idea): http://www.youtube.com/watch?v=zjxWpKCQqJc
I might not be qualified enough to answer this problem. Last time I seriously use OpenCV it was still 1.1. But just some thought on it, and hope it would help (currently I am interested in DIP and ML).
I think it will probably an easier task if you only need to classify an image, if the image is just one from (or very similar to) your 500 images. For this you could use SVM or some neural network (Felix already gave an excellent enumeration on that).
However, your problem seems to be that you need to first find this candidate image in your webcam, the location of which you have little clue beforehand. (let us know whether it is so. I think it is important.)
If so, the harder problem is the detection/localization of your candidate image.
I don't have a general solution for that. The first thing I would do is to see if there is some common feature in your 500 images (e.g., whether all of them enclosed by a red circle, or, half of them have circle and half of them have rectangle). If this can be done, the problem will be simpler (it would be similar to face detection problem, which have good solution).
In other words, this means that you first classify the 500 images to a few groups with common feature (by human), and detect the group first, then scale and use above mentioned technique to classify them into fine result. In this way, it will be more computationally acceptable than trying to detect 500 images one by one.
BTW, this ppt would help to give a visual clue of what is going on for feature extraction and image matching http://courses.cs.washington.edu/courses/cse455/09wi/Lects/lect6.pdf.
Detect vs recognize: detecting the image is just finding it on the background and from your comments I realized you may have your sings surrounded by the background. It might facilitate your algorithm if you can somehow crop your signs from the background (detect) before trying to recognize them. Recognizing is a next stage that presumes you can classify the cropped image correctly as the one seen before.
If you need real time speed and scale/rotation invariance neither SIFT no SURF will do this fast. Nowadays you can do much better if you shift the burden of image processing to a learning stage as was done by Lepitit. In short, he subjected each pattern to a bunch of affine transformations and trained a binary classification tree to recognize each point correctly by doing a lot of binary comparison tests. Trees are extremely fast and a way to go not to mention that most of the processing is done offline. This method is also more robust to off-plane rotations than SIFT or SURF. You will also learn about tree classification which may facilitate you last processing stage.
Finally a recognition stage is based not only on the number of matches but also on their geometric consistency. Since your signs look flat I suggest finding either affine or homography transformation that has most inliers when calculated between matched points.
Looking at your code though I realized that you may not follow any of these recommendations. It may be a good starting point for you to read about decision trees and then play with some sample code (see mushroom.cpp in the above mentioned link)