Target Detection - Algorithm suggestions - c++

I am trying to do image detection in C++. I have two images:
Image Scene: 1024x786
Person: 36x49
And I need to identify this particular person from the scene. I've tried to use Correlation but the image is too noisy and therefore doesn't give correct/accurate results.
I've been thinking/researching methods that would best solve this task and these seem the most logical:
Gaussian filters
Convolution
FFT
Basically, I would like to move the noise around the images, so then I can use Correlation to find the person more effectively.
I understand that an FFT will be hard to implement and/or may be slow especially with the size of the image I'm using.
Could anyone offer any pointers to solving this? What would the best technique/algorithm be?

In Andrew Ng's Machine Learning class we did this exact problem using neural networks and a sliding window:
train a neural network to recognize the particular feature you're looking for using data with tags for what the images are, using a 36x49 window (or whatever other size you want).
for recognizing a new image, take the 36x49 rectangle and slide it across the image, testing at each location. When you move to a new location, move the window right by a certain number of pixels, call it the jump_size (say 5 pixels). When you reach the right-hand side of the image, go back to 0 and increment the y of your window by jump_size.
Neural networks are good for this because the noise isn't a huge issue: you don't need to remove it. It's also good because it can recognize images similar to ones it has seen before, but are slightly different (the face is at a different angle, the lighting is slightly different, etc.).
Of course, the downside is that you need the training data to do it. If you don't have a set of pre-tagged images then you might be out of luck - although if you have a Facebook account you can probably write a script to pull all of yours and your friends' tagged photos and use that.

A FFT does only make sense when you already have sort the image with kd-tree or a hierarchical tree. I would suggest to map the image 2d rgb values to a 1d curve and reducing some complexity before a frequency analysis.

I do not have an exact algorithm to propose because I have found that target detection method depend greatly on the specific situation. Instead, I have some tips and advices. Here is what I would suggest: find a specific characteristic of your target and design your code around it.
For example, if you have access to the color image, use the fact that Wally doesn't have much green and blue color. Subtract the average of blue and green from the red image, you'll have a much better starting point. (Apply the same operation on both the image and the target.) This will not work, though, if the noise is color-dependent (ie: is different on each color).
You could then use correlation on the transformed images with better result. The negative point of correlation is that it will work only with an exact cut-out of the first image... Not very useful if you need to find the target to help you find the target! Instead, I suppose that an averaged version of your target (a combination of many Wally pictures) would work up to some point.
My final advice: In my personal experience of working with noisy images, spectral analysis is usually a good thing because the noise tend to contaminate only one particular scale (which would hopefully be a different scale than Wally's!) In addition, correlation is mathematically equivalent to comparing the spectral characteristic of your image and the target.

Related

Image segmentation: identify spots and scratches with irregular borders

A quick introduction: I do physics research which includes experimental measurements and numerical simulations.
Below is the image which is the result of our theoretical model
.
Without going into details, I just say that the intensity and color here represent a simulated physical quantity.
Experimental results are below
The measurement has more features and details but it also has a lot of "invalid" data which are represented by darker spots, scratches and marks which have irregular borders and can vary in size and shape. Nonetheless by comparing these two pictures we can visually identify "invalid" pixels on the second figure which is the problem I am trying to solve using a computer.
Simple thresholding by intensity won't work because the valid data also can vary in intensity. I was thinking about using CNN but then I realized that it would be very tedious to prepare a training dataset because there a lot of small marks/spots needs to be marked and manually marking them will take a lot of time.
Is there any other solution for this problem? Or may be there is a pretrained neural network ( maybe SVM?) which handles a similar problem?
Let's check all options one by one taking into account the following:
you have a very specific physical process
you need accurate results
(both process-wise and geometry-wise)
CNNs
It will be hard to find a "ready-to-be-used" model for your specific process. Moreover, there will be a need to take some specific actions to get an accurate geometry out of it:
https://ai.facebook.com/blog/using-a-classical-rendering-technique-to-push-state-of-the-art-for-image-segmentation/
Background subtraction
Background subtraction will require a threshold, so for your examples and conditions it has no sense. I produced two masks based on subtracted background, find the difference:
Color-based segmentation
With a properly defined threshold (let's assume we use delta_E) you can segment several areas of interest. For example, lets define three:
bright red
red
black/dark red
Let's compare:
Before:
After:
Additional area:
Before:
After:
So color-based segmentation seems to be an option, but it is better to improve input if possible. I hope it makes any sense.

How to detect anomalies in opencv (c++) if threshold is not good enought?

I have grayscale images like this:
I want to detect anomalies on this kind of images. On the first image (upper-left) I want to detect three dots, on the second (upper-right) there is a small dot and a "Foggy area" (on the bottom-right), and on the last one, there is also a bit smaller dot somewhere in the middle of the image.
The normal static tresholding does't work ok for me, also Otsu's method is always the best choice. Is there any better, more robust or smarter way to detect anomalies like this? In Matlab I was using something like Frangi Filtering (eigenvalue filtering). Can anybody suggest good processing algorithm to solve anomaly detection on surfaces like this?
EDIT: Added another image with marked anomalies:
Using #Tapio 's tophat filtering and contrast adjustement.
Since #Tapio provide us with great idea how to increase contrast of anomalies on the surfaces like I asked at the begining, I provide all you guys with some of my results. I have and image like this:
Here is my code how I use tophat filtering and contrast adjustement:
kernel = getStructuringElement(MORPH_ELLIPSE, Size(3, 3), Point(0, 0));
morphologyEx(inputImage, imgFiltered, MORPH_TOPHAT, kernel, Point(0, 0), 3);
imgAdjusted = imgFiltered * 7.2;
The result is here:
There is still question how to segment anomalies from the last image?? So if anybody have idea how to solve it, just take it! :) ??
You should take a look at bottom-hat filtering. It's defined as the difference of the original image and the morphological closing of the image and it makes small details such as the ones you are looking for flare out.
I adjusted the contrast to make both images visible. The anomalies are much more pronounced when looking at the intensities and are much easier to segment out.
Let's take a look at the first image:
The histogram values don't represent the reality due to scaling caused by the visualization tools I'm using. However the relative distances do. So now the thresholding range is much larger, the target changed from a window to a barn door.
Global thresholding ( intensity > 15 ) :
Otsu's method worked poorly here. It segmented all the small details to the foreground.
After removing noise by morphological opening :
I also assumed that the black spots are the anomalies you are interested in. By setting the threshold lower you include more of the surface details. For example the third image does not have any particularly interesting features to my eye, but that's for you to judge. Like m3h0w said, it's a good heuristic to know that if something is hard for your eye to judge it's probably impossible for the computer.
#skoda23, I would try unsharp masking with fine tuned parameters for the blurring part, so that the high frequencies get emphasized and test it thoroughly so that no important information is lost in the process. Remember that it is usually not good idea to expect computer to do super-human work. If a human has doubts about where the anomalies are, computer will have to. Thus it is important to first preprocess the image, so that the anomalies are obvious for the human eye. Alternative for unsharp masking (or addition) might be CLAHE. But again: remember to fine tune it very carefully - it might bring out the texture of the board too much and interfere with your task.
Alternative approach to basic thresholding or Otsu's, would be AdaptiveThreshold() which might be a good idea since there is a difference in intensity values between different regions you want to find.
My second guess would be first using fixed value thresholding for the darkest dots and then trying Sobel, or Canny. There should exist an optimal neighberhood where texture of the board will not shine as much and anomalies will. You can also try bluring before edge detection (if you've detected the small defects with the thresholding).
Again: it is vital for the task to experiment a lot on every step of this approach, because fine tuning the parameters will be crucial for eventual success. I'd recommend making friends with the trackbar, to speed up the process. Good luck!
You're basically dealing with the unfortunate fact that reality is analog. A threshold is a method to turn an analog range into a discrete (binary) range. Any threshold will do that. So what exactly do you mean with a "good enough" threshold?
Let's park that thought for a second. I see lots of anomalies - sort of thin grey worms. Apparently, you ignore them. I'm applying a different threshold then you are. This may be reasonable, but you're applying domain knowledge that I don't have.
I suspect these grey worms will be throwing off your fixed value thresholding. That's not to say the idea of a fixed threshold is bad. You can use it to find some artifacts and exclude those. Somewhat darkish patches will be missed, but can be brought out by replacing each pixel with the median value of its neighborhood, using a neighborhood size that's bigger than the width of those worms. In the dark patch, this does little, but it wipes out small local variations.
I don't pretend these two types of abnormalities are the only two, but that is really an application domain question and not about techniques. E.g. you don't appear to have ligthing artifacts (reflections), at least not in these 3 samples.

prior based image segmentation

I have a problem at hand, in which my image is composed of strange objects which do not necessarily have closed contours. (more like rivers and channels on a plain back ground).
I am also provided with a set of prior images of the same size from different rivers that their general orientation and structure matches my river under study while their position in the image might deviate.
I am looking for an image segmentation method, (theory or practice, i am really looking for clues to start with) which can actually use my set of prior examples in segmenting my river. in my case there could be multiple rivers of the same general orientation present in the image.
I am also very interested in ways of statistically representing these complex structures. for example, if it was not a river image (binary image), and i knew it had a Gaussian structure, then I could use information in the covariance estimated by the examples. but in binary or trinary images, I can not.
Here is an outline for image segmentation
Sample a small region (possible a rectangle) inside the river, the assumption is that they will belong to the foreground and provide a good estimate about its color distribution. You should have an algorithm which can find a small region inside the river with high confidence, probably this algorithm can be trained on the data you have.
Since you know little about the background, it would to be ideal to chose pixels lying on the image frame as background pixels.
The idea is to use these pre-selected foreground and background pixels as seeds in a graph cut algorithm for segmentation. Selecting seeds is the most important part of a graph cut algorithm for segmentation, once you have good seeds, the segmentation would be more or less correct. There is plenty of literature/code available online on how to do segmentation using graph cuts.

OpenCV detect image against a image set

I would like to know how I can use OpenCV to detect on my VideoCamera a Image. The Image can be one of 500 images.
What I'm doing at the moment:
- (void)viewDidLoad
{
[super viewDidLoad];
// Do any additional setup after loading the view.
self.videoCamera = [[CvVideoCamera alloc] initWithParentView:imageView];
self.videoCamera.delegate = self;
self.videoCamera.defaultAVCaptureDevicePosition = AVCaptureDevicePositionBack;
self.videoCamera.defaultAVCaptureSessionPreset = AVCaptureSessionPresetHigh;
self.videoCamera.defaultAVCaptureVideoOrientation = AVCaptureVideoOrientationPortrait;
self.videoCamera.defaultFPS = 30;
self.videoCamera.grayscaleMode = NO;
}
-(void)viewDidAppear:(BOOL)animated{
[super viewDidAppear:animated];
[self.videoCamera start];
}
#pragma mark - Protocol CvVideoCameraDelegate
#ifdef __cplusplus
- (void)processImage:(cv::Mat&)image;
{
// Do some OpenCV stuff with the image
cv::Mat image_copy;
cvtColor(image, image_copy, CV_BGRA2BGR);
// invert image
//bitwise_not(image_copy, image_copy);
//cvtColor(image_copy, image, CV_BGR2BGRA);
}
#endif
The images that I would like to detect are 2-5kb small. Few got text on them but others are just signs. Here a example:
Do you guys know how I can do that?
There are several things in here. I will break down your problem and point you towards some possible solutions.
Classification: Your main task consists on determining if a certain image belongs to a class. This problem by itself can be decomposed in several problems:
Feature Representation You need to decide how you are gonna model your feature, i.e. how are you going to represent each image in a feature space so you can train a classifier to separate those classes. The feature representation by itself is already a big design decision. One could (i) calculate the histogram of the images using n bins and train a classifier or (ii) you could choose a sequence of random patches comparison such as in a random forest. However, after the training, you need to evaluate the performance of your algorithm to see how good your decision was.
There is a known problem called overfitting, which is when you learn too well that you can not generalize your classifier. This can usually be avoided with cross-validation. If you are not familiar with the concept of false positive or false negative, take a look in this article.
Once you define your feature space, you need to choose an algorithm to train that data and this might be considered as your biggest decision. There are several algorithms coming out every day. To name a few of the classical ones: Naive Bayes, SVM, Random Forests, and more recently the community has obtained great results using Deep learning. Each one of those have their own specific usage (e.g. SVM ares great for binary classification) and you need to be familiar with the problem. You can start with simple assumptions such as independence between random variables and train a Naive Bayes classifier to try to separate your images.
Patches: Now you mentioned that you would like to recognize the images on your webcam. If you are going to print the images and display in a video, you need to handle several things. it is necessary to define patches on your big image (input from the webcam) in which you build a feature representation for each patch and classify in the same way you did in the previous step. For doing that, you could slide a window and classify all the patches to see if they belong to the negative class or to one of the positive ones. There are other alternatives.
Scale: Considering that you are able to detect the location of images in the big image and classify it, the next step is to relax the toy assumption of fixes scale. To handle a multiscale approach, you could image pyramid which pretty much allows you to perform the detection in multiresolution. Alternative approaches could consider keypoint detectors, such as SIFT and SURF. Inside SIFT, there is an image pyramid which allows the invariance.
Projection So far we assumed that you had images under orthographic projection, but most likely you will have slight perspective projections which will make the whole previous assumption fail. One naive solution for that would be for instance detect the corners of the white background of your image and rectify the image before building the feature vector for classification. If you used SIFT or SURF, you could design a way of avoiding explicitly handling that. Nevertheless, if your input is gonna be just squares patches, such as in ARToolkit, I would go for manual rectification.
I hope I might have given you a better picture of your problem.
I would recommend using SURF for that, because pictures can be on different distances form your camera, i.e changing the scale. I had one similar experiment and SURF worked just as expected. But SURF has very difficult adjustment (and expensive operations), you should try different setups before you get the needed results.
Here is a link: http://docs.opencv.org/modules/nonfree/doc/feature_detection.html
youtube video (in C#, but can give an idea): http://www.youtube.com/watch?v=zjxWpKCQqJc
I might not be qualified enough to answer this problem. Last time I seriously use OpenCV it was still 1.1. But just some thought on it, and hope it would help (currently I am interested in DIP and ML).
I think it will probably an easier task if you only need to classify an image, if the image is just one from (or very similar to) your 500 images. For this you could use SVM or some neural network (Felix already gave an excellent enumeration on that).
However, your problem seems to be that you need to first find this candidate image in your webcam, the location of which you have little clue beforehand. (let us know whether it is so. I think it is important.)
If so, the harder problem is the detection/localization of your candidate image.
I don't have a general solution for that. The first thing I would do is to see if there is some common feature in your 500 images (e.g., whether all of them enclosed by a red circle, or, half of them have circle and half of them have rectangle). If this can be done, the problem will be simpler (it would be similar to face detection problem, which have good solution).
In other words, this means that you first classify the 500 images to a few groups with common feature (by human), and detect the group first, then scale and use above mentioned technique to classify them into fine result. In this way, it will be more computationally acceptable than trying to detect 500 images one by one.
BTW, this ppt would help to give a visual clue of what is going on for feature extraction and image matching http://courses.cs.washington.edu/courses/cse455/09wi/Lects/lect6.pdf.
Detect vs recognize: detecting the image is just finding it on the background and from your comments I realized you may have your sings surrounded by the background. It might facilitate your algorithm if you can somehow crop your signs from the background (detect) before trying to recognize them. Recognizing is a next stage that presumes you can classify the cropped image correctly as the one seen before.
If you need real time speed and scale/rotation invariance neither SIFT no SURF will do this fast. Nowadays you can do much better if you shift the burden of image processing to a learning stage as was done by Lepitit. In short, he subjected each pattern to a bunch of affine transformations and trained a binary classification tree to recognize each point correctly by doing a lot of binary comparison tests. Trees are extremely fast and a way to go not to mention that most of the processing is done offline. This method is also more robust to off-plane rotations than SIFT or SURF. You will also learn about tree classification which may facilitate you last processing stage.
Finally a recognition stage is based not only on the number of matches but also on their geometric consistency. Since your signs look flat I suggest finding either affine or homography transformation that has most inliers when calculated between matched points.
Looking at your code though I realized that you may not follow any of these recommendations. It may be a good starting point for you to read about decision trees and then play with some sample code (see mushroom.cpp in the above mentioned link)

Tips for background subtraction in the face of noise

Background subtraction is an important primitive in computer vision. I'm looking at different methods that have been developed, and I've begun thinking about how to perform background subtraction in the face of random, salt and pepper noise.
In a system such as the Microsoft Kinect, the infrared camera will give off random noise pretty consistently. If you are trying to background subtract from the depth view, how can you avoid an issue with this random noise while reliably subtracting the background?
as you already said, noise and other unsteady parts of your background might give problems in segmentation, I mean lighting changes or other moving stuff in the background.
But if you're working on some indoor-project this shouldn't be too big of an issue, except of course the noise thing.
Besides substracing the background from an image to segment the objects in it you could also try to subtract two (or in some methods even three) following frames from each other. If the camera is steady this should leave the parts that have changed, so basically the objects that have moved. So this is an easy method for detecting moving objects.
But in most operations you might use you probably will have that noise you described. Easiest way to get rid of it is by using Median Filter or Morpholocigal Operators (Opening) on the segmented binary image. This should effectively remove small parts and leave the nice big blobs of the objects.
Hope that helps...
typically you do connected components (cc) in disparity space and then kill any cc that have a small size. The threshold for size and for connectedness (e.g. what is the disparity difference between two adjacent pixel to still consider them connected) are your two parameters to play with (ivlad#lab126.com).
As #evident mentioned, median filter is your ticket. That's the standard operator for getting rid of salt-and-pepper noise while being edge-preserving.
That said, I disagree with his suggestion that this occur on the segmented binary image. Median filtering is very low-level and should be applied on the raw data before any subsequent processing.