Any idea how I can get the smaller blobs belonging to the same vehicle count as 1 vehicle? Due to background subtraction, in the foreground mask, some of the blobs belonging to a vehicle are quite small, and hence filtering the blobs based on their size won't work.
Try filtering things based on colorDistance() and the comparing the mean color of the blobs in the image with the vehicle against a control image of the background without the car in it. The SimpleCV docs have a tutorial specifically on this topic. That said... it may not always work as expected. Another possibility (just occurred to me) might be summing up the area of the blobs of interest and seeing if that sum is over a given thresh-hold, rather than just any one blob itself.
Related
I've seen questions about detecting blurry images, but what about faded/grainy images. I have a large dataset of scanned passport-style portrait photos, and a number of them are old, hence looking faded and grainy (i.e hard to recognize the person).
Image quality metrics like BRISQUE and blur detection [link] didn't work so well and were inconsistent. The criteria for classification would be whether the photo was good enough for an average person to tell who the person was from the image.
So I tried face detection (HOG, etc), but it recognizes images where it's pretty much impossible to tell who the person is.
Ideally I'm looking for suggestions that is somewhat lightweight.
First idea I would check is image histograms. It's especially
straightforward in case of grayscale images. My assumption is
that quality photos have intensity distribution close to normal,
while grainy and faded photos do not. If histograms look similar
across images (looks like you have enough examples to check) in one
group it's easy to classify new image based on its histogram. You
can also consider counting histogram of image center's. Just area
containing eyes, nose and mouth. Low-quality images may loose this
details.
Another idea is to apply low-frequencies filter on image to remove
noise. Than count some metric based on some edge detector (Sobel,
Laplace, Canny, etc.) or just try to find any edges except one
around hair.
Another way is to average good images and compare this sample with
new ones. Higher difference will mean that observed image is not
typical portrait. Or try face-detection with cascade-based detector.
Or maybe some combination of this ideas will give a good result on your problem.
Sure it's possible to train a NN classifier, but I think it's possible to solve that specific problem without it.
A little introduction on what I'm doing ...
For academic purposes I am creating an application in c++ using opencv for the detection of static objects in a scene.
The application is based on a combined approach of background subtraction and tracking, and the detection of events related to the abandonment of the objects works fine.
But at the moment I have a problem that I can't solve; I have to implement a finite state machine for detect the event of object removal, both before and after the entry of the object in the background.
To do this I was ordered by my superiors to use the edges of objects.
And now the problem.
After detecting a vehicle illegally parked along a road, I need to compare the edges of various images (the background captured at the time of the alarm, the current background, the current frame) to understand what the vehicle do (picks up the movement, remains parked or picks up the movement after being in the background).
I run these comparisons on the region of the scene in which there is the vehicle (vehicles typically have different size), I pull the edges using canny algorithm by obtaining a binarized CV_8UC1 cv::Mat.
At this point I have to compare them.
I tried to detect the contours with findContours and compare them with matchShapes, but it does not seem the right way, I'd compare each contour of the first image with every contour of the second, in addition typically the two images to campare have different number of contour (for example original background and current background, because the edges of the current background increased with the entry of the vehicle in the background).
I also tried to create a new image in which each pixel corresponds to the absolute difference of the other two, then I counted the white pixels of the difference image (wPx), and I used this number for comparison in this way: I set two thresholds (thr1 and thr2), and counted the pixels of the bounding rect of the vehicle (perim), if wPxthr2*perim images are different.
(I set percentages thresholds and I moltipy them with the perimeter of the bounding box to adapt the thresholds to the vehicle dimensions.)
This solution, however, seems to be very little robust.
Do you have something simple to suggest me?
Thank you very much in any case, more than once you StackOverflow users have helped me!
PS: THIS is an example of the images that I have to compare
The first is the background without the vehicle stationary, contains the edges of the street;
the second is the original background, the one captured when the stationary vehicle is detected;
the third is the current background (which in this case is equal to the original being the same frame, but then change);
the fourth is the current frame of the video;
You may want to take a look at this paper: A Novel SIFT-Like-Based Approach
for FIR-VS Images Registration. Aguilera et al. propose an Edge Oriented Histogram descriptor (EOH-SIFT).
This paper intends to register multispectral images, visible and infrared image, to each other. Because of the different characteristics of the images, the authors first extract edges/contours in both images, which results in images similiar to yours.
So, you can describe your image patches using this descriptor, illustrated in the following figure (taken from the above paper):
Subdivide your image patch into 4x4 zones
For each of the 16 subregions compose a histogram of contour's orientation (5 bins)
Put the histograms together into one descriptor vector of size 16x5=80 bins
Normalize the feature vector
So, every image you want to compare (in your case 4) is described by its 80-dimensional feature vector. You can compare them to each other by calculating and evaluating the Euclidean distance between them.
Note: Here a patch of size 80x80 or 100x100 (NxN) pixels is suggested. You may have to adjust the sizes to your image sizes.
I have a problem at hand, in which my image is composed of strange objects which do not necessarily have closed contours. (more like rivers and channels on a plain back ground).
I am also provided with a set of prior images of the same size from different rivers that their general orientation and structure matches my river under study while their position in the image might deviate.
I am looking for an image segmentation method, (theory or practice, i am really looking for clues to start with) which can actually use my set of prior examples in segmenting my river. in my case there could be multiple rivers of the same general orientation present in the image.
I am also very interested in ways of statistically representing these complex structures. for example, if it was not a river image (binary image), and i knew it had a Gaussian structure, then I could use information in the covariance estimated by the examples. but in binary or trinary images, I can not.
Here is an outline for image segmentation
Sample a small region (possible a rectangle) inside the river, the assumption is that they will belong to the foreground and provide a good estimate about its color distribution. You should have an algorithm which can find a small region inside the river with high confidence, probably this algorithm can be trained on the data you have.
Since you know little about the background, it would to be ideal to chose pixels lying on the image frame as background pixels.
The idea is to use these pre-selected foreground and background pixels as seeds in a graph cut algorithm for segmentation. Selecting seeds is the most important part of a graph cut algorithm for segmentation, once you have good seeds, the segmentation would be more or less correct. There is plenty of literature/code available online on how to do segmentation using graph cuts.
I am trying to do image detection in C++. I have two images:
Image Scene: 1024x786
Person: 36x49
And I need to identify this particular person from the scene. I've tried to use Correlation but the image is too noisy and therefore doesn't give correct/accurate results.
I've been thinking/researching methods that would best solve this task and these seem the most logical:
Gaussian filters
Convolution
FFT
Basically, I would like to move the noise around the images, so then I can use Correlation to find the person more effectively.
I understand that an FFT will be hard to implement and/or may be slow especially with the size of the image I'm using.
Could anyone offer any pointers to solving this? What would the best technique/algorithm be?
In Andrew Ng's Machine Learning class we did this exact problem using neural networks and a sliding window:
train a neural network to recognize the particular feature you're looking for using data with tags for what the images are, using a 36x49 window (or whatever other size you want).
for recognizing a new image, take the 36x49 rectangle and slide it across the image, testing at each location. When you move to a new location, move the window right by a certain number of pixels, call it the jump_size (say 5 pixels). When you reach the right-hand side of the image, go back to 0 and increment the y of your window by jump_size.
Neural networks are good for this because the noise isn't a huge issue: you don't need to remove it. It's also good because it can recognize images similar to ones it has seen before, but are slightly different (the face is at a different angle, the lighting is slightly different, etc.).
Of course, the downside is that you need the training data to do it. If you don't have a set of pre-tagged images then you might be out of luck - although if you have a Facebook account you can probably write a script to pull all of yours and your friends' tagged photos and use that.
A FFT does only make sense when you already have sort the image with kd-tree or a hierarchical tree. I would suggest to map the image 2d rgb values to a 1d curve and reducing some complexity before a frequency analysis.
I do not have an exact algorithm to propose because I have found that target detection method depend greatly on the specific situation. Instead, I have some tips and advices. Here is what I would suggest: find a specific characteristic of your target and design your code around it.
For example, if you have access to the color image, use the fact that Wally doesn't have much green and blue color. Subtract the average of blue and green from the red image, you'll have a much better starting point. (Apply the same operation on both the image and the target.) This will not work, though, if the noise is color-dependent (ie: is different on each color).
You could then use correlation on the transformed images with better result. The negative point of correlation is that it will work only with an exact cut-out of the first image... Not very useful if you need to find the target to help you find the target! Instead, I suppose that an averaged version of your target (a combination of many Wally pictures) would work up to some point.
My final advice: In my personal experience of working with noisy images, spectral analysis is usually a good thing because the noise tend to contaminate only one particular scale (which would hopefully be a different scale than Wally's!) In addition, correlation is mathematically equivalent to comparing the spectral characteristic of your image and the target.
I am trying to write a software for document management. First I Input the blank invoice. then feeds the other invoices with data. Using SIFT detectors i get what type of a invoice it is.
Then I want to remove the interect of the two images. Basically this will keep only the information and remove the common data on the invoice. I want to know is there a proper way to remove areas from the image
there is a concept in imagery called the region of interest. It creates a pointer to a sub-region in the original image, this could help you to read directly at x,y coordinates in the image.
Another possibility would be to make a substraction of the original image. But depending on the quality of the filled form picture, this might lead to other problems.
I was implying the ROI in a sense that you could create a ROI for every place where the form has input data and process only those specific regions
I found a function you might help you, cvAbsDiff, which can subtract an image from another
Here is a link that might help you understanding how to use it
http://blog.damiles.com/?p=67