I have a question about preparing the dataset of positive samples for a cascaded classifier that will be used for object detection.
As positive samples, I have been given 3 sets of images:
a set of colored images in full size (about 1200x600) with a white background and with the object displayed at a different angles in each image
another set with the same images in grayscale and with a white background, scaled down to the detection window size (60x60)
another set with the same images in grayscale and with a black background, scaled down to the detection window size (60x60)
My question is that in set 1, should the background really be white? Should it not instead be an environment that the object is likely to be found in in the testing dataset? Or should I have a fourth set where the images are in their natural environments? How does environment figure into the training samples?
The background should be a typical environment of the object, because when you actually try to detect the objects, the search window will always include some of the background. The best thing is to crop the objects from natural images.
If you use the trainCascadeObjectDetector function in MATLAB, you do not even have to crop the samples. It lets you specify multiple bounding boxes per image. You also do not have to worry about the size of the samples, because trainCascadeObjectDetector will resize them for you.
There is a very handy GUI app on MATLAB file exchange for labeling objects of interest in images designed for use with trainCascadeObjectDetector.
Edit: couple of other points. Your negative images should also contain backgrounds typically associated with your objects of interest. Here is a tutorial that explains how to prepare training data and how to set some of the parameters.
Related
I am trying to implement a motion detection in OpenCV C++. I tried various methods like MOG, Optical flow which work fine but is there a way we can eliminate constant movements in the scene like a constant fan motion etc ? I have opencv accumuateWeighted() in mind but not sure if it works. Is there any better way we can do it ?
I have not got full robust solution and also i don't have any experience with video processing but i would put my idea whatever till now i have got in to this problem:
First consider a few pairs of consecutive image frames from the video and convert them to gray scale for more robust comparison.
Raster scan the image pairs and find the difference of image pairs by comparing corresponding pairs.
The resultant image will give the pixel location where there is a change in image to image in a pair, cluster these pixels locations and make a bounding box over them. So that this bounding box region will mark an object which is translating/rotation.
Now as we have applied the above image difference operation over several pairs. We will have rotating/translating bounding box in each image pair difference.
Now check in each resultant image difference with pixels having bounding box over them.
Compare bounding box central location in a difference image with other difference images. If bounding box with a very slight variation in its central location exists across all difference images then object contained in that bounding box will be having rotational motion like Fan,leaves and remaining bounding boxes will represent the actual translating objects in the video.
A little introduction on what I'm doing ...
For academic purposes I am creating an application in c++ using opencv for the detection of static objects in a scene.
The application is based on a combined approach of background subtraction and tracking, and the detection of events related to the abandonment of the objects works fine.
But at the moment I have a problem that I can't solve; I have to implement a finite state machine for detect the event of object removal, both before and after the entry of the object in the background.
To do this I was ordered by my superiors to use the edges of objects.
And now the problem.
After detecting a vehicle illegally parked along a road, I need to compare the edges of various images (the background captured at the time of the alarm, the current background, the current frame) to understand what the vehicle do (picks up the movement, remains parked or picks up the movement after being in the background).
I run these comparisons on the region of the scene in which there is the vehicle (vehicles typically have different size), I pull the edges using canny algorithm by obtaining a binarized CV_8UC1 cv::Mat.
At this point I have to compare them.
I tried to detect the contours with findContours and compare them with matchShapes, but it does not seem the right way, I'd compare each contour of the first image with every contour of the second, in addition typically the two images to campare have different number of contour (for example original background and current background, because the edges of the current background increased with the entry of the vehicle in the background).
I also tried to create a new image in which each pixel corresponds to the absolute difference of the other two, then I counted the white pixels of the difference image (wPx), and I used this number for comparison in this way: I set two thresholds (thr1 and thr2), and counted the pixels of the bounding rect of the vehicle (perim), if wPxthr2*perim images are different.
(I set percentages thresholds and I moltipy them with the perimeter of the bounding box to adapt the thresholds to the vehicle dimensions.)
This solution, however, seems to be very little robust.
Do you have something simple to suggest me?
Thank you very much in any case, more than once you StackOverflow users have helped me!
PS: THIS is an example of the images that I have to compare
The first is the background without the vehicle stationary, contains the edges of the street;
the second is the original background, the one captured when the stationary vehicle is detected;
the third is the current background (which in this case is equal to the original being the same frame, but then change);
the fourth is the current frame of the video;
You may want to take a look at this paper: A Novel SIFT-Like-Based Approach
for FIR-VS Images Registration. Aguilera et al. propose an Edge Oriented Histogram descriptor (EOH-SIFT).
This paper intends to register multispectral images, visible and infrared image, to each other. Because of the different characteristics of the images, the authors first extract edges/contours in both images, which results in images similiar to yours.
So, you can describe your image patches using this descriptor, illustrated in the following figure (taken from the above paper):
Subdivide your image patch into 4x4 zones
For each of the 16 subregions compose a histogram of contour's orientation (5 bins)
Put the histograms together into one descriptor vector of size 16x5=80 bins
Normalize the feature vector
So, every image you want to compare (in your case 4) is described by its 80-dimensional feature vector. You can compare them to each other by calculating and evaluating the Euclidean distance between them.
Note: Here a patch of size 80x80 or 100x100 (NxN) pixels is suggested. You may have to adjust the sizes to your image sizes.
I've working on this some time now, and can't find a decent solution for this.
I use OpenCV for image processing and my workflow is something like this:
Took a picture of a tv.
Split image in to R, G, B planes - I'm starting to test using H, S, V too and seems a bit promising.
For each plane, threshold image for a range values in 0 to 255
Reduce noise, detect edges with canny, find the contours and approximate it.
Select contours that contains the center of the image (I can assume that the center of the image is inside the tv screen)
Use convexHull and HougLines to filter and refine invalid contours.
Select contours with certain area (area between 10%-90% of the image).
Keep only contours that have only 4 points.
But this is too slow (loop on each channel (RGB), then loop for the threshold, etc...) and is not good enought as it not detects many tv's.
My base code is the squares.cpp example of the OpenCV framework.
The main problems of TV Screen detection, are:
Images that are half dark and half bright or have many dark/bright items on screen.
Elements on the screen that have the same color of the tv frame.
Blurry tv edges (in some cases).
I also have searched many SO questions/answers on Rectangle detection, but all are about detecting a white page on a dark background or a fixed color object on a contrast background.
My final goal is to implement this on Android/iOS for near-real time tv screen detection. My code takes up to 4 seconds on a Galaxy Nexus.
Hope anyone could help . Thanks in advance!
Update 1: Just using canny and houghlines, does not work, because there can be many many lines, and selecting the correct ones can be very difficult. I think that some sort of "cleaning" on the image should be done first.
Update 2: This question is one of the most close to the problem, but for the tv screen, it didn't work.
Hopefully these points provide some insight:
1)
If you can properly segment the image via foreground and background, then you can easily set a bounding box around the foreground. Graph cuts are very powerful methods of segmenting images. It appears that OpenCV provides easy to use implementations for it. So, for example, you provide some brush strokes which cover "foreground" and "background" pixels, and your image is converted into a digraph which is sliced optimally to split the two. Here is a fun example:
http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_grabcut/py_grabcut.html
This is a quick something I put together to illustrate its effectiveness:
2)
If you decide to continue down the edge detection route, then consider using Mathematical Morphology to "clean up" the lines you detect before trying to fit a bounding box or contour around the object.
http://en.wikipedia.org/wiki/Mathematical_morphology
3)
You could train across a dataset containing TVs and use the viola jones algorithm for object detection. Traditionally it is used for face detection but you can adapt it for TVs given enough data. For example you could script downloading images of living rooms with TVs as your positive class and living rooms without TVs as your negative class.
http://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework
http://docs.opencv.org/trunk/doc/py_tutorials/py_objdetect/py_face_detection/py_face_detection.html
4)
You could perform image registration using cross correlation, like this nice MATLAB example demonstrates:
http://www.mathworks.com/help/images/examples/registering-an-image-using-normalized-cross-correlation.html
As for your template TV image which would be slid across the search image, you could obtain a bunch of pictures of TVs and create "Eigenscreens" similar to how Eigenfaces are used for facial recognition and generate an average TV image:
http://jeremykun.com/2011/07/27/eigenfaces/
5)
It appears OpenCV has plenty of fun tools for describing shape and structure features, which appears to be mainly what you're interested in. Worth a look if you haven't seen this already:
http://docs.opencv.org/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html
Best of luck.
I'm trying to use a PNG with an alpha channel to 'mask' the current frame from a video stream.
My PNG has black pixels in the areas that I don't want processed and alpha in others - currently it's saved a 4 colours image with 4 channels, but it might as well be a binary image.
I'm doing background subtraction and contour finding on the image, so I imagine if I copy the black pixels from my 'mask' image into the current then there would be no contours found in the black areas. Is this a good approach? If so, how can I copy the black/non transparent pixels from one cv::Mat on top of the other?
What you're describing sounds to me like the usage of an image mask. It's odd that you'd do it in the alpha channel, when so many methods available in the OpenCV libraries support masking. Rather than use the alpha channel, why not create a separate binary image with non-zero values everywhere you'd like to find contours?
Depending on which algorithms you use, you are correct in your assumption that you would not find contours in the black pixeled areas. Unfortunately, I don't know of any efficient ways of copying pixels from one image to another, selectively, without getting into the nitty-gritty of the Mat structure, and iterating from byte to byte/pixel to pixel. Using the mask idea presented above with your pre-processing functions, and then sending the resulting binary image into findContours or the like, would allow you to both take advantage of the already well-written and optimized code of the OpenCV library, and keep more of your hair on your head, where it belongs ;).
I am new to OpenCV. I would like to know if we can compare two images (one of the images made by photoshop i.e source image and the otherone will be taken from the camera) and find if they are same or not.
I tried to compare the images using template matching. It does not work. Can you tell me what are the other procedures which we can use for this kind of comparison?
Comparison of images can be done in different ways depending on which purpose you have in mind:
if you just want to compare whether two images are approximately equal (with a few
luminance differences), but with the same perspective and camera view, you can simply
compute a pixel-to-pixel squared difference, per color band. If the sum of squares over
the two images is smaller than a threshold the images match, otherwise not.
If one image is a black-white variant of the other, conversion of the color images is
needed (see e.g. http://www.johndcook.com/blog/2009/08/24/algorithms-convert-color-grayscale). Afterwarts simply perform the step above.
If one image is a subimage of the other, you need to perform registration of the two
images. This means determining the scale, possible rotation and XY-translation that is
necessary to lay the subimage on the larger image (for methods to register images, see:
Pluim, J.P.W., Maintz, J.B.A., Viergever, M.A. , Mutual-information-based registration of
medical images: a survey, IEEE Transactions on Medical Imaging, 2003, Volume 22, Issue 8,
pp. 986 – 1004)
If you have perspective differences, you need an algorithm for deskewing one image to
match the other as well as possible. For ways of doing deskewing look for example in
http://javaanpr.sourceforge.net/anpr.pdf from page 15 and onwards.
Good luck!
You should try SIFT. You apply SIFT to your marker (image saved in memory) and you get some descriptors (points robust to be recognized). Then you can use FAST algorithm with the camera frames in order to find the coprrespondent keypoints of the marker in the camera image.
You have many threads about this topic:
How to get a rectangle around the target object using the features extracted by SIFT in OpenCV
How to search the image for an object with SIFT and OpenCV?
OpenCV - Object matching using SURF descriptors and BruteForceMatcher
Good luck