openFrameworks & Kinect: Improve human contour - c++

I am playing around with Kinect and I'm trying to get an as accurate as possible human contour.
So far I tried changing threshold values, blurring, etc... but I was wondering if there was an existing effective method of doing it.
I believe there are two main problems in order to get a good shape. One is that if keeps flickering all the time. The other, how is not a very good shape (hair not reflecting IR lighting, etc...).
Any reccomendations on how to proceed? At the moment I'm trying to average values of the most recent frames to stabilize for the first problem and I might try to convert the shape to a polygon and simplify it (however that's done).

One approach would be to improve the depth mask with a forground/background mask calculated from the RGB image and a classic background removal algorithm.
You could also work with the morphology functions of opencv to remove unwanted small parts of the mask or close holes.1

Related

Noise reduction OpenCV skindetection sample

I'm working on a facedetection app using segmentation of skinpixels with a predefined skinmodel in YCrCb space.
I'm loosely basing my algorithms of this report; http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=767122&tag=1, by Douglas Chai and King N. Ngan.
I first segment out all skin pixels (see left).
After that I perform some calculations to reduce the noise (see steps below). It results in a filtered bitmap 1/8th the size of original. Ideally this would be noise free both in face and background area, but it isn't. I have already tried to reduce it by using my density map and then checking neighbouring 3x3 area pixels and eroding/dilating pixel values depending on their neighbours. Then I resize this bitmap and apply the result as a mask on the original image (see right image for result, ignore my censorship).
My question is, what methods do you recommend for getting rid of the noise?
Also, are there any good methods to get smoother contours ? Ideally I would not like to use "find biggest contour and flood fill", preferably something more sophisticated.
There also seem to be bit of a displacement of the resized mask (it cuts of a bit too much on my right side of the face, and shows a bit too much on the left side). What can be causing this?
the easiest way to do smoother contours is to interpolate your data to a higher resolution using an interpolation scheme. You may look in openCV, that will result in smoother transitions between points.
I hope it will help a little bit. Good luck.

Dealing with noisy movement positions

I have positions of an object moving around an image. I think I'm detecting it the best I can where most of the time I'm detecting the center of the object. However, I'm still getting the odd detection of around the center caused by the frame rate not being fast enough, and the frame containing two positions of the object.
As I can't control the frame rate, how can I minimise the effects of the noise in the jittery positions.
As this is a common issue in computer vision, are there any filters in opencv to deal with noisy position data?
I asked for the comment by Berak to be made into an answer, but my request was "declined". So, yes, the answer that I found most useful was to use the Kalman filter which is implemented in opencv.

Pixel level image registration / alignment?

I'm trying to remove foreground from two images, here's a sample pair of images:
As you can see, the Budweiser bottle is removed from the scene before the second shot is taken.
These photos were captured from a pinhole camera (iPhone), and, the tricky part is I'm hand-holding the camera, so it cannot be guaranteed that the images are perfectly aligned pixel by pixel, so a simple minus-threshold method will not work.
Then, I've decided to perform image registration using findHomography and warpPerspective from OpenCV, here's the result image:
This image is warped with the matrix I've got from findHomography, it kind of improved the alignment quality, but still not that aligned so I can use a simple way to remove the foreground.
So, finally, I decided to implement a "fuzzy-minus" algorithm: for every pixel in image1, I'll look through a 7x7 neighbour in image2 (a 7 by 7 kernel?), using the minimal difference in grayscale as the result of minus, and threshold the result into binary image, here's what I've got:
And the result is still not good. Notice the white wholes in the bottle, this is produced due to similar grayscale value of foreground and background. So I'm not sure what to do now.
I can think of two ways to solve the problem, the first is to get a better aligned pair of images, and simply minus the pairs; the second is to use a more robust way to extract the foreground.
Can anyone give me some advice on how to deal with this kind of problem? I believe there should be some state-of-art algorithms or processing pipelines, but after googling around, I get nothing.
I'm using OpenCV with C++, it would be fantastic if you can tell me how to do it with these tools in hand.
Big big thanks in advance!
The problem is not in your algorithm. You are having problem because the two scenes were not taken from exactly the same angle, as shown in the animation below. This slight difference highlight the edges in the subtraction.
You need a static camera in order to apply this approach.
I suggest using mathematical morphology on the mask that you got to get rid of the artifacts.
Try applying both opening and closing to get rid of the black and the white small regions.
Mathematical Morphology
Mathematical Morphology in opencv
The difference between the two picture is pretty huge, so you will need to use a large structure element, but I don't think you will be able to get rid of the shadow.
For the two large strips in the background, you may try to use a horizontally shaped structure element as well.
Edit
Is it possible to produce a grayscale image instead of a binary image? if yes, you may try to experiment with the hat method for the shadow, but I am not sure about this point.
This is what I got using two different structure elements for closing THEN opening
Mat mask = imread("mask.jpg",CV_LOAD_IMAGE_GRAYSCALE);
morphologyEx(mask,mask,MORPH_CLOSE,getStructuringElement(CV_SHAPE_ELLIPSE,Size(50,10)));
morphologyEx(mask,mask,MORPH_OPEN,getStructuringElement(CV_SHAPE_ELLIPSE,Size(10,50)));
imshow("open",mask);
imwrite("maskopenclose.jpg",mask);
I would suggest optical flow for alignment and OpenCV's background subtraction algorithm:
http://docs.opencv.org/trunk/doc/tutorials/video/background_subtraction/background_subtraction.html
I suggest that instead of using findHomography try using some of openCV's stereo correspondence functions: http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
there is a sample code here: https://github.com/Itseez/opencv/blob/master/samples/cpp/stereo_calib.cpp

Target Detection - Algorithm suggestions

I am trying to do image detection in C++. I have two images:
Image Scene: 1024x786
Person: 36x49
And I need to identify this particular person from the scene. I've tried to use Correlation but the image is too noisy and therefore doesn't give correct/accurate results.
I've been thinking/researching methods that would best solve this task and these seem the most logical:
Gaussian filters
Convolution
FFT
Basically, I would like to move the noise around the images, so then I can use Correlation to find the person more effectively.
I understand that an FFT will be hard to implement and/or may be slow especially with the size of the image I'm using.
Could anyone offer any pointers to solving this? What would the best technique/algorithm be?
In Andrew Ng's Machine Learning class we did this exact problem using neural networks and a sliding window:
train a neural network to recognize the particular feature you're looking for using data with tags for what the images are, using a 36x49 window (or whatever other size you want).
for recognizing a new image, take the 36x49 rectangle and slide it across the image, testing at each location. When you move to a new location, move the window right by a certain number of pixels, call it the jump_size (say 5 pixels). When you reach the right-hand side of the image, go back to 0 and increment the y of your window by jump_size.
Neural networks are good for this because the noise isn't a huge issue: you don't need to remove it. It's also good because it can recognize images similar to ones it has seen before, but are slightly different (the face is at a different angle, the lighting is slightly different, etc.).
Of course, the downside is that you need the training data to do it. If you don't have a set of pre-tagged images then you might be out of luck - although if you have a Facebook account you can probably write a script to pull all of yours and your friends' tagged photos and use that.
A FFT does only make sense when you already have sort the image with kd-tree or a hierarchical tree. I would suggest to map the image 2d rgb values to a 1d curve and reducing some complexity before a frequency analysis.
I do not have an exact algorithm to propose because I have found that target detection method depend greatly on the specific situation. Instead, I have some tips and advices. Here is what I would suggest: find a specific characteristic of your target and design your code around it.
For example, if you have access to the color image, use the fact that Wally doesn't have much green and blue color. Subtract the average of blue and green from the red image, you'll have a much better starting point. (Apply the same operation on both the image and the target.) This will not work, though, if the noise is color-dependent (ie: is different on each color).
You could then use correlation on the transformed images with better result. The negative point of correlation is that it will work only with an exact cut-out of the first image... Not very useful if you need to find the target to help you find the target! Instead, I suppose that an averaged version of your target (a combination of many Wally pictures) would work up to some point.
My final advice: In my personal experience of working with noisy images, spectral analysis is usually a good thing because the noise tend to contaminate only one particular scale (which would hopefully be a different scale than Wally's!) In addition, correlation is mathematically equivalent to comparing the spectral characteristic of your image and the target.

Tips for background subtraction in the face of noise

Background subtraction is an important primitive in computer vision. I'm looking at different methods that have been developed, and I've begun thinking about how to perform background subtraction in the face of random, salt and pepper noise.
In a system such as the Microsoft Kinect, the infrared camera will give off random noise pretty consistently. If you are trying to background subtract from the depth view, how can you avoid an issue with this random noise while reliably subtracting the background?
as you already said, noise and other unsteady parts of your background might give problems in segmentation, I mean lighting changes or other moving stuff in the background.
But if you're working on some indoor-project this shouldn't be too big of an issue, except of course the noise thing.
Besides substracing the background from an image to segment the objects in it you could also try to subtract two (or in some methods even three) following frames from each other. If the camera is steady this should leave the parts that have changed, so basically the objects that have moved. So this is an easy method for detecting moving objects.
But in most operations you might use you probably will have that noise you described. Easiest way to get rid of it is by using Median Filter or Morpholocigal Operators (Opening) on the segmented binary image. This should effectively remove small parts and leave the nice big blobs of the objects.
Hope that helps...
typically you do connected components (cc) in disparity space and then kill any cc that have a small size. The threshold for size and for connectedness (e.g. what is the disparity difference between two adjacent pixel to still consider them connected) are your two parameters to play with (ivlad#lab126.com).
As #evident mentioned, median filter is your ticket. That's the standard operator for getting rid of salt-and-pepper noise while being edge-preserving.
That said, I disagree with his suggestion that this occur on the segmented binary image. Median filtering is very low-level and should be applied on the raw data before any subsequent processing.