I have positions of an object moving around an image. I think I'm detecting it the best I can where most of the time I'm detecting the center of the object. However, I'm still getting the odd detection of around the center caused by the frame rate not being fast enough, and the frame containing two positions of the object.
As I can't control the frame rate, how can I minimise the effects of the noise in the jittery positions.
As this is a common issue in computer vision, are there any filters in opencv to deal with noisy position data?
I asked for the comment by Berak to be made into an answer, but my request was "declined". So, yes, the answer that I found most useful was to use the Kalman filter which is implemented in opencv.
Related
I am at the start of developing a software using OpenCV in Microsoft Visual 2010 Express. Now what I need to know before i get into coding is the procedures i have to follow.
Overview:
I want to develop software that detects simple boxing moves such as (Left punch, right punch) and outputs the results.
Now where am struggling is what approach should i take how should i tackle this development i.e.
Capture Video Footage and be able to extract lets say every 5th frame for processing.
Do i have to extract and store this frame perhaps have a REFERENCE image to subtract the capture frame from it.
Once i capture a frame what would be the best way to process it:
* Threshold it, then
* Detect the edges, then
* Smooth the edges using some filter, then
* Draw some BOUNDING boxes....?
What is your view on this guys or am i missing something or are there better simpler ways...? Any suggestions...?
Any answer will be much appreciated
Ps...its not my homework :)
I'm not sure if analyzing only every 5th frame will be enough, because usually punches are so fast that they could be overlooked.
I assume what you actually want to find is fast forward (towards camera) movements of fists.
In case of OpenCV I would first start off with such movements of faces, since some examples are already provided on how to do that in software package.
To detect and track faces you can use CvHaarClassifierCascade, but since this won't be fast enough for runtime detection, continue tracking such found face with Lukas-Kanade. Just pick some good-to-track points inside previously found face, remember their distance from arbitrary face middle, and at each frame update it. See this guy http://www.youtube.com/watch?v=zNqCNMefyV8 - example of just some random points tracked with Lukas-Kanade. Note that unlike faces, fists may not be so easy to track since their surface is rather uniform, better check Lukas-Kanade demo in OpenCV.
Of course with each frame actual face will drift away, once in a while re-run CvHaarClassifierCascade and interpolate to it your currently held face position.
You should be able to do above for fists also, but that will require training classifier with pictures of fists (classifier trained with faces is already provided in OpenCV).
Now having fists/face tracked you may try observing what happens to the points - when someone punches they move rapidly in some direction, while on the fist that remains still they don't move to much. And so, when you calculate average movement of single points in recent frames, the higher the value, the bigger chance that there was a punch. Alternatively, if somehow you've managed to track them accurately, if distance between each of them increases, that means object is closer to camera - and so a likely punch.
Note that without at least knowing change of a size of the fist in picture, it might be hard to distinguish if a movement of hand was forward or backward, or if the user was faking it by moving fists left or right. You may have to come up with some specialized algorithm (maybe with trial and error) to detect that, like say, increase a number of screen color pixels in location that previously fist was found.
What you are looking for is the research field of action recognition e.g. www.nada.kth.se/cvap/actions/ or an possible solution is e.g the STIP ( Space-time interest points) method www.di.ens.fr/~laptev/actions/ . But finally this is a tough job if you have to deal with occlusion or different point of views.
I'm using OpenCV to process pictures taken with a mobile phone. The pictures contain text, and they have small amounts of motion blur, which I need to remove.
What would be the most viable algorithm to use? I have tested so far Lucy-Richardson and Weiner deconvolution, but they did not yield satisfactory results.
Agree with #TheJuice, your problem lies in the PSF estimation. Usually to be able to do this from a single frame, several assumptions need to be made about the factors leading to the blur (motion of object, type of motion of the sensor, etc.).
You can find some pointers, especially on the monodimensional case, here. They use a filtering method that leaves mostly correlation from the blur, discarding spatial correlation of original image, and use this to deduce motion direction and thence the PSF. For small blurs you might be able to consider the motion as constant; otherwise you will have to use a more complex accelerated motion model.
Unfortunately, mobile phone blur is often a compound of CCD integration and non-linear motion (translation perpendicular to line of sight, yaw from wrist motion, and rotation around the wrist), so Yitzhaky and Kopeika's method will probably only yield acceptable results in a minority of cases. I know there are methods to deal with that ("depth awareness" and other) but I have never had occasion of dealing with them.
You can preview the results using photo recovery software such as Focus Magic; while they do not employ YK estimator (motion description is left to you), the remaining workflow is necessarily very similar. If your pictures are amenable to Focus Magic recovery, then probably YK method will work. If they are not (or not enough, or not enough of them to be worthwhile), then there's no point even trying to implement it.
Motion blur is a difficult problem to overcome. The best results are gained when
The speed of the camera relative to the scene is known
You have many pictures of the blurred object which you can correlate.
You do have one major advantage in that you are looking at text (which normally constitutes high contrast features). If you only apply deconvolution to high contrast (I know that the theory is often to exclude high contrast) areas of your image you should get results which may enable you to better recognise characters. Also a combination of sharpening/blurring filters pre/post processing may help.
I remember being impressed with this paper previously. Perhaps an adaption on their implementation would be worth a go.
I think the estimation of your point-spread function is likely to be more important than the algorithm used. It depends on the kind of motion blur you're trying to remove, linear motion is likely to be the easiest but is unlikely to be the kind you're trying to remove: i imagine it's non-linear caused by hand movement during the exposure.
You cannot eliminate motion blur. The information is lost forever. What you are dealing with is a CCD that is recording multiple real objects to a single pixel, smearing them together. In other words if the pixel reads 56, you cannot magically determine that the actual reading should have been 37 at time 1, and 62 at time 2, and 43 at time 3.
Another way to look at this: imagine you have 5 pictures. You then use photoshop to blend the pictures together, averaging the value of each pixel. Can you now somehow from the blended picture tell what the original 5 pictures were? No, you cannot, because you do not have the information to do that.
I am playing around with Kinect and I'm trying to get an as accurate as possible human contour.
So far I tried changing threshold values, blurring, etc... but I was wondering if there was an existing effective method of doing it.
I believe there are two main problems in order to get a good shape. One is that if keeps flickering all the time. The other, how is not a very good shape (hair not reflecting IR lighting, etc...).
Any reccomendations on how to proceed? At the moment I'm trying to average values of the most recent frames to stabilize for the first problem and I might try to convert the shape to a polygon and simplify it (however that's done).
One approach would be to improve the depth mask with a forground/background mask calculated from the RGB image and a classic background removal algorithm.
You could also work with the morphology functions of opencv to remove unwanted small parts of the mask or close holes.1
Hey, I am coding up a simple chess playing robot's vision system, I am trying to improve on some previous research to allow camera and a standard chess set be used and both be allowed to move during the game. So far I can locate the board in an image acquired via web-cam, and I want to detect moves by taking difference of successive images to determine what has changed then use previous information about the board occupancy to detect moves.
My problem is that I can't seem to reliably detect changes at the moment, my current pipeline goes like this:
Subtract two images -> Histogram equalize the difference image -> erode and dilate diff image to remove minor changes -> make a binary copy and do distance transform -> Get the largest blob(corresponding to the highest value after DT and flood fill that blob) -> repeat again until DT returns a value small enough to ignore change.
I am coding all this in OpenCV and C++. but my flood fill seem to always either not fill the blobs, hence most cases I just get one change detected. I have tried also using cv::inpaint but that didn't help either. So my question is; am I just using the wrong approach or somehow turing can make the change detection more reliable. In case of the former, could people suggest alternative routes, preferable codable in C++/Python and/or OpenCV in a reasonable time?
thanks
The problem of getting a fix on the board and detecting movement of pieces can be solved independently, assuming one does not move the board while also moving pieces around..
Some thoughts on how I would approach it:
Detecting the orientation of the board
You have to be able to handle the board being rotated in place, as well as moved around as long as some angle is maintained that lets you see the pieces. It would help if there were something on the board that you could easily identify (e.g. a marker on each corner) so that if you lose orientation (e.g. someone moves the board away from the camera completely) you could easily find it again.
In order to keep track of the board you need to model the position of the camera relative to the board in 3D space. This is the same problem as determining the location of the camera being moved around a fixed board. A problem of Egomotion. Once you solve that you can move on to the next stage, which is detecting movement and tracking objects.
Detecting movement of pieces
This probably the simpler part of the problem. There are lots of algorithms out there for object detection in video. I would only add that you can use "key" frames. What I mean by that is to identify those frames in which you see only the board before and after a single move. e.g. you don't see the hand moving it obscuring the pieces, etc. Once you have the before/after frame you can figure out what moved and where it is positioned relative to the board.
You can probably get away with not being able to recognize the shape of each piece if you assume continuity (i.e. that you've tracked all movements since the initial arrangement of the board, which is well known).
Background subtraction is an important primitive in computer vision. I'm looking at different methods that have been developed, and I've begun thinking about how to perform background subtraction in the face of random, salt and pepper noise.
In a system such as the Microsoft Kinect, the infrared camera will give off random noise pretty consistently. If you are trying to background subtract from the depth view, how can you avoid an issue with this random noise while reliably subtracting the background?
as you already said, noise and other unsteady parts of your background might give problems in segmentation, I mean lighting changes or other moving stuff in the background.
But if you're working on some indoor-project this shouldn't be too big of an issue, except of course the noise thing.
Besides substracing the background from an image to segment the objects in it you could also try to subtract two (or in some methods even three) following frames from each other. If the camera is steady this should leave the parts that have changed, so basically the objects that have moved. So this is an easy method for detecting moving objects.
But in most operations you might use you probably will have that noise you described. Easiest way to get rid of it is by using Median Filter or Morpholocigal Operators (Opening) on the segmented binary image. This should effectively remove small parts and leave the nice big blobs of the objects.
Hope that helps...
typically you do connected components (cc) in disparity space and then kill any cc that have a small size. The threshold for size and for connectedness (e.g. what is the disparity difference between two adjacent pixel to still consider them connected) are your two parameters to play with (ivlad#lab126.com).
As #evident mentioned, median filter is your ticket. That's the standard operator for getting rid of salt-and-pepper noise while being edge-preserving.
That said, I disagree with his suggestion that this occur on the segmented binary image. Median filtering is very low-level and should be applied on the raw data before any subsequent processing.