Stationary/Moving Pool Ball Detection - c++

I am working on a project where I have to detect and track pool balls while players are playing. I tried various Background Subtraction Techniques and was able to track the moving balls but was unable to detect the stationary balls at the same time as it got adapted with the Background Model update. I can't stop the Background model update as it is necessary for changing lighting conditions. Any suggestions to achieve these goals?
SamplePoolImage
The work done so far:
Used opencv built-in BS algorithms like MOG2, KNN, GSOC
Written some algorithms like GMM, and Kalman Filter.
I was able to detect the moving balls/objects but the stationary balls/ objects get adapated as background after getting detected for some time.

Related

How to preprocess video for better OpenCV tracking?

I am trying to improve my webcam based OpenCV mouse controller for disabled people (MFC C++ application): https://preability.com/face-controlled-mouse/
The cursor moves, when a person moves her/his head, clicks when smile, etc.
Controller finds face area then use goodFeaturesToTrack, cornerSubPix and calcOpticalFlowPyrLK.
In general, I managed to stabilize cursor if lighting is good.
What I use now:
Evaluating and filtering the direction of the each corner point movement.
Spreading the corner points all over the face area for cv::goodFeaturesToTrack() helped a little too.
EWMA (or Kalman) filter for the cursor position.
I’ve included equalizeHist() for the face ROI. The detector performed much better in low light conditions.
In addition, I tried morphology operations of OpenCV without improvement.
However, the corner points still dance in uneven lighting.
I can see that similar old program eViacam has preprocessing module for webcam Creavision (old too) and the corner points are more stable.
Please advise what can be done with the input Mat? Or how can the video be processed with reasonable CPU loading?
Now I can answer my own question. Christoph Rackwitz gave me some good advice:
don’t track the whole head. track each feature. and don’t use those trackers, they’re too complex. use MOSSE. it’s dumb but very precise, as long as the object (which should be a tiny feature on the face) doesn’t change much.
MOSSE approaches optical flow. methods to calculate optical flow work like MOSSE, except they use simpler math and smaller regions, hence the result is noisier. MOSSE uses a larger area (for a single track/point of course) and more sophisticated math, for a more accurate result.
When MOSSE algorithm tracks “corner points”, cursor moves much more smoothly. There was a slight issue with discrete movement as the object rectangles moved the same number of pixels at the same time. Cursor moved in leaps. So, I had to use filter on each tracked point. Anyway, as you can see in the video, the CPU load did not increase comparing to Lukas-Kanade optical flow algorithm + filtration only cursor position. In good light, the difference is also very noticeable.
https://www.youtube.com/watch?v=WKwuas0GVkA
Lucas-Kanade optical flow:
goodFeaturesToTrack,
cornerSubPix,
calcOpticalFlowPyrLK,
cursor EWMA filter
MOSSE object tracking:
goodFeaturesToTrack,
cornerSubPix,
TrackerMOSSE,
all points EWMA filtration
And of course I had to remember to include the tracking453.lib to Linker when add legacy Tracker. I spent half a day googling the “unresolved external symbol LNK2001 error”. For some reason including a tracker from the core library (cv::Tracker) does not result such compilation error, so it is confusing.

Using OpenCV to touch and select object

I'm using the OpenCV framework in iOS Xcode objc, is there a way that I could process the image feed from the video camera and allow the user to touch an object on the screen then we use some functionality in OpenCV to highlight it.
Here is graphically what I mean. The first image shows an example of what the user might see in the video feed:
Then when they tap on the screen on the ipad i want to use OpenCV feature/object detecting to process the area they've clicked to highlight the area. Would look something like this if they clicked the ipad:
Any ideas on how this would be achievable in objc OpenCV?
I can see quite easily how we could achieve this using trained templates of the iPad to match it using OpenCV algorithms but I want to try and get it dynamic so users can just touch anything in the screen and we'll take it from there?
Explanation: why should we use the segmentation approach
According to my understanding, the task which you are trying to solve is segmentation of objects, regardless to their identity.
The object recognition approach is one way to do it. But it has two major downsides:
It requires you to train an object classifier, and to collect a dataset which contains a respectable amount of examples of objects which you would like to recognize. If you choose to take a classifier which is already trained - it won'y necessarily work on any type of object which you would like to detect.
Most of the object recognition solutions find a bounding box around the recognized object, but they don't perform a complete segmentation of it. The segmentation part requires extra effort.
Therefore, I believe that the best way for your case is to use an image segmentation algorithms. More precisly, we'll be using the GrabCut segmentation algorithm.
The GrabCut algorithm
This is an iterative algorithm with two stages:
initial stage - the user specify a bounding box around the object.
given this bounding box the algorithm estimates the color distribution of foreground (the object) and the background by using GMM, followed by a graph cut optimization for finding the optimal boundaries between the foreground and the background.
In the next stage, the user may correct the segmentation if needed, by supplying scribbles of the foreground and the background. The algorithm fixes the model accordingly and perform a new segmentation based on the updated information.
Using this approach has pros and cons.
The pros:
The segmentation algorithm is easy to implement with openCV.
It enables the user to fix segmentation errors if needed.
It doesn't relies on a collecting a dataset and training a classifier.
The main con is that you will need an extra source of information from the user beside of a single tap on the screen. This information will be a bounding box around the object, and in some cases - additional scribbles will be required to correct the segmentation.
Code
Luckily, there is an implementation of this algorithm in OpenCV. The user Itseez create a simple and easy to use sample for using OpenCV's GrabCut algorithm, which can be found here: https://github.com/Itseez/opencv/blob/master/samples/cpp/grabcut.cpp
Application usage:
The application receives a path to an image file as an command line argument input. It renders the image onto the screen and the user is required to supply an initial bounding rect.
The user can press 'n' in order to perform the segmentation for the current iteration or press 'r' to revert his operation.
After choosing a rect, the segmentation is calculated. If the user wants to correct it, he may choose to add foreground or background scribbles by pressing shift+left and Ctrl+left accordingly.
Examples
Segmenting the iPod:
Segmenting the pen:
You Can do it by Training a Classifier of Ipad images using opencv Haar Classifiers and then detecting Ipad images in a given frame.
Now based on coordinates of the touch check if that area overlapped with detected Ipad image area. If it does Drawbounding box on the detected Object.Means from there on you can proceed towards processing your detected ipad image.
Repeat the above procedure for Number of objects that you want to detect.
The task which you are trying to solve is "Object proposal". It doesn't work very accurate and this results are very new.
This two articles give you a good overview of methods for this:
https://pdollar.wordpress.com/2013/12/10/a-seismic-shift-in-object-detection/
https://pdollar.wordpress.com/2013/12/22/generating-object-proposals/
To have state-of-the-art results, look for latest CVPR papers on Object proposals. Quite often they have code available to test.

Im trying to use this method to detect moving object. Can someone advise me for this?

I want to ask about what kind of problems there be if i use this method to extract foreground.
The condition before using this method is that it runs on fixed camera so there're not going to be any movement on camera position.
And what im trying to do is below.
read one frame from camera and set this frame as background image. This is done periodically.
periodically subtract frames that are read afterward to background image above. Then there will be only moving things colored differently from other area
that are same to background image.
then isolate moving object by using grayscale, binarization, thresholding.
iterate above 4 processes.
If i do this, would probability of successfully detect moving object be high? If not... could you tell me why?
If you consider illumination change(gradually or suddenly) in scene, you will see that your method does not work.
There are more robust solutions for these problems. One of these(maybe the best) is Gaussian Mixture Model applied for background subtraction.
You can use BackgroundSubtractorMOG2 (implementation of GMM) in OpenCV library.
Your scheme is quite adequate to cases where the camera is fix and the background is stationary. Indoor and man-controlled scenes are more appropriate to this approach than outdoor and natural scenes .I've contributed to a detection system that worked basically on the same principals you suggested. But of course the details are crucial. A few remarks based on my experience
Your initialization step can cause very slow convergence to a normal state. You set the background to the first frames, and then pieces of background coming behind moving objects will be considered as objects. A better approach is to take the median of N first frames.
Simple subtraction may not be enough in cases of changing light condition etc. You may find a similarity criterion better for your application.
simple thresholding on the difference image may not be enough. A simple approach is to dilate the foreground for the sake of not updating the background on pixels that where accidentally identified as such.
Your step 4 is unclear, I assumed that you mean that you update the foreground only on those places that are identified as background on the last frame. Note that with such a simple approach, pixels that are actually background may be stuck forever with a "foreground" labeling, as you don't update the background under them. There are many possible solutions to this.
There are many ways to solve this problem, and it will really depend on the input images as to which method will be the most appropriate. It may be worth doing some reading on the topic
The method you are suggesting may work, but it's a slightly non-standard approach to this problem. My main concern would be that subtracting several images from the background could lead to saturation and then you may lose some detail of the motion. It may be better to take difference between consecutive images, and then apply the binarization / thresholding to these images.
Another (more complex) approach which has worked for me in the past is to take subregions of the image and then cross-correlate with the new image. The peak in this correlation can be used to identify the direction of movement - it's a useful approach if more then one thing is moving.
It may also be possible to use a combination of the two approaches above for example.
Subtract second image from the first background.
Threshold etc to find the ROI where movement is occurring
Use a pattern matching approach to track subsequent movement focussed on the ROI detected above.
The best approach will depend on you application but there are lots of papers on this topic

can prime sense sensor detect two people hugging or back-to-back closely?

I try to detect two people hugging or back-to-back closely by prime sense sensor. Currently I can track two people simultaneously and their skeleton data when they stand at a distance. but if they are hugging or back-to-back closely, their skeleton will be merged into one. can anyone tell me what should i do to detect the actions(hug/backToback) between two people?
Platform: windows 7
OpenNI version: 1.5.7
NITE version: 1.5.2
Thanks.
I'm not sure the OpenNI scene segmentation alone will do, since as you observed yourself, having 2 people close to each other will cause the outlines to merge.
You would need to run your own algorithm around this merging issue yourself.
There are probably a few different ways to tackle this.
Here are a few hacky ideas that to mind at the moment:
Idea #1
Use OpenNI's scene segmentation feature and analyze the user pixels even though they are merged. If you analyze the outline. By looking at the distance between the edges(outer most pixels) and the centroid, you should spot extremities (largest distances), which would be hand/feet and the head. If your blob has two heads probably it's two people hugging.
Idea #2
You know that when users get closer they merge into a single user which means new user/lost user event triggers from OpenNI. On top of this you can keep track of each user's CoM(centre of mass, available in OpenNI). If the distance between the two users decreases a lot and this is immediately followed by a new user event (the merged blob), which has a bigger bounding box than each of the two users, then it's likely you're users are hugging or very close to each other.
Idea #3
You can track the upper body skeleton profile only for each user and detect a "hug" gesture. You can start with an initial crude pose detection where the angle of the arms is within certain threshold values. Instead of detecting pose, you could use DTW to detect a gesture. If at least one of the two users triggers this pose/gesture prior to OpenNI detecting 'fused' users then you might detect a hug.
Idea #4
Using some of the ideas above: the fact that the merged blob will be larger than either of the members and will be detect after the number of users decreases, you can use a hug detecting haar cascade on the rgb pixels belonging to the newly detected merged blob to confirm the hug.
Some of these ideas will be easier to implement than others, but it's important to keep out false positive of. Ideally you would have a simple scene (no complex objects in the background) with decent lighting( artificial cold, preferably anything away from the infrared range (sunlight, incandescent light, etc.)) to make your life easier. I've observed that with complex background sometimes even a single user can get merged with a background object.

Is it possible to detect a moving object in OpenCV?

I am asked to write a code which can detect ANY moving object using OpenCV. It will be used in out-door system. But, any moving object? According to my knowledge it can detect pre-defined objects like human, car, ball etc. I am not sure about this Any object, because the trees also moving to wind which has no use to the system and if the system is going to detect moving branches of the trees, moving waves of the water and useless stuff like that it will be a big issue.
Is there any way in OpenCV where we can detect all useful moving objects like humans, cars, vans, animals etc and not useless things like moving branches of the trees, moving waves of the water etc.
Some people told me "pattern Recognizing" will help but I have no time to move with it, I got only 4 months and I am not a Math person. Anyway, if this can be easily used with video OpenCV, then I can think about it.
No, you don't have to reinvent the wheel. There are plenty of examples over net to detect moving objects
you can google about motion.
The simple method to accomplish this is just detecting back ground, having the reference of previous frame and subtracting the new frame. the subtracted image will contain the information about the regions of motion or any thing that changed on the screen(frame)
About detecting the objects, You can rectify the regions according to the motion and you can specify the threshold value of motion and the can grab the objects by binarization
Look into background/foreground segmentation methods. They are used to segment out (detect) moving objects by using statistical methods to estimate background. OpenCV version 2.4.5 offers a number of different implementations for background subtraction, namely
BackgroundSubtractorMOG
BackgroundSubtractorMOG2
FGDStatModel
MOG_GPU
MOG2_GPU VIBE_GPU <- listed under non-free functionality
GMG_GPU
There is a demo source code bgfg_segm.cpp located in {opencv_folder}\samples\gpu. The demo shows usage and displays output for the segmentation classes (on GPU). There is also a similar demo for CPU, just look for it. GPU-based classes offer real-time performance.
The approach will output objects as contours or as masks. After detection you can remove some false positives and noise by applying morphological operations, such as dilation and erosion. In addition, you can only keep contours that has an area large enough (so that leaves, which are small, might be filtered out).