How to preprocess video for better OpenCV tracking? - c++

I am trying to improve my webcam based OpenCV mouse controller for disabled people (MFC C++ application): https://preability.com/face-controlled-mouse/
The cursor moves, when a person moves her/his head, clicks when smile, etc.
Controller finds face area then use goodFeaturesToTrack, cornerSubPix and calcOpticalFlowPyrLK.
In general, I managed to stabilize cursor if lighting is good.
What I use now:
Evaluating and filtering the direction of the each corner point movement.
Spreading the corner points all over the face area for cv::goodFeaturesToTrack() helped a little too.
EWMA (or Kalman) filter for the cursor position.
I’ve included equalizeHist() for the face ROI. The detector performed much better in low light conditions.
In addition, I tried morphology operations of OpenCV without improvement.
However, the corner points still dance in uneven lighting.
I can see that similar old program eViacam has preprocessing module for webcam Creavision (old too) and the corner points are more stable.
Please advise what can be done with the input Mat? Or how can the video be processed with reasonable CPU loading?

Now I can answer my own question. Christoph Rackwitz gave me some good advice:
don’t track the whole head. track each feature. and don’t use those trackers, they’re too complex. use MOSSE. it’s dumb but very precise, as long as the object (which should be a tiny feature on the face) doesn’t change much.
MOSSE approaches optical flow. methods to calculate optical flow work like MOSSE, except they use simpler math and smaller regions, hence the result is noisier. MOSSE uses a larger area (for a single track/point of course) and more sophisticated math, for a more accurate result.
When MOSSE algorithm tracks “corner points”, cursor moves much more smoothly. There was a slight issue with discrete movement as the object rectangles moved the same number of pixels at the same time. Cursor moved in leaps. So, I had to use filter on each tracked point. Anyway, as you can see in the video, the CPU load did not increase comparing to Lukas-Kanade optical flow algorithm + filtration only cursor position. In good light, the difference is also very noticeable.
https://www.youtube.com/watch?v=WKwuas0GVkA
Lucas-Kanade optical flow:
goodFeaturesToTrack,
cornerSubPix,
calcOpticalFlowPyrLK,
cursor EWMA filter
MOSSE object tracking:
goodFeaturesToTrack,
cornerSubPix,
TrackerMOSSE,
all points EWMA filtration
And of course I had to remember to include the tracking453.lib to Linker when add legacy Tracker. I spent half a day googling the “unresolved external symbol LNK2001 error”. For some reason including a tracker from the core library (cv::Tracker) does not result such compilation error, so it is confusing.

Related

Im trying to use this method to detect moving object. Can someone advise me for this?

I want to ask about what kind of problems there be if i use this method to extract foreground.
The condition before using this method is that it runs on fixed camera so there're not going to be any movement on camera position.
And what im trying to do is below.
read one frame from camera and set this frame as background image. This is done periodically.
periodically subtract frames that are read afterward to background image above. Then there will be only moving things colored differently from other area
that are same to background image.
then isolate moving object by using grayscale, binarization, thresholding.
iterate above 4 processes.
If i do this, would probability of successfully detect moving object be high? If not... could you tell me why?
If you consider illumination change(gradually or suddenly) in scene, you will see that your method does not work.
There are more robust solutions for these problems. One of these(maybe the best) is Gaussian Mixture Model applied for background subtraction.
You can use BackgroundSubtractorMOG2 (implementation of GMM) in OpenCV library.
Your scheme is quite adequate to cases where the camera is fix and the background is stationary. Indoor and man-controlled scenes are more appropriate to this approach than outdoor and natural scenes .I've contributed to a detection system that worked basically on the same principals you suggested. But of course the details are crucial. A few remarks based on my experience
Your initialization step can cause very slow convergence to a normal state. You set the background to the first frames, and then pieces of background coming behind moving objects will be considered as objects. A better approach is to take the median of N first frames.
Simple subtraction may not be enough in cases of changing light condition etc. You may find a similarity criterion better for your application.
simple thresholding on the difference image may not be enough. A simple approach is to dilate the foreground for the sake of not updating the background on pixels that where accidentally identified as such.
Your step 4 is unclear, I assumed that you mean that you update the foreground only on those places that are identified as background on the last frame. Note that with such a simple approach, pixels that are actually background may be stuck forever with a "foreground" labeling, as you don't update the background under them. There are many possible solutions to this.
There are many ways to solve this problem, and it will really depend on the input images as to which method will be the most appropriate. It may be worth doing some reading on the topic
The method you are suggesting may work, but it's a slightly non-standard approach to this problem. My main concern would be that subtracting several images from the background could lead to saturation and then you may lose some detail of the motion. It may be better to take difference between consecutive images, and then apply the binarization / thresholding to these images.
Another (more complex) approach which has worked for me in the past is to take subregions of the image and then cross-correlate with the new image. The peak in this correlation can be used to identify the direction of movement - it's a useful approach if more then one thing is moving.
It may also be possible to use a combination of the two approaches above for example.
Subtract second image from the first background.
Threshold etc to find the ROI where movement is occurring
Use a pattern matching approach to track subsequent movement focussed on the ROI detected above.
The best approach will depend on you application but there are lots of papers on this topic

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

Opencv Object tracking and count objects which passes ROI in video frame

I am working on Opencv application that need to count any object which motion can be detected by the camera. The camera is still and I did the object tracking with opencv and cvblob by referring many tutorials.
I found some similar question:
Object counting
And i found this was similar
http://labs.globant.com/uncategorized/peopletracker-people-and-object-tracking/
I am new to OpenCV and I've gone through the opencv documentation but I couldn't find anything which is related to count moving objects in video.
Can any one please give me a idea how to do this specially the counting part. As I read in article above, they count people who crosses the virtual line.Is there a special algorithm to detect the object crossing the line?
Your question might be to broad when you are asking about general technique that count moving objects in video sequences. I would give some hints that might help you:
As usual in computer vision, there does not exist one specific way to solve your problem. Try do do some research about people detection, background extraction and motion detection to have a wider point of view
State more clearly user requirements of your system, namely how many people can occur in the image frame? The things get complicated when you would like to track more than one person. Furthermore, can other moving objects appear on an image (e.g. animals)? If no and only one person are supposed to be track, the answer to your problem is pretty easy, see an explanation below. If yes, you will have to do more research.
Usually you cannot find in OpenCV API direct solution to computer vision problem, namely there is not such method that solve directly problem of people counting. But for sure there exists some paper, reference (usually some scientific stuff) which can be adopted to solve your problem. So there is no method that "count people crossing vertical line". You have to solve problem my merging some algorithms together.
In the link you have provided one can see that they use some algorithm for background extraction which determined what is a non-moving background and moving foreground (in our case, a walking person). We are not sure if they use something more (or sophisticated), but information about background extraction is sufficient to start with problem solving.
And here is my contribution to the solution. Assuming only one person walks in front of the stable placed camera and no other objects motion can be observed, do as following:
Save frame when no person is moving in front of the camera, which will be used later as a reference for background
In a loop, apply some background detector to extract parts in the image representing motion (MOG or even you can just calculate difference between background and current frame, followed by binary threshold and blob counting, see my answer here)
From the assumption, only one blob should be detected (if not, use some metrics the chooses "the best one". for example choose the one with maximum area). That blob is the person we would like to track. Knowing its position on an image, compare to the position of the "vertical line". Objects moving from left to right are exiting and from right to left entering.
Remember that this solution will only work in case of the assumption we stated.

OpenCv C++ record video when motion detected from cam

I am attempting to use a straightforward motion detection code to detect movement from a camera. I'm using the OpenCV library and I have some code that takes the difference between two frames to detect a change.
I have the difference frame working just fine and it's black when no motion is present.
The problem is how now i can detect that blackness to stop recording or no darkness to begin recording frames.
Thank u all.
A very simple thing to do is to sum the entire diff image into an integer. If that sum is above a threshold you have movement. Then you can use a second threshold and when the sum is below that limit you stopped having movement.
You can also make the threshold only change the program state if some elapsed time has occurred since the last threshold. i.e. after movement is detected you don't check for lack of movement for 10 seconds.
Take a look at the code of the free software motion for getting inspiring ideas.
There are quite a few things to keep in mind for reliable motion detection. For example tolerate the slow changes from the sun's rotation. Or accepting momentary image glitches which can come especially from the cheapest cameras.
From a small experience I have had, I think that better than just adding up all differences, it works better to count the number of pixels whose variation exceeds a certain threshold.
Motion also offers masks, which let you for example ignore movements in a nearby road.
What about storing a black frame internally and using your same comparison code? If your new frame is different (above a threshold) from the all-black frame, start recording.
This seems the most straightforward since you already have the image-processing algorithms down.

Improving camshift algorithm in open cv

I am using camshift algorithm of opencv for object tracking. The input is being taken from a webcam and the object is tracked between successive frames. How can I make the tracking stronger? If I move the object at a rapid rate, tracking fails. Also when the object is not in the frame there are false detections. How do I improve this ?
Object tracking is an active research area in computer vision. There are numerous algorithms to do it, and none of them work 100% of the time.
If you need to track in real time, then you need something simple and fast. I am assuming that you have a way of segmenting a moving object from the background. Then you can compute a representation of the object, such as a color histogram, and compare it to the the object you find in the next frame. You should also check that the object has not moved too far between frames. If you want to try more advanced motion tracking, then you should look up Kalman Filter.
Determining that an object is not in the frame is also a big problem. First, what kinds of objects are you trying to track? People? Cars? Dogs? You can build an object classifier, which would tell you whether or not the moving object in the frame is your object of interest, as opposed to noise or some other kind of object. A classifier can be something very simple, such as a constraint on size, or it can be very complicated. In the latter case you need to learn about features that can be computed, classification algorithms, such as support vector machines, and you would need to collect training images to train it.
In short, a reliable tracker is not an easy thing to build.
Suppose you find the object in the first two frames. From that information, you can extrapolate where you'd expect the object in the third frame. Instead of using a generic find-the-object algorithm, you can use a slower, more sophisticated (and thus hopefully more reliable) algorithm by limiting it to check in the vicinity that the extrapolation predicts. It may not be exactly where you expect (perhaps the velocity vector is changing), but you should certainly be able to reduce the area that's checked.
This should help reduce the number of times some other part of the frame is misidentified as the object (because you're looking at a smaller portion of the frame and because you're using a better feature detector).
Update the extrapolations based on what you find and iterate for the next frame.
If the object goes out of frame, you fall back to your generic feature detector, as you do with the first two frames, and try again to get a "lock" when the object returns to the view.
Also, if you can, throw as much light into the physical scene as possible. If the scene is dim, the webcam will use a longer exposure time, leading to more motion blur on moving objects. Motion blur can make it very hard for the feature detectors (though it can give you information about direction and speed).
I've found that if you expand the border of the search window in camShift it makes the algorithm a bit more adaptive to fast moving objects, although it can introduced some irregularities. try just making the border of your window 10% bigger and see what happens.