I am asked to write a code which can detect ANY moving object using OpenCV. It will be used in out-door system. But, any moving object? According to my knowledge it can detect pre-defined objects like human, car, ball etc. I am not sure about this Any object, because the trees also moving to wind which has no use to the system and if the system is going to detect moving branches of the trees, moving waves of the water and useless stuff like that it will be a big issue.
Is there any way in OpenCV where we can detect all useful moving objects like humans, cars, vans, animals etc and not useless things like moving branches of the trees, moving waves of the water etc.
Some people told me "pattern Recognizing" will help but I have no time to move with it, I got only 4 months and I am not a Math person. Anyway, if this can be easily used with video OpenCV, then I can think about it.
No, you don't have to reinvent the wheel. There are plenty of examples over net to detect moving objects
you can google about motion.
The simple method to accomplish this is just detecting back ground, having the reference of previous frame and subtracting the new frame. the subtracted image will contain the information about the regions of motion or any thing that changed on the screen(frame)
About detecting the objects, You can rectify the regions according to the motion and you can specify the threshold value of motion and the can grab the objects by binarization
Look into background/foreground segmentation methods. They are used to segment out (detect) moving objects by using statistical methods to estimate background. OpenCV version 2.4.5 offers a number of different implementations for background subtraction, namely
BackgroundSubtractorMOG
BackgroundSubtractorMOG2
FGDStatModel
MOG_GPU
MOG2_GPU VIBE_GPU <- listed under non-free functionality
GMG_GPU
There is a demo source code bgfg_segm.cpp located in {opencv_folder}\samples\gpu. The demo shows usage and displays output for the segmentation classes (on GPU). There is also a similar demo for CPU, just look for it. GPU-based classes offer real-time performance.
The approach will output objects as contours or as masks. After detection you can remove some false positives and noise by applying morphological operations, such as dilation and erosion. In addition, you can only keep contours that has an area large enough (so that leaves, which are small, might be filtered out).
Related
I want to ask about what kind of problems there be if i use this method to extract foreground.
The condition before using this method is that it runs on fixed camera so there're not going to be any movement on camera position.
And what im trying to do is below.
read one frame from camera and set this frame as background image. This is done periodically.
periodically subtract frames that are read afterward to background image above. Then there will be only moving things colored differently from other area
that are same to background image.
then isolate moving object by using grayscale, binarization, thresholding.
iterate above 4 processes.
If i do this, would probability of successfully detect moving object be high? If not... could you tell me why?
If you consider illumination change(gradually or suddenly) in scene, you will see that your method does not work.
There are more robust solutions for these problems. One of these(maybe the best) is Gaussian Mixture Model applied for background subtraction.
You can use BackgroundSubtractorMOG2 (implementation of GMM) in OpenCV library.
Your scheme is quite adequate to cases where the camera is fix and the background is stationary. Indoor and man-controlled scenes are more appropriate to this approach than outdoor and natural scenes .I've contributed to a detection system that worked basically on the same principals you suggested. But of course the details are crucial. A few remarks based on my experience
Your initialization step can cause very slow convergence to a normal state. You set the background to the first frames, and then pieces of background coming behind moving objects will be considered as objects. A better approach is to take the median of N first frames.
Simple subtraction may not be enough in cases of changing light condition etc. You may find a similarity criterion better for your application.
simple thresholding on the difference image may not be enough. A simple approach is to dilate the foreground for the sake of not updating the background on pixels that where accidentally identified as such.
Your step 4 is unclear, I assumed that you mean that you update the foreground only on those places that are identified as background on the last frame. Note that with such a simple approach, pixels that are actually background may be stuck forever with a "foreground" labeling, as you don't update the background under them. There are many possible solutions to this.
There are many ways to solve this problem, and it will really depend on the input images as to which method will be the most appropriate. It may be worth doing some reading on the topic
The method you are suggesting may work, but it's a slightly non-standard approach to this problem. My main concern would be that subtracting several images from the background could lead to saturation and then you may lose some detail of the motion. It may be better to take difference between consecutive images, and then apply the binarization / thresholding to these images.
Another (more complex) approach which has worked for me in the past is to take subregions of the image and then cross-correlate with the new image. The peak in this correlation can be used to identify the direction of movement - it's a useful approach if more then one thing is moving.
It may also be possible to use a combination of the two approaches above for example.
Subtract second image from the first background.
Threshold etc to find the ROI where movement is occurring
Use a pattern matching approach to track subsequent movement focussed on the ROI detected above.
The best approach will depend on you application but there are lots of papers on this topic
I've built a classifier that has high accuracy in classifying certain objects of interest, given images that focus only on those objects.
However, when this same classifier is applied to an object detector that scans a larger image using selective search or sliding windows, the detector's performance is dismally low.
I don't understand why. Is this normal in computer vision? And what's the solution?
You would have to be more specific. What does "dismally low" mean? Are you seeing a lot of false positives? A lot of false negatives? Also, what kind of objects are you trying to detect?
One thing to keep in mind is when you do sliding-window object detection, you typically get multiple detections around the actual location of the object of interest. The usual way to deal with this issue is to merge the overlapping detections. For example, the Computer Vision System Toolbox for MATLAB provides selectStrongestBbox function for this.
Another possible issue could be the variation in aspect ratio of the objects of interest. Typically, when you train a cascade classifier all the positive images are resized to be the same "training size" (e.g. 32x32 pixels). Then, when you do sliding windows, the aspect ratio of your window is the same as that of the training size. This results in good detection accuracy only if the aspect ratio of your objects of interest stays roughly the same. For example, this works well for faces. This will not work as well for a class of objects with varying aspect ratios, such as a category that includes both regular cars and stretched limousines.
And, of course, don't forget that haar cascade detectors do not tolerate in-plane or out-of-plane rotation very well.
Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!
I am working on Opencv application that need to count any object which motion can be detected by the camera. The camera is still and I did the object tracking with opencv and cvblob by referring many tutorials.
I found some similar question:
Object counting
And i found this was similar
http://labs.globant.com/uncategorized/peopletracker-people-and-object-tracking/
I am new to OpenCV and I've gone through the opencv documentation but I couldn't find anything which is related to count moving objects in video.
Can any one please give me a idea how to do this specially the counting part. As I read in article above, they count people who crosses the virtual line.Is there a special algorithm to detect the object crossing the line?
Your question might be to broad when you are asking about general technique that count moving objects in video sequences. I would give some hints that might help you:
As usual in computer vision, there does not exist one specific way to solve your problem. Try do do some research about people detection, background extraction and motion detection to have a wider point of view
State more clearly user requirements of your system, namely how many people can occur in the image frame? The things get complicated when you would like to track more than one person. Furthermore, can other moving objects appear on an image (e.g. animals)? If no and only one person are supposed to be track, the answer to your problem is pretty easy, see an explanation below. If yes, you will have to do more research.
Usually you cannot find in OpenCV API direct solution to computer vision problem, namely there is not such method that solve directly problem of people counting. But for sure there exists some paper, reference (usually some scientific stuff) which can be adopted to solve your problem. So there is no method that "count people crossing vertical line". You have to solve problem my merging some algorithms together.
In the link you have provided one can see that they use some algorithm for background extraction which determined what is a non-moving background and moving foreground (in our case, a walking person). We are not sure if they use something more (or sophisticated), but information about background extraction is sufficient to start with problem solving.
And here is my contribution to the solution. Assuming only one person walks in front of the stable placed camera and no other objects motion can be observed, do as following:
Save frame when no person is moving in front of the camera, which will be used later as a reference for background
In a loop, apply some background detector to extract parts in the image representing motion (MOG or even you can just calculate difference between background and current frame, followed by binary threshold and blob counting, see my answer here)
From the assumption, only one blob should be detected (if not, use some metrics the chooses "the best one". for example choose the one with maximum area). That blob is the person we would like to track. Knowing its position on an image, compare to the position of the "vertical line". Objects moving from left to right are exiting and from right to left entering.
Remember that this solution will only work in case of the assumption we stated.
I am using camshift algorithm of opencv for object tracking. The input is being taken from a webcam and the object is tracked between successive frames. How can I make the tracking stronger? If I move the object at a rapid rate, tracking fails. Also when the object is not in the frame there are false detections. How do I improve this ?
Object tracking is an active research area in computer vision. There are numerous algorithms to do it, and none of them work 100% of the time.
If you need to track in real time, then you need something simple and fast. I am assuming that you have a way of segmenting a moving object from the background. Then you can compute a representation of the object, such as a color histogram, and compare it to the the object you find in the next frame. You should also check that the object has not moved too far between frames. If you want to try more advanced motion tracking, then you should look up Kalman Filter.
Determining that an object is not in the frame is also a big problem. First, what kinds of objects are you trying to track? People? Cars? Dogs? You can build an object classifier, which would tell you whether or not the moving object in the frame is your object of interest, as opposed to noise or some other kind of object. A classifier can be something very simple, such as a constraint on size, or it can be very complicated. In the latter case you need to learn about features that can be computed, classification algorithms, such as support vector machines, and you would need to collect training images to train it.
In short, a reliable tracker is not an easy thing to build.
Suppose you find the object in the first two frames. From that information, you can extrapolate where you'd expect the object in the third frame. Instead of using a generic find-the-object algorithm, you can use a slower, more sophisticated (and thus hopefully more reliable) algorithm by limiting it to check in the vicinity that the extrapolation predicts. It may not be exactly where you expect (perhaps the velocity vector is changing), but you should certainly be able to reduce the area that's checked.
This should help reduce the number of times some other part of the frame is misidentified as the object (because you're looking at a smaller portion of the frame and because you're using a better feature detector).
Update the extrapolations based on what you find and iterate for the next frame.
If the object goes out of frame, you fall back to your generic feature detector, as you do with the first two frames, and try again to get a "lock" when the object returns to the view.
Also, if you can, throw as much light into the physical scene as possible. If the scene is dim, the webcam will use a longer exposure time, leading to more motion blur on moving objects. Motion blur can make it very hard for the feature detectors (though it can give you information about direction and speed).
I've found that if you expand the border of the search window in camShift it makes the algorithm a bit more adaptive to fast moving objects, although it can introduced some irregularities. try just making the border of your window 10% bigger and see what happens.