Opencv Object tracking and count objects which passes ROI in video frame - c++

I am working on Opencv application that need to count any object which motion can be detected by the camera. The camera is still and I did the object tracking with opencv and cvblob by referring many tutorials.
I found some similar question:
Object counting
And i found this was similar
http://labs.globant.com/uncategorized/peopletracker-people-and-object-tracking/
I am new to OpenCV and I've gone through the opencv documentation but I couldn't find anything which is related to count moving objects in video.
Can any one please give me a idea how to do this specially the counting part. As I read in article above, they count people who crosses the virtual line.Is there a special algorithm to detect the object crossing the line?

Your question might be to broad when you are asking about general technique that count moving objects in video sequences. I would give some hints that might help you:
As usual in computer vision, there does not exist one specific way to solve your problem. Try do do some research about people detection, background extraction and motion detection to have a wider point of view
State more clearly user requirements of your system, namely how many people can occur in the image frame? The things get complicated when you would like to track more than one person. Furthermore, can other moving objects appear on an image (e.g. animals)? If no and only one person are supposed to be track, the answer to your problem is pretty easy, see an explanation below. If yes, you will have to do more research.
Usually you cannot find in OpenCV API direct solution to computer vision problem, namely there is not such method that solve directly problem of people counting. But for sure there exists some paper, reference (usually some scientific stuff) which can be adopted to solve your problem. So there is no method that "count people crossing vertical line". You have to solve problem my merging some algorithms together.
In the link you have provided one can see that they use some algorithm for background extraction which determined what is a non-moving background and moving foreground (in our case, a walking person). We are not sure if they use something more (or sophisticated), but information about background extraction is sufficient to start with problem solving.
And here is my contribution to the solution. Assuming only one person walks in front of the stable placed camera and no other objects motion can be observed, do as following:
Save frame when no person is moving in front of the camera, which will be used later as a reference for background
In a loop, apply some background detector to extract parts in the image representing motion (MOG or even you can just calculate difference between background and current frame, followed by binary threshold and blob counting, see my answer here)
From the assumption, only one blob should be detected (if not, use some metrics the chooses "the best one". for example choose the one with maximum area). That blob is the person we would like to track. Knowing its position on an image, compare to the position of the "vertical line". Objects moving from left to right are exiting and from right to left entering.
Remember that this solution will only work in case of the assumption we stated.

Related

Combine tracking and detection

I'm currently working on a multiple object tracking problem. I think using Tracking-by-Detection is a good choice. However, I do not know how to combine tracking and detection result so that detection can help improve tracking results.
I'm using Faster-RCNN, tensorflow object detection API as a simple starting point for detection.
For tracking, I use KCF algorithm from opencv.
Detection is unstable because every frame is independent to the model, while tracking is much more stable.
Although tracking is more stable, when the object moves, tracker can not follow the object, which is not accurate.
So I'm thinking of combining these two methods to improve my result as both stable and accurate.
I have a background of computer vision but I'm new to this field (Multiple Object Tracking). Could anyone please give me some advice on how I should deal with this kind of problem ?
Thanks alot! :)
I have tried to use detection to track objects recently. The unstable problem can be resovled by classic filting techology such as Kalman filtering(In that field, the point from signal processing is also "unstable" due to noise.). You can set a small region around the tracked object and try to find same one in that region in next frame. A "matched" relationship is established from that, and then you try to match the object in next frame from next next one... A trace can be built from the process. Any smoothing method can be employed to suppress predicted box noise. A example can be shown in:
The transparent points are detected trace points and the soild one are smoothed points.
The corresponding trace shown in background:
Some tricks are also useful, if detection fail on some random position, you can set a "skip gate", to try find one matching point in later frame(In my experiment, 60 is not bad for 24fps video). You will prefer recall more than accuracy since you can build a pretty long sequence and drop short noise sequence come from false alarm detection.
Reference code:https://github.com/yiyuezhuo/detection-tracking
I think you should try using CSRT tracker from opencv which is much more stable than KCF. For the detection, you could use it after a fixed set of frames to reinitialize the tracker using the detections. This way you can fuse a tracker with the detector.

Bullet to detect collision detection

Actually, I'm currently working on a simple project to detect collision between 2 specific objects in a surgery scene. The problem is that I don't have background on such problems so I'm really newbie to such things and I don't know yet what to do. After a little bit of research, I found Bullet library which can be used as a collision detection tool but not sure yet if it suits my case. I already checked some examples where the developer create the objects of interest manually which led me to think that I should detect first the objects of interest then launch the collision detection process.
In my case, I have 2 types of data:
Video shooting the operating room
Cloud points representing the room in 3D
I need to detect the collision between two objects in the scene. Is there any way to use Bullet to achieve such thing? Is it common to use a video as input for a detection collision problem(I'm wondering since I could find too much resources on it)?
I'm just starting so it might be a fuzzy question so sorry in advance for any inconveniences.
EDITED:
I already checked it but my point was to understand what options can be used before digging into the details. For me, a collision detection problem should have 2 parts: the objects of interest (The 2 or more objects that we're trying to detect their collision) and the scene in which we will be trying to detect the collision of the objects of interest. For the scene, the data I have is presented in 2 types mentioned above. So, I was asking about which type of data should be used as input for bullet collision process. Should it be an image taken from the video or should it be a list of 3D points? Or something else?
I have used Bullet half a year ago. I remember, that you need to register objects to Bullet with a collision shape. In simplistic case of your points, it could probably be small spheres. In case of your video, you need to have a 3d representation. I do not understand a 100% what you mean by detecting a "video" for collisions. However, to use Bullet, you need to have a collision shape associated with the object.
Further, you register a Collision Callback. This is one function called for each collision detected. All callbacks are listed here: http://www.bulletphysics.org/mediawiki-1.5.8/index.php?title=Collision_Callbacks_and_Triggers
As the wiki says - and I implemented it this way - to detect a specific collision, you need to iterate over allr esulting manifolds from Bullet manually. A little bit painful and performance wise strange approach. So you cannot register a specific callback for a specific object with another specific object!
Once the objects are registered, you run the algorithm and then you can check all manifolds in the callback.
To get started with Bullet, I used Bullet Physics Simplest Collision Example with the answers at that time.

Im trying to use this method to detect moving object. Can someone advise me for this?

I want to ask about what kind of problems there be if i use this method to extract foreground.
The condition before using this method is that it runs on fixed camera so there're not going to be any movement on camera position.
And what im trying to do is below.
read one frame from camera and set this frame as background image. This is done periodically.
periodically subtract frames that are read afterward to background image above. Then there will be only moving things colored differently from other area
that are same to background image.
then isolate moving object by using grayscale, binarization, thresholding.
iterate above 4 processes.
If i do this, would probability of successfully detect moving object be high? If not... could you tell me why?
If you consider illumination change(gradually or suddenly) in scene, you will see that your method does not work.
There are more robust solutions for these problems. One of these(maybe the best) is Gaussian Mixture Model applied for background subtraction.
You can use BackgroundSubtractorMOG2 (implementation of GMM) in OpenCV library.
Your scheme is quite adequate to cases where the camera is fix and the background is stationary. Indoor and man-controlled scenes are more appropriate to this approach than outdoor and natural scenes .I've contributed to a detection system that worked basically on the same principals you suggested. But of course the details are crucial. A few remarks based on my experience
Your initialization step can cause very slow convergence to a normal state. You set the background to the first frames, and then pieces of background coming behind moving objects will be considered as objects. A better approach is to take the median of N first frames.
Simple subtraction may not be enough in cases of changing light condition etc. You may find a similarity criterion better for your application.
simple thresholding on the difference image may not be enough. A simple approach is to dilate the foreground for the sake of not updating the background on pixels that where accidentally identified as such.
Your step 4 is unclear, I assumed that you mean that you update the foreground only on those places that are identified as background on the last frame. Note that with such a simple approach, pixels that are actually background may be stuck forever with a "foreground" labeling, as you don't update the background under them. There are many possible solutions to this.
There are many ways to solve this problem, and it will really depend on the input images as to which method will be the most appropriate. It may be worth doing some reading on the topic
The method you are suggesting may work, but it's a slightly non-standard approach to this problem. My main concern would be that subtracting several images from the background could lead to saturation and then you may lose some detail of the motion. It may be better to take difference between consecutive images, and then apply the binarization / thresholding to these images.
Another (more complex) approach which has worked for me in the past is to take subregions of the image and then cross-correlate with the new image. The peak in this correlation can be used to identify the direction of movement - it's a useful approach if more then one thing is moving.
It may also be possible to use a combination of the two approaches above for example.
Subtract second image from the first background.
Threshold etc to find the ROI where movement is occurring
Use a pattern matching approach to track subsequent movement focussed on the ROI detected above.
The best approach will depend on you application but there are lots of papers on this topic

Is it possible to detect a moving object in OpenCV?

I am asked to write a code which can detect ANY moving object using OpenCV. It will be used in out-door system. But, any moving object? According to my knowledge it can detect pre-defined objects like human, car, ball etc. I am not sure about this Any object, because the trees also moving to wind which has no use to the system and if the system is going to detect moving branches of the trees, moving waves of the water and useless stuff like that it will be a big issue.
Is there any way in OpenCV where we can detect all useful moving objects like humans, cars, vans, animals etc and not useless things like moving branches of the trees, moving waves of the water etc.
Some people told me "pattern Recognizing" will help but I have no time to move with it, I got only 4 months and I am not a Math person. Anyway, if this can be easily used with video OpenCV, then I can think about it.
No, you don't have to reinvent the wheel. There are plenty of examples over net to detect moving objects
you can google about motion.
The simple method to accomplish this is just detecting back ground, having the reference of previous frame and subtracting the new frame. the subtracted image will contain the information about the regions of motion or any thing that changed on the screen(frame)
About detecting the objects, You can rectify the regions according to the motion and you can specify the threshold value of motion and the can grab the objects by binarization
Look into background/foreground segmentation methods. They are used to segment out (detect) moving objects by using statistical methods to estimate background. OpenCV version 2.4.5 offers a number of different implementations for background subtraction, namely
BackgroundSubtractorMOG
BackgroundSubtractorMOG2
FGDStatModel
MOG_GPU
MOG2_GPU VIBE_GPU <- listed under non-free functionality
GMG_GPU
There is a demo source code bgfg_segm.cpp located in {opencv_folder}\samples\gpu. The demo shows usage and displays output for the segmentation classes (on GPU). There is also a similar demo for CPU, just look for it. GPU-based classes offer real-time performance.
The approach will output objects as contours or as masks. After detection you can remove some false positives and noise by applying morphological operations, such as dilation and erosion. In addition, you can only keep contours that has an area large enough (so that leaves, which are small, might be filtered out).

Improving camshift algorithm in open cv

I am using camshift algorithm of opencv for object tracking. The input is being taken from a webcam and the object is tracked between successive frames. How can I make the tracking stronger? If I move the object at a rapid rate, tracking fails. Also when the object is not in the frame there are false detections. How do I improve this ?
Object tracking is an active research area in computer vision. There are numerous algorithms to do it, and none of them work 100% of the time.
If you need to track in real time, then you need something simple and fast. I am assuming that you have a way of segmenting a moving object from the background. Then you can compute a representation of the object, such as a color histogram, and compare it to the the object you find in the next frame. You should also check that the object has not moved too far between frames. If you want to try more advanced motion tracking, then you should look up Kalman Filter.
Determining that an object is not in the frame is also a big problem. First, what kinds of objects are you trying to track? People? Cars? Dogs? You can build an object classifier, which would tell you whether or not the moving object in the frame is your object of interest, as opposed to noise or some other kind of object. A classifier can be something very simple, such as a constraint on size, or it can be very complicated. In the latter case you need to learn about features that can be computed, classification algorithms, such as support vector machines, and you would need to collect training images to train it.
In short, a reliable tracker is not an easy thing to build.
Suppose you find the object in the first two frames. From that information, you can extrapolate where you'd expect the object in the third frame. Instead of using a generic find-the-object algorithm, you can use a slower, more sophisticated (and thus hopefully more reliable) algorithm by limiting it to check in the vicinity that the extrapolation predicts. It may not be exactly where you expect (perhaps the velocity vector is changing), but you should certainly be able to reduce the area that's checked.
This should help reduce the number of times some other part of the frame is misidentified as the object (because you're looking at a smaller portion of the frame and because you're using a better feature detector).
Update the extrapolations based on what you find and iterate for the next frame.
If the object goes out of frame, you fall back to your generic feature detector, as you do with the first two frames, and try again to get a "lock" when the object returns to the view.
Also, if you can, throw as much light into the physical scene as possible. If the scene is dim, the webcam will use a longer exposure time, leading to more motion blur on moving objects. Motion blur can make it very hard for the feature detectors (though it can give you information about direction and speed).
I've found that if you expand the border of the search window in camShift it makes the algorithm a bit more adaptive to fast moving objects, although it can introduced some irregularities. try just making the border of your window 10% bigger and see what happens.