I am new to opencv and i am trying to track some moving objects(e.g. cars) in an image. I have computed the optical flow and have used it to implement kmeans and try something like background substraction , i mean seperate moving objects from stationary. Then i have also used the intensity of the video as information . The following screenshots are from the result of the flow and the k means segmentation respectively :
The results are not good but also not bad. How could i proceed from now on ? I am thinking of trying SURF feature extraction and SURF detector . Any ideas are welcome .
It seems you are using dense optical flow. I would advice trying some feature detection (surf, fast, whatever) followed by sparse optical flow tracking(from my experience it is better than feature matching for this task). Then, once you have the feature correspondences over some frames, you can use fundamental matrix, trifocal tensor, plane+parallax or some other method to detect moving objects. You can later cluster moving objects into different motion groups that represent different objects.
Also it seems that your camera is fixed. In this case you can drop the movement detection step, and consider only tracks with enough displacement, and then do the clustering into motion groups.
Related
I am using opencv c++ and am a new user. I am interested in object detection problems . So far I have studies and implemented the use of sparse optical flow( Lucas Kanade method) in a video from a stationary camera.After trying k means and Background substraction , I have decided to move to a more difficult problem , that is the moving camera.
I have so far studied some documentation and found out that I could use cv::findHomography in order to find the inliers or outliers during the sequence of frames in my video and then understand from the returned values what movement is caused due to camera motion and what due to object motion. In addition , I could use SURF features to track some objects and then decide which of them are good points .
However , I was wondering how I could implement this theory. For example, should I use the first frame as ground truth and detect some features using SURF and then for the rest of the video use findHomography for each frame ? Any ideas/help is welcome !
Detecting moving objects from moving camera is a quite challenging task, and requires solid understanding of multiple view geometry, besides there is less info on this topic available (than, for example, about structure from motion), so be warned!
Anyway, homography matrix will not be a good choice for detection of moving objects (unless you are 100% sure that your background can be represented by a flat surface accurately enough). You should probably use a fundamental matrix or trifocal tensor.
Fundamental matrix is computed from point correspondences between 2 frames. It associates points on one image with lines on other image (so called epipolar lines), and this way it is independent from scene structure. After you have obtained F matrix using some robust estimation method, like RANSAC or LMEDS (RANSAC seems to be better for this kind of task), you can calculate the reprojection error for each point. Objects that move independently from scene would not be accurately described by F matrix and will have a bigger error. So, outliers of F matrix calculated from image matches over two frames can be considered moving objects. One note though - objects that move along epipolar lines would not be detected by this approach, since their parallax can be also described by some depth level.
Trifocal tensor does not have the depth/motion ambiguity with objects that move along epipolar lines, but it is harder to estimate and it is not included into OpenCV. It can be calculated from correspondences over 3 frames, and its usage can be conceptually described as triangulating a point from 2 views and then calculating reprojection error on a third view.
As for the matching - I still think that LK tracking will be better than SURF matching if you work with video sequences, since in that case you don't need to consider very distant points as matches, and tracking usually is faster then detection+matching.
For analysis we have a sequence of images or a movie. My aim is to create a semi automatic face and eye detection for these sequences. The sequences consist of about 4000 images with a frontal capture of a person slightly moving. I want to process these images semi automatic or manual to get the two/three ROIs of the face and eyes.
I tried OpenCV's cascade classifiers but for my sequences they do not turn out to be robust (with manual controll we need to get a rate of 100%). The cascade classifiers do not give positions, eg. when the person is looking slightly to the side.
Is there any semi automatic approach out there for imagej, matlab or opencv/c++ to select/correct the rois manually if false detected or to select templates for tracking ?
If you are processing a movie, it is reasonable to assume that the motion between frames is low. The following is a possible approach.
Initialize the first frame manually (or get user input to confirm/edit the positions detected by cascade classifiers)
For the next frame, check if the features detected are too far off the original positions. You can also check if the positions of different parts are moving in an illogical manner.
Stop and get the user to correct the points, if processing in step 2 suggest errors.
Note: With OpenCV cascades, face detection is generally accurate. But eye detection is not so accurate and you might not detect both eyes in some frames. Some projects use AAMs (Active Appearance Models) to robustly track a face, and this might work for you.
For face detection, try this list of 50+ API's :
http://blog.mashape.com/post/53379410412/list-of-50-face-detection-recognition-apis
for eyes detection you can try flandmark detector:
http://cmp.felk.cvut.cz/~uricamic/flandmark/
Or STASM:
http://www.milbo.users.sonic.net/stasm/
Bascially I want to detect an object and than track it in a video (frame-by-frame).
I can detect it on the first frame with for example ORB or SIFT. But for the next frames (or say next XX frames) I would like to avoid to calulcate again all the keypoints (ORB or SIFT) to detect it again.
Considering I want to track it in a video real time, what could I do ?
A common option is using a patchtracker. That means that you just search for keypoints in an area of, for example, 8 pixels around the previous frame keypoint. You can perform cv::matchTemplate() of an area surrounding the keypoint, instead of using SIFT.
Performing a pyramidal search helps to improve frame-rate. You first search at a lower scale, if you cannot find the keypoint you double the scale.
If patchtracker fails, because the image moves too fast, you just have to reinitialize the system by applying SIFT again. I would use FAST instead of SIFT. You can use SIFT for the marker, and then FAST for detecting keypoints real-time, generating SIFT descriptors.
Detecting and tracking object in a video is a very large topic and the way to go highly depends on your application. There is no magic bullet!
If you achieve the detection part, you can try tracking by meanshift on color (maybe HSV color space) likelihood if the object you need to track is colored .. , or try template matching, or .. You need to be more specific on your needs.
you can use OpticalFlow for simple tracking, here are the steps to do it...
Find the corners of a moving object using harris corner detector or SIFT feature detector.
Give those corners and previous image(in which you found the corners of object to be tracked) and the next image to opticalflow function it will compute the corners of the same object in the next images..
Here are the links:
Link1
Link2
code
NOTE: if you are addressing problems like occlusion handling , multiple people tracking then OpticalFlow alone can't solve problems. For that kalman filter or particle filters are needed to be employed...
You can achieve almost perfect and real time tracking using TLD or CLM. Once you detect the object of interest use that bounding box to initiate predator tracking.
You can find about CMT here
https://www.gnebehay.com/cmt/
and TLD here
https://www.gnebehay.com/tld/
I am trying to do image detection in C++. I have two images:
Image Scene: 1024x786
Person: 36x49
And I need to identify this particular person from the scene. I've tried to use Correlation but the image is too noisy and therefore doesn't give correct/accurate results.
I've been thinking/researching methods that would best solve this task and these seem the most logical:
Gaussian filters
Convolution
FFT
Basically, I would like to move the noise around the images, so then I can use Correlation to find the person more effectively.
I understand that an FFT will be hard to implement and/or may be slow especially with the size of the image I'm using.
Could anyone offer any pointers to solving this? What would the best technique/algorithm be?
In Andrew Ng's Machine Learning class we did this exact problem using neural networks and a sliding window:
train a neural network to recognize the particular feature you're looking for using data with tags for what the images are, using a 36x49 window (or whatever other size you want).
for recognizing a new image, take the 36x49 rectangle and slide it across the image, testing at each location. When you move to a new location, move the window right by a certain number of pixels, call it the jump_size (say 5 pixels). When you reach the right-hand side of the image, go back to 0 and increment the y of your window by jump_size.
Neural networks are good for this because the noise isn't a huge issue: you don't need to remove it. It's also good because it can recognize images similar to ones it has seen before, but are slightly different (the face is at a different angle, the lighting is slightly different, etc.).
Of course, the downside is that you need the training data to do it. If you don't have a set of pre-tagged images then you might be out of luck - although if you have a Facebook account you can probably write a script to pull all of yours and your friends' tagged photos and use that.
A FFT does only make sense when you already have sort the image with kd-tree or a hierarchical tree. I would suggest to map the image 2d rgb values to a 1d curve and reducing some complexity before a frequency analysis.
I do not have an exact algorithm to propose because I have found that target detection method depend greatly on the specific situation. Instead, I have some tips and advices. Here is what I would suggest: find a specific characteristic of your target and design your code around it.
For example, if you have access to the color image, use the fact that Wally doesn't have much green and blue color. Subtract the average of blue and green from the red image, you'll have a much better starting point. (Apply the same operation on both the image and the target.) This will not work, though, if the noise is color-dependent (ie: is different on each color).
You could then use correlation on the transformed images with better result. The negative point of correlation is that it will work only with an exact cut-out of the first image... Not very useful if you need to find the target to help you find the target! Instead, I suppose that an averaged version of your target (a combination of many Wally pictures) would work up to some point.
My final advice: In my personal experience of working with noisy images, spectral analysis is usually a good thing because the noise tend to contaminate only one particular scale (which would hopefully be a different scale than Wally's!) In addition, correlation is mathematically equivalent to comparing the spectral characteristic of your image and the target.
I am rather new to C++ and openframeworks. I am beginning to play with manipulating objects using the Lucas Kanade technique. I am having some success with pushing objects around but unfortunately I cannot figure out how to go about rotating them properly or even detect when rotational movement is occurring for that matter.
Does anyone have any pointers or tips they would like to share?
Many thanks,
N
Optical flow calculations won't on their own help you detect things like "rotational movement". Basically, all the optical flow calc is doing is looking at changes pixel-by-pixel, while what you mean by rotation is a larger aggregate of pixel change. An algorithm would need to detect something like "all the pixels on the edge of the object are flowing in a (counter-)clockwise direction". Very difficult to do, and I don't think there's anything in OpenFrameworks or OpenCV that will help you.
Are you trying to detect rotation of an object in the image, or rotation-like movements in the image that will affect a virtual object? If it's the former, I think there are OpenCV techniques for identifying objects and then tracking them, including things like rotation. I think the things to research are like "opencv object tracking" and "opencv object motion analysis".
To compute the 2x3 affine transformation matrix of your motion could be a solution. The affine transformation matrix contains tranlational and rotational movements as far as scaling. If you are using OpenCV than cv::getAffineTransform is what you are looking for where you can directly input the tracked feature points.