Which object recognition algorithm should I use? - c++

I am pretty new to CV, so forgive my stupid questions...
What I want to do:
I want to recognize a RC plane in live video (for now its only a recorded video).
What I have done so far:
Differences between frames
Convert it to grey scale
GaussianBlur
Threshold
findContours
Here are some example frames:
But there are also frames with noise, so there are more objects in the frame.
I thought I could do something like this:
Use some object recognition algorithm for every contour that has been found. And compute only the feature vector for each of these bounding rectangles.
Is it possible to compute SURF/SIFT/... only for a specific patch (smaller part) of the image?
Since it will be important that the algorithm is capable of processing real time video I think it will only be possible if I don't look at the whole image all the time?! Or maybe decide for example if there are more than 10 bounding rectangles I check the whole image instead of every rectangle.
Then I will look at the next frame and try to match my feature vector with the previous one. That way I will be able to trace my objects. Once these objects cross the red line in the middle of the picture it will trigger another event. But that's not important here.
I need to make sure that not every object which is crossing or behind that red line is triggering that event. So there need to be at least 2 or 3 consecutive frames which contain that object and if it crosses then and only then the event should be triggered.
There are so many variations of object recognition algorithms, I am bit overwhelmed.
Sift/Surf/Orb/... you get what I am saying.
Can anyone give me a hint which one I should chose or if what I am doing is even making sense?

Assuming the plane location doesn't change a lot from one frame to the next, I think you should look at object tracking instead of trying to estimate the location independently in each frame.
http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html

Related

Can I track objects by mapping their coordinates from a sequence of images?

I have a video of simple moving dots (that sometimes overlap) that is saved as a sequence of images. At each image I detect all the dots and save their coordinates:
(snapshot 1 -> snapshot 2)
I would like to infer the trajectory of each dot. The dots move smoothly and not too fast from one frame to the other, but if for each point of the first image I just find their closest point of the next image it often fails to reconstruct the trajectory.
I tried on opencv the multitrackers but the trackers very quickly lose their target by jumping on a different dot when the dots tend to overlap. The detection works very nicely though.
The video and the objects to track are simple. I do not want to believe that I need to implement something more technical to accurately track these dots. Which is why I decided to ask here, I am out of ideas. Any tip or advice is appreciated... Thanks.

Detect a 2 x 3 Matrix of white dots in an image

I want to locate a service robot via infrared landmarks. The idea is to detect two landmarks, get the distance to the landmarks and calculate the robots position from these informations (the position of the landmarks are known).
For this I have built an artificial 2x3 matrix of IR LEDs, which are visible in the robots infrared camera image (shown in the image below).
As the first step, I want to detect a single landmark in a picture and get it's x-y coordinates. I can use these coordinates in the future to get the distance from the depth-image provided.
My first approach was to convert the image to a black and white image. Then I tried to filter out different cluster of points (which i dilated and contoured in the first place). I couldn't succeed with this method.
Now I wonder if there are any pattern recognition/computer vision methods which can help me to quite "easily" detect the pattern.
I've added a picture of the infrared image with the landmark in it and a converted black/white image.
a) Which method can help me to solve this problem?
b) Should I use a 3x3 Matrix or any other geometric form instead of the 2x3 Matrix ?
IR-Image
Black-White Image
A direct answer:
1) find all small circles in the image; 2) look among these small circles for ones that are the same size and close together, and, say, form parallel lines.
The reason for this approach is that you have coded the robot with a specific pattern of small objects. Therefore, look for the objects and then look for the pattern. (If the orientation and size wouldn't change, then you could just look for a sub-image within the larger image, but because it can, you need to look for elements of the pattern that remain consistent with motion in the 3D space, that is, the parallel lines.)
This will work in the example images, but to know whether this will work more generally, we need to know more than you told us: It depends on whether the variation in the images of the matrix and the variations in the background will let this be enough to distinguish between them. If not, maybe you need a more clever algorithm or maybe a different pattern of lights. In the extreme case, it's obvious that if you had another 2x3 matric around, it's not enough. It all depends on the variation of the object to be identified and the variations within the background scene, and because you don't tell us either of these things, it's hard to say the best way, what's good enough, what's a better way, etc.
If you have the choice, and here it sound like you do, good data is better than clever analysis. For this problem, I'd call good data to be anything that clearly distinguishes the object from the background. You need to think of it this way, and look at what the background is, and all the different perspectives on the lights that are possible, and make sure these can never be confused.
For example, if you have a lot of control over this, and enough time, temporal variations are often the easiest. Turning the lights (or a subset of the lights) on and off, etc, and then looking for the expected temporal variation is often the surest way to distinguish signal from noise — but really, this again is just making an assumption about the background and foreground (ie, that the background won't vary with some particular time pattern).

Track numbered markers in a video

I have a video which has frames as shown in my previous image in this question.
How do we detect points from a picture with a particular color on those points
I detected these markers and numbered them as shown in the image given below:
My problem is as follows. After I have detected markers in one frame I need to detect them in another frame and find out how much the marker has moved from its previous location. However on using my code again on the second frame I sometimes in some frames get a different numbering among markers and hence I am not able to track markers from one image to another. Also detecting the markers in each image becomes a cumbersome task and takes a lot of time for a video which has around 200 frames.
How can I track these markers over images so as to know how much a particular marker has moved between frames or simply how can I number these markers such that the numbering never changes viz, the marker numbered 60 remains marker number 60 from frame 1 to frame 200.
As a side question is there a way to actually decrease the processing time such that I don't have to detect the face and eyes in each and every frame (Please refer to the image given in the link in my previous question it makes things clearer).
My problem is as follows. After I have detected markers in one frame I
need to detect them in another frame and find out how much the marker
has moved from its previous location. However on using my code again
on the second frame I sometimes in some frames get a different
numbering among markers and hence I am not able to track markers from
one image to another. Also detecting the markers in each image becomes
a cumbersome task and takes a lot of time for a video which has around
200 frames.
How can I track these markers over images so as to know how much a
particular marker has moved between frames or simply how can I number
these markers such that the numbering never changes viz, the marker
numbered 60 remains marker number 60 from frame 1 to frame 200.
Maybe consider using optical flow technique - http://robotics.stanford.edu/~dstavens/cs223b/ ?
Alternatively try to divide your points cloud into smaller parts and than detect contours. You can divide it using lines or by using this simple idea (not tested or analysed):
Find convex hull of all points (http://en.wikipedia.org/wiki/Convex_hull_algorithms) from your point cloud.
Points which are on the border are in one group.
After processing points from group from point 2, delete them.
Go to point 1.
As a side question is there a way to actually decrease the processing time such that I don't have to detect the face and eyes in
each and every frame
There are few easy things you can do to decrease processing time:
Don't load haar cascade during processing each frame - load it only once, before starting getting frames from camera/video file.
if need to find only one face in each frame, use CV_HAAR_FIND_BIGGEST_OBJECT flag - searching will return only one (the biggest) object. It should be much faster, because search will start from the biggest window and additionally when haar detector find one object it will abort searching and return this object.
play with parameters and check different cascades
once you find face in frame number n than in frame number n+1 don't perform search in whole frame - expand rectangle in which you found face in n frame and search only in this expanded rectangle. How much you should expand it? It depends on how fast user can move his head ;) 50% is big tolerance, but also it's slow. The best option is to find this value on your own.
if your image won't change very much you can skip detecting face in most of frames and just assume that it's in the same place as in previous frame - just check whether frame has changed much. The simplest method is Motion detection using OpenCV (as the author mentioned - it's good idea to use binary threshold on the result of subtraction to ignore changes occurring because of noise). I've used this method in my BSc thesis (Eyetracking system) and it worked very well and improved speed of whole system. Note - it's good idea to force normal (using haar cascade) search from time to time (i've decided to do this once per each 3 frames, but you can try with searching less often) - it will allow you to avoid situation in which used has moved outside camera area and the system didn't noticed it.

Region of Interest Uniqueness and Identity

I'm currently working a computer vision application with OpenCV. The application involves target identification and characteristic determination. Generally, I'm going to have a target cross into the visible region and slowly move through it in a couple of seconds. This should give me upwards of 50-60 frames from the camera in which I'll be able to find the target.
We have successfully implemented the detection algorithms using SWT and OCR (the targets all have alphanumeric identifiers, which makes them relatively easy to pick out). What I want to do is use as much of the data as possible from all 50-60 shots of each target. To do this, I need some way to identify that a particular ROI of image 2 contains the same target as another ROI from image 1.
What I'm asking for a little advice from anyone who may have come across this before. How can I easily/quickly identify, within a reasonable error margin, that ROI #2 has the same target as ROI#1? My first instinct is something like this:
Detect targets in frame 1.
Calculate certain unique features of each of the targets in frame 1. Save.
Get frame 2.
Immediately look for ROIs which have the same features as those calc'd in step 2. Grab these and send them down the line for further processing, skipping step 5.
Detect new targets in frame 2.
Pass targets to a thread to calculate shape, color, GPS coordinates, etc.
Lather, rinse, repeat.
I'm thinking that SURF or SIFT features might be a way to accomplish this, but I'm concerned that they might have trouble identifying targets as the same from frame to frame due to distortion or color fade. I don't know how to set a threshold on SIFT/SURF features.
Thank you in advance for any light you can shed on this matter.
One thing you can do is locally equalize brightness and possibly saturation levels. If you aren't using an advanced space such as YCrCb or HSV, I suggest you try them.
Can you assume that the object is not moving too fast? If you feed the previous position in the detection routine, you can decrease the size of the window you are looking at. Same thing goes with the speed, and direction of movement.
I've successfully used histogram composition and shape descriptors of a region in order to reliably detect it, you can use that or add it to a SURF/SIFT classifier.

OpenCV Developing Motion detection Software

I am at the start of developing a software using OpenCV in Microsoft Visual 2010 Express. Now what I need to know before i get into coding is the procedures i have to follow.
Overview:
I want to develop software that detects simple boxing moves such as (Left punch, right punch) and outputs the results.
Now where am struggling is what approach should i take how should i tackle this development i.e.
Capture Video Footage and be able to extract lets say every 5th frame for processing.
Do i have to extract and store this frame perhaps have a REFERENCE image to subtract the capture frame from it.
Once i capture a frame what would be the best way to process it:
* Threshold it, then
* Detect the edges, then
* Smooth the edges using some filter, then
* Draw some BOUNDING boxes....?
What is your view on this guys or am i missing something or are there better simpler ways...? Any suggestions...?
Any answer will be much appreciated
Ps...its not my homework :)
I'm not sure if analyzing only every 5th frame will be enough, because usually punches are so fast that they could be overlooked.
I assume what you actually want to find is fast forward (towards camera) movements of fists.
In case of OpenCV I would first start off with such movements of faces, since some examples are already provided on how to do that in software package.
To detect and track faces you can use CvHaarClassifierCascade, but since this won't be fast enough for runtime detection, continue tracking such found face with Lukas-Kanade. Just pick some good-to-track points inside previously found face, remember their distance from arbitrary face middle, and at each frame update it. See this guy http://www.youtube.com/watch?v=zNqCNMefyV8 - example of just some random points tracked with Lukas-Kanade. Note that unlike faces, fists may not be so easy to track since their surface is rather uniform, better check Lukas-Kanade demo in OpenCV.
Of course with each frame actual face will drift away, once in a while re-run CvHaarClassifierCascade and interpolate to it your currently held face position.
You should be able to do above for fists also, but that will require training classifier with pictures of fists (classifier trained with faces is already provided in OpenCV).
Now having fists/face tracked you may try observing what happens to the points - when someone punches they move rapidly in some direction, while on the fist that remains still they don't move to much. And so, when you calculate average movement of single points in recent frames, the higher the value, the bigger chance that there was a punch. Alternatively, if somehow you've managed to track them accurately, if distance between each of them increases, that means object is closer to camera - and so a likely punch.
Note that without at least knowing change of a size of the fist in picture, it might be hard to distinguish if a movement of hand was forward or backward, or if the user was faking it by moving fists left or right. You may have to come up with some specialized algorithm (maybe with trial and error) to detect that, like say, increase a number of screen color pixels in location that previously fist was found.
What you are looking for is the research field of action recognition e.g. www.nada.kth.se/cvap/actions/ or an possible solution is e.g the STIP ( Space-time interest points) method www.di.ens.fr/~laptev/actions/ . But finally this is a tough job if you have to deal with occlusion or different point of views.