Track numbered markers in a video - c++

I have a video which has frames as shown in my previous image in this question.
How do we detect points from a picture with a particular color on those points
I detected these markers and numbered them as shown in the image given below:
My problem is as follows. After I have detected markers in one frame I need to detect them in another frame and find out how much the marker has moved from its previous location. However on using my code again on the second frame I sometimes in some frames get a different numbering among markers and hence I am not able to track markers from one image to another. Also detecting the markers in each image becomes a cumbersome task and takes a lot of time for a video which has around 200 frames.
How can I track these markers over images so as to know how much a particular marker has moved between frames or simply how can I number these markers such that the numbering never changes viz, the marker numbered 60 remains marker number 60 from frame 1 to frame 200.
As a side question is there a way to actually decrease the processing time such that I don't have to detect the face and eyes in each and every frame (Please refer to the image given in the link in my previous question it makes things clearer).

My problem is as follows. After I have detected markers in one frame I
need to detect them in another frame and find out how much the marker
has moved from its previous location. However on using my code again
on the second frame I sometimes in some frames get a different
numbering among markers and hence I am not able to track markers from
one image to another. Also detecting the markers in each image becomes
a cumbersome task and takes a lot of time for a video which has around
200 frames.
How can I track these markers over images so as to know how much a
particular marker has moved between frames or simply how can I number
these markers such that the numbering never changes viz, the marker
numbered 60 remains marker number 60 from frame 1 to frame 200.
Maybe consider using optical flow technique - http://robotics.stanford.edu/~dstavens/cs223b/ ?
Alternatively try to divide your points cloud into smaller parts and than detect contours. You can divide it using lines or by using this simple idea (not tested or analysed):
Find convex hull of all points (http://en.wikipedia.org/wiki/Convex_hull_algorithms) from your point cloud.
Points which are on the border are in one group.
After processing points from group from point 2, delete them.
Go to point 1.
As a side question is there a way to actually decrease the processing time such that I don't have to detect the face and eyes in
each and every frame
There are few easy things you can do to decrease processing time:
Don't load haar cascade during processing each frame - load it only once, before starting getting frames from camera/video file.
if need to find only one face in each frame, use CV_HAAR_FIND_BIGGEST_OBJECT flag - searching will return only one (the biggest) object. It should be much faster, because search will start from the biggest window and additionally when haar detector find one object it will abort searching and return this object.
play with parameters and check different cascades
once you find face in frame number n than in frame number n+1 don't perform search in whole frame - expand rectangle in which you found face in n frame and search only in this expanded rectangle. How much you should expand it? It depends on how fast user can move his head ;) 50% is big tolerance, but also it's slow. The best option is to find this value on your own.
if your image won't change very much you can skip detecting face in most of frames and just assume that it's in the same place as in previous frame - just check whether frame has changed much. The simplest method is Motion detection using OpenCV (as the author mentioned - it's good idea to use binary threshold on the result of subtraction to ignore changes occurring because of noise). I've used this method in my BSc thesis (Eyetracking system) and it worked very well and improved speed of whole system. Note - it's good idea to force normal (using haar cascade) search from time to time (i've decided to do this once per each 3 frames, but you can try with searching less often) - it will allow you to avoid situation in which used has moved outside camera area and the system didn't noticed it.

Related

How to process data at less than camera's frame per second ability?

i am not sure of how to put my question properly so here it goes.
I am running an object detection algorithm which runs at 40 frame per seconds (fps) and fitted on a camera which acts as an 'eye' on a robot. Then, I process the information which is received from the algorithm and pass the actions to my robot.
The issue is each time, the algorithm runs, it gives me slightly new reading. I guess its because as it processes data every 40 times per second, it will give new information. But I don't need new information if my robot doesn't move as most of the objects are at the same position at the previous frame.
My question, how can i only enhance my algorithm to only give me information each time if there is a change in object positions? by comparing last frame reading with current frame reading for example
I think you should try to find the motion estimation of the image ,I think MPG-4 video is using an algorithm like that.
http://www.img.lx.it.pt/~fp/cav/Additional_material/MPEG4_video.pdf
But if you don't want something so sophisticated and you just want to be see if the second image is the sane with the first one just substract them and see the differance. You can also use a Gaussian filter to cut the high frequencies and subtract them and also put a threshhold to check if you want do the procesing or not

Which object recognition algorithm should I use?

I am pretty new to CV, so forgive my stupid questions...
What I want to do:
I want to recognize a RC plane in live video (for now its only a recorded video).
What I have done so far:
Differences between frames
Convert it to grey scale
GaussianBlur
Threshold
findContours
Here are some example frames:
But there are also frames with noise, so there are more objects in the frame.
I thought I could do something like this:
Use some object recognition algorithm for every contour that has been found. And compute only the feature vector for each of these bounding rectangles.
Is it possible to compute SURF/SIFT/... only for a specific patch (smaller part) of the image?
Since it will be important that the algorithm is capable of processing real time video I think it will only be possible if I don't look at the whole image all the time?! Or maybe decide for example if there are more than 10 bounding rectangles I check the whole image instead of every rectangle.
Then I will look at the next frame and try to match my feature vector with the previous one. That way I will be able to trace my objects. Once these objects cross the red line in the middle of the picture it will trigger another event. But that's not important here.
I need to make sure that not every object which is crossing or behind that red line is triggering that event. So there need to be at least 2 or 3 consecutive frames which contain that object and if it crosses then and only then the event should be triggered.
There are so many variations of object recognition algorithms, I am bit overwhelmed.
Sift/Surf/Orb/... you get what I am saying.
Can anyone give me a hint which one I should chose or if what I am doing is even making sense?
Assuming the plane location doesn't change a lot from one frame to the next, I think you should look at object tracking instead of trying to estimate the location independently in each frame.
http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html

can i access a camera frame in two functions running in parallel?

i am working on face detection - recognition project in opencv c++ , the code works really slow , there is a lag between the real camera feed and the processed feed , i dont want that lag to be visible to the user .
so can i have a function which just reads a frame from camera and displays it . and all the detection/recognition work can be done on other functions running in parallel ?
also i want my result to be visible on the screen ( a box around the face with necessary details) so can i transfer this data across functions . can i create a vector of Rect datatype which contains all these rectangle data , which can be accessed by all the functions to push new faces and to display them?
i am just searching for a solution to this problem , i know little about parallel computing , if there is any other alternative please give details
thanks
Rishi
Yes, you need to run face detection and recognition code in a separate thread. First you need to copy frame to use it on another thread.
Using vector of Rect will be convinient. But you need to lock mutex when you use vector to prevent problems with parallel access to the same data. And you need to lock mutex while copying frame.
I should note that if your face detection and recognition code runs very slowly, it will never give you up-to-date result: rectangles will be displaced.
First of all note one thing - there will be always some lag. Even if you just display image video from the camera (without any processing) it will be a bit delayed.
It's also important to optimize the process of face detection, parallel computing won't fix all you problems. Here i've written a bit about that (but it's mostly about eye detection within face). Anther technique which it's worth trying is checking whether region (part of image) in which you have found face in last frame have changed or not. General idea is quite simple - subtract region of new (actual) frame from the same region of old (previous one) frame. Then on the result image use binary threshold operation (you need to find threshold value on you own by trying different values - i'm not sure, but i think that i've used something about 30 - don't use too small value, because there is always some difference between two frames, because of noise and little changes in lighting etc). Then count all non-zero pixels and divide this number by number off all pixels of this region ( = width * height ) and multiply by 100. This number will be percentage of changed pixels. If this value is small, you don't have analyze current frame, you can just assume that results of analysis from previous frame are still actual and correct. Note that this technique is working fine only if background isn't changing quickly (like for example trees or water).

Region of Interest Uniqueness and Identity

I'm currently working a computer vision application with OpenCV. The application involves target identification and characteristic determination. Generally, I'm going to have a target cross into the visible region and slowly move through it in a couple of seconds. This should give me upwards of 50-60 frames from the camera in which I'll be able to find the target.
We have successfully implemented the detection algorithms using SWT and OCR (the targets all have alphanumeric identifiers, which makes them relatively easy to pick out). What I want to do is use as much of the data as possible from all 50-60 shots of each target. To do this, I need some way to identify that a particular ROI of image 2 contains the same target as another ROI from image 1.
What I'm asking for a little advice from anyone who may have come across this before. How can I easily/quickly identify, within a reasonable error margin, that ROI #2 has the same target as ROI#1? My first instinct is something like this:
Detect targets in frame 1.
Calculate certain unique features of each of the targets in frame 1. Save.
Get frame 2.
Immediately look for ROIs which have the same features as those calc'd in step 2. Grab these and send them down the line for further processing, skipping step 5.
Detect new targets in frame 2.
Pass targets to a thread to calculate shape, color, GPS coordinates, etc.
Lather, rinse, repeat.
I'm thinking that SURF or SIFT features might be a way to accomplish this, but I'm concerned that they might have trouble identifying targets as the same from frame to frame due to distortion or color fade. I don't know how to set a threshold on SIFT/SURF features.
Thank you in advance for any light you can shed on this matter.
One thing you can do is locally equalize brightness and possibly saturation levels. If you aren't using an advanced space such as YCrCb or HSV, I suggest you try them.
Can you assume that the object is not moving too fast? If you feed the previous position in the detection routine, you can decrease the size of the window you are looking at. Same thing goes with the speed, and direction of movement.
I've successfully used histogram composition and shape descriptors of a region in order to reliably detect it, you can use that or add it to a SURF/SIFT classifier.

OpenCV Developing Motion detection Software

I am at the start of developing a software using OpenCV in Microsoft Visual 2010 Express. Now what I need to know before i get into coding is the procedures i have to follow.
Overview:
I want to develop software that detects simple boxing moves such as (Left punch, right punch) and outputs the results.
Now where am struggling is what approach should i take how should i tackle this development i.e.
Capture Video Footage and be able to extract lets say every 5th frame for processing.
Do i have to extract and store this frame perhaps have a REFERENCE image to subtract the capture frame from it.
Once i capture a frame what would be the best way to process it:
* Threshold it, then
* Detect the edges, then
* Smooth the edges using some filter, then
* Draw some BOUNDING boxes....?
What is your view on this guys or am i missing something or are there better simpler ways...? Any suggestions...?
Any answer will be much appreciated
Ps...its not my homework :)
I'm not sure if analyzing only every 5th frame will be enough, because usually punches are so fast that they could be overlooked.
I assume what you actually want to find is fast forward (towards camera) movements of fists.
In case of OpenCV I would first start off with such movements of faces, since some examples are already provided on how to do that in software package.
To detect and track faces you can use CvHaarClassifierCascade, but since this won't be fast enough for runtime detection, continue tracking such found face with Lukas-Kanade. Just pick some good-to-track points inside previously found face, remember their distance from arbitrary face middle, and at each frame update it. See this guy http://www.youtube.com/watch?v=zNqCNMefyV8 - example of just some random points tracked with Lukas-Kanade. Note that unlike faces, fists may not be so easy to track since their surface is rather uniform, better check Lukas-Kanade demo in OpenCV.
Of course with each frame actual face will drift away, once in a while re-run CvHaarClassifierCascade and interpolate to it your currently held face position.
You should be able to do above for fists also, but that will require training classifier with pictures of fists (classifier trained with faces is already provided in OpenCV).
Now having fists/face tracked you may try observing what happens to the points - when someone punches they move rapidly in some direction, while on the fist that remains still they don't move to much. And so, when you calculate average movement of single points in recent frames, the higher the value, the bigger chance that there was a punch. Alternatively, if somehow you've managed to track them accurately, if distance between each of them increases, that means object is closer to camera - and so a likely punch.
Note that without at least knowing change of a size of the fist in picture, it might be hard to distinguish if a movement of hand was forward or backward, or if the user was faking it by moving fists left or right. You may have to come up with some specialized algorithm (maybe with trial and error) to detect that, like say, increase a number of screen color pixels in location that previously fist was found.
What you are looking for is the research field of action recognition e.g. www.nada.kth.se/cvap/actions/ or an possible solution is e.g the STIP ( Space-time interest points) method www.di.ens.fr/~laptev/actions/ . But finally this is a tough job if you have to deal with occlusion or different point of views.