i am working on face detection - recognition project in opencv c++ , the code works really slow , there is a lag between the real camera feed and the processed feed , i dont want that lag to be visible to the user .
so can i have a function which just reads a frame from camera and displays it . and all the detection/recognition work can be done on other functions running in parallel ?
also i want my result to be visible on the screen ( a box around the face with necessary details) so can i transfer this data across functions . can i create a vector of Rect datatype which contains all these rectangle data , which can be accessed by all the functions to push new faces and to display them?
i am just searching for a solution to this problem , i know little about parallel computing , if there is any other alternative please give details
thanks
Rishi
Yes, you need to run face detection and recognition code in a separate thread. First you need to copy frame to use it on another thread.
Using vector of Rect will be convinient. But you need to lock mutex when you use vector to prevent problems with parallel access to the same data. And you need to lock mutex while copying frame.
I should note that if your face detection and recognition code runs very slowly, it will never give you up-to-date result: rectangles will be displaced.
First of all note one thing - there will be always some lag. Even if you just display image video from the camera (without any processing) it will be a bit delayed.
It's also important to optimize the process of face detection, parallel computing won't fix all you problems. Here i've written a bit about that (but it's mostly about eye detection within face). Anther technique which it's worth trying is checking whether region (part of image) in which you have found face in last frame have changed or not. General idea is quite simple - subtract region of new (actual) frame from the same region of old (previous one) frame. Then on the result image use binary threshold operation (you need to find threshold value on you own by trying different values - i'm not sure, but i think that i've used something about 30 - don't use too small value, because there is always some difference between two frames, because of noise and little changes in lighting etc). Then count all non-zero pixels and divide this number by number off all pixels of this region ( = width * height ) and multiply by 100. This number will be percentage of changed pixels. If this value is small, you don't have analyze current frame, you can just assume that results of analysis from previous frame are still actual and correct. Note that this technique is working fine only if background isn't changing quickly (like for example trees or water).
Related
i am not sure of how to put my question properly so here it goes.
I am running an object detection algorithm which runs at 40 frame per seconds (fps) and fitted on a camera which acts as an 'eye' on a robot. Then, I process the information which is received from the algorithm and pass the actions to my robot.
The issue is each time, the algorithm runs, it gives me slightly new reading. I guess its because as it processes data every 40 times per second, it will give new information. But I don't need new information if my robot doesn't move as most of the objects are at the same position at the previous frame.
My question, how can i only enhance my algorithm to only give me information each time if there is a change in object positions? by comparing last frame reading with current frame reading for example
I think you should try to find the motion estimation of the image ,I think MPG-4 video is using an algorithm like that.
http://www.img.lx.it.pt/~fp/cav/Additional_material/MPEG4_video.pdf
But if you don't want something so sophisticated and you just want to be see if the second image is the sane with the first one just substract them and see the differance. You can also use a Gaussian filter to cut the high frequencies and subtract them and also put a threshhold to check if you want do the procesing or not
I am pretty new to CV, so forgive my stupid questions...
What I want to do:
I want to recognize a RC plane in live video (for now its only a recorded video).
What I have done so far:
Differences between frames
Convert it to grey scale
GaussianBlur
Threshold
findContours
Here are some example frames:
But there are also frames with noise, so there are more objects in the frame.
I thought I could do something like this:
Use some object recognition algorithm for every contour that has been found. And compute only the feature vector for each of these bounding rectangles.
Is it possible to compute SURF/SIFT/... only for a specific patch (smaller part) of the image?
Since it will be important that the algorithm is capable of processing real time video I think it will only be possible if I don't look at the whole image all the time?! Or maybe decide for example if there are more than 10 bounding rectangles I check the whole image instead of every rectangle.
Then I will look at the next frame and try to match my feature vector with the previous one. That way I will be able to trace my objects. Once these objects cross the red line in the middle of the picture it will trigger another event. But that's not important here.
I need to make sure that not every object which is crossing or behind that red line is triggering that event. So there need to be at least 2 or 3 consecutive frames which contain that object and if it crosses then and only then the event should be triggered.
There are so many variations of object recognition algorithms, I am bit overwhelmed.
Sift/Surf/Orb/... you get what I am saying.
Can anyone give me a hint which one I should chose or if what I am doing is even making sense?
Assuming the plane location doesn't change a lot from one frame to the next, I think you should look at object tracking instead of trying to estimate the location independently in each frame.
http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html
I have a video which has frames as shown in my previous image in this question.
How do we detect points from a picture with a particular color on those points
I detected these markers and numbered them as shown in the image given below:
My problem is as follows. After I have detected markers in one frame I need to detect them in another frame and find out how much the marker has moved from its previous location. However on using my code again on the second frame I sometimes in some frames get a different numbering among markers and hence I am not able to track markers from one image to another. Also detecting the markers in each image becomes a cumbersome task and takes a lot of time for a video which has around 200 frames.
How can I track these markers over images so as to know how much a particular marker has moved between frames or simply how can I number these markers such that the numbering never changes viz, the marker numbered 60 remains marker number 60 from frame 1 to frame 200.
As a side question is there a way to actually decrease the processing time such that I don't have to detect the face and eyes in each and every frame (Please refer to the image given in the link in my previous question it makes things clearer).
My problem is as follows. After I have detected markers in one frame I
need to detect them in another frame and find out how much the marker
has moved from its previous location. However on using my code again
on the second frame I sometimes in some frames get a different
numbering among markers and hence I am not able to track markers from
one image to another. Also detecting the markers in each image becomes
a cumbersome task and takes a lot of time for a video which has around
200 frames.
How can I track these markers over images so as to know how much a
particular marker has moved between frames or simply how can I number
these markers such that the numbering never changes viz, the marker
numbered 60 remains marker number 60 from frame 1 to frame 200.
Maybe consider using optical flow technique - http://robotics.stanford.edu/~dstavens/cs223b/ ?
Alternatively try to divide your points cloud into smaller parts and than detect contours. You can divide it using lines or by using this simple idea (not tested or analysed):
Find convex hull of all points (http://en.wikipedia.org/wiki/Convex_hull_algorithms) from your point cloud.
Points which are on the border are in one group.
After processing points from group from point 2, delete them.
Go to point 1.
As a side question is there a way to actually decrease the processing time such that I don't have to detect the face and eyes in
each and every frame
There are few easy things you can do to decrease processing time:
Don't load haar cascade during processing each frame - load it only once, before starting getting frames from camera/video file.
if need to find only one face in each frame, use CV_HAAR_FIND_BIGGEST_OBJECT flag - searching will return only one (the biggest) object. It should be much faster, because search will start from the biggest window and additionally when haar detector find one object it will abort searching and return this object.
play with parameters and check different cascades
once you find face in frame number n than in frame number n+1 don't perform search in whole frame - expand rectangle in which you found face in n frame and search only in this expanded rectangle. How much you should expand it? It depends on how fast user can move his head ;) 50% is big tolerance, but also it's slow. The best option is to find this value on your own.
if your image won't change very much you can skip detecting face in most of frames and just assume that it's in the same place as in previous frame - just check whether frame has changed much. The simplest method is Motion detection using OpenCV (as the author mentioned - it's good idea to use binary threshold on the result of subtraction to ignore changes occurring because of noise). I've used this method in my BSc thesis (Eyetracking system) and it worked very well and improved speed of whole system. Note - it's good idea to force normal (using haar cascade) search from time to time (i've decided to do this once per each 3 frames, but you can try with searching less often) - it will allow you to avoid situation in which used has moved outside camera area and the system didn't noticed it.
I am at the start of developing a software using OpenCV in Microsoft Visual 2010 Express. Now what I need to know before i get into coding is the procedures i have to follow.
Overview:
I want to develop software that detects simple boxing moves such as (Left punch, right punch) and outputs the results.
Now where am struggling is what approach should i take how should i tackle this development i.e.
Capture Video Footage and be able to extract lets say every 5th frame for processing.
Do i have to extract and store this frame perhaps have a REFERENCE image to subtract the capture frame from it.
Once i capture a frame what would be the best way to process it:
* Threshold it, then
* Detect the edges, then
* Smooth the edges using some filter, then
* Draw some BOUNDING boxes....?
What is your view on this guys or am i missing something or are there better simpler ways...? Any suggestions...?
Any answer will be much appreciated
Ps...its not my homework :)
I'm not sure if analyzing only every 5th frame will be enough, because usually punches are so fast that they could be overlooked.
I assume what you actually want to find is fast forward (towards camera) movements of fists.
In case of OpenCV I would first start off with such movements of faces, since some examples are already provided on how to do that in software package.
To detect and track faces you can use CvHaarClassifierCascade, but since this won't be fast enough for runtime detection, continue tracking such found face with Lukas-Kanade. Just pick some good-to-track points inside previously found face, remember their distance from arbitrary face middle, and at each frame update it. See this guy http://www.youtube.com/watch?v=zNqCNMefyV8 - example of just some random points tracked with Lukas-Kanade. Note that unlike faces, fists may not be so easy to track since their surface is rather uniform, better check Lukas-Kanade demo in OpenCV.
Of course with each frame actual face will drift away, once in a while re-run CvHaarClassifierCascade and interpolate to it your currently held face position.
You should be able to do above for fists also, but that will require training classifier with pictures of fists (classifier trained with faces is already provided in OpenCV).
Now having fists/face tracked you may try observing what happens to the points - when someone punches they move rapidly in some direction, while on the fist that remains still they don't move to much. And so, when you calculate average movement of single points in recent frames, the higher the value, the bigger chance that there was a punch. Alternatively, if somehow you've managed to track them accurately, if distance between each of them increases, that means object is closer to camera - and so a likely punch.
Note that without at least knowing change of a size of the fist in picture, it might be hard to distinguish if a movement of hand was forward or backward, or if the user was faking it by moving fists left or right. You may have to come up with some specialized algorithm (maybe with trial and error) to detect that, like say, increase a number of screen color pixels in location that previously fist was found.
What you are looking for is the research field of action recognition e.g. www.nada.kth.se/cvap/actions/ or an possible solution is e.g the STIP ( Space-time interest points) method www.di.ens.fr/~laptev/actions/ . But finally this is a tough job if you have to deal with occlusion or different point of views.
I am working on a microscope that streams live images via a built-in video camera to a PC, where further image processing can be performed on the streamed image. Any processing done on the streamed image must be done in "real-time" (minimal frames dropped).
We take the average of a series of static images to counter random noise from the camera to improve the output of some of our image processing routines.
My question is: how do I know if the image is no longer static - either the sample under inspection has moved or rotated/camera zoom-in or out - so I can reset the image series used for averaging?
I looked through some of the threads, and some ideas that seemed interesting:
Note: using Windows, C++ and Intel IPP. With IPP the image is a byte array (Ipp8u).
1. Hash the images, and compare the hashes (normal hash or perceptual hash?)
2. Use normalized cross correlation (IPP has many variations - which to use?)
Which do you guys think is suitable for my situation (speed)?
If you camera doesn't shake, you can, as inVader said, subtract images. Then a sum of absolute values of all pixels of the difference image is sometimes enough to tell if images are the same or different. However, if your noise, lighting level, etc... varies, this will not give you a good enough S/N ratio.
And in noizy conditions normal hashes are even more useless.
The best would be to identify that some features of your object has changed, like it's boundary (if it's regular) or it's mass center (if it's irregular). If you have a boundary position, you'll need to analyze just one line of pixels, perpendicular to that boundary, to tell that boundary has moved.
Mass center position may be a subject to frequent false-negative responses, but adding a total mass and/or moment of inertia may help.
If the camera shakes, you may have to align images before comparing (depending on comparison method and required accuracy, a single pixel misalignment might be huge), and that's where cross-correlation helps.
And further, you doesn't have to analyze each image. You can skip one, and if the next differs, discard both of them. Here you have twice as much time to analyze an image.
And if you are averaging images, you might just define an optimal amount of images you need and compare just the first and the last image in the sequence.
So, simplest thing to try would be to take subsequent images, subtract them from each other and have a look at the difference. Then define some rules including local and global thresholds for the difference in which two images are considered equal. Simple subtraction of bitmap/array data, looking for maxima and calculating the average differnce across the whole thing should be ne problem to do in real time.
If there are varying light conditions or something moving in a predictable way(like a door opening and closing), then something more powerful, albeit slower, like gaussian mixture models for background modeling, might be worth looking into, click here. It is quite compute intensive, but can be parallelized pretty easily.
Motion detection algorithms is what is used.
http://www.codeproject.com/Articles/10248/Motion-Detection-Algorithms
http://www.codeproject.com/Articles/22243/Real-Time-Object-Tracker-in-C
First of all I would take a series of images at a slow fps rate and downsample those images to make them smaller, not too much but enough to speed up the process.
Now you have several options:
You could make a sum of absolute differences of the two images by subtracting them and use a threshold to value if the image has changed.
If you want to speed it up even further I would suggest doing a progressive SAD using a small kernel and moving from the top of the image to the bottom. You can value the complessive amount of differences during the process and eventually stop when you are satisfied.