Semi-automatic face & eye detection - c++

For analysis we have a sequence of images or a movie. My aim is to create a semi automatic face and eye detection for these sequences. The sequences consist of about 4000 images with a frontal capture of a person slightly moving. I want to process these images semi automatic or manual to get the two/three ROIs of the face and eyes.
I tried OpenCV's cascade classifiers but for my sequences they do not turn out to be robust (with manual controll we need to get a rate of 100%). The cascade classifiers do not give positions, eg. when the person is looking slightly to the side.
Is there any semi automatic approach out there for imagej, matlab or opencv/c++ to select/correct the rois manually if false detected or to select templates for tracking ?

If you are processing a movie, it is reasonable to assume that the motion between frames is low. The following is a possible approach.
Initialize the first frame manually (or get user input to confirm/edit the positions detected by cascade classifiers)
For the next frame, check if the features detected are too far off the original positions. You can also check if the positions of different parts are moving in an illogical manner.
Stop and get the user to correct the points, if processing in step 2 suggest errors.
Note: With OpenCV cascades, face detection is generally accurate. But eye detection is not so accurate and you might not detect both eyes in some frames. Some projects use AAMs (Active Appearance Models) to robustly track a face, and this might work for you.

For face detection, try this list of 50+ API's :
http://blog.mashape.com/post/53379410412/list-of-50-face-detection-recognition-apis
for eyes detection you can try flandmark detector:
http://cmp.felk.cvut.cz/~uricamic/flandmark/
Or STASM:
http://www.milbo.users.sonic.net/stasm/

Related

What is the best method to train the faces with FaceRecognizer OpenCV to have the best result?

Here I say that I have tried many tutorials to implement face recognition in OpenCV 3.2 by using the FaceRecognizer class in face module. But I did not get the accepted result as I wish.
Here I want to ask and I want to know, that what is the best way or what are the conditions to be care off during training and recognizing?
What I have done to improve the accuracy:
Create (at least) 10 faces for training each person in the best quality, size, and angle.
Try to fit the face in the image.
Equalize the HIST of the images
And then I have tried all the three face recognizer (EigenFaceRecognizer, FisherFaceRecognizer, LBPHFaceRecognizer), the result all was the same, but the recognition rate was really very low, I have trained only for three persons, but also cannot recognize very well (the fist person was recognized as the second and so on problems).
Questions:
Do the training and the recognition images must be from the same
camera?
Do the training images cropped manually (photoshop -> read images then train) or this task
must be done programmatically (detect-> crop-> resize then train)?
And what are the best parameters for the each face recognizer (int num_components, double threshold)
And how to set training Algorithm to return -1 when it is an unknown
person.
Expanding on my comment, Chapter 8 in Mastering OpenCV provides really helpful tips for pre-processing faces to make aid the recognition process, such as:
taking a sample only when both eyes are detected (via haar cascade)
Geometrical transformation and cropping: This process would include scaling, rotating, and translating the images so that the eyes are aligned, followed by the removal of the forehead, chin, ears, and background from the face image.
Separate histogram equalization for left and right sides: This process standardizes the brightness and contrast on both the left- and right-hand sides of the face independently.
Smoothing: This process reduces the image noise using a bilateral filter.
Elliptical mask: The elliptical mask removes some remaining hair and background from the face image.
I've added a hacky load/save to my fork of the example code, feel free to try it/tweak it as you need it. Currently it's very limited, but it's a start.
Additionally, you should also check OpenFace and it's DNN face recognizer.
I haven't played with that yet so can't provide details, but it looks really cool.

Dynamic background separation and reliable circle detection with OpenCV

I am attempting to detect coloured tennis balls on a similar coloured background. I am using OpenCV and C++
This is the test image I am working with:
http://i.stack.imgur.com/yXmO4.jpg
I have tried using multiple edge detectors; sobel, laplace and canny. All three detect the white line, but when the threshold is at a value where it can detect the edge of the tennis ball, there is too much noise in the output.
I have also tried the Hough Circle transform but as it is based on canny, it isn't effective.
I cannot use background subtraction because the background can move. I also cannot modify the threshold values as lighting conditions may create gradients within the tennis ball.
I feel my only option is too template match or detect the white line, however I would like to avoid this if possible.
Do you have any suggestions ?
I had to tilt my screen to spot the tennisball myself. It's a hard image.
That said, the default OpenCV implementation of the Hough transform uses the Canny edge detector, but it's not the only possible implementation. For these harder cases, you might need to reimplement it yourself.
You can certainly run the Hough algorithm repeatedly with different settings for the edge detection, to generate multiple candidates. Besides comparing candidates directly, you can also check that each candidate has a dominant texture (after local shading corrections) and possibly a stripe. But that might be very tricky if those tennisballs are actually captured in flight, i.e. moving.
What are you doing to the color image BEFORE the edge detection? Simply converting it to gray?
In my experience colorful balls pop out best when you use the HSV color space. Then you would have to decide which channel gives the best results.
Perhaps transform the image to a different feature space might be better then relying on color. Maybe try LBP which responds to texture. Then do PCA on the result to reduce the feature space to 1 single channel image and try Hough Transform on that.

Region of Interest Uniqueness and Identity

I'm currently working a computer vision application with OpenCV. The application involves target identification and characteristic determination. Generally, I'm going to have a target cross into the visible region and slowly move through it in a couple of seconds. This should give me upwards of 50-60 frames from the camera in which I'll be able to find the target.
We have successfully implemented the detection algorithms using SWT and OCR (the targets all have alphanumeric identifiers, which makes them relatively easy to pick out). What I want to do is use as much of the data as possible from all 50-60 shots of each target. To do this, I need some way to identify that a particular ROI of image 2 contains the same target as another ROI from image 1.
What I'm asking for a little advice from anyone who may have come across this before. How can I easily/quickly identify, within a reasonable error margin, that ROI #2 has the same target as ROI#1? My first instinct is something like this:
Detect targets in frame 1.
Calculate certain unique features of each of the targets in frame 1. Save.
Get frame 2.
Immediately look for ROIs which have the same features as those calc'd in step 2. Grab these and send them down the line for further processing, skipping step 5.
Detect new targets in frame 2.
Pass targets to a thread to calculate shape, color, GPS coordinates, etc.
Lather, rinse, repeat.
I'm thinking that SURF or SIFT features might be a way to accomplish this, but I'm concerned that they might have trouble identifying targets as the same from frame to frame due to distortion or color fade. I don't know how to set a threshold on SIFT/SURF features.
Thank you in advance for any light you can shed on this matter.
One thing you can do is locally equalize brightness and possibly saturation levels. If you aren't using an advanced space such as YCrCb or HSV, I suggest you try them.
Can you assume that the object is not moving too fast? If you feed the previous position in the detection routine, you can decrease the size of the window you are looking at. Same thing goes with the speed, and direction of movement.
I've successfully used histogram composition and shape descriptors of a region in order to reliably detect it, you can use that or add it to a SURF/SIFT classifier.

After having calculate SIFT or ORB on a frame, how to real time track the object in a video?

Bascially I want to detect an object and than track it in a video (frame-by-frame).
I can detect it on the first frame with for example ORB or SIFT. But for the next frames (or say next XX frames) I would like to avoid to calulcate again all the keypoints (ORB or SIFT) to detect it again.
Considering I want to track it in a video real time, what could I do ?
A common option is using a patchtracker. That means that you just search for keypoints in an area of, for example, 8 pixels around the previous frame keypoint. You can perform cv::matchTemplate() of an area surrounding the keypoint, instead of using SIFT.
Performing a pyramidal search helps to improve frame-rate. You first search at a lower scale, if you cannot find the keypoint you double the scale.
If patchtracker fails, because the image moves too fast, you just have to reinitialize the system by applying SIFT again. I would use FAST instead of SIFT. You can use SIFT for the marker, and then FAST for detecting keypoints real-time, generating SIFT descriptors.
Detecting and tracking object in a video is a very large topic and the way to go highly depends on your application. There is no magic bullet!
If you achieve the detection part, you can try tracking by meanshift on color (maybe HSV color space) likelihood if the object you need to track is colored .. , or try template matching, or .. You need to be more specific on your needs.
you can use OpticalFlow for simple tracking, here are the steps to do it...
Find the corners of a moving object using harris corner detector or SIFT feature detector.
Give those corners and previous image(in which you found the corners of object to be tracked) and the next image to opticalflow function it will compute the corners of the same object in the next images..
Here are the links:
Link1
Link2
code
NOTE: if you are addressing problems like occlusion handling , multiple people tracking then OpticalFlow alone can't solve problems. For that kalman filter or particle filters are needed to be employed...
You can achieve almost perfect and real time tracking using TLD or CLM. Once you detect the object of interest use that bounding box to initiate predator tracking.
You can find about CMT here
https://www.gnebehay.com/cmt/
and TLD here
https://www.gnebehay.com/tld/

Refining Haar detection

I'm trying to make a hand detection program by using OpenCV and Haar cascade. It works quite well but it's very jerky. So I'm asking myself if this is a trouble of the haar file that would be too 'cheap' or if there's a way to refine the detection by using contours or feature detection (or may be some other techniques).
What I would like to perform would be the same as this face detection, but for hands : Face Detection (see FaceOSC)
Thanks a lot.
EDIT : here is the kind of stuff I would like to do : Hand extraction It seems that he performs it with contour detection, but how to find the hand ?
The Hand Extraction video, you gave the link, is based on skin color detection and convex hull finding.
1) Change image to YCrCb (or HSV).
2) Threshold the image so that hand becomes white and everything other to black.
3) Remove noise
4) Find center of hand (if you like).
5) Use convex hull to find sharpest points which will be finger tips.
You can get full details from this paper.
Anyway, no need of haar cascades.
obviously if the HAAR classifier-based detection results become so-called 'jerky', in my opinion which means the detection is not stable and jumps around the detecting image, then the problem is on the quality of classifier.
as far as there are enough positive/negative samples, lets say 5k/5k, the results should be quite robust already. Based on my experiences, I used 700 positive hand gesture samples and 1200 negative samples, and the results seemed satisfied to some extent. but after I used another group of 8000 positive samples and 10200 negative samples with different features included, the results were even worse than the former.
So, I would suggest you to carefully reset your training samples, such like the ratio, content features and colours.