Pose independent face detection - c++

I'm working on a project where I need to detect faces in very messy videos (recorded from an egocentric point of view, so you can imagine..). Faces can have angles of yaw that variate between -90 and +90, pitch with almost the same variation (well, a bit lower due to the human body constraints..) and possibly some roll variations too.
I've spent a lot of time searching for some pose independent face detector. In my project I'm using OpenCV but OpenCV face detector is not even close to the detection rate I need. It has very good results on frontal faces but almost zero results on profile faces. Using haarcascade .xml files trained on profile images doesn't really help. Combining frontal and profile cascades yield slightly better results but still, not even close to what I need.
Training my own haarcascade will be my very last resource since the huge computational (or time) requirements.
By now, what I'm asking is any help or any advice regarding this matter.
The requirements for a face detector I could use are:
very good detection rate. I don't mind a very high false positive rate since using some temporal consistency in my video I'll probably be able to get rid of the majority of them
written in c++, or that could work in a c++ application
Real time is not an issue by now, detection rate is everything I care right now.
I've seen many papers achieving these results but i couldn't find any code that I could use.
I sincerely thank for any help that you'll be able to provide.

perhaps not an answer but too long to put into comment.
you can use opencv_traincascade.exe to train a new detector that can detect a wider variety of poses. this post may be of help. http://note.sonots.com/SciSoftware/haartraining.html. i have managed to trained a detector that is sensitive within -50:+50 yaw by using feret data set. for my case, we did not want to detect purely side faces so training data is prepared accordingly. since feret already provides convenient pose variations it might be possible to train a detector somewhat close to your specification. time is not an issue if you are using lbp features, training completes in 4-5 hours at most and it goes even faster(15-30min) by setting appropriate parameters and using fewer training data(useful for ascertaining whether the detector is going to produce the output you expected).

Related

Solution for object detectors that are far worse than object classifiers

I've built a classifier that has high accuracy in classifying certain objects of interest, given images that focus only on those objects.
However, when this same classifier is applied to an object detector that scans a larger image using selective search or sliding windows, the detector's performance is dismally low.
I don't understand why. Is this normal in computer vision? And what's the solution?
You would have to be more specific. What does "dismally low" mean? Are you seeing a lot of false positives? A lot of false negatives? Also, what kind of objects are you trying to detect?
One thing to keep in mind is when you do sliding-window object detection, you typically get multiple detections around the actual location of the object of interest. The usual way to deal with this issue is to merge the overlapping detections. For example, the Computer Vision System Toolbox for MATLAB provides selectStrongestBbox function for this.
Another possible issue could be the variation in aspect ratio of the objects of interest. Typically, when you train a cascade classifier all the positive images are resized to be the same "training size" (e.g. 32x32 pixels). Then, when you do sliding windows, the aspect ratio of your window is the same as that of the training size. This results in good detection accuracy only if the aspect ratio of your objects of interest stays roughly the same. For example, this works well for faces. This will not work as well for a class of objects with varying aspect ratios, such as a category that includes both regular cars and stretched limousines.
And, of course, don't forget that haar cascade detectors do not tolerate in-plane or out-of-plane rotation very well.

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

Is Eigenface best for moving object face recognition

I am doing a project on face recognition from CCTV cameras, I want to recognize each individual faces. I think eigenface method is best for face recognition. But when we use eigenface method for moving object face recognition, is there any problem? Can we recognize individuals perfectly? Since it is not still image, I am really confused to select a method.
Please help me to know whether this method is ok, otherwise suggest a better alternative.
Short answer: Typically those computer vision techniques used in image analysis can be used in video analysis, too. Videos just give you more information (esp. the temporal information.) For example, you could do face recognition using multiple frames, and between each frame you do object tracking. Associating multiple frames typically give you higher accuracy.
IMO, the most difficult problems are: you're more likely to face viewing angle, calibration problems, and lighting condition problems, in which you will need accurate face detection technique, or more training data in order to recognize faces under viewing angles and lighting conditions. Eigen face based approach relies on an accurate position of faces, eyes, and so on. Otherwise, you are likely to mix different features in the same vector. But again, this problem also exists in face recognition under still image.
To sum up, video content only gives you more information. If you don't really want to associated frames and consider temporal information, video is just a collection of still images :)

Detect and extract face from an image

I have been trying to do the following -
When a user uploads an Image in my web app, I'd like to detect his/her face in it and extract face (from forehead to chin and cheek to cheek) from it.
I tried OpenCV/C++ face detection using Haar Cascade but problem with it is that it gives a probability of where the face would be because of which either background of image comes inside the ROI or even the complete face doesn't come in the ROI.
I also want to detect eye inside the face and while using the above technique, the eye detection isn't that accurate.
I've read up on a new technique called Active Appearance Model (AAM). The blogs where I read up about this show that this is exactly what I want but I am lost on how to implement this.
My queries are -
Is using AAM a good idea for face detection and face feature detection.
Are there any other techniques for doing the same.
Any help on any of these is much appreciated.
Thanks !
As you noticed OpenCV's implementation of face detection is not state-of-the-art. It is a very good and robust implementation but you can do better.
Recently, Zhu and Ramanan (CVPR 2012) had intoduced Face detection, pose estimation and landmark localization in the wild which is considered to be one of the leading algorithms for face detection in recent years.
Their algorithm is capable of detecting faces both frontal and profile views AND identifying keypoints on the detected face such as eyes nose and mouth.
The authors were kind enough to publish their code along with learned models, it is a Matlab implementation but the main computations are done in C++, so it should not be too difficult to make a standalone C++ implementation of thier method.

Training the HOG descriptor for people detection

First I have tried the default people detector in the OpenCV library.
HOGDescriptor hog;
hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());
hog.detectMultiScale(img, found, 0, Size(8,8), Size(0,0), 1.05, 2);
Although it returns positive matches in a indoor environment with a webcam, they are very rare. So I trained the descriptor with INRIA dataset's negative and positive images but this time the false positives are far too many. I am not trying to lower false matches to zero, it would be enough to lower them to a reasonable level. What should I do?
Another issue is that I think the people in my sample videos are too far away to be easily distinguishable as human images. I have tried reducing the cell size but am not sure this is the right approach. What can be done about this?
Images would be helpful to you but due to reputation I can't post them.
Thanks
Check the opencv [doc]: http://docs.opencv.org/modules/gpu/doc/object_detection.html#gpu-hogdescriptor-detectmultiscale it seems your not using the interface correctly.
Did you do an evaluation of your trained SVM and observed a bad detection rate there as well? If yes you need to play a bit with the training parameters or input data. As far as I remember the INRIA set included people and non-people images but only the positive patches where exactly defined. When I trained a hog classifier the selection of negative samples had a lot of influence. Oh and did you use boosting? IIRC boosting provided a large performance gain in the original paper.