face tracking or "dynamic recognition" - computer-vision

What is a best approach for face detection/tracking considering following scenario:
when person enters in scene/frame it should be detected and recognized in every next frame until he/she leaves the scene.
also should be able to do this for multiple users at once.
I have experience with viola jones detection, and fisher face recognition. But I've used ff recognition only for previously prepared learning set, and now I need something for any user that enters the scene..

I am also interested in different solutions.
I used opencv face detection for multiple faces and the rekognition api (http://rekognition.com) and pushed the faces and retrained the dataset frequently. Light-weighted from our side, but I am sure there are more robust solutions for this.

Have you tried VideoSurveillance? Also known as OpenCV blob tracker.
It's a motion-based tracker with across frames data association(1) and if you want to replace motion with face detection, you must adjust the code by replacing the foreground mask with detection responses. This approach is called track-by-detect in the literature.
(1) "Appearance Models for Occlusion Handling", Andrew Senior et al.

Related

Stitching a full spherical mosaic using only a smartphone and sensor data?

I'm really interested in the Google Street View mobile application, which integrates a method to create a fully functional spherical panorama using only your smartphone camera. (Here's the procedure for anyone interested: https://www.youtube.com/watch?v=NPs3eIiWRaw)
What strikes me the most is that it always manages to create the full sphere, even when stitching a feature-less near monochrome blue sky or ceiling ; which gets me to thinking that they're not using feature based matching.
Is it possible to get a decent quality full spherical mosaic without using feature based matching and only using sensor data? Are smartphone sensors precise enough? What library would be usable to do this? OpenCV? Something else?
Thanks!
The features are needed for registration. In the app the clever UI makes sure they already know where each photo is relative to the sphere so in the extreme case all the have to do is reproject/warp and blend. No additional geometry processing needed.
I would assume that they do try to do some small corrections to improve the registration, but even if these fail, you can fallback onto the sensor based ones acquired at capture time.
This is a case where a clever UI makes the vision problem significantly easier.

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

Is Eigenface best for moving object face recognition

I am doing a project on face recognition from CCTV cameras, I want to recognize each individual faces. I think eigenface method is best for face recognition. But when we use eigenface method for moving object face recognition, is there any problem? Can we recognize individuals perfectly? Since it is not still image, I am really confused to select a method.
Please help me to know whether this method is ok, otherwise suggest a better alternative.
Short answer: Typically those computer vision techniques used in image analysis can be used in video analysis, too. Videos just give you more information (esp. the temporal information.) For example, you could do face recognition using multiple frames, and between each frame you do object tracking. Associating multiple frames typically give you higher accuracy.
IMO, the most difficult problems are: you're more likely to face viewing angle, calibration problems, and lighting condition problems, in which you will need accurate face detection technique, or more training data in order to recognize faces under viewing angles and lighting conditions. Eigen face based approach relies on an accurate position of faces, eyes, and so on. Otherwise, you are likely to mix different features in the same vector. But again, this problem also exists in face recognition under still image.
To sum up, video content only gives you more information. If you don't really want to associated frames and consider temporal information, video is just a collection of still images :)

Detect and extract face from an image

I have been trying to do the following -
When a user uploads an Image in my web app, I'd like to detect his/her face in it and extract face (from forehead to chin and cheek to cheek) from it.
I tried OpenCV/C++ face detection using Haar Cascade but problem with it is that it gives a probability of where the face would be because of which either background of image comes inside the ROI or even the complete face doesn't come in the ROI.
I also want to detect eye inside the face and while using the above technique, the eye detection isn't that accurate.
I've read up on a new technique called Active Appearance Model (AAM). The blogs where I read up about this show that this is exactly what I want but I am lost on how to implement this.
My queries are -
Is using AAM a good idea for face detection and face feature detection.
Are there any other techniques for doing the same.
Any help on any of these is much appreciated.
Thanks !
As you noticed OpenCV's implementation of face detection is not state-of-the-art. It is a very good and robust implementation but you can do better.
Recently, Zhu and Ramanan (CVPR 2012) had intoduced Face detection, pose estimation and landmark localization in the wild which is considered to be one of the leading algorithms for face detection in recent years.
Their algorithm is capable of detecting faces both frontal and profile views AND identifying keypoints on the detected face such as eyes nose and mouth.
The authors were kind enough to publish their code along with learned models, it is a Matlab implementation but the main computations are done in C++, so it should not be too difficult to make a standalone C++ implementation of thier method.

Pose independent face detection

I'm working on a project where I need to detect faces in very messy videos (recorded from an egocentric point of view, so you can imagine..). Faces can have angles of yaw that variate between -90 and +90, pitch with almost the same variation (well, a bit lower due to the human body constraints..) and possibly some roll variations too.
I've spent a lot of time searching for some pose independent face detector. In my project I'm using OpenCV but OpenCV face detector is not even close to the detection rate I need. It has very good results on frontal faces but almost zero results on profile faces. Using haarcascade .xml files trained on profile images doesn't really help. Combining frontal and profile cascades yield slightly better results but still, not even close to what I need.
Training my own haarcascade will be my very last resource since the huge computational (or time) requirements.
By now, what I'm asking is any help or any advice regarding this matter.
The requirements for a face detector I could use are:
very good detection rate. I don't mind a very high false positive rate since using some temporal consistency in my video I'll probably be able to get rid of the majority of them
written in c++, or that could work in a c++ application
Real time is not an issue by now, detection rate is everything I care right now.
I've seen many papers achieving these results but i couldn't find any code that I could use.
I sincerely thank for any help that you'll be able to provide.
perhaps not an answer but too long to put into comment.
you can use opencv_traincascade.exe to train a new detector that can detect a wider variety of poses. this post may be of help. http://note.sonots.com/SciSoftware/haartraining.html. i have managed to trained a detector that is sensitive within -50:+50 yaw by using feret data set. for my case, we did not want to detect purely side faces so training data is prepared accordingly. since feret already provides convenient pose variations it might be possible to train a detector somewhat close to your specification. time is not an issue if you are using lbp features, training completes in 4-5 hours at most and it goes even faster(15-30min) by setting appropriate parameters and using fewer training data(useful for ascertaining whether the detector is going to produce the output you expected).