Dlib frontal face detection for small faces - computer-vision

I am using Dlib's frontal face detector to detect faces in an images; however, it cannot detect faces smaller than 80 by 80 pixels.
Dlib's example in face_detection_ex.cpp upsamples the input image using pyramid_up() to increase the face sizes. However, it makes the algorithm much slower because it will have to search in a larger image.
I wonder if anyone knows a solution for this problem.

Dlib's face detector is trained to process 80x80 faces. If you want to detect smaller faces, you have two ways:
increase resolution to make faces bigger. you can use pyramid_up or any other way lice cv::resize. And you can increase resultion not 2x, but may be 1.5x will be enough - it's on you
train new face detector that will work on small faces - dlib has samples for training process
And the next your question is performance of face detector. Yes, it depends on resolution and if you want to detect 20x20 faces on 13 MP image - it will be slow. To make it work fast you have this options:
reduce amount of pixels that should be processed by detector - use the right scale and region of interest
use greyscale images
reduce the amount of scale changes at scanning process
use the recommendations from FAQ . I can only add that MinGW/GCC code works about 20% faster than MSVC and Android/ARM code is not using SIMD instructions
for videos: apply motion detection and detect only changed areas (crop them manually and detect in cropped area) and also run frames in separate threads to consume all CPU cores

Related

Get pixels coordinates above a threshold in opencv c++

I have a gray image (648*488 pixels) and I want to get the coordinates of the pixels above a threshold, I need to be really fast doing this so I want to know if there is a function in opencv to do this.
You could start with the OpenCV forEach method in the Mat class.
It uses one of several parallel frameworks for multithreading.
If it has to be even faster, access the memory directly and compute the threshold for several pixels with SIMD instructions at the same time.
You can also use GPUs (Cuda / OpenCL).

How to segment anomalies on the glossy surface with opencv (c++)

I have an image of a glossy surface:
My goal is to detect anomalies on that image. The same image with marked anomalies I show you here:
As we can see from images above, anomalies have bad contrast (or at least not the best), and they are also changing from image to image by their shape, contrast, orientation...
I was trying to increase anomalies contrast by using tophat filtering. The result is here:
Now I anomalies are much more visible on the image, I want to segment them out from the image. The aim is to binarize image and use connectedComponents function to calculate areas, dimensions, positions of anomalies...
What kind of segmentation do you suggest? What would be the best way to binarize image? Should I even use tophat filtering to increase anomalies contrast or should I try to segment anomalies directly from first image?
You can:
try several top hat filters of different sizes and parameters, to see which one highlights the anomalies best while suppressing the finger prints.
or
go directly to thresholding and adjust parameters there to make sure that none of the anomalies are lost in the process. Then use the features of connected components to extract the actual anomalies.
An increasingly popular approach is to train a deep neural network using lots of images of anomalies and then using the network to identify them.

Why is there a difference between OpenCV's scale change implementation of detectMultiScale between the cascade classifier and HOGDescriptor?

I know the gist of how detectMultiScale in OpenCV works i.e. you have an image and a detection window; the image is scanned by a detection window and particular feature calculations are done on the pixels in the window at that particular instance to determine if a detection occurred or not.
However, from OpenCV's documentation it would seem that the manner in which the scaling (to detect objects of different sizes) takes place, differs whether or not you are using a
cascade classifier; code can be found here
or if you are using the HOGDescriptor; code can be found here
The documentation of OpenCV states that the cascade classfier detectMultiScale uses a scaleFactor to REDUCE THE IMAGE SIZE in which the detection takes place until it is smaller than the detection window, while the HOGDetector detectMultiScale has a scale factor (scale0) which INCREASES THE DETECTION WINDOW until it is the size of the image in which detections are checked.
Why is there a difference between the two? Is one implementation better than the other?
Currently I have trained both a cascade classifier with HOG features and a SVM and HOG features (HOGDescriptor) in OpenCV 2.4.8.
Thank you in advance

Neural network topology for object recognition on aerial photos (computer vision)

My objective is to recognize the footprints of buildings on aerial photos. Having heard about recent progress in machine vision (ImageNet Large Scale Visual Recognition Challenges) I though I could (at least) try to use neural networks for this task.
Can anybody give me the idea what should be the topology of such a network? I guess it should have as many outputs as inputs (which means all the pixels in picture) since I want to recognize the outlines of buildings with their (at least approximate) placement on the picture.
I guess the input pictures should be of standard size, with each pixel normalized to grey scale or YUV color space (1 value per color) and maybe normalized resolution (each pixel should represent fixed size in reality). I am not sure if the picture could be preprocessed in any other way before inputting into net, maybe by extracting the edges first?
The tricky part is how the outputs should be represented and how to train the net. Using just e.g. output=0 for the pixel within building footprint and 1 for the pixel outside of it, might not be the best idea. Maybe I should teach the network to recognize edges of the building instead so the pixels which represent building edges should have 1's and 0's for the rest of pixels?
Can anybody throw in some suggestions about network topology/inputs/outputs formats?
Or maybe this task is hopelessly difficult and I have 0 chances to solve it?
I think we need a better definition of "buildings". If you want to do building "detection", that is detect the presence of a building of any shape/size, this is difficult for a cascade classifier. You can try the following, though:
Partition a set of known images to fixed-size blocks.
Label each block as "building", "not building", or
"boundary(includes portions
of both)"
Extract basic features like intensity histograms, edges,
hough lines, HOG, etc.
Train SVM classifiers based on these features (you can try others, too, but I recommend SVM by experience).
Now you can partition your images again and use the trained classifier to get the results. The results will have to be combined to identify buildings.
This will still need some testing to get the parameters(size of histograms, parameters of SVM classifier etc.) right.
I have used this approach to detect "food" regions on images. The accuracy was below 70%, but my guess is that it will be better for buildings.

Pose independent face detection

I'm working on a project where I need to detect faces in very messy videos (recorded from an egocentric point of view, so you can imagine..). Faces can have angles of yaw that variate between -90 and +90, pitch with almost the same variation (well, a bit lower due to the human body constraints..) and possibly some roll variations too.
I've spent a lot of time searching for some pose independent face detector. In my project I'm using OpenCV but OpenCV face detector is not even close to the detection rate I need. It has very good results on frontal faces but almost zero results on profile faces. Using haarcascade .xml files trained on profile images doesn't really help. Combining frontal and profile cascades yield slightly better results but still, not even close to what I need.
Training my own haarcascade will be my very last resource since the huge computational (or time) requirements.
By now, what I'm asking is any help or any advice regarding this matter.
The requirements for a face detector I could use are:
very good detection rate. I don't mind a very high false positive rate since using some temporal consistency in my video I'll probably be able to get rid of the majority of them
written in c++, or that could work in a c++ application
Real time is not an issue by now, detection rate is everything I care right now.
I've seen many papers achieving these results but i couldn't find any code that I could use.
I sincerely thank for any help that you'll be able to provide.
perhaps not an answer but too long to put into comment.
you can use opencv_traincascade.exe to train a new detector that can detect a wider variety of poses. this post may be of help. http://note.sonots.com/SciSoftware/haartraining.html. i have managed to trained a detector that is sensitive within -50:+50 yaw by using feret data set. for my case, we did not want to detect purely side faces so training data is prepared accordingly. since feret already provides convenient pose variations it might be possible to train a detector somewhat close to your specification. time is not an issue if you are using lbp features, training completes in 4-5 hours at most and it goes even faster(15-30min) by setting appropriate parameters and using fewer training data(useful for ascertaining whether the detector is going to produce the output you expected).