Face recognition using neural networks - c++

I am doing a project on face recognition, for that I have already used different methods like eigenface, fisherface, LBP histograms and surf. But these methods are not giving me an accurate result. Surf gives good matches for exact same images, but I need to match one image with it's own different poses(wearing glasses,side pose,if somebody is covering his face) etc. LBP compares histogram of images, i.e., only color informations. So when there is high variation on lighting condition it is not showing good results. So I heard about neural networks, but I don't know much about that. Is it possible to train the system very accurately by using neural networks. If possible how can we do that?

According to this OpenCV page, there does seem to be some support for machine learning. That being said, the support does seem to be a bit limited.
What you could do, would be to:
User OpenCV to extract the face of the person.
Change the image to grey scale.
Try to manipulate so that the face is always the same size.
All the above should be doable with OpenCV itself (could be wrong, haven't messed with OpenCV in a while) so that should save you some time.
Next, you take the image, as a bitmap maybe, and feed the bitmap as a vector to the neural network. Alternatively, as #MatthiasB recommended, you could feed the features instead of individual pixels. This would simplify the data being passed, thus making the network easier to train.
As for training, you manipulate these images as above, and then feed them to the network. If a person uses glasses occasionally, you could have cases of the same person with and without glasses, etc.

Related

How to find logos on a website screenshot

I'm looking for a way to check if a given logo appears on a screenshot of a webpage. So basically, I need to be able to find a small predefined image on a larger image that may or may not contain the smaller image. A match could be of a different scale, somewhat different colors. I need to judge occurrence similarity as well. Need some pointers for what to look at, I've never worked with computer vision before.
Simplest yet not simple way to do it is a normal CNN trained on augmented dataset of the logos.
Trying to keep the answer short, Just make a cnn in tensorflow and train your model on tons images of logos with labels on each training image, It's a simple task and a not-very-crafty CNN must be able to get your work done.
CNN- Convolutional Neural Network
Reference : https://etasr.com/index.php/ETASR/article/view/3919

Extracting the descriptors of a bunch of sample images and training an SVM in OpenCV

I have 4 types of symbols of musical notes of the same color: Whole note, half note, Crotchet and quaver. I need to classify an image and tell if it has one of this symbols (just one for now) and which one. for example, if i have an image with just the musical staff (but nothing else in it) it should tell me that the image is empty, but if i have an image with a Half note symbol in it, it should tell me something like "it is a half note".
Suppose i have 20 sample images for each possible symbol and 20 with the base case (nothing in it), i want to train a SVM to classify any input image. I've read about how i could do it, but i still have certain doubts. i think the process is something like this (and please correct me if i'm wrong):
extract the descriptors of all the sample images.
put those descriptors inside different Mat Objects (one for each symbol).
feed those Mats to the SVM to train it.
Use the SVM to classify the images.
i have specific doubts about what i think is the process:
is what i described the correct process for what i need to do?
should i pre-process the sample images (say extract the background and apply canny edges) before i feed them to the descriptor extractor? o can i leave them as they are?
i have read about three methods of extracting the descriptors: HOG, BOW (Bag of Words) and SIFT. i think they all do what i need but i don't know which one to use. i see that HOG is mostly (if not all times) for face and pedestrians detection and i don't know if it could be used for my case. Any advice of which one should use?
how many sample images should i have for every case?
i dont need specific details of the implementation, but i do need answers to these questions, thank you in advance
I'm not an expert of SIFT and BOW but I know something about HOG and SVM.
1 Is what i described the correct process for what i need to do?
If you are using OpenCV and HOG no that is not correct. Have a look to the sample code for HOG in OpenCV samples and you will find that, once extracted, the descriptors directly feed the SVM without filling a MAT element.
2 should i pre-process the sample images (say extract the background and apply canny edges) before i feed them to the descriptor extractor? o can i leave them as they are?
This is not mandatory. Preprocessing has been proved to be very useful but for your simple case you wont need it. On the other hand, if your wall presents draws, stickers or something that can confuse the detector then yes. It can be a good solution to decrease the number of false positives.
3 i have read about three methods of extracting the descriptors: HOG, BOW (Bag of Words) and SIFT. i think they all do what i need but i don't know which one to use. i see that HOG is mostly (if not all times) for face and pedestrians detection and i don't know if it could be used for my case. Any advice of which one should use?
I have direct knowledge only of HOG. You can easily implement your own detector with HOG without any problem, I'm currently using it for traffic signs. Pay attention to the detection window that you want to use. You can leave all the other parameters as they are, it will work for simple cases.
4 how many sample images should i have for every case?
Once again it depends on the situation. I would say that 200 images (try also with less) for class will do the trick but you can always increase the number by applying some transformation on the positives. Try to flip, saturate or blur the images.
Some more considerations. I think that you can work with grey scale images due to the fact that color is not important to distinguish the notes (all the same color right?). If you have problem with false positives you can try to use the HSV color space to filter out patches that you will then use to detect the notes (it really works well with red!!). The easiest way to train your SVM is using a linear kernel and then train a model for each class.

Neural network topology for object recognition on aerial photos (computer vision)

My objective is to recognize the footprints of buildings on aerial photos. Having heard about recent progress in machine vision (ImageNet Large Scale Visual Recognition Challenges) I though I could (at least) try to use neural networks for this task.
Can anybody give me the idea what should be the topology of such a network? I guess it should have as many outputs as inputs (which means all the pixels in picture) since I want to recognize the outlines of buildings with their (at least approximate) placement on the picture.
I guess the input pictures should be of standard size, with each pixel normalized to grey scale or YUV color space (1 value per color) and maybe normalized resolution (each pixel should represent fixed size in reality). I am not sure if the picture could be preprocessed in any other way before inputting into net, maybe by extracting the edges first?
The tricky part is how the outputs should be represented and how to train the net. Using just e.g. output=0 for the pixel within building footprint and 1 for the pixel outside of it, might not be the best idea. Maybe I should teach the network to recognize edges of the building instead so the pixels which represent building edges should have 1's and 0's for the rest of pixels?
Can anybody throw in some suggestions about network topology/inputs/outputs formats?
Or maybe this task is hopelessly difficult and I have 0 chances to solve it?
I think we need a better definition of "buildings". If you want to do building "detection", that is detect the presence of a building of any shape/size, this is difficult for a cascade classifier. You can try the following, though:
Partition a set of known images to fixed-size blocks.
Label each block as "building", "not building", or
"boundary(includes portions
of both)"
Extract basic features like intensity histograms, edges,
hough lines, HOG, etc.
Train SVM classifiers based on these features (you can try others, too, but I recommend SVM by experience).
Now you can partition your images again and use the trained classifier to get the results. The results will have to be combined to identify buildings.
This will still need some testing to get the parameters(size of histograms, parameters of SVM classifier etc.) right.
I have used this approach to detect "food" regions on images. The accuracy was below 70%, but my guess is that it will be better for buildings.

Is Eigenface best for moving object face recognition

I am doing a project on face recognition from CCTV cameras, I want to recognize each individual faces. I think eigenface method is best for face recognition. But when we use eigenface method for moving object face recognition, is there any problem? Can we recognize individuals perfectly? Since it is not still image, I am really confused to select a method.
Please help me to know whether this method is ok, otherwise suggest a better alternative.
Short answer: Typically those computer vision techniques used in image analysis can be used in video analysis, too. Videos just give you more information (esp. the temporal information.) For example, you could do face recognition using multiple frames, and between each frame you do object tracking. Associating multiple frames typically give you higher accuracy.
IMO, the most difficult problems are: you're more likely to face viewing angle, calibration problems, and lighting condition problems, in which you will need accurate face detection technique, or more training data in order to recognize faces under viewing angles and lighting conditions. Eigen face based approach relies on an accurate position of faces, eyes, and so on. Otherwise, you are likely to mix different features in the same vector. But again, this problem also exists in face recognition under still image.
To sum up, video content only gives you more information. If you don't really want to associated frames and consider temporal information, video is just a collection of still images :)

Assessing the quality of an image with respect to compression?

I have images that I am using for a computer vision task. The task is sensitive to image quality. I'd like to remove all images that are below a certain threshold, but I am unsure if there is any method/heuristic to automatically detect images that are heavily compressed via JPEG. Anyone have an idea?
Image Quality Assessment is a rapidly developing research field. As you don't mention being able to access the original (uncompressed) images, you are interested in no reference image quality assessment. This is actually a pretty hard problem, but here are some points to get you started:
Since you mention JPEG, there are two major degradation features that manifest themselves in JPEG-compressed images: blocking and blurring
No-reference image quality assessment metrics typically look for those two features
Blocking is fairly easy to pick up, as it appears only on macroblock boundaries. Macroblocks are a fixed size -- 8x8 or 16x16 depending on what the image was encoded with
Blurring is a bit more difficult. It occurs because higher frequencies in the image have been attenuated (removed). You can break up the image into blocks, DCT (Discrete Cosine Transform) each block and look at the high-frequency components of the DCT result. If the high-frequency components are lacking for a majority of blocks, then you are probably looking at a blurry image
Another approach to blur detection is to measure the average width of edges of the image. Perform Sobel edge detection on the image and then measure the distance between local minima/maxima on each side of the edge. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous approach. "No Reference Block Based Blur Detection" by Debing is a more recent paper
Regardless of what metric you use, think about how you will deal with false positives/negatives. As opposed to simple thresholding, I'd use the metric result to sort the images and then snip the end of the list that looks like it contains only blurry images.
Your task will be a lot simpler if your image set contains fairly similar content (e.g. faces only). This is because the image quality assessment metrics
can often be influenced by image content, unfortunately.
Google Scholar is truly your friend here. I wish I could give you a concrete solution, but I don't have one yet -- if I did, I'd be a very successful Masters student.
UPDATE:
Just thought of another idea: for each image, re-compress the image with JPEG and examine the change in file size before and after re-compression. If the file size after re-compression is significantly smaller than before, then it's likely the image is not heavily compressed, because it had some significant detail that was removed by re-compression. Otherwise (very little difference or file size after re-compression is greater) it is likely that the image was heavily compressed.
The use of the quality setting during re-compression will allow you to determine what exactly heavily compressed means.
If you're on Linux, this shouldn't be too hard to implement using bash and imageMagick's convert utility.
You can try other variations of this approach:
Instead of JPEG compression, try another form of degradation, such as Gaussian blurring
Instead of merely comparing file-sizes, try a full reference metric such as SSIM -- there's an OpenCV implementation freely available. Other implementations (e.g. Matlab, C#) also exist, so look around.
Let me know how you go.
I had many photos shot to an ancient book (so similar layout, two pages per image), but some were much blurred, to the point that the text could not be read. I searched for a ready-made batch script to find the most blurred one, but I didn't find any useful, so I used another part of script got on the net (based on ImageMagick, but no longer working; I couldn't retrieve the author for the credits!), useful to assessing the blur level of a single image, tweaked it, and automatised it over a whole folder. I uploaded here:
https://gist.github.com/888239
hoping it will be useful for someone else. It works on a Linux system, and uses ImageMagick (and some usually command line installed tools, as gawk, sort, grep, etc.).
One simple heuristic could be to look at width * height * color depth < sigma * file size. You would have to determine a good value for sigma, of course. sigma would be dependent on the expected entropy of the images you are looking at.