OpenCV DNN face detection in UWP/C++: bad results - c++

I'm using OpenCV and Cafe to perform face detection on some images I receive from a stream. First, I tried with python:
prototxt_file = 'deploy.prototxt'
weights_file = 'res10_300x300_ssd_iter_140000.caffemodel'
dnn = cv2.dnn.readNetFromCaffe(prototxt_file, weights_file)
for image in images:
blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 1.0, (300, 300),
(104.0, 177.0, 123.0))
dnn.setInput(blob)
detections = dnn.forward()
for i in range(0, detections.shape[2]):
confidence = detections[0, 0, i, 2]
box = detections[0, 0, i, 3:7]
if confidence > 0.5:
//Do something
This works quite well. Now, I want to do the same within a C++ Windows UWP App, so I compiled OpenCV from source for UWP (tried with versions 3.4.1 and 4.3.0). After going through this example I tried the following:
std::string caffeConfigFilePath = "deploy.prototxt";
std::string caffeWeightFilePath = "res10_300x300_ssd_iter_140000.caffemodel";
net = cv::dnn::readNetFromCaffe(caffeConfigFilePath, caffeWeightFilePath);
for (image in images)
{
cv::Mat imageResized, imageBlob;
std::vector<cv::Mat> outs;
cv::resize(image, imageResized, cv::Size(300, 300));
cv::dnn::blobFromImage(imageResized, imageBlob, 1, cv::Size(300, 300),
(104.0, 177.0, 123.0));
net.setInput(imageBlob, "data");
net.forward(outs, "detection_out");
CV_Assert(outs.size() > 0);
for (size_t k = 0; k < outs.size(); k++)
{
float* data = (float*)outs[k].data;
for (size_t i = 0; i < outs[k].total(); i += 7)
{
float confidence = data[i + 2];
if (confidence > 0.5)
{
//Do something
}
}
}
This gives me very bad results. I get a lot of detections with a confidence of 1.0, covering the entire image. The face itself, however, is not detected. So I thought I might be reading the output wrong. I also tried the code posted with this question, but the results are the same. I checked everything I could think of (input images in right format, model correctly loaded, etc.) but could not identify the error.
Since the DNN module is usually not included in an OpenCV UWP build (I had to comment some lines in the CMake.txt, but then it compiled without errors), can it be that using it is just not possible from a UWP app? What else could be the reason the code is working in python, but an almost identical code is not working in C++?

Related

Why Opencv DNN based (caffe) face detector failed to find faces?

By using OpenCV version 4.2.0 in c++ (VS 2019) I created project which performs face detection on the given image. I used Opencv's DNN face detector which uses res10_300x300_ssd_iter_140000_fp16.caffemodel model to detect faces. Below is the code of that function:
//variables which are used in function
const double inScaleFactor = 1.0;
const cv::Scalar meanVal = cv::Scalar(104.0, 177.0, 123.0);
const size_t inWidth = 300;
const size_t inHeight = 300;
std::vector<FaceDetectionResult> namespace_name::FaceDetection::detectFaceByOpenCVDNN(std::string filename, FaceDetectionModel model)
{
Net net;
cv::Mat frame = cv::imread(filename);
cv::Mat inputBlob;
std::vector<FaceDetectionResult> vec;
if (frame.empty())
throw std::exception("provided image file is not found or unable to open.");
int frameHeight = frame.rows;
int frameWidth = frame.cols;
if (model == FaceDetectionModel::CAFFE)
{
net = cv::dnn::readNetFromCaffe(caffeConfigFile, caffeWeightFile);
inputBlob = cv::dnn::blobFromImage(frame, inScaleFactor, cv::Size(inWidth, inHeight), meanVal, false, false);
}
else
{
net = cv::dnn::readNetFromTensorflow(tensorflowWeightFile, tensorflowConfigFile);
inputBlob = cv::dnn::blobFromImage(frame, inScaleFactor, cv::Size(inWidth, inHeight), meanVal, true, false);
}
net.setInput(inputBlob, "data");
cv::Mat detection = net.forward("detection_out");
cv::Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());
for (int i = 0; i < detectionMat.rows; i++)
{
if (detectionMat.at<float>(i, 2) >= 0.5)
{
FaceDetectionResult res;
res.faceDetected = true;
res.confidence = detectionMat.at<float>(i, 2);
res.x1 = static_cast<int>(detectionMat.at<float>(i, 3) * frameWidth);
res.y1 = static_cast<int>(detectionMat.at<float>(i, 4) * frameHeight);
res.x2 = static_cast<int>(detectionMat.at<float>(i, 5) * frameWidth);
res.y2 = static_cast<int>(detectionMat.at<float>(i, 6) * frameHeight);
vec.push_back(res);
}
#ifdef aDEBUG
else
{
cout << detectionMat.at<float>(i, 2) << endl;
}
#endif
}
return vec;
}
In the above code, after face detection I assign confidence and co-ordinates of face detected in custom class FaceDetectionResult, which a simple class having bool and int,float members as required.
Function detect faces in the given image, but while playing with this I am doing comparison with dlib's HOG+SVM face detector, So first I am doing face detection by dlib and then same image path is passed to this function.
I found some images where dlib can easily find faces in the image but opencv didn't find a single face, for example look at below image:
As you can see HOG+SVM detected 46 faces in approx 3 sec., If I pass this same image to above function then opencv did not detect a single face in it. Why? Do I need any enhancements in above code? I am not saying that function does not detect faces for any image, it does, but for some images (like above) it could not not.
For ref:
I used https://pastebin.com/9rt9reNY this python program to detect faces using dlib.
After a deep search, unfortunately I couldn't find a good explanation to this problem. The reason why I tried to crop image is that I assumed there can be a maximum detected face number limit. It is also not about occlusion.
I tried some image examples which includes more than 20(appx.) faces and the results were the same but when I cropped those images(decrease the number of faces), program was able to find the faces.This is also not about the resolution(sizes) of the image because the images I tried had different sizes.
I also changed and tried the all parameters(iteration number, confidentThreshold etc.) but the result still wasn't the desired one.
My assumption but not the answer:
The program doesn't let to find the faces if image includes more than a maximum number(approximately 20)
As a solution for this question, we can divide the source image into 2 parts and find the rectangles for each one then can be pasted to source image.
Note: After digging deeply on the internet, I couldnt find a topic related to this problem. I am also curious about the main reason causes this issue so any help will be appreciated. This post only includes my experiences and assumptions.
change this line :
inputBlob = cv::dnn::blobFromImage(frame, inScaleFactor, cv::Size(inWidth, inHeight), meanVal, false, false);
by this line :
frameHeightinputBlob = cv::dnn::blobFromImage(frame, inScaleFactor, cv::Size(inWidth, inHeight), meanVal, false, false);

Difference between two photos in tollerance variable

I have two photos:
and
I am getting differences between these photos. But these differences include changes of light, shaking of the camera, etc. I want to see only the man in the difference photo. I wrote a threshold value and I succeeded in it. But this threshold does not correct other photos. I can't show wrong examples because of my reputation in stackoverflow. You can run my code on other photos and you can see the disorders. My code is given below. How else can I do this threshold?
#include <Windows.h>
#include <opencv\highgui.h>
#include <iostream>
#include <opencv2\opencv.hpp>
using namespace cv;
using namespace std;
int main() {
Mat siyah;
Mat resim = imread("C:/Users/toshiba/Desktop/z.jpg", CV_LOAD_IMAGE_GRAYSCALE);
Mat resim2 = imread("C:/Users/toshiba/Desktop/t.jpg", CV_LOAD_IMAGE_GRAYSCALE);
if (resim.empty() || resim2.empty())
{
cout << "Dosya Açılamadı " << "\n";
return 0;
}
for (int i = 0; i < resim.rows; i++)
{
for (int j = 0; j <resim.cols; j++)
{
if (resim.data[resim.channels()*(resim.cols*(i)+
(j))] - resim2.data[resim2.channels()*(resim2.cols*(i)+
(j))]>30) {
resim.data[resim.channels()*(resim.cols*(i)+
(j))] = 255;
}
else
resim.data[resim.channels()*(resim.cols*(i)+
(j))] = 0;
//inRange(resim, 150, 255, siyah);
}
}
//inRange(resim, 150, 255, siyah);
namedWindow("Resim", CV_WINDOW_NORMAL);
imshow("Resim", resim);
waitKey();
system("PAUSE");
waitKey();
return 0;
}
If your background is always the same and the pictures which include an object occur rarely enough then you can update your reference image very often such that the changes in lighting from your reference image to the image to be analyzed are always small. You could then measure/compute a threshold that most of the time will work when computing the image difference. I am not sure why your camera is moving - is it not fixed ?
I made the following code with Otsu thresholding and GrabCut algorithm. It doesn't use any pre-set threshold values, but I am still not sure how well it will perform for other images (maybe if you provide several more pictures to test with). The code is in Python, but it mostly consists of calling OpenCV functions and filling matrices, so should be easy to convert to C++ or whatever. The result for your image:
Using just Otsu alone on the difference gives the following mask:
The legs are fine but the rest is messed up. But there seem to be no false positives, so I took the mask as definite foreground, everything else as probable background and fed it to GrabCut.
import cv2
import numpy as np
#read the empty background image and the image with the guy in it,
#convert them into float32, so we don't get integers overflow
img_empty = cv2.imread("000_empty.png", 0).astype(np.float32)
img_guy = cv2.imread("001_guy.jpg", 0).astype(np.float32)
#absolute difference -> back to uint8 for thresholding etc.
diff = np.abs(img_empty - img_guy).astype(np.uint8)
#otsu thresholding
ret2, th = cv2.threshold(diff, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
#read our image again for GrabCut
img = cv2.imread("001_guy.jpg")
#fill GrabCut mask
mask = np.zeros(th.shape, np.uint8)
mask[th == 255] = cv2.GC_FGD #the value is GC_FGD (foreground) when our thresholded value is 255
mask[th == 0] = cv2.GC_PR_BGD #GC_PR_BGD (probable background) otherwise
#some internal stuff for GrabCut...
bgdModel = np.zeros((1,65),np.float64)
fgdModel = np.zeros((1,65),np.float64)
#run GrabCut
cv2.grabCut(img, mask, (0, 0, 1365, 767), bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_MASK)
#convert the `mask` we got from GrabCut into a binary mask,
#then apply it to the original image
mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')
img = img*mask2[:,:,np.newaxis]
#save the results
cv2.imwrite("003_grabcut.jpg", img)

opencv neural network, incorrect predict

I'm trying to create a neural network in C++ with OpenCV. The aim is recognition of road signs. I have created the network in this way, but it predicts badly, because it returns strange results:
Sample images from the training selection look like this:
Can someone help?
trainNN() {
char* templates_directory[] = {
"speed50ver1\\",
"speed60ver1\\",
"speed70ver1\\",
"speed80ver1\\"
};
int const numFilesChars[]={ 213, 100, 385, 163};
char const strCharacters[] = { '5', '6', '7', '8' };
Mat trainingData;
Mat trainingLabels(0, 0, CV_32S);
int const numCharacters = 4;
// load images from directory
for (int i = 0; i != numCharacters; ++i) {
int numFiles = numFilesChars[i];
DIR *dir;
struct dirent *ent;
char* s1 = templates_directory[i];
if ((dir = opendir (s1)) != NULL) {
Size size(80, 80);
while ((ent = readdir (dir)) != NULL) {
string s = s1;
s.append(ent->d_name);
if(s.substr(s.find_last_of(".") + 1) == "jpg") {
Mat img = imread(s,0);
Mat img_mat;
resize(img, img_mat, size);
Mat new_img = img_mat.reshape(1, 1);
trainingData.push_back(new_img);
trainingLabels.push_back(i);
}
}
int b = 0;
closedir (dir);
} else {
/* could not open directory */
perror ("");
}
}
trainingData.convertTo(trainingData, CV_32FC1);
Mat trainClasses(trainingData.rows, numCharacters, CV_32FC1);
for( int i = 0; i != trainClasses.rows; ++i){
int const labels = *trainingLabels.ptr<int>(i);
auto train_ptr = trainClasses.ptr<float>(i);
for(int k = 0; k != trainClasses.cols; ++k){
*train_ptr = k != labels ? 0 : 1;
++train_ptr;
}
}
int layers_d[] = { trainingData.cols, 10, numCharacters};
Mat layers(1, 3, CV_32SC1, layers_d);
ann.create(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
CvANN_MLP_TrainParams params = CvANN_MLP_TrainParams(
// terminate the training after either 1000
// iterations or a very small change in the
// network wieghts below the specified value
cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
// use backpropogation for training
CvANN_MLP_TrainParams::BACKPROP,
// co-efficents for backpropogation training
// (refer to manual)
0.1,
0.1);
int iterations = ann.train(trainingData, trainClasses, cv::Mat(), cv::Mat(), params);
CvFileStorage* storage = cvOpenFileStorage( "neural_network_2.xml", 0, CV_STORAGE_WRITE );
ann.write(storage,"digit_recognition");
cvReleaseFileStorage(&storage);
}
void analysis(char* file, bool a) {
//trainNN(a);
read_nn();
// load image
Mat img = imread(file, 0);
Size my_size(80,80);
resize(img, img, my_size);
Mat r_img = img.reshape(1,1);
r_img.convertTo(r_img, CV_32FC1);
Mat classOut(1,4,CV_32FC1);
ann.predict(r_img, classOut);
double min1, max1;
cv::Point min_loc, max_loc;
minMaxLoc(classOut, &min1, &max1, &min_loc, &max_loc);
int x = max_loc.x;
//create windows
namedWindow("Original Image", CV_WINDOW_AUTOSIZE);
imshow("Original Image", img);
waitKey(0); //wait for key press
img.release();
rr.release();
destroyAllWindows(); //destroy all open windows
}
strange results: for this input answer is 3 (because i have only 4 classes - speed limit 50, 60, 70, 80). It's correct for speed limit 80 sign.
But for the rest inputs results are incorrect. They are same for signs 50, 60, 70. max1 = min1 = 1.02631...(as on the first picture) It's strange.
I have adapted your code to train a classifier on 4 hand positions (since that's the image data I have). I kept your logic as similar as possible, only changing what was absolutely necessary to make it run on my Windows machine on my images. Long story short, there is nothing fundamentally wrong with your code - I don't see the failure mode you described.
One thing you left out was the code for read_nn(). I assume that just does something like the following:
ann.load("neural_network_2.xml");
Anyway, my suspicion is that either your neural network is not converging at all or it's badly overfitting. Perhaps there's not enough variation in the training data. Are you running analysis() on separate test data that the ANN wasn't trained on? If so, is the ANN able to predict training data properly at least?
EDIT: OK, I just downloaded your image data and tried it out and saw the same behavior. After some analysis, it looks like your ANN is not converging. The training operation exits after only about 250 iterations, even if you specify only CV_TERMCRIT_ITER for the cvTermCriteria. After increasing your hidden layer size from 10 to 20, I saw a marked improvement, with successful classification on the training data for 212, 72, 94, and 143 of the images respectively to the classes (50, 60, 70, and 80). That's not very good, but it demonstrates that you're on the right track.
Basically, the network architecture is not expressive enough to adequately model the problem you're trying to solve, so the network weights never converge and it abandons the backprop early. For one class, you may see some success, but I believe that's largely a function of the lack of shuffling of training data. If it stops after having just trained on a couple hundred very similar images, it may be able to manage to classify those correctly.
In short, I would recommend doing the following:
Build a way to test the results - e.g.: create a function to run prediction on all training data, and ideally set aside some images as a validation set in order to also confirm that the model is not overfitting the training data.
Shuffle the training data prior to training. Otherwise, backprop will not converge as easily.
Experiment with different architectures such as more than one hidden layer with varying sizes.
Really, this is a problem that would benefit dramatically from using a Convolutional Neural Net, but OpenCV's machine learning facilities are pretty limited. Ultimately, if you're serious about creating ANNs, you might want to investigate some more robust tools. I personally use Tensorflow, but I've heard good things about Theano as well.
I've only implemented NN with OpenCV for boolean classification, but I think that for a task where you need to classify more than two distinct classes this might also apply:
"If you are using the default cvANN_MLP::SIGMOID_SYM activation function then the output should be in the range [-1,1], instead of [0,1], for optimal results."
So, where you do:
*train_ptr = k != labels ? 0 : 1;
You might want to try:
*train_ptr = k != labels ? -1 : 1;
Disregard if I'm way off track here.

OpenCV 2.4.11 SVM Prediction from Test Images

I am currently working on training and testing greyscale images. So far I've trained the images using the svm.train() method.
However, I fail at testing the images. So far My Code for Testing the Images:
for (int i = 0; i < test_files.size(); i++){
temporary_image = imread(test_files[i], 0);
Mat image1d(1, temporary_image.cols, CV_32FC1);
//Mat row_image = temporary_distance.reshape(1, 1);
float result = svm.predict(image1d);
printf("\n%d\n", result);
}
Could you please tell me, how can I fix the problem?
svm.predict(image1d) --> This gives the error.
Whether I make it float result = svm.predict(image1d) or svm.predict(image1d) simply same problem occurs.
Before Asking this question I read
Error on SVM using images
using OpenCV and SVM with images

Weird behaviour of cv::circle

I'm trying to draw a couple of circles with OpenCV 3 onto an image which is obtained from the Kinect v2 sensor.
There seems to be a strange bug with cv::circle or I don't understand how the function works. Let's look at some code:
if(kinectDataManager.IsColorStreamEnabled())
{
cv::Mat colorFrameMat = kinectDataManager.GetFrame().GetColorFrame()
cv::imshow("Color", colorFrameMat)
}
This code works perfectly fine, and using the ImageWatch Visual Studio Plugin for inspecting OpenCV images, I can see that the colorFrameMat matrix is not corrupted.
Let's look at some more code:
if(kinectDataManager.IsColorStreamEnabled())
{
cv::Mat colorFrameMat = kinectDataManager.GetFrame().GetColorFrame()
int radius = 2;
int y = 1068;
for (int x = 0; x < 1920; ++x)
{
cv::circle(colorFrameMat, cv::Point(x,y), radius, cv::Scalar(255, 0, 0), -1, CV_AA);
}
cv::imshow("Color", colorFrameMat)
}
After the loop execution has finished, the ImageWatch plugin reveals that the last rows of the image are missing. Strangely, the program still executes. However, for different values of y, the program crashes due to access violations, e.g. for y = 1067, the program crashes for x = 1917. For y = 1069, it crashes at x = 988.
Does anyone have an idea what the issue might be?
EDIT:
The ImageWatch plugin of course reveals that the last rows are missing, as circles are drawn at these positions from left to right, sorry for the mistake!!
EDIT2:
After storing one frame and reading it in, the cv::circle method with the identical code works fine:
cv::Mat test = cv::imread("test.jpg", CV_LOAD_IMAGE_COLOR);
cv::namedWindow("test", CV_WINDOW_NORMAL);
int radius = 10;
int y = 1067;
for (int x = 0; x < 1920; ++x)
{
cv::circle(test, cv::Point(x, y), radius, cv::Scalar(0, 0, 255, 255), -1, CV_AA);
}
cv::imshow("test", test);
cv::waitKey(0);
The Kinect SDK provides only functionality to read a 4-channel image (i.e. RGBA), however, the cv::circle functions seems to crash in a strange way for these kind of images. By dropping the alpha channel with a call to cvtImage, I could resolve the issue.
Since you say the image looks fine before the cv::circle call it is possible that GetColorFrame() returns data that changes while the loop is running.
Try:
a GetColorFrame().clone() to see if this fixes the issue, or
change the way GetColorFrame works.