OpenCV Neural network for images processing - c++

I new in AI world and try some practice.
It looks like I need some third-party experience.
Let's say I need to get rid of image defects (actually the task more tricky).
I hope that trained NN will be able to interpolate defect area.
For these reasons I try to create simple neural network.
It has input : grayscale image with deffect(72*54) and the same image with no defect.
Hidden layer has 2*72*54 neurons.
Main piece of code
cv::Ptr<cv::ml::ANN_MLP> ann = cv::ml::ANN_MLP::create();
int inputsCount = imageSizes.width * imageSizes.height;
std::vector<int> layerSizes = { inputsCount, inputsCount * 2, inputsCount};
cv::TermCriteria tc(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 50, 0.1);
ann->setTrainMethod(cv::ml::ANN_MLP::BACKPROP, 0.0001);
std::cout << "Result : " << ann->train(trainData, cv::ml::ROW_SAMPLE, resData) << std::endl;
ann->predict(trainData, predicted);
My training dataset looks like
Trained on 10 items dataset NN gives bad results on this(same) inputs. I tried different params
But trained on only 2 images NN gets close output (on trained data).
I suppose that it's not inappropriate approach and solution is not so easy.
Maybe someone has some advice about parameters or neural network architecture or whole approach.

It seems that the termination criteria were fine for just two samples but were not good enough when training with a larger number of samples. Do try adjusting them, and also the learning rate.
Judging by the quality of the pixels that have been restored properly, the network architecture seems to be fine for this task. Once the network works well on 10 samples, I strongly recommend adding more training samples.

The chief problem is that you have way to little data for the given network.
Your NN is fully connected. The weights for pixel 0,0 are entirely separate from those of pixel 1,0, and pixel 0,1 has again different weights. And you have a lot of weights, with so many nodes. So while you have plenty of pixels in 10 images, you have nowhere near enough pixels for all the weights.
A Convolutional Neural Network has far less weights, as many of its weights are reused. That means that in training, these weights are trained by multiple pixels from each training image.
Not that I'd expect this to work well with just 10 images. The human expectation is derived from years of human vision, literally billions of images.


the prediction figures I got during training a semantic segmentation model

I have tried to experiment with some existing semantics segmentation code against the ultrasound nerve data set.
The implementation is based on u-net architecture.
During the training process, I can capture the verification plot for each epoch. In the following figures, the left one is the raw image, the middle one is the ground truth and the right one is the predicted one (or the probability map).
As shown in the following figures, we can see that the prediction for the epoch0 is just all black, then it seems to me that it started to capture some distribution of original image, then it gets all black again.
I just want to know how to explain the training process based on these plots, why it go back to the result of the first epoch after several training epochs.
Besides, why the predicted result tends to reproduce the distribution of original image during the training process.
Are there any insight can be derived from these training observations?
I generally followed the tutorial to generate the training set. (I use the same function of create_train_data in the tutorial.
The only difference is that I add a background channel, to make the mask image with shape (1,image_row, image_col,2)
img_mask = io.imread(os.path.join(raw_data_path, image_mask_name))
img_mask = img_mask//255
img_mask_background = 1-img_mask
After loading the npy file generated from above, I normalize the raw image of training set
imgs_train = np.load(os.path.join(train_data_path,"imgs_train.npy"))
imgs_mask_train = np.load(os.path.join(train_data_path,"imgs_mask_train.npy"))
imgs_train = imgs_train.astype('float32')
mean = np.mean(imgs_train) # mean for data centering
std = np.std(imgs_train) # std for data normalization
imgs_train -= mean
imgs_train /= std
I follow this implementation to train the model. I did not change anything except this one
self.learning_rate_node = tf.train.exponential_decay(learning_rate=learning_rate,
I change it to
global_step = global_step*self.batch_size
epoch 0
epoch 4
epoch 12
epoch 16

Tensorflow for audio signal processing - detecting features intensity and delayes

For my studies I need to train a deep NN to identify certain sounds and their delays. We have 1X25K sample points (microphone output) and need quantification of events and their intensity.
In order to simplify the model to look more like the MNIST training procedure, for now we use the classification for the quantification (if there are two events each with intensity of 5 and 3, the output would be 8 and the delays vector).
we tried to throw the data [trainNum, 25000] to a 3 layered NN with 250, 100 and 50 neurons and adamoptimizer for three classes output as 100\ 010\001 [trainNum, 3] . The cost is not reducing from 400 and accuracy is 30%.
Please would appreciate any help and comments.
additional information: 2700 samples, 270 batches, 10 epochs. Used the following tutorial and changed the data from the MNIST to out sound data -
Thank you in advance
All the best,

Backpropagation 2-Dimensional Neuron Network C++

I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!

OpenCV Face Recognition strange result

I have been using OpenCV's SVM and RF for a multi-class face recognition problem with 11 classes and only 5 images per class. I used two kinds of features - initially a toy intensity image feature (just each image resized to 32x32 grayscale) and then the second feature was simply another toy feature using Tan Triggs preprocessing(link). Here is the feature code:
void Feature::makeFeature(cv::Mat &image, cv::Mat &result)
cv::resize( image, image, cv::Size(32, 32), 0, 0, cv::INTER_CUBIC );
cv::equalizeHist(image, image);
// Images must be aligned - Only pitch executed, yaw and roll assumed negligible
algmt->getAlignedImage( image, image ); // image alignment
// tan triggs
tan_triggs_preprocessing(image, result);
result = result.reshape(0, 1); // make a single row vector, needed for the training samples matrix
// if plain intensity
// image.copyTo(result);
// result.convertTo(result, CV_32F, 1.0f/255.0f);
// result = result.reshape(0, 1); // make a single row vector, needed for the training samples matrix
Where the tan_triggs_preprocessing function is the same as the Tan Triggs preprocessing function given in the link. I added one step - i normalized the result between 0 and 1.
The results on test for both were not very good, as expected, but then I made a silly mistake and discovered something strange: When I accidentally gave the training directory as input for both training and test, I get 100% results on the plain intensity feature, but the Tan Triggs feature gives the following as result:
SVM Training Complete
Total number of correct: 51 and accuracy: 92.7273
RF Training Complete
Total number of correct: 53 and accuracy: 96.3636
I do know however much you overfit the result should be perfect when the training set is input to test. Everything else is standard, both SVM and RF are standard as in the OpenCV examples. Besides I get 100% for plain intensity feature so of course I am mucking something up here when using Tan Triggs. Anyone has any idea what mistake I am making?
I have used other complex features like LTPs and LQPs without issue, but this preprocessing method is something I want to use. I use the Jain-Learned Miller congealing algorithm for alignment as I assume frontals for face recognition, no pose correction.

Hu moments and SVM does not work

I have come across one problem when trying to train data with SVM.
I get some different regions (set of connected pixels) from face images, and regions from eyes are very similar, so I want to use Hu moments for shape description and SVM for training.
But SVM does not work properly, method svm.predict evaluates afterwards everything as non-eye, moreover the same regions which were labeled and used in traning phase as eye, are evaluated as non-eye.
Feature data consists only of 7 Hu moments. I will post here some samples of source code in a moment, thanks in advance :)
Additional info:
input image:
Setting up basic svm for 1 image:
int image_regions = 10;
Mat training_mat(image_regions ,7,CV_32FC1); // 7 hu moments
Mat labels(image_regions ,1,CV_32FC1); // for labels 1 (eye) and -1 (non eye)
// computing hu moments
Moments moments2=moments(croppedImage,false);
double hu[7];
// putting them into svm traning mat
for (int k=0;k<huCounter;k++)<float>(counter,k) = hu[k]; // counter is current number of region
if (isEye(...))
//I use the following:
CvSVM svm;
CvSVMParams params;
params.svm_type = CvSVM::C_SVC;
params.kernel_type = CvSVM::LINEAR;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 1000, 1e-6);
// ... do the above mentioned phase, and then:
svm.train(training_mat, labels, Mat(), Mat(), params);
I hope the following suggestions can help you…..
The simplest task is to use a clustering algorithm and try to cluster the data into two classes. If an algorithm like ‘k-means’ can do the job why make things complex by using SVM and Neural Nets. I suggest you use this technique because your feature vector dimension is of a very small size (7 Hu Moments) as well as your number of samples.
Perform feature Normalization (specified in point 4) to make sure the values fall in a limited range.
Check out “is your data really separable?” As your data is small, take a few samples from positive images and a few samples from negative images and plot the feature vectors. If you can visually see the difference surely any learning algorithm can do the job for you. As I said earlier simple tricks can do better than complex math.
Only if you then decide to use SVM you should know the following:
• As I can see from your code you are using a Linear SVM, may be your data is non-separable by a linear kernel. Try using some polynomial kernel or other kernels. There is one option bool CvSVM::train_auto in openCV just have a look.
• Try to check whether the feature vector values you are getting are proper values or not (make sure that they are not some garbage values).
• Also you can perform feature normalization “ZERO MEAN and UNIT VARIENCE” before you use it for training.
• Most importantly increase the number of images for training, both positively and negatively labeled.
• Last but not least SVM is not magic, at the end of the day it is just drawing a line between two sets of points. So don’t expect it to classify anything you give it as input.
If nothing works “Just improve your feature extraction technique”