Tensorflow for audio signal processing - detecting features intensity and delayes - python-2.7

For my studies I need to train a deep NN to identify certain sounds and their delays. We have 1X25K sample points (microphone output) and need quantification of events and their intensity.
In order to simplify the model to look more like the MNIST training procedure, for now we use the classification for the quantification (if there are two events each with intensity of 5 and 3, the output would be 8 and the delays vector).
we tried to throw the data [trainNum, 25000] to a 3 layered NN with 250, 100 and 50 neurons and adamoptimizer for three classes output as 100\ 010\001 [trainNum, 3] . The cost is not reducing from 400 and accuracy is 30%.
Please would appreciate any help and comments.
additional information: 2700 samples, 270 batches, 10 epochs. Used the following tutorial and changed the data from the MNIST to out sound data - https://pythonprogramming.net/tensorflow-neural-network-session-machine-learning-tutorial/
Thank you in advance
All the best,
AA

Related

OpenCV Neural network for images processing

I new in AI world and try some practice.
It looks like I need some third-party experience.
Let's say I need to get rid of image defects (actually the task more tricky).
I hope that trained NN will be able to interpolate defect area.
For these reasons I try to create simple neural network.
It has input : grayscale image with deffect(72*54) and the same image with no defect.
Hidden layer has 2*72*54 neurons.
Main piece of code
cv::Ptr<cv::ml::ANN_MLP> ann = cv::ml::ANN_MLP::create();
int inputsCount = imageSizes.width * imageSizes.height;
std::vector<int> layerSizes = { inputsCount, inputsCount * 2, inputsCount};
ann->setLayerSizes(layerSizes);
ann->setActivationFunction(cv::ml::ANN_MLP::SIGMOID_SYM);
cv::TermCriteria tc(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 50, 0.1);
ann->setTermCriteria(tc);
ann->setTrainMethod(cv::ml::ANN_MLP::BACKPROP, 0.0001);
std::cout << "Result : " << ann->train(trainData, cv::ml::ROW_SAMPLE, resData) << std::endl;
ann->predict(trainData, predicted);
My training dataset looks like
Trained on 10 items dataset NN gives bad results on this(same) inputs. I tried different params
But trained on only 2 images NN gets close output (on trained data).
I suppose that it's not inappropriate approach and solution is not so easy.
Maybe someone has some advice about parameters or neural network architecture or whole approach.
It seems that the termination criteria were fine for just two samples but were not good enough when training with a larger number of samples. Do try adjusting them, and also the learning rate.
Judging by the quality of the pixels that have been restored properly, the network architecture seems to be fine for this task. Once the network works well on 10 samples, I strongly recommend adding more training samples.
The chief problem is that you have way to little data for the given network.
Your NN is fully connected. The weights for pixel 0,0 are entirely separate from those of pixel 1,0, and pixel 0,1 has again different weights. And you have a lot of weights, with so many nodes. So while you have plenty of pixels in 10 images, you have nowhere near enough pixels for all the weights.
A Convolutional Neural Network has far less weights, as many of its weights are reused. That means that in training, these weights are trained by multiple pixels from each training image.
Not that I'd expect this to work well with just 10 images. The human expectation is derived from years of human vision, literally billions of images.

Number of neurons in AlexNet

In AlexNet,the image data is 3*224*224.
The first convolutional layer filters the image with 96 kernels of size 11*11*3 with a stride of 4 piexels.
I have doubt with the first layer's output neurons count.
In my opinion,the input is 224*224*3=150528,then the output should be 55*55*96=290400
But in the paper,they described the output is 253440
How to calculate the number of this layer's neurons?
It seems like the input size is 227x227, without padding. I also think that what they mention in the paper is a mistake. Have look at this link.
http://cs231n.github.io/convolutional-networks/
It mentions:
The Krizhevsky et al. architecture that won the ImageNet challenge in 2012 accepted images of size [227x227x3]. On the first Convolutional Layer, it used neurons with receptive field size F=11, stride S=4 and no zero padding P=0. Since (227 - 11)/4 + 1 = 55, and since the Conv layer had a depth of K=96, the Conv layer output volume had size [55x55x96]. Each of the 555596 neurons in this volume was connected to a region of size [11x11x3] in the input volume. Moreover, all 96 neurons in each depth column are connected to the same [11x11x3] region of the input, but of course with different weights. As a fun aside, if you read the actual paper it claims that the input images were 224x224, which is surely incorrect because (224 - 11)/4 + 1 is quite clearly not an integer. This has confused many people in the history of ConvNets and little is known about what happened. My own best guess is that Alex used zero-padding of 3 extra pixels that he does not mention in the paper.
I also believe this is a mistake by the author, I found a proof in the courseware of stanford cs231n, in the 10th and 11th page, you can find that the output size of the first conv is 290400.

Backpropagation 2-Dimensional Neuron Network C++

I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!
http://ufldl.stanford.edu/wiki/index.php/Main_Page
http://ufldl.stanford.edu/tutorial/

Digital signal decimation using gnuradio lib

I write application where I must process digital signal - array of double. I must the signal decimate, filter etc.. I found a project gnuradio where are functions for this problem. But I can't figure how to use them correctly.
I need signal decimate (for example from 250Hz to 200Hz). The function should be similar to resample function in Matlab. I found, the classes for it are:
rational_resampler_base_fff Class source
fir_filter_fff Class source
...
Unfortunately I can't figure how to use them.
gnuradio and shared library I have installed
Thanks for any advice
EDIT to #jcoppens
Thank you very much for you help.
But I must process signal in my code. I find classes in gnuradio which can solve my problem, but I need help how set them.
Functions which I must set are:
low_pass(doub gain, doub sampling_freq, doub cutoff_freq, doub transition_width, window, beta)
where:
use "window method" to design a low-pass FIR filter
gain: overall gain of filter (typically 1.0)
sampling_freq: sampling freq (Hz)
cutoff_freq: center of transition band (Hz)
transition_width: width of transition band (Hz).
The normalized width of the transition band is what sets the number of taps required. Narrow –> more taps
window_type: What kind of window to use. Determines maximum attenuation and passband ripple.
beta: parameter for Kaiser window
I know, I must use window = KAISER and beta = 5, but for the rest I'm not sure.
The func which I use are: low_pass and pfb_arb_resampler_fff::filter
UPDATE:
I solved the resampling using libsamplerate
I need signal decimate (for example from 250Hz to 200Hz)
WARNING: I expressed the original introductory paragraph incorrectly - my apologies.
As 250 Hz is not related directly to 200 Hz, you have to do some tricks to convert 250Hz into 200Hz. Inserting 4 interpolated samples in between the 250Hz samples, lowers the frequency to 50Hz. Then you can raise the frequency to 200Hz again by decimating by a factor 4.
For this you need the "Rational Resampler", where you can define the subsample and decimate factors. Something like this:
This means you would have to do something similar if you use the library. Maybe it's even simpler to do it without the library. Interpolate linearly between the 250 Hz samples (i.e. insert 4 extra samples between each), then decimate by selecting each 4th sample.
Note: There is a Signal Processing forum on stackexchange - maybe this question might fall in that category...
More information: If you only have to resample your input data, and you do not need the actual gnuradio program, then have a look at this document:
https://ccrma.stanford.edu/~jos/resample/resample.pdf
There are several links to other documents, and a link to libresample, libresample4, and others, which may be of use to you. Another, very interesting, page is:
http://www.dspguru.com/dsp/faqs/multirate/resampling
Finally, from the same source as the pdf above, check their snd program. It may solve your problem without writing any software. It can load floating point samples, resample, and save again:
http://ccrma.stanford.edu/planetccrma/software/soundapps.html#SECTION00062100000000000000
EDIT: And yet another solution - maybe the simplest of all: Use Matlab (or the free Octave version):
pkg load signal
t = linspace(0, 10*pi, 50); % Generate a timeline - 5 cycles
s = sin(t); % and the sines -> 250 Hz
tr = resample(s, 5, 4); % Convert to 200 Hz
plot(t, s, 'r') % Plot 250 Hz in red
hold on
plot(t, tr(1:50)) % and resampled in blue
Will give you:

How to find/detect optimal parameters of a Grid Search in Libsvm+Weka?

I'm trying to use SVM with Weka framework. So i'm using Libsvm. I'm new to SVM and reading the guide on the site of Libsvm I read that is possible to discover optimal parameters for SVM (cost and gamma) using GridSearch. So i choose Grid Search on Weka and I obtained a bad classification results (TN rate around 1%). So how do I have to interpret these results? If using optimal parameter I got bad results is there no chance for me to get better classification?What I mean is: Grid Search give me the Best results that i can obtain using SVM?
My dataset is formed by 1124 instances (89% negative class, 11% positive class) and there are 31 attributes (2 of them are nominal others are numeric). I'm using a cross validation (10-fold) on the whole dataset to test the model.
I tried to use GridSearch (I normalized each attribute values between 0 and 1, no features selection but I change class value from 0 and 1 to 1 and -1 accroding to SVM theory but T don't know if it useful) with these parameters: cost from 1 to 18 with 1.0 step and gamma from -5 to 10 with 1.0 step. Results are sensitivity 93,6% and specificity 64.8% but these takes around 1 hour to complete computation!!
I'd like to get better results compared with decision tree. Using Features Selection (Info Gain ranking) + SMOTE oversampling + Cost Sensitive Learning I obtained sensitivity 91% and specificity 80%. Is there a way to tune SVM without trying every possible range of values for cost and gamma?