Can I do deep learning if the image size of the paired images is different? - computer-vision

How to write a deep learning program, 1000 pictures of 29001 * 499 learn the corresponding 1000 pictures of 29001 * 936, so that the former has more information.
I tried the srcnn algorithm, but the generated. h5 file is too large. Is there any other algorithm I can use

Related

OpenCV Neural network for images processing

I new in AI world and try some practice.
It looks like I need some third-party experience.
Let's say I need to get rid of image defects (actually the task more tricky).
I hope that trained NN will be able to interpolate defect area.
For these reasons I try to create simple neural network.
It has input : grayscale image with deffect(72*54) and the same image with no defect.
Hidden layer has 2*72*54 neurons.
Main piece of code
cv::Ptr<cv::ml::ANN_MLP> ann = cv::ml::ANN_MLP::create();
int inputsCount = imageSizes.width * imageSizes.height;
std::vector<int> layerSizes = { inputsCount, inputsCount * 2, inputsCount};
ann->setLayerSizes(layerSizes);
ann->setActivationFunction(cv::ml::ANN_MLP::SIGMOID_SYM);
cv::TermCriteria tc(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 50, 0.1);
ann->setTermCriteria(tc);
ann->setTrainMethod(cv::ml::ANN_MLP::BACKPROP, 0.0001);
std::cout << "Result : " << ann->train(trainData, cv::ml::ROW_SAMPLE, resData) << std::endl;
ann->predict(trainData, predicted);
My training dataset looks like
Trained on 10 items dataset NN gives bad results on this(same) inputs. I tried different params
But trained on only 2 images NN gets close output (on trained data).
I suppose that it's not inappropriate approach and solution is not so easy.
Maybe someone has some advice about parameters or neural network architecture or whole approach.
It seems that the termination criteria were fine for just two samples but were not good enough when training with a larger number of samples. Do try adjusting them, and also the learning rate.
Judging by the quality of the pixels that have been restored properly, the network architecture seems to be fine for this task. Once the network works well on 10 samples, I strongly recommend adding more training samples.
The chief problem is that you have way to little data for the given network.
Your NN is fully connected. The weights for pixel 0,0 are entirely separate from those of pixel 1,0, and pixel 0,1 has again different weights. And you have a lot of weights, with so many nodes. So while you have plenty of pixels in 10 images, you have nowhere near enough pixels for all the weights.
A Convolutional Neural Network has far less weights, as many of its weights are reused. That means that in training, these weights are trained by multiple pixels from each training image.
Not that I'd expect this to work well with just 10 images. The human expectation is derived from years of human vision, literally billions of images.

Tensorflow for audio signal processing - detecting features intensity and delayes

For my studies I need to train a deep NN to identify certain sounds and their delays. We have 1X25K sample points (microphone output) and need quantification of events and their intensity.
In order to simplify the model to look more like the MNIST training procedure, for now we use the classification for the quantification (if there are two events each with intensity of 5 and 3, the output would be 8 and the delays vector).
we tried to throw the data [trainNum, 25000] to a 3 layered NN with 250, 100 and 50 neurons and adamoptimizer for three classes output as 100\ 010\001 [trainNum, 3] . The cost is not reducing from 400 and accuracy is 30%.
Please would appreciate any help and comments.
additional information: 2700 samples, 270 batches, 10 epochs. Used the following tutorial and changed the data from the MNIST to out sound data - https://pythonprogramming.net/tensorflow-neural-network-session-machine-learning-tutorial/
Thank you in advance
All the best,
AA

Why two .wav that should have the same pitch don't

This is for a python computational physics class. We are given two .wav files that contain files of a harp and a piano playing the same note. We are supposed to "load the files and take the FFT of the amplitude. From the FFT determine the frequency of the fundamental for both instruments to 4 sig figs."
Here is what I have done.
import scipy.io.wavfile as sciwav
import matplotlib.pyplot as plt
#import data from .wav file. This function returns the sampling rate and the data in an array.
harp_rate,harp_data=sciwav.read('/Users/williamweiss2/Desktop/Test2/harp.wav',mmap=False)
piano_rate,piano_data=sciwav.read('/Users/williamweiss2/Desktop/Test 2/piano.wav',mmap=False)
#perform the FFT on both sets of data and graph to find the index of the first harmonic.
plt.figure(1)
p=rfft(piano_data)
h=rfft(harp_data)
plt.subplot(121)
plot(abs(p),'b')
title('Piano FFT')
xlim(0,100000)
plt.subplot(122)
plot(abs(h),'g')
title('Harp FFT')
This all works just fine. Now, to find freq. of note played this is what I was taught to do.
x value of first spike in FFT graph = Index.
deltaF = Sampling Rate / # of samples.
Index * deltaF = Freq. of note played.
I followed these steps and got two drastically different notes. Does anyone see a misstep in my process? Any ideas are appreciated even if they go over my head. I am just a junior getting a Physics degree. Thanks very much in advance.

is there any way to determine width and height of rgb values array?

I have RGB values array with raw size each time. I'm trying to determine which width/height it would be more suitable for it.
The idea is, I'm getting raw files and I want to display file data as BMP image (e.g Hex Workshop got that feature which called Data Visualizer)
Any suggestions?
Regards.
Find the divisors of the pixel array size.
For instance, if your array contains 243 pixels, divisors are 1, 3, 9, 27, 81 and 243. It means that your image is either 1x243, 3x81, 9x27, 27x9, 81x3 or 243x1.
You can only guess which is the good one by analyzing image content, vertical or horizontal features, recurring patterns, common aspect ratio, etc.

How to get a list of notes present in a wav file?

I am writing a program to help people learn guitar. To do this, I need to be able to look at a sample of time and see what note(s) they played. I looked at FFTW but I don't understand how to get this to work. I also tried to figure out the Goertzel algorithm but it seems like that is just for single-frequency notes like dial tones (not sure about that though). To be clear, I do need to be able to detect multiple notes (to see if a chord is played), but it doesn't matter too much if a few harmonics get in there too.
I'm coding this in C++, and would prefer a solution that is cross-platform.
UPDATE: I've realized it isn't so important to detect specific notes; what I really need is to check that certain frequencies are present, and that others aren't. For example, if someone plays a C, I want to check that a C frequency is present (about 262 Hz), as well as probably 524 Hz and 786 Hz, and check that nearby notes that are not near in the overtone series (like B and D) are not present.
Notes are not present in a wav file. Sampled sound is.
Humans might perceive some notes that might have been played to create the sound in some wav file, but doing automatic polyphonic pitch estimation/recognition from recorded sound into transcribed music for rich and complex waveforms, such as produced by guitars, still appears to be an advanced research topic.
When possible for certain very restricted types of music sounds, some non-trivial DSP will be involved. FFTW might be useful for a small part of the more sophisticated DSP processing needed for pitch estimation, Goertzel filtering less so.
I can't point you to specifics but I believe what you need would be a Fourier transform to detect the frequency you're looking for. There's also a similar question here
What about this pdf? http://miracle.otago.ac.nz/tartini/papers/A_Smarter_Way_to_Find_Pitch.pdf
The problem with the FFT is that if you do a 256 sample FFT, you will get only 256 outputs. Essentially, what this means is that it will divide your your frequency space, where there are infinite number of frequencies, into a limited set of frequencies.
This is because if you only check 256 samples (256 can be replace by N, the number of samples used for the FFT), any frequency which is related by a multiple of 256 will look the same.
In other words, if you check 256 evenly spaced samples, taken at time 0, 1/256, 2/256, 3/256, ... 255/256. Then, the two signals sin(2 pi 80 x), which has frequency 80 cycles/sec, and sin(2 pi (80 + 9*256) x), which has frequency (80+9*256), will have the same samples.
Here, 9 can be replaced by k, the multiple to use. You could replace 9 with 1,2,3,4,5, etc. You can replace 256 (N) with any value as well.
As an example, sampling both at 200/256, one of the samples, we have:
sin(2 pi (80 + 9*256) (200/256)) = sin(2 pi 80 (200/256) + 2 pi * 9 * 200)
Because multiples of 2 pi don't affect sin, this is the same as
sin(2 pi 80 (200/256)).
More generically,
sin(2 pi (M + k*N) j/N) = sin (2 pi M (j/N) + 2 pi k*j) = sin (2 pi M (j/N) ), where j is any integer 0,..., N - 1, N is the number of samples, (j/N) is the time to sample, M is the number of cycles per second, k is any integer ... -2, -1, 0, 1, 2 ...
From Nyquist sampling, if you want to distinguish, -128, -127, -126, -125, ..., 125, 126, 127 cycles per second you would take 256 samples/sec. 256 samples/sec means distinguishing 256 frequencies. However, 0 cycles/sec, 256 cycles/sec, 512 cycles/sec, 1024 cycles/sec would all look the same.