I watched this awesome video by Dave Miller on making a neural network from scratch in C++ here: https://vimeo.com/19569529
Here is the full source code referenced in the video: http://inkdrop.net/dave/docs/neural-net-tutorial.cpp
It uses mean squared error as the cost function. I'm interested in using a neural network for binary classification though and so would like to use cross-entropy as the cost function. I was hoping to add this to this code if possible, since I've already been playing around with it.
How would that be applied specifically here?
Would the only difference be in how the error is calculated for the output layer...or do the equations change all the way through backpropogation?
Does anything change at all? Is MSE versus cross-entropy solely used to get an idea of the overall error and not independently relevant to backpropogation?
Edit for clarity:
Here are the relevant functions.
//output layer - seems like error is just target value minus calculated value
void Neuron::calcOutputGradients(double targetVal)
{
double delta = targetVal - m_outputVal;
m_gradient = delta * Neuron::transferFunctionDerivative(m_outputVal);
}
double Neuron::sumDOW(const Layer &nextLayer) const
{
double sum = 0.0;
// Sum our contributions of the errors at the nodes we feed.
for (unsigned n = 0; n < nextLayer.size() - 1; ++n) {
sum += m_outputWeights[n].weight * nextLayer[n].m_gradient;
}
return sum;
}
void Neuron::calcHiddenGradients(const Layer &nextLayer)
{
double dow = sumDOW(nextLayer);
m_gradient = dow * Neuron::transferFunctionDerivative(m_outputVal);
}
void Neuron::updateInputWeights(Layer &prevLayer)
{
// The weights to be updated are in the Connection container in the neurons in the preceding layer
for (unsigned n = 0; n < prevLayer.size(); ++n) {
Neuron &neuron = prevLayer[n];
double oldDeltaWeight = neuron.m_outputWeights[m_myIndex].deltaWeight;
//calculate new weight for neuron with momentum
double newDeltaWeight = eta * neuron.getOutputVal() * m_gradient + alpha * oldDeltaWeight;
neuron.m_outputWeights[m_myIndex].deltaWeight = newDeltaWeight;
neuron.m_outputWeights[m_myIndex].weight += newDeltaWeight;
}
}
Finally found the answer here: https://visualstudiomagazine.com/articles/2014/04/01/neural-network-cross-entropy-error.aspx
You only have to change how the error at the output layer is calculated.
The relevant function to be changed is:
void Neuron::calcOutputGradients(double targetVal)
For mean square errors use:
double delta = targetVal - m_outputVal;
m_gradient = delta * Neuron::transferFunctionDerivative(m_outputVal);
For cross entropy just use:
m_gradient = targetVal - m_outputVal;
Related
I am trying to recognise a sequence of audio frames on an embedded system - an audio frame being a frequency or interpolation of two frequencies for a variable amount of time. I know the sounds I am trying to recognise (i.e. the start and end frequencies which are being linearly interpolated and the duration of each audio frame), but they are produced by a another embedded system so the microphone and speaker are cheap and somewhat inaccurate. The output is a square wave. Any suggestions how to go about doing this?
What I am trying to do now is to use FFT to get the magnitude of all frequencies, detect the peaks, look at the detection duration/2 ms ago and check if that somewhat matches an audio frame, and finally just checking if any sound I am looking for matched the sequence.
So far I used the FFT to process the microphone input - after applying a Hann window - and then assigning each frequency bin a coefficient that it's a peak based on how many standard deviations is away from the mean. This hasn't worked great since it thought there are peaks when it was silence in the room. Any ideas on how to more accurately detect the peaks? Also I think there are a lot of harmonics because of the square wave / interpolation? Can I do harmonic product spectrum if the peaks don't really line up at double the frequency?
Here I graphed noise (almost silent room) with somewhere in the interpolation of 2226 and 1624 Hz.
https://i.stack.imgur.com/R5Gs2.png
I sample at 91 microseconds -> 10989 Hz. Should I sample more often?
I added here samples of how the interpolation sounds when recorded on my laptop and on the embedded system.
https://easyupload.io/m/5l72b0
#define MIC_SAMPLE_RATE 10989 // Hz
#define AUDIO_SAMPLES_NUMBER 1024
MicroBitAudioProcessor::MicroBitAudioProcessor(DataSource& source) : audiostream(source)
{
arm_rfft_fast_init_f32(&fft_instance, AUDIO_SAMPLES_NUMBER);
buf = (float *)malloc(sizeof(float) * (AUDIO_SAMPLES_NUMBER * 2));
output = (float *)malloc(sizeof(float) * AUDIO_SAMPLES_NUMBER);
mag = (float *)malloc(sizeof(float) * AUDIO_SAMPLES_NUMBER / 2);
}
float henn(int i){
return 0.5 * (1 - arm_cos_f32(2 * 3.14159265 * i / AUDIO_SAMPLES_NUMBER));
}
int MicroBitAudioProcessor::pullRequest()
{
int s;
int result;
auto mic_samples = audiostream.pull();
if (!recording)
return DEVICE_OK;
int8_t *data = (int8_t *) &mic_samples[0];
int samples = mic_samples.length() / 2;
for (int i=0; i < samples; i++)
{
s = (int) *data;
result = s;
data++;
buf[(position++)] = (float)result;
if (position % AUDIO_SAMPLES_NUMBER == 0)
{
position = 0;
float maxValue = 0;
uint32_t index = 0;
// Apply a Henn window
for(int i=0; i< AUDIO_SAMPLES_NUMBER; i++)
buf[i] *= henn(i);
arm_rfft_fast_f32(&fft_instance, buf, output, 0);
arm_cmplx_mag_f32(output, mag, AUDIO_SAMPLES_NUMBER / 2);
}
}
return DEVICE_OK;
}
uint32_t frequencyToIndex(int freq) {
return (freq / ((uint32_t)MIC_SAMPLE_RATE / AUDIO_SAMPLES_NUMBER));
}
float MicroBitAudioProcessor::getFrequencyIntensity(int freq){
uint32_t index = frequencyToIndex(freq);
if (index <= 0 || index >= (AUDIO_SAMPLES_NUMBER / 2) - 1) return 0;
return mag[index];
}
I want to write an audio code in c++ for my microcontroller-based synthesizer which should allow me to generate a sampled square wave signal using the Fourier Series equation.
My question in general is: is there a way to set an "unknown" variable like "x" inside a sine-equation, and change its value afterwards?
What do I mean by that:
If you take a look on my code i've written so far you see the following:
void SquareWave(int mHarmonics){
char x;
for(int k = 0; k <= mHarmonics; k++){
mFourier += 1/((2*k)+1)*sin(((2*k)+1)*2*M_PI*x/SAMPLES_TOTAL);
}
for(x = (int)0; x < SAMPLES_TOTAL; x++){
mWave[x] = mFourier;
}
}
Inside the first for loop mFourier is summing weighted sinus-signals dependent by the number of Harmonics "mHarmonics". So a note on my keyboard should be setting up the harmonic spectrum automatically.
In this equation I've set x as a character and now we get to the center of my problem because i want to set x as a "unknown" variable which has a value that i want to set afterwards and if x would be an integer it would have some standard value like 0, which would make the whole equation incorrect.
In the bottom loop i want to write this Fourier Series sum inside an array mWave, which will be the resulting output. Is there a possibility to give the sum to mWave[x], where x is a "unknown" multiplier inside the sine signal first, and then change its values afterwards inside the second loop?
Sorry if this is a stupid question, I have not much experience with c++ but I try to learn it by making these stupid mistakes!
Cheers
#Useless told you what to do, but I am going to try to spell it out for you.
This is how I would do it:
#include <vector>
/**
* Perform a rectangular window in the frequency domain of a time domain square
* wave. This should be a sync impulse response.
*
* #param x The time domain sample within the period of the signal.
* #param harmonic_count The number of harmonics to aggregate in the result.
* #param sample_count The number of samples across the square wave period.
*
* #return double The time domain result of the combined harmonics at point x.
*/
double box_car(unsigned int x,
unsigned int harmonic_count,
unsigned int sample_count)
{
double mFourier = 0.0;
for (int k = 0; k <= harmonic_count; k++)
{
mFourier += 1.0 / ((2 * k) + 1) * sin(((2 * k) + 1) * 2.0 * M_PI * x / sample_count);
}
return mFourier;
}
/**
* Calculate the suqare wave samples across the time domain where the samples
* are filtered to only include the harmonic_count.
*
* #param harmonic_count The number of harmonics to aggregate in the result.
* #param sample_count The number of samples across the square wave period.
*
* #return std::vector<double>
*/
std::vector<double> box_car_samples(unsigned int harmonic_count,
unsigned int sample_count)
{
std::vector<double> square_wave;
for (unsigned int x = 0; x < sample_count; x++)
{
double sample = box_car(x, harmonic_count, sample_count);
square_wave.push_back(sample);
}
return square_wave;
}
So mWave[x] is returned as a std::vector of doubles (floating point).
The function box_car_samples() is f(k, x) as stated before.
So since I can't use vectors inside Arduino IDE anyhow I've tried the following solution:
...
void ComputeBandlimitedSquareWave(int mHarmonics){
for(int i = 0; i < sample_count; i++){
mWavetable[i] = ComputeFourierSeriesSquare(x);
if (x < sample_count) x++;
}
}
float ComputeFourierSeriesSquare(int x){
for(int k = 0; k <= mHarmonics; k++){
mFourier += 1/((2*k)+1)*sin(((2*k)+1)*2*M_PI*x/sample_count);
return mFourier;
}
}
...
First I thought this must be right a minute ago, but my monitors prove me wrong...
It sounds like a completely messed up sum of signals first, but after about 2 seconds the true characterisic squarewave sound comes through. I try to figure out what I'm overseeing and keep You guys updated if I can isolate that last part coming through my speakers, because it actually has a really decent sound. Only the messy overlays in the beginning are making me desperate right now...
I have a function to detect the peak of real-time data. The algorithm is mentioned in this thread. which looks like this:
std::vector<int> smoothedZScore(std::vector<float> input)
{
//lag 5 for the smoothing functions
int lag = 5;
//3.5 standard deviations for signal
float threshold = 3.5;
//between 0 and 1, where 1 is normal influence, 0.5 is half
float influence = .5;
if (input.size() <= lag + 2)
{
std::vector<int> emptyVec;
return emptyVec;
}
//Initialise variables
std::vector<int> signal(input.size(), 0.0);
std::vector<float> filteredY(input.size(), 0.0);
std::vector<float> avgFilter(input.size(), 0.0);
std::vector<float> stdFilter(input.size(), 0.0);
std::vector<float> subVecStart(input.begin(), input.begin() + lag);
double sum = std::accumulate(std::begin(subVecStart), std::end(subVecStart), 0.0);
double mean = sum / subVecStart.size();
double accum = 0.0;
std::for_each (std::begin(subVecStart), std::end(subVecStart), [&](const double d) {
accum += (d - mean) * (d - mean);
});
double stdev = sqrt(accum / (subVecStart.size()-1));
//avgFilter[lag] = mean(subVecStart);
avgFilter[lag] = mean;
//stdFilter[lag] = stdDev(subVecStart);
stdFilter[lag] = stdev;
for (size_t i = lag + 1; i < input.size(); i++)
{
if (std::abs(input[i] - avgFilter[i - 1]) > threshold * stdFilter[i - 1])
{
if (input[i] > avgFilter[i - 1])
{
signal[i] = 1; //# Positive signal
}
else
{
signal[i] = -1; //# Negative signal
}
//Make influence lower
filteredY[i] = influence* input[i] + (1 - influence) * filteredY[i - 1];
}
else
{
signal[i] = 0; //# No signal
filteredY[i] = input[i];
}
//Adjust the filters
std::vector<float> subVec(filteredY.begin() + i - lag, filteredY.begin() + i);
// avgFilter[i] = mean(subVec);
// stdFilter[i] = stdDev(subVec);
}
return signal;
}
In my code, I'm reading real-time 3 axis accelerometer values from IMU sensor and displaying it as a graph. I need to detect the peak of the signal using the above algorithm. I added the function to my code.
Let's say the realtime valuees are following:
double x = sample->acceleration_g[0];
double y = sample->acceleration_g[1];
double z = sample->acceleration_g[2];
How do I pass this value to the above function and detect the peak.
I tried calling this:
smoothedZScore(x)
but gives me an error:
settings.cpp:230:40: error: no matching function for call to 'smoothedZScore'
settings.cpp:92:18: note: candidate function not viable: no known conversion from 'double' to 'std::vector<float>' for 1st argument
EDIT
The algorithm needs a minimum of 7 samples to feed in. So I guess I may need to store my realtime data in a buffer.
But I've difficulty understanding how to store samples in a buffer and apply to the peak detection algorithm.
can you show me a possible solution to this?
You will need to rewrite the algorithm. Your problem isn't just a realtime problem, you also need a causal solution. The function you have is not causal.
Practically speaking, you will need a class, and that class will need to incrementally calculate the standard deviation.
I have successfully implemented stochastic backpropagation and I am trying to increase its accuracy. I've noticed batched backpropagation seems to be more popular I wanted to try and see if that will improve the network's accuracy, however I can't seem to figure out how to implement it. By "batched backpropagation" I mean backpropagation where the weights and biases are only updated after the completion of a mini-batch or epoch instead of updating it after each input.
My understanding is that you sum up the changes that are needed to be made to each weight and bias and apply the change at the end of the batch of training examples. I basically changed nothing from my original stochastic backprop code except instead of applying the change directly to the weights and biases I apply the change to a buffer which is then used to update the weights and biases later. Or am I supposed to sum up the cost from each training example and then at the end of the batch run backpropagation? If this is the case then what do I use for the intermediate results (the output vectors of each layer) if the cost is a combination of the cost for a batch of inputs?
//Called after each calculation on a training example
void ML::NeuralNetwork::learnBatch(const Matrix & calc, const Matrix & real) const {
ML::Matrix cost = 2 * (calc - real);
for (int i = weights.size() - 1; i >= 0; --i) {
//Each element in results is the column vector output for each layer
//ElementMultiply() returns Hadamard Product
ML::Matrix dCdB = cost.elementMultiply(ML::sigDerivative(weights[i] * results[i] + biases[i]));
ML::Matrix dCdW = dCdB * results[i].transpose();
cost = weights[i].transpose() * dCdB;
sumWeights[i] += learningRate * dCdW; //Scalar multiplication
sumBiases[i] += learningRate * dCdB;
/* Original Code:
* weights[i] -= learningRate * dCdW;
* biases[i] -= learningRate * dCdB;
*/
}
}
//Called at the end of a batch
void ML::NeuralNetwork::update() {
for (int i = 0; i < weights.size(); ++i) {
weights[i] -= sumWeights[i];
biases[i] -= sumBiases[i];
//Sets all elements in the matrix to 0
sumWeights[i].zero();
sumBiases[i].zero();
}
}
Besides the addition of an update() function I really haven't changed much from my working stochastic backprop code. With my current batch backprop code the neural network never learns and consistently gets 0 correct outputs even after iterating over 200 batches. Is there something I'm not understanding?
All help will be greatly appreciated.
In batch back propagation, you sum the contribution of the backpropagation of each sample.
In other terms, the resulting gradient is thus the sum of the gradient of each sample.
I'm currently trying to display an audio spectrum using FFTW3 and SFML. I've followed the directions found here and looked at numerous references on FFT and spectrums and FFTW yet somehow my bars are almost all aligned to the left like below. Another issue I'm having is I can't find information on what the scale of the FFT output is. Currently I'm dividing it by 64 yet it still reaches beyond that occasionally. And further still I have found no information on why the output of the from FFTW has to be the same size as the input. So my questions are:
Why is the majority of my spectrum aligned to the left unlike the image below mine?
Why isn't the output between 0.0 and 1.0?
Why is the input sample count related to the fft output count?
What I get:
What I'm looking for:
const int bufferSize = 256 * 8;
void init() {
sampleCount = (int)buffer.getSampleCount();
channelCount = (int)buffer.getChannelCount();
for (int i = 0; i < bufferSize; i++) {
window.push_back(0.54f - 0.46f * cos(2.0f * GMath::PI * (float)i / (float)bufferSize));
}
plan = fftwf_plan_dft_1d(bufferSize, signal, results, FFTW_FORWARD, FFTW_ESTIMATE);
}
void update() {
int mark = (int)(sound.getPlayingOffset().asSeconds() * sampleRate);
for (int i = 0; i < bufferSize; i++) {
float s = 0.0f;
if (i + mark < sampleCount) {
s = (float)buffer.getSamples()[(i + mark) * channelCount] / (float)SHRT_MAX * window[i];
}
signal[i][0] = s;
signal[i][1] = 0.0f;
}
}
void draw() {
int inc = bufferSize / 2 / size.x;
int y = size.y - 1;
int max = size.y;
for (int i = 0; i < size.x; i ++) {
float total = 0.0f;
for (int j = 0; j < inc; j++) {
int index = i * inc + j;
total += std::sqrt(results[index][0] * results[index][0] + results[index][1] * results[index][1]);
}
total /= (float)(inc * 64);
Rectangle2I rect = Rectangle2I(i, y, 1, -(int)(total * max)).absRect();
g->setPixel(rect, Pixel(254, toColor(BLACK, GREEN)));
}
}
All of your questions are related to the FFT theory. Study the properties of FFT from any standard text/reference book and you will be able to answer your questions all by yourself only.
The least you can start from is here:
https://en.wikipedia.org/wiki/Fast_Fourier_transform.
Many FFT implementations are energy preserving. That means the scale of the output is linearly related to the scale and/or size of the input.
An FFT is a DFT is a square matrix transform. So the number of outputs will always be equal to the number of inputs (or half that by ignoring the redundant complex conjugate half given strictly real input), unless some outputs are thrown away. If not, it's not an FFT. If you want less outputs, there are ways to downsample the FFT output or post process it in other ways.