Identifying / generating a waveform? - c++

I would like to code something that could take some sort of input and identify it as a square wave, triangle wave, or some sort of waveform. I also need some way of generating said waves.
I do have experience with C/C++, however, I'm not sure how I would approach simulating all of this. Eventually, I would like to translate it to a microcontroller program for reading its analog input to determine the waveform.
EDIT: Sorry; I should have mentioned it would be at a known frequency and the amplitude should be unknown.

Generating the waves is significantly easier than identifying them. I have a small project that does some wave generation. Here's an example from my project:
float amplitude;
switch (sound->wavetype)
{
case LA_SQUARE:
amplitude = sound->theta > .5 ? 1.0 : -1.0;
break;
case LA_SINE:
amplitude = sin(2 * PI * sound->theta);
break;
case LA_TRIANGLE:
amplitude = sound->theta > .5 ? 4 * sound->theta - 3 : -4 * sound->theta + 1;
break;
case LA_SAWTOOTH:
amplitude = 2 * sound->theta - 1.0;
break;
case LA_NOISE:
amplitude = ((float)rand() / RAND_MAX);
break;
default:
;
}
theta here is updated at every frame along the wave form and is dependent on the frequency of the wave you're creating.
As for identifying waves, if you know you're just going to be getting simple, unmixed square, triangle or sine waves, you can probably just do some simple tests. Look at the change in amplitude at any two points along the wave. If they're the same, square wave. If they're changing linearly (that is, if the change in amplitude is constant) you've got a triangle wave (or a sawtooth, if you're making that distinction). Otherwise, it's a sine wave. Keep in mind this check only works if you're expecting just those types of wave, and they're not being mixed or anything. There's some other edge cases in there I can think of but I'll let you worry about that.
If you're doing anything fancier, you're going to need to probably look up a book that specializes in this sort of thing, like the one suggested in the comments section.

Start with MATLAB or the free GNU Octave work-alike. You can generate arrays with the desired waveforms and write appropriate functions to decode/identify. When you have the details worked out, grab a copy of the FFTW (fastest fourier transforn in the west) library to handle the fft/ifft routines for your c/c++ code. The signal processing workbench module of MATLAB has lots of useful tools to achieve your objective.

To identifying waveforms: If you know the frequency, you can do quite a lot by using discrete gradients, as Alex suggests in his answer.
Another method would be to use an interpolation technique and have a look at the coefficients. Still another would be a fast Fourier transform. These last two are computationally more intense, but also much more accurate, especially when identifying more complex waveforms. You would have to see whether your uC is fast enough or, if you are lucky, has a hardware-FFT.

Related

C++ mathematical function generation

In working on a project I came across the need to generate various waves, accurately. I thought that a simple sine wave would be the easiest to begin with, but it appears that I am mistaken. I made a simple program that generates a vector of samples and then plays those samples back so that the user hears the wave, as a test. Here is the relevant code:
vector<short> genSineWaveSample(int nsamples, float freq, float amp) {
vector<short> samples;
for(float i = 0; i <= nsamples; i++) {
samples.push_back(amp * sinx15(freq*i));
}
return samples;
}
I'm not sure what the issue with this is. I understand that there could be some issue with the vector being made of shorts, but that's what my audio framework wants, and I am inexperienced with that kind of library and so do not know what to expect.
The symptoms are as follows:
frequency not correct
ie: given freq=440, A4 is not the note played back
strange distortion
Most frequencies do not generate a clean wave. 220, 440, 880 are all clean, most others are distorted
Most frequencies are shifted upwards considerably
Can anyone give advice as to what I may be doing wrong?
Here's what I've tried so far:
Making my own sine function, for greater accuracy.
I used a 15th degree Taylor Series expansion for sin(x)
Changed the sample rate, anything from 256 to 44100, no change can be heard given the above errors, the waves are simply more distorted.
Thank you. If there is any information that can help you, I'd be obliged to provide it.
I suspect that you are passing incorrect values to your sin15x function. If you are familiar with the basics of signal processing the Nyquist frequency is the minimum frequency at which you can faithful reconstruct (or in your case construct) a sampled signal. The is defined as 2x the highest frequency component present in the signal.
What this means for your program is that you need at last 2 values per cycle of the highest frequency you want to reproduce. At 20Khz you'd need 40,000 samples per second. It looks like you are just packing a vector with values and letting the playback program sort out the timing.
We will assume you use 44.1Khz as your playback sampling frequency. This means that a snipet of code producing one second of a 1kHz wave would look like
DataStructure wave = new DataStructure(44100) // creates some data structure of 44100 in length
for(int i = 0; i < 44100; i++)
{
wave[i] = sin(2*pi * i * (frequency / 44100) + pi / 2) // sin is in radians, frequency in Hz
}
You need to divide by the frequency, not multiply. To see this, take the case of a 22,050 Hz frequency value is passed. For i = 0, you get sin(0) = 1. For i = 1, sin(3pi/2) = -1 and so on are so forth. This gives you a repeating sequence of 1, -1, 1, -1... which is the correct representation of a 22,050Hz wave sampled at 44.1Khz. This works as you go down in frequency but you get more and more samples per cycle. Interestingly though this does not make a difference. A sinewave sampled at 2 samples per cycle is just as accurately recreated as one that is sampled 1000 times per second. This doesn't take into account noise but for most purposes works well enough.
I would suggest looking into the basics of digital signal processing as it a very interesting field and very useful to understand.
Edit: This assumes all of those parameters are evaluated as floating point numbers.
Fundamentally, you're missing a piece of information. You don't specify the amount of time over which you want your samples taken. This could also be thought of as the rate at which the samples will be played by your system. Something roughly in this direction will get you closer, for now, though.
samples.push_back(amp * std::sin(M_PI / freq *i));

Backpropagation 2-Dimensional Neuron Network C++

I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!
http://ufldl.stanford.edu/wiki/index.php/Main_Page
http://ufldl.stanford.edu/tutorial/

Desert fractal OpenGL

we're trying to generate a 3d world using a 2d perlin noise (with a recorsive/fractal technique). We have generated mountains and valleys quite fine but now we are having problems with desert and dunes because we only worked on persistence and octaves and we aren't able to make the classic shape of the dune. Has anybody already experienced that? Any solution, possibly still using perlin noise, or also other algorithms which allow you to do this?
You could give the Musgrave ridged multifractal a try. It gives nice ridged structures and you can use your existing noise algorithms for it.
The C reference implementation for it is here
Dunes are lobsided: .='\ cross section... you may want to use an initial shape of that kind
They are regular, like waves in the sea. not completely noise
they are elongated towards the wind
I didnt use the first condition, but i have made great dunes by multiplying 2 1d perin noises together, or even 2 sin/parabol functions, where they are both lined to one axis. i.e. Z, and they have a small low frequency Sin or noise wobbling them along X axis, so they aren't alined.
try this:
dunes = sin ( X + 1dperlin(Z) *.2 ) * sin ( X + 1dperlin(Z+432) *.2 );
otherwise to test it:
dunes = sin ( X + sin(Z) *.2 ) (plus or times or devided by) sin ( X + sin(Z+432) *.2 );
0.2 makes dunes 10 times longer than wide, and it's like when two straight water waves meet at almost the same angle, plus an uncertainty variable using noise for the angle.
Maybe turbulence is yet enough for what you need... Try to play with turbulence using the absolute value of your octaves return values instead of the normal values. You can also evaluate separately and combine your noise and your turbulence to mix both effects in some areas.

Runtime Sound Generation in C++ on Windows

How might one generate audio at runtime using C++? I'm just looking for a starting point. Someone on a forum suggested I try to make a program play a square wave of a given frequency and amplitude.
I've heard that modern computers encode audio using PCM samples: At a give rate for a specific unit of time (eg. 48 kHz), the amplitude of a sound is recorded at a given resolution (eg. 16-bits). If I generate such a sample, how do I get my speakers to play it? I'm currently using windows. I'd prefer to avoid any additional libraries if at all possible but I'd settle for a very light one.
Here is my attempt to generate a square wave sample using this principal:
signed short* Generate_Square_Wave(
signed short a_amplitude ,
signed short a_frequency ,
signed short a_sample_rate )
{
signed short* sample = new signed short[a_sample_rate];
for( signed short c = 0; c == a_sample_rate; c++ )
{
if( c % a_frequency < a_frequency / 2 )
sample[c] = a_amplitude;
else
sample[c] = -a_amplitude;
}
return sample;
}
Am I doing this correctly? If so, what do I do with the generated sample to get my speakers to play it?
Your loop has to use c < a_sample_rate to avoid a buffer overrun.
To output the sound you call waveOutOpen and other waveOut... functions. They are all listed here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd743834(v=vs.85).aspx
The code you are using generates a wave that is truly square, binary kind of square, in short the type of waveform that does not exist in real life. In reality most (pretty sure all) of the sounds you hear are a combination of sine waves at different frequencies.
Because your samples are created the way they are they will produce aliasing, where a higher frequency masquerades as a lower frequency causing audio artefacts. To demonstrate this to yourself write a little program which sweeps the frequency of your code from 20-20,000hz. You will hear that the sound does not go up smoothly as it raises in frequency. You will hear artefacts.
Wikipedia has an excellent article on square waves: https://en.m.wikipedia.org/wiki/Square_wave
One way to generate a square wave is to perform an inverse Fast Fourier Transform which transforms a series of frequency measurements into a series of time based samples. Then generating a square wave is a matter of supplying the routine with a collection of the measurements of sin waves at different frequencies that make up a square wave and the output is a buffer with a single cycle of the waveform.
To generate audio waves is computationally expensive so what is often done is to generate arrays of audio samples and play them back at varying speeds to play different frequencies. This is called wave table synthesis.
Have a look at the following link:
https://www.earlevel.com/main/2012/05/04/a-wavetable-oscillator%E2%80%94part-1/
And some more about band limiting a signal and why it’s necessary:
https://dsp.stackexchange.com/questions/22652/why-band-limit-a-signal

Converting an FFT to a spectogram

I have an audio file and I am iterating through the file and taking 512 samples at each step and then passing them through an FFT.
I have the data out as a block 514 floats long (Using IPP's ippsFFTFwd_RToCCS_32f_I) with real and imaginary components interleaved.
My problem is what do I do with these complex numbers once i have them? At the moment I'm doing for each value
const float realValue = buffer[(y * 2) + 0];
const float imagValue = buffer[(y * 2) + 1];
const float value = sqrt( (realValue * realValue) + (imagValue * imagValue) );
This gives something slightly usable but I'd rather some way of getting the values out in the range 0 to 1. The problem with he above is that the peaks end up coming back as around 9 or more. This means things get viciously saturated and then there are other parts of the spectrogram that barely shows up despite the fact that they appear to be quite strong when I run the audio through audition's spectrogram. I fully admit I'm not 100% sure what the data returned by the FFT is (Other than that it represents the frequency values of the 512 sample long block I'm passing in). Especially my understanding is lacking on what exactly the compex number represents.
Any advice and help would be much appreciated!
Edit: Just to clarify. My big problem is that the FFT values returned are meaningless without some idea of what the scale is. Can someone point me towards working out that scale?
Edit2: I get really nice looking results by doing the following:
size_t count2 = 0;
size_t max2 = kFFTSize + 2;
while( count2 < max2 )
{
const float realValue = buffer[(count2) + 0];
const float imagValue = buffer[(count2) + 1];
const float value = (log10f( sqrtf( (realValue * realValue) + (imagValue * imagValue) ) * rcpVerticalZoom ) + 1.0f) * 0.5f;
buffer[count2 >> 1] = value;
count2 += 2;
}
To my eye this even looks better than most other spectrogram implementations I have looked at.
Is there anything MAJORLY wrong with what I'm doing?
The usual thing to do to get all of an FFT visible is to take the logarithm of the magnitude.
So, the position of the output buffer tells you what frequency was detected. The magnitude (L2 norm) of the complex number tells you how strong the detected frequency was, and the phase (arctangent) gives you information that is a lot more important in image space than audio space. Because the FFT is discrete, the frequencies run from 0 to the nyquist frequency. In images, the first term (DC) is usually the largest, and so a good candidate for use in normalization if that is your aim. I don't know if that is also true for audio (I doubt it)
For each window of 512 sample, you compute the magnitude of the FFT as you did. Each value represents the magnitude of the corresponding frequency present in the signal.
mag
/\
|
| ! !
| ! ! !
+--!---!----!----!---!--> freq
0 Fs/2 Fs
Now we need to figure out the frequencies.
Since the input signal is of real values, the FFT is symmetric around the middle (Nyquist component) with the first term being the DC component. Knowing the signal sampling frequency Fs, the Nyquist frequency is Fs/2. And therefore for the index k, the corresponding frequency is k*Fs/512
So for each window of length 512, we get the magnitudes at specified frequency. The group of those over consecutive windows form the spectrogram.
Just so people know I've done a LOT of work on this whole problem. The main thing I've discovered is that the FFT requires normalisation after doing it.
To do this you average all the values of your window vector together to get a value somewhat less than 1 (or 1 if you are using a rectangular window). You then divide that number by the number of frequency bins you have post the FFT transform.
Finally you divide the actual number returned by the FFT by the normalisation number. Your amplitude values should now be in the -Inf to 1 range. Log, etc, as you please. You will still be working with a known range.
There are a few things that I think you will find helpful.
The forward FT will tend to give larger numbers in the output than in the input. You can think of it as all of the intensity at a certain frequency being displayed at one place rather than being distributed through the dataset. Does this matter? Probably not because you can always scale the data to fit your needs. I once wrote an integer based FFT/IFFT pair and each pass required rescaling to prevent integer overflow.
The real data that are your input are converted into something that is almost complex. As it turns out buffer[0] and buffer[n/2] are real and independent. There is a good discussion of it here.
The input data are sound intensity values taken over time, equally spaced. They are said to be, appropriately enough, in the time domain. The output of the FT is said to be in the frequency domain because the horizontal axis is frequency. The vertical scale remains intensity. Although it isn't obvious from the input data, there is phase information in the input as well. Although all of the sound is sinusoidal, there is nothing that fixes the phases of the sine waves. This phase information appears in the frequency domain as the phases of the individual complex numbers, but often we don't care about it (and often we do too!). It just depends upon what you are doing. The calculation
const float value = sqrt((realValue * realValue) + (imagValue * imagValue));
retrieves the intensity information but discards the phase information. Taking the logarithm essentially just dampens the big peaks.
Hope this is helpful.
If you are getting strange results then one thing to check is the documentation for the FFT library to see how the output is packed. Some routines use a packed format where real/imaginary values are interleaved, or they may begin at the N/2 element and wrap around.
For a sanity check I would suggest creating sample data with known characteristics, eg Fs/2, Fs/4 (Fs = sample frequency) and compare the output of the FFT routine with what you'd expect. Try creating both a sine and cosine at the same frequency, as these should have the same magnitude in the spectrum, but have different phases (ie the realValue/imagValue will differ, but the sum of squares should be the same.
If you're intending on using the FFT though then you really need to know how it works mathematically, otherwise you're likely to encounter other strange problems such as aliasing.