In working on a project I came across the need to generate various waves, accurately. I thought that a simple sine wave would be the easiest to begin with, but it appears that I am mistaken. I made a simple program that generates a vector of samples and then plays those samples back so that the user hears the wave, as a test. Here is the relevant code:
vector<short> genSineWaveSample(int nsamples, float freq, float amp) {
vector<short> samples;
for(float i = 0; i <= nsamples; i++) {
samples.push_back(amp * sinx15(freq*i));
}
return samples;
}
I'm not sure what the issue with this is. I understand that there could be some issue with the vector being made of shorts, but that's what my audio framework wants, and I am inexperienced with that kind of library and so do not know what to expect.
The symptoms are as follows:
frequency not correct
ie: given freq=440, A4 is not the note played back
strange distortion
Most frequencies do not generate a clean wave. 220, 440, 880 are all clean, most others are distorted
Most frequencies are shifted upwards considerably
Can anyone give advice as to what I may be doing wrong?
Here's what I've tried so far:
Making my own sine function, for greater accuracy.
I used a 15th degree Taylor Series expansion for sin(x)
Changed the sample rate, anything from 256 to 44100, no change can be heard given the above errors, the waves are simply more distorted.
Thank you. If there is any information that can help you, I'd be obliged to provide it.
I suspect that you are passing incorrect values to your sin15x function. If you are familiar with the basics of signal processing the Nyquist frequency is the minimum frequency at which you can faithful reconstruct (or in your case construct) a sampled signal. The is defined as 2x the highest frequency component present in the signal.
What this means for your program is that you need at last 2 values per cycle of the highest frequency you want to reproduce. At 20Khz you'd need 40,000 samples per second. It looks like you are just packing a vector with values and letting the playback program sort out the timing.
We will assume you use 44.1Khz as your playback sampling frequency. This means that a snipet of code producing one second of a 1kHz wave would look like
DataStructure wave = new DataStructure(44100) // creates some data structure of 44100 in length
for(int i = 0; i < 44100; i++)
{
wave[i] = sin(2*pi * i * (frequency / 44100) + pi / 2) // sin is in radians, frequency in Hz
}
You need to divide by the frequency, not multiply. To see this, take the case of a 22,050 Hz frequency value is passed. For i = 0, you get sin(0) = 1. For i = 1, sin(3pi/2) = -1 and so on are so forth. This gives you a repeating sequence of 1, -1, 1, -1... which is the correct representation of a 22,050Hz wave sampled at 44.1Khz. This works as you go down in frequency but you get more and more samples per cycle. Interestingly though this does not make a difference. A sinewave sampled at 2 samples per cycle is just as accurately recreated as one that is sampled 1000 times per second. This doesn't take into account noise but for most purposes works well enough.
I would suggest looking into the basics of digital signal processing as it a very interesting field and very useful to understand.
Edit: This assumes all of those parameters are evaluated as floating point numbers.
Fundamentally, you're missing a piece of information. You don't specify the amount of time over which you want your samples taken. This could also be thought of as the rate at which the samples will be played by your system. Something roughly in this direction will get you closer, for now, though.
samples.push_back(amp * std::sin(M_PI / freq *i));
Related
I'm trying to figure out the amplitude of each frequency of sound captured by microphone.
Just like this example https://developer.apple.com/documentation/accelerate/visualizing_sound_as_an_audio_spectrogram
I captured sample from microphone to sample buffer, copy to a circle buffer, and then performed ForwardDCT on it, just like this:
func processData(values: [Int16]) {
vDSP.convertElements(of: values,
to: &timeDomainBuffer)
vDSP.multiply(timeDomainBuffer,
hanningWindow,
result: &timeDomainBuffer)
forwardDCT.transform(timeDomainBuffer,
result: &frequencyDomainBuffer)
vDSP.absolute(frequencyDomainBuffer,
result: &frequencyDomainBuffer)
vDSP.convert(amplitude: frequencyDomainBuffer,
toDecibels: &frequencyDomainBuffer,
zeroReference: Float(Microphone.sampleCount))
if frequencyDomainValues.count > Microphone.sampleCount {
frequencyDomainValues.removeFirst(Microphone.sampleCount)
}
frequencyDomainValues.append(contentsOf: frequencyDomainBuffer)
}
the timeDomainBuffer is the float16 Array contains samples counting sampleCount,
while the frequencyDomainBuffer is the amplitude of each frequency, frequency is denoted as it's array index with it's value expressing amplitude of this frequency.
I'm trying to get amplitude of each frequency, just like this:
for index in frequencyDomainBuffer{
let frequency = index * (AVAudioSession().sampleRate/Double(Microphone.sampleCount)/2)
}
I supposed the index of freqeuencyDomainBuffer will be linear to the actual frequency, so sampleRate divided by half of sampleCount will be correct. (sampleCount is the timeDomainBuffer length)
The result is correct when running on my iPad, but the frequency got 10% higher when on iPhone.
I'm dubious whether AVAudioSession().sampleRate can be used on iPhone?
Of course I can add a condition like if iPhone, but I'd like to know why and will it be correct on other devices I haven't tested on?
If you're seeing a consistent 10% difference, I'm betting it's actually an 8.9% difference. I haven't studied your code, but I'd look for a hard-coded 44.1kHz somewhere. The sample rate on iPhones is generally 48kHz.
Remember also that the bins are (as you suspect) proportional to the sampling rate. So at different sampling rates the center of the bins are different. Depending on the number of bins you're using, this could represent a large difference (not really an "error" since the bins are ranges, but if you assume it's precisely the center frequency, this could match your 10%).
I'm having trouble making an audio visualizer look accurate. The bins that have a significant amount of sound tend to draw correctly, but the problem I'm having is that all the frequencies with no significant sound seem to be coming back with a value that usually bounces between -60dB and -40dB. This forms a flat bouncing line (usually in the higher freqencies).
I want to display 512 bins or less at 30 frames per second. I've been reading up on FFT and audio non stop for a couple weeks now, and so far my process has been:
Load pcm data from wav file. This comes in as 44100 samples per second that have a range of -/+ 32767. I'm assuming I treat these as real numbers when passing them to the FFT.
Divide these samples up into 1470 per frame. (446 are ignored)
Take 1024 samples and apply a Hann window.
Pass the samples to FFT as an array of real[1024] as well as another array of the same size filled with zeros for the imaginary part.
Get the magnitude by looping through the (samples/2) bins and do a sqrt(real[i]*real[i] + img[i]*img[i]).
Taking 20 * log(magnitude) to get the decibel level of each bin
Draw a rectangle for each bin. Draw these bins for each frame.
I've tested it with a couple songs, and a wav file I generated that just plays a tone at 440Hz. With the wav file, I do get a spike at the 440 bin, but all the other bins form a line that isn't much shorter than the 440 bin. Also every other frame, the bins apart from 440 look like a graphed log function with a dip on some other bin.
I'm writing this in c++. Using STK to only load left channel from the audio file:
//put every sample in the song into a temporary vector
for (int i = 0; i < stkObject->getSize(); i++)
{
standardVector.push_back(stkObject->tick(LEFT));
}
I'm using FFTReal to perform the FFT:
std::vector<std::vector <double> > leftChannelData;
int numberOfFrames = stkObject->getSize()/samplesPerFrame;
leftChannelData.resize(numberOfFrames);
for(int i = 0; i < numberOfFrames; i++)
{
for(int j = 0; j < FFT_SAMPLE_LENGTH; j++)
{
real[j] = standardVector[j + (i*samplesPerFrame)];
}
applyHannWindow(real, FFT_SAMPLE_LENGTH);
fft_object.do_fft(imaginary,real);
//FFTReal instructions say to run this after an fft
fft_object.rescale(real);
leftChannelData[i].resize(FFT_SAMPLE_LENGTH/2);
for (int j = 0; j < FFT_SAMPLE_LENGTH/2; j++)
{
double magnitude = sqrt(real[j]*real[j] + imaginary[j]*imaginary[j]);
double dbValue = 20 * log(magnitude/maxMagnitude);
leftChannelData[i].at(j) = dbValue;
}
}
I'm at a loss as to what's causing this. I've tried various ways to pull those 446 samples I'm ignoring, but the results don't seem to change. I think I may be doing something fundamentally wrong. I've tried normalizing the pcm data before handing it to the fft and I've tried normalizing the magnitude before finding the decibels, but it doesn't seem to be working. Any thoughts?
EDIT: I don't see any difference between log(magnitude) and log(magnitude/maxMagnitude). All it seems to do is shift all of the bin's values evenly downwards.
EDIT2:
Here's a what they look like to get a visual:
Song playing low sounds - with log(mag)
Song playing low sounds - same but with log(mag/maxMag)
Again, log(mag) and log(mag/maxMag) generally look the same, but with values spanning in the negative. Like MSalters said, the decibel can approach -infinite, so I can clamp those values to -100dB. Then take log(mag/maxMag) and add 100. That way the rectangle's height range from 0 to 100 instead of -100 to 0.
Is this what I should do? I've tried this, but it still looks wrong. Maybe it's just a scaling issue? When I do this, a lot of the bars don't make it above the line when it sounds like they should. And if they do make it above 0, they do so just barely.
You have to understand that you're not taking the Fourier Transform of an infinite signal, but the FT of an windowed version thereof. And your window isn't even a plain Hann window. Discarding 446 points is effectively a rectangular window function. The FT of the window functions will both show up in your output.
Secondly, the dB scale is logarithmic. That indeed means it can go quite low in the absence of a signal. You mention -60 dB, but it in fact could hit minus infinity. The only thing that would save you from that is the window function, which will introduce smear at about -110 dB.
The noise (stop band ripple) produced by a quantized Von Hann window of length 1024 could well be around -40 to -60 dB. So one strategy is to just set a threshold, and ignore (don't plot) all values below that threshold.
Also, try removing the rescale(real) function, as that could distort your complex vector before you take the log magnitude.
Also, make sure you are actually loading the audio samples into your real vector correctly (sign, number of bits and endianess).
I am currently taking a class in school and I have to code FIR/IIR filter in C/C++.
As an input to the filter, 2kHz sine wave with white noise is used. Then, by inputting the sine wave to the C/C++ code, I need to observe the clean sine wave output. It's all done in software level.
My problem is that I don't know how to deal with this input/output of sine wave. For example, I don't know what type of file format I can use or need to use, I don't know how to make the sine wave form and etc.
This might be a very trivial question, but I have no clue where to begin.
Does anyone have any experience in this type of question or have any tips?
Any help would be really appreciated.
Generating the sine wave at 2kHz means that you want to generate values over time that, when graphed, follow a sine wave. Pick an amplitude (you didn't mention one), and pick your sample rate. See the graph here (http://en.wikipedia.org/wiki/Sine_wave); you want values that when plotted follow the sine wave graphed in 2D with the X axis being time, and the Y axis being the amplitude of the value you are measuring.
amplitude (volts, degrees, pascals, milliamps, etc)
frequency (2kHz, that is 2000 sine waves/second)
sample rate (how many samples do you want per second)
Suppose you generate a file that has a time value and an amplitude measurement, which you would want to scale to your amplitude (more on this later). So a device might give an 8-bit or 16-bit digital reading which represents either an absolute, or logarithmic measurement against some scale.
struct sample
{
long usec; //microseconds (1/1,000,000 second)
short value; //many devices give a value between 0 and 255
}
Suppose you generate exactly 2000 samples/second. If you were actually measuring an external value, you would get the same value every time (see that?), which when graphed would look like a straight line.
So you want a sample rate higher than the frequency. Suppose you sample as 2x the frequency. Then you would see points 180deg off on the sine wave, which might be peaks, up or down slope, or where sine wave crosses zero. A sample rate 4x the frequency would show a sawtooth pattern. And as you increase the number of samples, your graph looks closer to the actual sine wave. This is similar to the pixelization you see in 8-bit game sprites.
How many samples for any given sine wave would you think would give you a good approximation of a sine wave? 8? 16? 100? 500? Suppose you sampled 1,000,000 times per second, then you would have 1,000,000/2,000 = 500 samples per sine wave.
pick your sample rate (500)
define your frequency (2000)
decide how long to record your samples (5 seconds?)
define your amplitude (device measures 0-255, but what is measured max?)
Here is code to generate some samples,
#define MAXJITTER (10)
#define MAXNOISE (20)
int
generate_samples( long duration, //duration in microseconds
int amplitude, //scaled peak measurement from device
int frequency, //Hz > 0
int samplerate ) //how many samples/second > 0
{
long ts; //timestamp in microseconds, usec
long sdelay; //sample delay in usec
if(frequency<1) frequency1=1; //avoid division by zero
if(samplerate<1) samplerate=1; //avoid division by zero
sdelay = 1000000/samplerate; //usec delay between each sample
sample m;
int jitter, noise; //introduce noise here
for( long ts=0; ts<duration; ts+=sdelay ) // //in usec (microseconds)
{
//jitter, sample not exactly sdelay
jitter = drand48()*MAXJITTER - (MAXJITTER/2); // +/-1/2 MAXJITTER
//noise is mismeasurement
noise = drand48()*MAXNOISE - (MAXNOISE/2); // +/-1/2 MAXNOISE
m.usec = ts + jitter;
//2PI in a full sine wave
float period = 2*PI * (ts*1.0/frequency);
m.value = sin( period );
//write m to file or save me to array/vector
}
return 0; //return number of samples, or sample array, etc
}
First generate some samples,
generate_samples( 5*1000000, 100, 2000, 2000*50 );
You could graph the samples generated as a view of the noisy signal.
The above certainly answers many of your questions about how to record measurements, and what format is typically used. And it shows how transit through the period of multiple sine waves, generate random samples with jitter and noise, and record samples over some time duration.
Building your filter is a second issue. Writing the code to emulate the filter(s) described below is left as an exercise, or a second question as you glean more understanding,
http://en.wikipedia.org/wiki/Finite_impulse_response
http://en.wikipedia.org/wiki/Infinite_impulse_response
The generated sample of the signal (above) would be fed into the code you write to build the filter. Expect that the output of the filter would be a new set of samples, perhaps with jitter, but expect that your filter would eliminate at least some of the noise. You would then be able to graph the samples produced by the filter.
You might consider that converting the samples into a comma delimited file would enable you to load them into excel and graph them. And it might help if you elucidated your electronics background, your trig knowledge, and how much you know about filters, etc.
Good luck!
How might one generate audio at runtime using C++? I'm just looking for a starting point. Someone on a forum suggested I try to make a program play a square wave of a given frequency and amplitude.
I've heard that modern computers encode audio using PCM samples: At a give rate for a specific unit of time (eg. 48 kHz), the amplitude of a sound is recorded at a given resolution (eg. 16-bits). If I generate such a sample, how do I get my speakers to play it? I'm currently using windows. I'd prefer to avoid any additional libraries if at all possible but I'd settle for a very light one.
Here is my attempt to generate a square wave sample using this principal:
signed short* Generate_Square_Wave(
signed short a_amplitude ,
signed short a_frequency ,
signed short a_sample_rate )
{
signed short* sample = new signed short[a_sample_rate];
for( signed short c = 0; c == a_sample_rate; c++ )
{
if( c % a_frequency < a_frequency / 2 )
sample[c] = a_amplitude;
else
sample[c] = -a_amplitude;
}
return sample;
}
Am I doing this correctly? If so, what do I do with the generated sample to get my speakers to play it?
Your loop has to use c < a_sample_rate to avoid a buffer overrun.
To output the sound you call waveOutOpen and other waveOut... functions. They are all listed here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd743834(v=vs.85).aspx
The code you are using generates a wave that is truly square, binary kind of square, in short the type of waveform that does not exist in real life. In reality most (pretty sure all) of the sounds you hear are a combination of sine waves at different frequencies.
Because your samples are created the way they are they will produce aliasing, where a higher frequency masquerades as a lower frequency causing audio artefacts. To demonstrate this to yourself write a little program which sweeps the frequency of your code from 20-20,000hz. You will hear that the sound does not go up smoothly as it raises in frequency. You will hear artefacts.
Wikipedia has an excellent article on square waves: https://en.m.wikipedia.org/wiki/Square_wave
One way to generate a square wave is to perform an inverse Fast Fourier Transform which transforms a series of frequency measurements into a series of time based samples. Then generating a square wave is a matter of supplying the routine with a collection of the measurements of sin waves at different frequencies that make up a square wave and the output is a buffer with a single cycle of the waveform.
To generate audio waves is computationally expensive so what is often done is to generate arrays of audio samples and play them back at varying speeds to play different frequencies. This is called wave table synthesis.
Have a look at the following link:
https://www.earlevel.com/main/2012/05/04/a-wavetable-oscillator%E2%80%94part-1/
And some more about band limiting a signal and why it’s necessary:
https://dsp.stackexchange.com/questions/22652/why-band-limit-a-signal
I have an audio file and I am iterating through the file and taking 512 samples at each step and then passing them through an FFT.
I have the data out as a block 514 floats long (Using IPP's ippsFFTFwd_RToCCS_32f_I) with real and imaginary components interleaved.
My problem is what do I do with these complex numbers once i have them? At the moment I'm doing for each value
const float realValue = buffer[(y * 2) + 0];
const float imagValue = buffer[(y * 2) + 1];
const float value = sqrt( (realValue * realValue) + (imagValue * imagValue) );
This gives something slightly usable but I'd rather some way of getting the values out in the range 0 to 1. The problem with he above is that the peaks end up coming back as around 9 or more. This means things get viciously saturated and then there are other parts of the spectrogram that barely shows up despite the fact that they appear to be quite strong when I run the audio through audition's spectrogram. I fully admit I'm not 100% sure what the data returned by the FFT is (Other than that it represents the frequency values of the 512 sample long block I'm passing in). Especially my understanding is lacking on what exactly the compex number represents.
Any advice and help would be much appreciated!
Edit: Just to clarify. My big problem is that the FFT values returned are meaningless without some idea of what the scale is. Can someone point me towards working out that scale?
Edit2: I get really nice looking results by doing the following:
size_t count2 = 0;
size_t max2 = kFFTSize + 2;
while( count2 < max2 )
{
const float realValue = buffer[(count2) + 0];
const float imagValue = buffer[(count2) + 1];
const float value = (log10f( sqrtf( (realValue * realValue) + (imagValue * imagValue) ) * rcpVerticalZoom ) + 1.0f) * 0.5f;
buffer[count2 >> 1] = value;
count2 += 2;
}
To my eye this even looks better than most other spectrogram implementations I have looked at.
Is there anything MAJORLY wrong with what I'm doing?
The usual thing to do to get all of an FFT visible is to take the logarithm of the magnitude.
So, the position of the output buffer tells you what frequency was detected. The magnitude (L2 norm) of the complex number tells you how strong the detected frequency was, and the phase (arctangent) gives you information that is a lot more important in image space than audio space. Because the FFT is discrete, the frequencies run from 0 to the nyquist frequency. In images, the first term (DC) is usually the largest, and so a good candidate for use in normalization if that is your aim. I don't know if that is also true for audio (I doubt it)
For each window of 512 sample, you compute the magnitude of the FFT as you did. Each value represents the magnitude of the corresponding frequency present in the signal.
mag
/\
|
| ! !
| ! ! !
+--!---!----!----!---!--> freq
0 Fs/2 Fs
Now we need to figure out the frequencies.
Since the input signal is of real values, the FFT is symmetric around the middle (Nyquist component) with the first term being the DC component. Knowing the signal sampling frequency Fs, the Nyquist frequency is Fs/2. And therefore for the index k, the corresponding frequency is k*Fs/512
So for each window of length 512, we get the magnitudes at specified frequency. The group of those over consecutive windows form the spectrogram.
Just so people know I've done a LOT of work on this whole problem. The main thing I've discovered is that the FFT requires normalisation after doing it.
To do this you average all the values of your window vector together to get a value somewhat less than 1 (or 1 if you are using a rectangular window). You then divide that number by the number of frequency bins you have post the FFT transform.
Finally you divide the actual number returned by the FFT by the normalisation number. Your amplitude values should now be in the -Inf to 1 range. Log, etc, as you please. You will still be working with a known range.
There are a few things that I think you will find helpful.
The forward FT will tend to give larger numbers in the output than in the input. You can think of it as all of the intensity at a certain frequency being displayed at one place rather than being distributed through the dataset. Does this matter? Probably not because you can always scale the data to fit your needs. I once wrote an integer based FFT/IFFT pair and each pass required rescaling to prevent integer overflow.
The real data that are your input are converted into something that is almost complex. As it turns out buffer[0] and buffer[n/2] are real and independent. There is a good discussion of it here.
The input data are sound intensity values taken over time, equally spaced. They are said to be, appropriately enough, in the time domain. The output of the FT is said to be in the frequency domain because the horizontal axis is frequency. The vertical scale remains intensity. Although it isn't obvious from the input data, there is phase information in the input as well. Although all of the sound is sinusoidal, there is nothing that fixes the phases of the sine waves. This phase information appears in the frequency domain as the phases of the individual complex numbers, but often we don't care about it (and often we do too!). It just depends upon what you are doing. The calculation
const float value = sqrt((realValue * realValue) + (imagValue * imagValue));
retrieves the intensity information but discards the phase information. Taking the logarithm essentially just dampens the big peaks.
Hope this is helpful.
If you are getting strange results then one thing to check is the documentation for the FFT library to see how the output is packed. Some routines use a packed format where real/imaginary values are interleaved, or they may begin at the N/2 element and wrap around.
For a sanity check I would suggest creating sample data with known characteristics, eg Fs/2, Fs/4 (Fs = sample frequency) and compare the output of the FFT routine with what you'd expect. Try creating both a sine and cosine at the same frequency, as these should have the same magnitude in the spectrum, but have different phases (ie the realValue/imagValue will differ, but the sum of squares should be the same.
If you're intending on using the FFT though then you really need to know how it works mathematically, otherwise you're likely to encounter other strange problems such as aliasing.