Drawing audio spectrum with Bass library - c++

How can I draw an spectrum for an given audio file with Bass library?
I mean the chart similar to what Audacity generates:
I know that I can get the FFT data for given time t (when I play the audio) with:
float fft[1024];
BASS_ChannelGetData(chan, fft, BASS_DATA_FFT2048); // get the FFT data
That way I get 1024 values in array for each time t. Am I right that the values in that array are signal amplitudes (dB)? If so, how the frequency (Hz) is associated with those values? By the index?
I am an programmer, but I am not experienced with audio processing at all. So I don't know what to do, with the data I have, to plot the needed spectrum.
I am working with C++ version, but examples in other languages are just fine (I can convert them).

From the documentation, that flag will cause the FFT magnitude to be computed, and from the sounds of it, it is the linear magnitude.
dB = 10 * log10(intensity);
dB = 20 * log10(pressure);
(I'm not sure whether audio file samples are a measurement of intensity or pressure. What's a microphone output linearly related to?)
Also, it indicates the length of the input and the length of the FFT match, but half the FFT (corresponding to negative frequencies) is discarded. Therefore the highest FFT frequency will be one-half the sampling frequency. This occurs at N/2. The docs actually say
For example, with a 2048 sample FFT, there will be 1024 floating-point values returned. If the BASS_DATA_FIXED flag is used, then the FFT values will be in 8.24 fixed-point form rather than floating-point. Each value, or "bin", ranges from 0 to 1 (can actually go higher if the sample data is floating-point and not clipped). The 1st bin contains the DC component, the 2nd contains the amplitude at 1/2048 of the channel's sample rate, followed by the amplitude at 2/2048, 3/2048, etc.
That seems pretty clear.

Related

Swift. frequency of sound got from vDSP.DCT output differs from iPhone and iPad

I'm trying to figure out the amplitude of each frequency of sound captured by microphone.
Just like this example https://developer.apple.com/documentation/accelerate/visualizing_sound_as_an_audio_spectrogram
I captured sample from microphone to sample buffer, copy to a circle buffer, and then performed ForwardDCT on it, just like this:
func processData(values: [Int16]) {
vDSP.convertElements(of: values,
to: &timeDomainBuffer)
vDSP.multiply(timeDomainBuffer,
hanningWindow,
result: &timeDomainBuffer)
forwardDCT.transform(timeDomainBuffer,
result: &frequencyDomainBuffer)
vDSP.absolute(frequencyDomainBuffer,
result: &frequencyDomainBuffer)
vDSP.convert(amplitude: frequencyDomainBuffer,
toDecibels: &frequencyDomainBuffer,
zeroReference: Float(Microphone.sampleCount))
if frequencyDomainValues.count > Microphone.sampleCount {
frequencyDomainValues.removeFirst(Microphone.sampleCount)
}
frequencyDomainValues.append(contentsOf: frequencyDomainBuffer)
}
the timeDomainBuffer is the float16 Array contains samples counting sampleCount,
while the frequencyDomainBuffer is the amplitude of each frequency, frequency is denoted as it's array index with it's value expressing amplitude of this frequency.
I'm trying to get amplitude of each frequency, just like this:
for index in frequencyDomainBuffer{
let frequency = index * (AVAudioSession().sampleRate/Double(Microphone.sampleCount)/2)
}
I supposed the index of freqeuencyDomainBuffer will be linear to the actual frequency, so sampleRate divided by half of sampleCount will be correct. (sampleCount is the timeDomainBuffer length)
The result is correct when running on my iPad, but the frequency got 10% higher when on iPhone.
I'm dubious whether AVAudioSession().sampleRate can be used on iPhone?
Of course I can add a condition like if iPhone, but I'd like to know why and will it be correct on other devices I haven't tested on?
If you're seeing a consistent 10% difference, I'm betting it's actually an 8.9% difference. I haven't studied your code, but I'd look for a hard-coded 44.1kHz somewhere. The sample rate on iPhones is generally 48kHz.
Remember also that the bins are (as you suspect) proportional to the sampling rate. So at different sampling rates the center of the bins are different. Depending on the number of bins you're using, this could represent a large difference (not really an "error" since the bins are ranges, but if you assume it's precisely the center frequency, this could match your 10%).

Compute FFT in frequency axis when signal is in rawData in Matlab

I have a signal of frequency 10 MHz sampled at 100 MS/sec. How to compute FFT in matlab in terms of frequency when my signal is in rawData (length of this rawData is 100000), also
what should be the optimum length of NFFT.(i.e., on what factor does NFFT depend)
why does my Amplitude (Y axis) change with NFFT
whats difference between NFFT, N and L. How to compute length of a signal
How to separate Noise and signal from a single signal (which is in rawData)
Here is my code,
t=(1:40);
f=10e6;
fs=100e6;
NFFT=1024;
y=abs(rawData(:1000,2));
X=abs(fft(y,NFFT));
f=[-fs/2:fs/NFFT:(fs/2-fs/NFFT)];
subplot(1,1,1);
semilogy(f(513:1024),X(513:1024));
axis([0 10e6 0 10]);
As you can find the corresponding frequencies in another post, I will just answer your other questions:
Including all your data is most of the time the best option. fft just truncates your input data to the requested length, which is probably not what you want. If you known the period of your input single, you can truncate it to include a whole number of periods. If you don't know it, a window (ex. Hanning) may be interesting.
If you change NFFT, you use more data in your fft calculation, which may change the amplitude for a given frequency slightly. You also calculate the amplitude at more frequencies between 0 and Fs/2 (half of the sampling frequency).
Question is not clear, please provide the definition of N and L.
It depends on your application. If the noise is at the same frequency as your signal, you are not able to separate it. Otherwise, you can a filter (ex. bandpass) to extract the frequencies of interest.

C++ mathematical function generation

In working on a project I came across the need to generate various waves, accurately. I thought that a simple sine wave would be the easiest to begin with, but it appears that I am mistaken. I made a simple program that generates a vector of samples and then plays those samples back so that the user hears the wave, as a test. Here is the relevant code:
vector<short> genSineWaveSample(int nsamples, float freq, float amp) {
vector<short> samples;
for(float i = 0; i <= nsamples; i++) {
samples.push_back(amp * sinx15(freq*i));
}
return samples;
}
I'm not sure what the issue with this is. I understand that there could be some issue with the vector being made of shorts, but that's what my audio framework wants, and I am inexperienced with that kind of library and so do not know what to expect.
The symptoms are as follows:
frequency not correct
ie: given freq=440, A4 is not the note played back
strange distortion
Most frequencies do not generate a clean wave. 220, 440, 880 are all clean, most others are distorted
Most frequencies are shifted upwards considerably
Can anyone give advice as to what I may be doing wrong?
Here's what I've tried so far:
Making my own sine function, for greater accuracy.
I used a 15th degree Taylor Series expansion for sin(x)
Changed the sample rate, anything from 256 to 44100, no change can be heard given the above errors, the waves are simply more distorted.
Thank you. If there is any information that can help you, I'd be obliged to provide it.
I suspect that you are passing incorrect values to your sin15x function. If you are familiar with the basics of signal processing the Nyquist frequency is the minimum frequency at which you can faithful reconstruct (or in your case construct) a sampled signal. The is defined as 2x the highest frequency component present in the signal.
What this means for your program is that you need at last 2 values per cycle of the highest frequency you want to reproduce. At 20Khz you'd need 40,000 samples per second. It looks like you are just packing a vector with values and letting the playback program sort out the timing.
We will assume you use 44.1Khz as your playback sampling frequency. This means that a snipet of code producing one second of a 1kHz wave would look like
DataStructure wave = new DataStructure(44100) // creates some data structure of 44100 in length
for(int i = 0; i < 44100; i++)
{
wave[i] = sin(2*pi * i * (frequency / 44100) + pi / 2) // sin is in radians, frequency in Hz
}
You need to divide by the frequency, not multiply. To see this, take the case of a 22,050 Hz frequency value is passed. For i = 0, you get sin(0) = 1. For i = 1, sin(3pi/2) = -1 and so on are so forth. This gives you a repeating sequence of 1, -1, 1, -1... which is the correct representation of a 22,050Hz wave sampled at 44.1Khz. This works as you go down in frequency but you get more and more samples per cycle. Interestingly though this does not make a difference. A sinewave sampled at 2 samples per cycle is just as accurately recreated as one that is sampled 1000 times per second. This doesn't take into account noise but for most purposes works well enough.
I would suggest looking into the basics of digital signal processing as it a very interesting field and very useful to understand.
Edit: This assumes all of those parameters are evaluated as floating point numbers.
Fundamentally, you're missing a piece of information. You don't specify the amount of time over which you want your samples taken. This could also be thought of as the rate at which the samples will be played by your system. Something roughly in this direction will get you closer, for now, though.
samples.push_back(amp * std::sin(M_PI / freq *i));

Length of FFT and IFFT

I have some signals which I add up to a larger signal, where each signal is located in a different frequency region.
Now, I perform the FFT operation on the big signal with FFTW and cut the concrete FFT bins (where the signals are located) out.
For example: The big signal is FFT transformed with 1024 points,
the sample rate of the signal is fs=200000.
I calculate the concrete bin positions for given start and stop frequencies in the following way:
tIndex.iStartPos = (int64_t) ((tFreqs.i64fstart) / (mSampleRate / uFFTLen));
and e.g. I get for the first signal to be cut out 16 bins.
Now I do the IFFT transformation again with FFTW and get the 16 complex values back (because I reserved the vector for 16 bins).
But when I compare the extracted signal with the original small signal in MATLAB, then I can see that the original signal (is a wav-File) has xxxxx data and my signal (which I saved as raw binary file) has only 16 complex values.
So how do I obtain the length of the IFFT operation to be correctly transformed? What is wrong here?
EDIT
The logic itself is split over 3 programs, each line is in a multithreaded environment. For that reason I post here some pseudo-code:
ReadWavFile(); //returns the signal data and the RIFF/FMT header information
CalculateFFT_using_CUFFTW(); //calculates FFT with user given parameters, like FFT length, polyphase factor, and applies polyphased window to reduce leakage effect
GetFFTData(); //copy/get FFT data from CUDA device
SendDataToSignalDetector(); //detects signals and returns center frequency and bandwith for each sigal
Freq2Index(); // calculates positions with the returned data from the signal detector
CutConcreteBins(position);
AddPaddingZeroToConcreteBins(); // adds zeros till next power of 2
ApplyPolyphaseAndWindow(); //appends the signal itself polyphase-factor times and applies polyphased window
PerformIFFT_using_FFTW();
NormalizeFFTData();
Save2BinaryFile();
-->Then analyse data in MATLAB (is at the moment in work).
If you have a real signal consisting of 1024 samples, the contribution from the 16 frequency bins of interest could be obtained by multiplying the frequency spectrum by a rectangular window then taking the IFFT. This essentially amounts to:
filling a buffer with zeros before and after the frequency bins of interest
copying the frequency bins of interest at the same locations in that buffer
if using a full-spectrum representation (if you are using fftw_plan_dft_1d(..., FFTW_BACKWARD,... for the inverse transform), computing the Hermitian symmetry for the upper half of the spectrum (or simply use a half-spectrum representation and perform the inverse transform through fftw_plan_dft_c2r_1d).
That said, you would get a better frequency decomposition by using specially designed filters instead of just using a rectangular window in the frequency domain.
The output length of the FT is equal to the input length. I don't know how you got to 16 bins; the FT of 1024 inputs is 1024 bins. Now for a real input (not complex) the 1024 bins will be mirrorwise identical around 512/513, so your FFT library may return only the lower 512 bins for a real input. Still, that's more than 16 bins.
You'll probably need to fill all 1024 bins when doing the IFFT, as it generally doesn't assume that its output will become a real signal. But that's just a matter of mirroring the lower 512 bins then.

audio waveform to Integer sequence

I need to create an Integer Sequence from an Audio file. I was checking the waveform libraries as that draw a linear graph. But I am searching for the key information, What is the source of the integer that is used to draw the graph ? is it amplitude ? frequency ? or something else ? There are libraries available but I need to know what unit of information I need to extract to have a data that I can feed to a graph. However drawing a graph is not my objective. I just want that raw integer array.
Of course, it's the amplitudes what you need to get a wave oscillogram, and it's the way PCM data are stored in wav files, for example (data which come directly after the file header). Note that there are 8-bit and 16-bit formats, the latter may be also big-endian or little-endian depending on the byte order (just to keep you aware of it).
Audio is simply a curve - when you plot it with time across the X axis then Y axis is amplitude - similar to plotting a sin math function - each point on the curve is a number which gets stored in the audio file - WAV format this number typically is a 16 bit unsigned integer - so ignoring the 44 byte header - the rest of the file is just a sequence of these integer numbers. When this curve varies up and down quickly over time the frequency is higher than if the curve varies more slowly over time. If you download the audio workbench application : Audacity you can view this curve of any audio file (WAV, mp3,...)