I'm working on my first own gstreamer project and I'm trying to get a grasp on it.
So I have a stream of arrays of size N that I generate. This array consists of unsigned 8 bit integers representing the amplitude between 0 and 255. I want to create a soundsrc that takes this data stream and generates an N frequencies overlayed with each frequency having an amplitude depending on the data stream. How do I do this in broad terms?
Related
Lets say I want to play a sine wave using WASAPI.
Will the data I enter into the AudioClient buffer always just be samples between -1 and 1, or will it be different between PCM and IEEE_Float Formats, and other formats for that matter.
Thanks.
Right now i'm just using 1 to -1, but i want to know whether or not i need to write my buffer input code different for each format.
MEDIASUBTYPE_IEEE_FLOAT / WAVE_FORMAT_IEEE_FLOAT audio types operate with floating point values in [-1, +1] range.
MEDIASUBTYPE_PCM / WAVE_FORMAT_PCM has integer values,
8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767.
You will also find good references here: How to handle asymmetry of WAV data?.
I have some signals which I add up to a larger signal, where each signal is located in a different frequency region.
Now, I perform the FFT operation on the big signal with FFTW and cut the concrete FFT bins (where the signals are located) out.
For example: The big signal is FFT transformed with 1024 points,
the sample rate of the signal is fs=200000.
I calculate the concrete bin positions for given start and stop frequencies in the following way:
tIndex.iStartPos = (int64_t) ((tFreqs.i64fstart) / (mSampleRate / uFFTLen));
and e.g. I get for the first signal to be cut out 16 bins.
Now I do the IFFT transformation again with FFTW and get the 16 complex values back (because I reserved the vector for 16 bins).
But when I compare the extracted signal with the original small signal in MATLAB, then I can see that the original signal (is a wav-File) has xxxxx data and my signal (which I saved as raw binary file) has only 16 complex values.
So how do I obtain the length of the IFFT operation to be correctly transformed? What is wrong here?
EDIT
The logic itself is split over 3 programs, each line is in a multithreaded environment. For that reason I post here some pseudo-code:
ReadWavFile(); //returns the signal data and the RIFF/FMT header information
CalculateFFT_using_CUFFTW(); //calculates FFT with user given parameters, like FFT length, polyphase factor, and applies polyphased window to reduce leakage effect
GetFFTData(); //copy/get FFT data from CUDA device
SendDataToSignalDetector(); //detects signals and returns center frequency and bandwith for each sigal
Freq2Index(); // calculates positions with the returned data from the signal detector
CutConcreteBins(position);
AddPaddingZeroToConcreteBins(); // adds zeros till next power of 2
ApplyPolyphaseAndWindow(); //appends the signal itself polyphase-factor times and applies polyphased window
PerformIFFT_using_FFTW();
NormalizeFFTData();
Save2BinaryFile();
-->Then analyse data in MATLAB (is at the moment in work).
If you have a real signal consisting of 1024 samples, the contribution from the 16 frequency bins of interest could be obtained by multiplying the frequency spectrum by a rectangular window then taking the IFFT. This essentially amounts to:
filling a buffer with zeros before and after the frequency bins of interest
copying the frequency bins of interest at the same locations in that buffer
if using a full-spectrum representation (if you are using fftw_plan_dft_1d(..., FFTW_BACKWARD,... for the inverse transform), computing the Hermitian symmetry for the upper half of the spectrum (or simply use a half-spectrum representation and perform the inverse transform through fftw_plan_dft_c2r_1d).
That said, you would get a better frequency decomposition by using specially designed filters instead of just using a rectangular window in the frequency domain.
The output length of the FT is equal to the input length. I don't know how you got to 16 bins; the FT of 1024 inputs is 1024 bins. Now for a real input (not complex) the 1024 bins will be mirrorwise identical around 512/513, so your FFT library may return only the lower 512 bins for a real input. Still, that's more than 16 bins.
You'll probably need to fill all 1024 bins when doing the IFFT, as it generally doesn't assume that its output will become a real signal. But that's just a matter of mirroring the lower 512 bins then.
How can I draw an spectrum for an given audio file with Bass library?
I mean the chart similar to what Audacity generates:
I know that I can get the FFT data for given time t (when I play the audio) with:
float fft[1024];
BASS_ChannelGetData(chan, fft, BASS_DATA_FFT2048); // get the FFT data
That way I get 1024 values in array for each time t. Am I right that the values in that array are signal amplitudes (dB)? If so, how the frequency (Hz) is associated with those values? By the index?
I am an programmer, but I am not experienced with audio processing at all. So I don't know what to do, with the data I have, to plot the needed spectrum.
I am working with C++ version, but examples in other languages are just fine (I can convert them).
From the documentation, that flag will cause the FFT magnitude to be computed, and from the sounds of it, it is the linear magnitude.
dB = 10 * log10(intensity);
dB = 20 * log10(pressure);
(I'm not sure whether audio file samples are a measurement of intensity or pressure. What's a microphone output linearly related to?)
Also, it indicates the length of the input and the length of the FFT match, but half the FFT (corresponding to negative frequencies) is discarded. Therefore the highest FFT frequency will be one-half the sampling frequency. This occurs at N/2. The docs actually say
For example, with a 2048 sample FFT, there will be 1024 floating-point values returned. If the BASS_DATA_FIXED flag is used, then the FFT values will be in 8.24 fixed-point form rather than floating-point. Each value, or "bin", ranges from 0 to 1 (can actually go higher if the sample data is floating-point and not clipped). The 1st bin contains the DC component, the 2nd contains the amplitude at 1/2048 of the channel's sample rate, followed by the amplitude at 2/2048, 3/2048, etc.
That seems pretty clear.
In other words, I am trying to play a .wav file and for doing this I need to know the frequency and how much it lasts; the API I am using has a method that need as parameter a vector with two fields (frequency and time)!
I tried to use fast fourier transformation but it gives me the frequency and the magnitude!
mag
/\
|
|
-|------> freq
But I need something like this:
freq
/\
|
|
-|------>time
I want to know if is possible to get these informations from a wav file!
A digital audio signal is series of pairs (amplitude, time).
Or you can say it is a function of time.
If you take a sequence of an audio signal and perform a Fourier Transformation (DFT/FFT) on this sequence, you will get a new sequence which contains pairs of (amplitude, frequency).
Or you can say a function of frequency.
This sequence describes the properties of a signal in frequency domain.
It does not contain any time information at all.
I guess, what you want, is a function, which describes the change of an audio signal's frequency components over time. This can not be done by a simple FFT.
What you can do is:
Take N samples of the audio data stream, samples (0, ..., N-1)
Perform a FFT
Take another N samples of the audio data stream, (m, ..., m+N-1) with m << N
Perform a FFT
Take another N samples of the audio data stream, (2m, ..., 2m+N-1)
Perform a FFT
and so on
If your sampling time is ts, you will get a new frequency analysis after T = m*ts.
Maybe, that is what you you want.
I need to create an Integer Sequence from an Audio file. I was checking the waveform libraries as that draw a linear graph. But I am searching for the key information, What is the source of the integer that is used to draw the graph ? is it amplitude ? frequency ? or something else ? There are libraries available but I need to know what unit of information I need to extract to have a data that I can feed to a graph. However drawing a graph is not my objective. I just want that raw integer array.
Of course, it's the amplitudes what you need to get a wave oscillogram, and it's the way PCM data are stored in wav files, for example (data which come directly after the file header). Note that there are 8-bit and 16-bit formats, the latter may be also big-endian or little-endian depending on the byte order (just to keep you aware of it).
Audio is simply a curve - when you plot it with time across the X axis then Y axis is amplitude - similar to plotting a sin math function - each point on the curve is a number which gets stored in the audio file - WAV format this number typically is a 16 bit unsigned integer - so ignoring the 44 byte header - the rest of the file is just a sequence of these integer numbers. When this curve varies up and down quickly over time the frequency is higher than if the curve varies more slowly over time. If you download the audio workbench application : Audacity you can view this curve of any audio file (WAV, mp3,...)