Im reconstructing signal from amplitude, frequency and phase obtained fft. After I do fft, I picked some of its frequencies and reconstructed time line signal from those fft data. I know IFFT is for this but, I dont want to use IFFT.
Reconstruction seems fine but theres some time lag between two signals. This image shows this problem. Black one is the original signal and red one is that reconstructed.
If I know correctly, amplitude of frequency bin t is sqrt(real[t]*real[t] + imag[t]*imag[t] and phase is atan2(imag[t], real[t]).
So, I used formula amplitude * cos(2*π*x / frequency + phase) for a frequency bin. And I summed those regenerated waves. As far as I know, this should generate intact signal fits to original signal. But it ends up always with some time lag from original signal.
Yeah, I think its about phase but thats so simple to calculate and its working correctly. If it has error, reconstructed signal would not fit to its original signal in shape.
This is the code to generate cosine wave. I generated cosine wave from sin(x + π/2).
std::vector<short> encodeSineWavePCM(
const double frequency,
const double amplitude,
const double offSetPhase)
{
const double pi = 3.1415926535897932384626;
const int N = 44100; // 1 sec length wave
std::vector<short> s(N);
const double wavelength = 1.0 * N / frequency;
const double unitlength = 2 * pi / wavelength;
for (int i = 0; i<N; i ++) {
double val = sin(offSetPhase + i * unitlength);
val *= amplitude;
s[i] = (short)val;
}
return s;
}
What am I missing?
Quite normal. You're doing a frame-by-frame transform. That means the FFT frame is produced after one time frame. When transforming back, you have the inverse effect: your time frame starts after the FFT frame has been decoded.
Related
When I am using Intel IPP's ippsFFTFwd_RToCCS_64f and then ippsMagnitude_64fc I get a massive peak at zero index in magnitudes array.
My sine wave is long and main component I am interested is somewhere between 0.15 Hz and 0.25 Hz. I take the sample with 500Hz sampling frequency. If I reduce mean from the signal before FFT I get really small zero component not that peak anymore. Below is a pic of magnitudes array head:
Also the magnitude scaling seems to be 10 times the magnitude I see in the time series of the signal e.g. if amplitude is 29 in magnitudes it is 290.
I Am not sure why this is so and my question is 1. Do I really need to address the zero index peak with mean reduction and 2. Where does this scale of 10 come?
void CalculateForwardTransform(array<double> ^signal, array<double> ^transformedSignal, array<double> ^magnitudes)
{
// source signal
pin_ptr<double> pinnedSignal = &signal[0];
double *pSignal = pinnedSignal;
int order = (int)Math::Round(Math::Log(signal->Length, 2));
// get sizes
int sizeSpec = 0, sizeInit = 0, sizeBuf = 0;
int status = ippsFFTGetSize_R_64f(order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, &sizeSpec, &sizeInit, &sizeBuf);
// memory allocation
IppsFFTSpec_R_64f* pSpec;
Ipp8u *pSpecMem = (Ipp8u*)ippMalloc(sizeSpec);
Ipp8u *pMemInit = (Ipp8u*)ippMalloc(sizeInit);
// FFT specification structure initialized
status = ippsFFTInit_R_64f(&pSpec, order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, pSpecMem, pMemInit);
// transform
pin_ptr<double> pinnedTransformedSignal = &transformedSignal[0];
double *pDst = pinnedTransformedSignal;
Ipp8u *pBuffer = (Ipp8u*)ippMalloc(sizeBuf);
status = ippsFFTFwd_RToCCS_64f(pSignal, pDst, pSpec, pBuffer);
// get magnitudes
pin_ptr<double> pinnedMagnitudes = &magnitudes[0];
double *pMagn = pinnedMagnitudes;
status = ippsMagnitude_64fc((Ipp64fc*)pDst, pMagn, magnitudes->Length); // magnitudes is half of signal len
// free memory
ippFree(pSpecMem);
ippFree(pMemInit);
ippFree(pBuffer);
}
Do I really need to address the zero index peak with mean reduction?
For low frequency signal analysis a small bias can really interfere (especially due to spectral leakage). For sake of illustration, consider the following low-frequency signal tone and another one with a constant bias tone_with_bias:
fs = 1;
f0 = 0.15;
for (int i = 0; i < N; i++)
{
tone[i] = 0.001*cos(2*M_PI*i*f0/fs);
tone_with_bias[i] = 1 + tone[i];
}
If we plot the frequency spectrum of a 100 sample window of these signals, you should notice that the spectrum of tone_with_bias completely drowns out the spectrum of tone:
So yes it's better if you can remove that bias. However, it should be emphasized that this is possible provided that you know the nature of the bias. If you know that the bias is indeed a constant, removing it will reveal the low-frequency component. Otherwise, removing the mean from the signal may not achieve the desired effect if the bias is just a very low-frequency component.
Where does this scale of 10 come?
Scaling of the magnitude by the FFT should be expected, as described in this answer of approximately 0.5*N (where N is the FFT size). If you were processing a small chunk of 20 samples, then you should get such a factor of 10 scaling. If you scale the output of the FFT by 2/N (or equivalently scale by 2 while also using the IPP_FFT_DIV_FWD_BY_N flag) you should get results that have similar magnitudes as the time-domain signal.
I am currently creating a C code, which takes as an input a wav file (specifically just one channel of the original wav file), and it performs the short-time Fourier transform.
The main part of the code is this one:
stft_data = (fftw_complex*)(fftw_malloc(sizeof(fftw_complex)*windowSize));
fft_result= (fftw_complex*)(fftw_malloc(sizeof(fftw_complex)*windowSize));
storage = (fftw_complex*)(fftw_malloc(sizeof(fftw_complex)*storage_capacity));
//define the fftw plane
fftw_plan plan_forward;
plan_forward = fftw_plan_dft_1d(windowSize, stft_data, fft_result, FFTW_FORWARD, FFTW_ESTIMATE);
//integer indexes
int i,counter ;
counter = 0 ;
//create a Hamming window
double hamming_result[windowSize];
hamming(windowSize, hamming_result);
//implement the stft position indexes
int chunkPosition = 0; //actual chunk position
int readIndex ; //read the index of the wav file
while (chunkPosition < wav_length ){
//read the window
for(i=0; i<windowSize; i++){
readIndex = chunkPosition + i;
if (readIndex < wav_length){
stft_data[i] = wav_data[readIndex]*hamming_result[i]*_Complex_I + 0.0*I;
}
else{
//if we are beyond the wav_length
stft_data[i] = 0.0*_Complex_I + 0.0*I;//padding
break;
}
}
//compute the fft
fftw_execute(plan_forward);
//store the stft in a data structure
for (i=0; i<windowSize;i++)
{
//printf("RE: %.2f IM: %.2f\n", creal(fft_result[i]),cimag(fft_result[i]));
storage[counter] = creal(fft_result[i]) + cimag(fft_result[i]);
counter+=1;
}
//update indexes
chunkPosition += hop_size;
printf("Chunk Position %d\n", chunkPosition);
printf("Counter position %d\n", counter);
printf("Fourier transform done\n");
}
Once the FFT has been computed onto the selected window, I am storing the FFT real and imaginary part into a storage variable.
After that I would like to compute the cross correlation among the data points in each of the N windows I have in the end.
As an example, I would like to compute the correlation between the first data point of the first window ( storage[0] ) with the first element of the second window (storage[windowSize+1]).
However, I am facing some problems and I don't have reasonable values. According to what I studied, the correlation in the Fourier space it is just the complex multiplication between two Fourier terms. Thus,
what I am doing is something like :
correlation = storage[0]*conj(storage[windowSize+1]);
However, I got very huge values, which makes me wonder if I am really computing a correlation.
Where am I wrong?
How should I scale my correlation results?
How should I compute the correlation with the Fourier values?
and then, how should I plot the Fourier values I have from FFTW3 calculations? should I shift all the values or are they already shifted?
Thanks very much
The line storage[counter] = creal(fft_result[i]) + cimag(fft_result[i]); makes storage purely real. Since computing correlation = storage[0]*conj(storage[windowSize+1]); is the next step in the computation of the cross correlation, there is a problem. Indeed, there is no point in conjugating a real number.
Trying storage[counter] = fft_result[i]; could partly resolve the issue.
In addition, correlation = storage[0]*conj(storage[windowSize+1]); should be modified to correlation = storage[0]*conj(storage[windowSize]);
By performing correlation = storage[0]*conj(storage[windowSize]);, the magnitude of index [0] of the DFT of the correlation is obtained. Indeed, storage[0] corresponds to the average of the first frame, while storage[windowSize] corresponds to the average of the second frame. It is not equal to the averages, but much larger, as it is scaled by the length of the frame windowSize.
To compute the correlation between the two signals, the next step should be:
for (i=0; i<windowSize;i++)
{
dftofcorrelation[i]=storage[i]*conj(storage[i+windowSize]
}
Then, the inverse DFT must be applied to the array dftofcorrelation to get the correlation as an array. It must be kept in mind that neither the forward nor the backward DFT of FFTW include any scaling, see what FFTW really computes:
fftw_execute(plan_backward);
If two scalars are to be retained of this correlation array, it's its maximum (high if the signal are similar up to a delay) and the index of the maximum, that is the estimated time offset between the two signals.
The overall scaling induced by FFTW is a power of windowSize (windowSize^3?). It can be checked by computing the autocorrelation of a uniform signal (which is uniform).
I am using FFTW to create a spectrum analyzer in C++.
After applying any window function to an input signal, the output amplitude suddenly seems to scale with frequency.
Retangular Window
Exact-Blackman
Graphs are scaled logarithmically with a sampling frequency of 44100 Hz. All harmonics are generated at the same level, peaking at 0dB as seen during the rectangular case. The Exact-Blackman window was amplified by 7.35dB to attempt to makeup for processing gain.
Here is my code for generating the input table...
freq = 1378.125f;
for (int i = 0; i < FFT_LOGICAL_SIZE; i++)
{
float term = 2 * PI * i / FFT_ORDER;
for (int h = 1; freq * h < FREQ_NYQST; h+=1) // Harmonics up to Nyquist
{
fftInput[i] += sinf(freq * h * K_PI * i / K_SAMPLE_RATE); // Generate sine
fftInput[i] *= (7938 / 18608.f) - ((9240 / 18608.f) * cosf(term)) + ((1430 / 18608.f) * cosf(term * 2)); // Exact-Blackman window
}
}
fftwf_execute(fftwR2CPlan);
Increasing or decreasing the window size changes nothing. I tested with the Hamming window as well, same problem.
Here is my code for grabbing the output.
float val; // Used elsewhere
for (int i = 1; i < K_FFT_COMPLEX_BINS_NOLAST; i++) // Skips the DC and Nyquist bins
{
real = fftOutput[i][0];
complex = fftOutput[i][1];
// Grabs the values and scales based on the window size
val = sqrtf(real * real + complex * complex) / FFT_LOGICAL_SIZE_OVER_2;
val *= powf(20, 7.35f / 20); // Only applied during Exact-Blackman test
}
Curiously, I attempted the following to try to flatten out the response in the Exact-Blackman case. This scaling back down resulted in a nearly, but still not perfectly flat response. Neat, but still doesn't explain to me why this is happening.
float x = (float)(FFT_COMPLEX_BINS - i) / FFT_COMPLEX_BINS; // Linear from 0 to 1
x = log10f((x * 9) + 1.3591409f); // Now logarithmic from 0 to 1, offset by half of Euler's constant
val = sqrt(real * real + complex * complex) / (FFT_LOGICAL_SIZE_OVER_2 / x); // Division by x added to this line
Might be a bug. You seem to be applying your window function multiple times per sample. Any windowing should be removed from your input compositing loop and applied to the input vector just once, right before the FFT.
I was not able to reproduce code because I do not have the library on hand. However, This may be a consequence of spectral leakage. https://en.wikipedia.org/wiki/Spectral_leakage
This is an inevevitiblity of window functions as well as sampling. If you look at the tradeoffs section of that article, the type of window can be adaptive for a wide range of frequencies or focused on a particular one. Since the frequency of your signal is increasing perhaps the lower freq signal outside your target is more subjected to spectral leakage.
I am using fftw to analyse the frequency spectrum of audio input to a computer from the mic input. I am using portaudio c++ libraries to capture the windows of time-domain audio data and then fftw to do a real to complex r2c transformation of this data to the frequency domain. Below is my function which I call everytime I receive the block of data.
The sample rate is 44100 samples per second , the sample type is short (signed 16 bit integer)and I am taking 250ms blocks of data in each window. The fft resolution is therefore 4Hz.
The problem is , i'm not sure how to interpret the data which I am receiving after the transformation. When no audio is played , I am getting amplitudes of around 1000 to 4000 for every frequency component, as soon as audio is played from an instrument for example, all of the amplitudes go negative.
I have tried doing a normalisation before the fft, by dividing by the average power and then the data makes more sense. All amplitudes are from 200 to 500 when nothing is played, then for example if I play a tone of 76Hz, the amplitude for this component increases to around 2000. So that is something along the lines of what I expect, but still not sure if this process can be implemented better.
My question is, am I doing the right thing here? Must the data be normalised and am I doing it right? Why am I still receiving high amplitudes on the frequencies that are not being played. Has anyone any experience of doing something similar and maybe give some tips. Many thanks in advance.
void AudioProcessor::GetFFT(void* inputData, void* freqSpectrum)
{
double* input = (double*)inputData;
short* freq_spectrum = (short*)freqSpectrum;
fftPlan = fftw_plan_dft_r2c_1d(FRAMES_PER_BUFFER, input, complexOut, FFTW_ESTIMATE);
fftw_execute(fftPlan);
////
for (int k = 0; k < (FRAMES_PER_BUFFER + 1) / 2; ++k)
{
freq_spectrum[k] = (short)(sqrt(complexOut[k][0] * complexOut[k][0] + complexOut[k][1] * complexOut[k][1]));
}
if (FRAMES_PER_BUFFER % 2 == 0) /* frames per buffer is even number */
{
freq_spectrum[FRAMES_PER_BUFFER / 2] = (short)(sqrt(complexOut[FRAMES_PER_BUFFER / 2][0] * complexOut[FRAMES_PER_BUFFER / 2][0] + complexOut[FRAMES_PER_BUFFER / 2][1] * complexOut[FRAMES_PER_BUFFER / 2][1])); /* Nyquist freq. */
}
}
I'm trying to analyse the audio quality of a file by detecting the highest frequency present (compressed audio will generally be filtered to something less than 20KHz).
I'm reading WAV file data using a class from the soundstretch library which returns PCM samples as floats, then performing FFT on those samples with the fftw3 library. Then for each frequency (rounded to the nearest KHz), I am totalling up the amplitude for that frequency.
So for a low quality file that doesn't contain frequencies above 16KHz, I would expect there to be none or very little amplitude above 16KHz, however I'm not getting the results I would expect. Below is my code:
#include <iostream>
#include <math.h>
#include <fftw3.h>
#include <soundtouch/SoundTouch.h>
#include "include/WavFile.h"
using namespace std;
using namespace soundtouch;
#define BUFF_SIZE 6720
#define MAX_FREQ 22//KHz
static float freqMagnitude[MAX_FREQ];
static void calculateFrequencies(fftw_complex *data, size_t len, int Fs) {
for (int i = 0; i < len; i++) {
int re, im;
float freq, magnitude;
int index;
re = data[i][0];
im = data[i][1];
magnitude = sqrt(re * re + im * im);
freq = i * Fs / len;
index = freq / 1000;//round(freq);
if (index <= MAX_FREQ) {
freqMagnitude[index] += magnitude;
}
}
}
int main(int argc, char *argv[]) {
if (argc < 2) {
cout << "Incorrect args" << endl;
return -1;
}
SAMPLETYPE sampleBuffer[BUFF_SIZE];
WavInFile inFile(argv[1]);
fftw_complex *in, *out;
fftw_plan p;
in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * BUFF_SIZE);
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * BUFF_SIZE);
p = fftw_plan_dft_1d(BUFF_SIZE, in, out, FFTW_FORWARD, FFTW_ESTIMATE);
while (inFile.eof() == 0) {
size_t samplesRead = inFile.read(sampleBuffer, BUFF_SIZE);
for (int i = 0; i < BUFF_SIZE; i++) {
in[i][0] = (double) sampleBuffer[i];
}
fftw_execute(p); /* repeat as needed */
calculateFrequencies(out, samplesRead, inFile.getSampleRate());
}
for (int i = 0; i < MAX_FREQ; i += 2) {
cout << i << "KHz magnitude: " << freqMagnitude[i] << std::endl;
}
fftw_destroy_plan(p);
fftw_free(in);
fftw_free(out);
}
Can compile with: - (you'll need the soundtouch library and fftw3 library)
g++ -g -Wall MP3.cpp include/WavFile.cpp -lfftw3 -lm -lsoundtouch -I/usr/local/include -L/usr/local/lib
And here is the spectral analysis of the file I am testing on:
As you can see it's clipped at 16KHz, however my results are as follows:
0KHz magnitude: 4.61044e+07
2KHz magnitude: 5.26959e+06
4KHz magnitude: 4.68766e+06
6KHz magnitude: 4.12703e+06
8KHz magnitude: 12239.6
10KHz magnitude: 456
12KHz magnitude: 3
14KHz magnitude: 650468
16KHz magnitude: 1.83266e+06
18KHz magnitude: 1.40232e+06
20KHz magnitude: 1.1477e+06
I would expect there to be no amplitude over 16KHz, am I doing this right?
Is my calculation for frequency correct? (I robbed it off another stackoverflow answer)
Could it be something to do with there being 2 channels and I'm not separating the channels?
Cheers for any help guys.
You are likely measuring the interleave difference between two stereo channels, which can include high frequencies due to unequal mix and pan. Try again with the channels separated or mixed down to mono, and use a smooth window function to reduce FFT aperture edge artifacts, which can also introduce a small amount of high frequency noise due to your rectangular window.
An FFT foundamental requirement is the equally time spacing of samples and their congruence.
In your case a stereo signal supply to the FFT algorithm double the number of samples uncorrelated between themself. What is mathematically seen is the natural phase difference between the two cannels, but, more important, two samples that, because unrelated, can have such a big difference to wrongly represent a square wave (in the time domain it would be represented by an extremely high signal slew rate).
As a solution you have to separate the two channels and perform FFT on one single series of samples, or two different FFT.
I don't think that there could be any aliasing problem because this is normally related to the sampling process and performed using analog filter having bandpass frequency < 1/2 the sampling frequence (Nyquist or antialias filter). If this filtering is missed there are almost no way to remove ghosts (alias spectrum) after.
I speak as someone with very slight real-world experience and book learning over a decade ago so this answer might be evidence of a little knowledge being a dangerous thing but I think the problem you're seeing is just aliasing.
Imagine a perfect square wave. You've never heard a perfect square wave because it would require a sound source instantly to transition from one position to another, while still pushing air particles about.
You also can't describe a square wave with a finite number of harmonics. However, you can trivially describe a square wave with any frequency of PCM audio. Therefore any source PCM audio can appear to contain an infinite number of harmonics.
What you can probably do is just sit atop Nyquist and say that if the input audio is N Mhz then the highest-frequency part that can be actual signal is at N/2 Mhz; therefore you can resample the input wave down to twice the first rate less than or equal to N/2 Mhz that shows significant signal without losing meaningful content.