FFT window causing unequal amplification across frequency spectrum - c++

I am using FFTW to create a spectrum analyzer in C++.
After applying any window function to an input signal, the output amplitude suddenly seems to scale with frequency.
Retangular Window
Exact-Blackman
Graphs are scaled logarithmically with a sampling frequency of 44100 Hz. All harmonics are generated at the same level, peaking at 0dB as seen during the rectangular case. The Exact-Blackman window was amplified by 7.35dB to attempt to makeup for processing gain.
Here is my code for generating the input table...
freq = 1378.125f;
for (int i = 0; i < FFT_LOGICAL_SIZE; i++)
{
float term = 2 * PI * i / FFT_ORDER;
for (int h = 1; freq * h < FREQ_NYQST; h+=1) // Harmonics up to Nyquist
{
fftInput[i] += sinf(freq * h * K_PI * i / K_SAMPLE_RATE); // Generate sine
fftInput[i] *= (7938 / 18608.f) - ((9240 / 18608.f) * cosf(term)) + ((1430 / 18608.f) * cosf(term * 2)); // Exact-Blackman window
}
}
fftwf_execute(fftwR2CPlan);
Increasing or decreasing the window size changes nothing. I tested with the Hamming window as well, same problem.
Here is my code for grabbing the output.
float val; // Used elsewhere
for (int i = 1; i < K_FFT_COMPLEX_BINS_NOLAST; i++) // Skips the DC and Nyquist bins
{
real = fftOutput[i][0];
complex = fftOutput[i][1];
// Grabs the values and scales based on the window size
val = sqrtf(real * real + complex * complex) / FFT_LOGICAL_SIZE_OVER_2;
val *= powf(20, 7.35f / 20); // Only applied during Exact-Blackman test
}
Curiously, I attempted the following to try to flatten out the response in the Exact-Blackman case. This scaling back down resulted in a nearly, but still not perfectly flat response. Neat, but still doesn't explain to me why this is happening.
float x = (float)(FFT_COMPLEX_BINS - i) / FFT_COMPLEX_BINS; // Linear from 0 to 1
x = log10f((x * 9) + 1.3591409f); // Now logarithmic from 0 to 1, offset by half of Euler's constant
val = sqrt(real * real + complex * complex) / (FFT_LOGICAL_SIZE_OVER_2 / x); // Division by x added to this line

Might be a bug. You seem to be applying your window function multiple times per sample. Any windowing should be removed from your input compositing loop and applied to the input vector just once, right before the FFT.

I was not able to reproduce code because I do not have the library on hand. However, This may be a consequence of spectral leakage. https://en.wikipedia.org/wiki/Spectral_leakage
This is an inevevitiblity of window functions as well as sampling. If you look at the tradeoffs section of that article, the type of window can be adaptive for a wide range of frequencies or focused on a particular one. Since the frequency of your signal is increasing perhaps the lower freq signal outside your target is more subjected to spectral leakage.

Related

Intel integrated performance primitives Fourier Transform magnitudes

When I am using Intel IPP's ippsFFTFwd_RToCCS_64f and then ippsMagnitude_64fc I get a massive peak at zero index in magnitudes array.
My sine wave is long and main component I am interested is somewhere between 0.15 Hz and 0.25 Hz. I take the sample with 500Hz sampling frequency. If I reduce mean from the signal before FFT I get really small zero component not that peak anymore. Below is a pic of magnitudes array head:
Also the magnitude scaling seems to be 10 times the magnitude I see in the time series of the signal e.g. if amplitude is 29 in magnitudes it is 290.
I Am not sure why this is so and my question is 1. Do I really need to address the zero index peak with mean reduction and 2. Where does this scale of 10 come?
void CalculateForwardTransform(array<double> ^signal, array<double> ^transformedSignal, array<double> ^magnitudes)
{
// source signal
pin_ptr<double> pinnedSignal = &signal[0];
double *pSignal = pinnedSignal;
int order = (int)Math::Round(Math::Log(signal->Length, 2));
// get sizes
int sizeSpec = 0, sizeInit = 0, sizeBuf = 0;
int status = ippsFFTGetSize_R_64f(order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, &sizeSpec, &sizeInit, &sizeBuf);
// memory allocation
IppsFFTSpec_R_64f* pSpec;
Ipp8u *pSpecMem = (Ipp8u*)ippMalloc(sizeSpec);
Ipp8u *pMemInit = (Ipp8u*)ippMalloc(sizeInit);
// FFT specification structure initialized
status = ippsFFTInit_R_64f(&pSpec, order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, pSpecMem, pMemInit);
// transform
pin_ptr<double> pinnedTransformedSignal = &transformedSignal[0];
double *pDst = pinnedTransformedSignal;
Ipp8u *pBuffer = (Ipp8u*)ippMalloc(sizeBuf);
status = ippsFFTFwd_RToCCS_64f(pSignal, pDst, pSpec, pBuffer);
// get magnitudes
pin_ptr<double> pinnedMagnitudes = &magnitudes[0];
double *pMagn = pinnedMagnitudes;
status = ippsMagnitude_64fc((Ipp64fc*)pDst, pMagn, magnitudes->Length); // magnitudes is half of signal len
// free memory
ippFree(pSpecMem);
ippFree(pMemInit);
ippFree(pBuffer);
}
Do I really need to address the zero index peak with mean reduction?
For low frequency signal analysis a small bias can really interfere (especially due to spectral leakage). For sake of illustration, consider the following low-frequency signal tone and another one with a constant bias tone_with_bias:
fs = 1;
f0 = 0.15;
for (int i = 0; i < N; i++)
{
tone[i] = 0.001*cos(2*M_PI*i*f0/fs);
tone_with_bias[i] = 1 + tone[i];
}
If we plot the frequency spectrum of a 100 sample window of these signals, you should notice that the spectrum of tone_with_bias completely drowns out the spectrum of tone:
So yes it's better if you can remove that bias. However, it should be emphasized that this is possible provided that you know the nature of the bias. If you know that the bias is indeed a constant, removing it will reveal the low-frequency component. Otherwise, removing the mean from the signal may not achieve the desired effect if the bias is just a very low-frequency component.
Where does this scale of 10 come?
Scaling of the magnitude by the FFT should be expected, as described in this answer of approximately 0.5*N (where N is the FFT size). If you were processing a small chunk of 20 samples, then you should get such a factor of 10 scaling. If you scale the output of the FFT by 2/N (or equivalently scale by 2 while also using the IPP_FFT_DIV_FWD_BY_N flag) you should get results that have similar magnitudes as the time-domain signal.

Fast, good quality pixel interpolation for extreme image downscaling

In my program, I am downscaling an image of 500px or larger to an extreme level of approx 16px-32px. The source image is user-specified so I do not have control over its size. As you can imagine, few pixel interpolations hold up and inevitably the result is heavily aliased.
I've tried bilinear, bicubic and square average sampling. The square average sampling actually provides the most decent results but the smaller it gets, the larger the sampling radius has to be. As a result, it gets quite slow - slower than the other interpolation methods.
I have also tried an adaptive square average sampling so that the smaller it gets the greater the sampling radius, while the closer it is to its original size, the smaller the sampling radius. However, it produces problems and I am not convinced this is the best approach.
So the question is: What is the recommended type of pixel interpolation that is fast and works well on such extreme levels of downscaling?
I do not wish to use a library so I will need something that I can code by hand and isn't too complex. I am working in C++ with VS 2012.
Here's some example code I've tried as requested (hopefully without errors from my pseudo-code cut and paste). This performs a 7x7 average downscale and although it's a better result than bilinear or bicubic interpolation, it also takes quite a hit:
// Sizing control
ctl(0): "Resize",Range=(0,800),Val=100
// Variables
float fracx,fracy;
int Xnew,Ynew,p,q,Calc;
int x,y,p1,q1,i,j;
//New image dimensions
Xnew=image->width*ctl(0)/100;
Ynew=image->height*ctl(0)/100;
for (y=0; y<image->height; y++){ // rows
for (x=0; x<image->width; x++){ // columns
p1=(int)x*image->width/Xnew;
q1=(int)y*image->height/Ynew;
for (z=0; z<3; z++){ // channels
for (i=-3;i<=3;i++) {
for (j=-3;j<=3;j++) {
Calc += (int)(src(p1-i,q1-j,z));
} //j
} //i
Calc /= 49;
pset(x, y, z, Calc);
} // channels
} // columns
} // rows
Thanks!
The first point is to use pointers to your data. Never use indexes at every pixel. When you write: src(p1-i,q1-j,z) or pset(x, y, z, Calc) how much computation is being made? Use pointers to data and manipulate those.
Second: your algorithm is wrong. You don't want an average filter, but you want to make a grid on your source image and for every grid cell compute the average and put it in the corresponding pixel of the output image.
The specific solution should be tailored to your data representation, but it could be something like this:
std::vector<uint32_t> accum(Xnew);
std::vector<uint32_t> count(Xnew);
uint32_t *paccum, *pcount;
uint8_t* pin = /*pointer to input data*/;
uint8_t* pout = /*pointer to output data*/;
for (int dr = 0, sr = 0, w = image->width, h = image->height; sr < h; ++dr) {
memset(paccum = accum.data(), 0, Xnew*4);
memset(pcount = count.data(), 0, Xnew*4);
while (sr * Ynew / h == dr) {
paccum = accum.data();
pcount = count.data();
for (int dc = 0, sc = 0; sc < w; ++sc) {
*paccum += *i;
*pcount += 1;
++pin;
if (sc * Xnew / w > dc) {
++dc;
++paccum;
++pcount;
}
}
sr++;
}
std::transform(begin(accum), end(accum), begin(count), pout, std::divides<uint32_t>());
pout += Xnew;
}
This was written using my own library (still in development) and it seems to work, but later I changed the variables names in order to make it simpler here, so I don't guarantee anything!
The idea is to have a local buffer of 32 bit ints which can hold the partial sum of all pixels in the rows which fall in a row of the output image. Then you divide by the cell count and save the output to the final image.
The first thing you should do is to set up a performance evaluation system to measure how much any change impacts on the performance.
As said precedently, you should not use indexes but pointers for (probably) a substantial
speed up & not simply average as a basic averaging of pixels is basically a blur filter.
I would highly advise you to rework your code to be using "kernels". This is the matrix representing the ratio of each pixel used. That way, you will be able to test different strategies and optimize quality.
Example of kernels:
https://en.wikipedia.org/wiki/Kernel_(image_processing)
Upsampling/downsampling kernel:
http://www.johncostella.com/magic/
Note, from the code it seems you apply a 3x3 kernel but initially done on a 7x7 kernel. The equivalent 3x3 kernel as posted would be:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]

fftw analysing frequencies from mic input on pc

I am using fftw to analyse the frequency spectrum of audio input to a computer from the mic input. I am using portaudio c++ libraries to capture the windows of time-domain audio data and then fftw to do a real to complex r2c transformation of this data to the frequency domain. Below is my function which I call everytime I receive the block of data.
The sample rate is 44100 samples per second , the sample type is short (signed 16 bit integer)and I am taking 250ms blocks of data in each window. The fft resolution is therefore 4Hz.
The problem is , i'm not sure how to interpret the data which I am receiving after the transformation. When no audio is played , I am getting amplitudes of around 1000 to 4000 for every frequency component, as soon as audio is played from an instrument for example, all of the amplitudes go negative.
I have tried doing a normalisation before the fft, by dividing by the average power and then the data makes more sense. All amplitudes are from 200 to 500 when nothing is played, then for example if I play a tone of 76Hz, the amplitude for this component increases to around 2000. So that is something along the lines of what I expect, but still not sure if this process can be implemented better.
My question is, am I doing the right thing here? Must the data be normalised and am I doing it right? Why am I still receiving high amplitudes on the frequencies that are not being played. Has anyone any experience of doing something similar and maybe give some tips. Many thanks in advance.
void AudioProcessor::GetFFT(void* inputData, void* freqSpectrum)
{
double* input = (double*)inputData;
short* freq_spectrum = (short*)freqSpectrum;
fftPlan = fftw_plan_dft_r2c_1d(FRAMES_PER_BUFFER, input, complexOut, FFTW_ESTIMATE);
fftw_execute(fftPlan);
////
for (int k = 0; k < (FRAMES_PER_BUFFER + 1) / 2; ++k)
{
freq_spectrum[k] = (short)(sqrt(complexOut[k][0] * complexOut[k][0] + complexOut[k][1] * complexOut[k][1]));
}
if (FRAMES_PER_BUFFER % 2 == 0) /* frames per buffer is even number */
{
freq_spectrum[FRAMES_PER_BUFFER / 2] = (short)(sqrt(complexOut[FRAMES_PER_BUFFER / 2][0] * complexOut[FRAMES_PER_BUFFER / 2][0] + complexOut[FRAMES_PER_BUFFER / 2][1] * complexOut[FRAMES_PER_BUFFER / 2][1])); /* Nyquist freq. */
}
}

reconstruct signal from fft

Im reconstructing signal from amplitude, frequency and phase obtained fft. After I do fft, I picked some of its frequencies and reconstructed time line signal from those fft data. I know IFFT is for this but, I dont want to use IFFT.
Reconstruction seems fine but theres some time lag between two signals. This image shows this problem. Black one is the original signal and red one is that reconstructed.
If I know correctly, amplitude of frequency bin t is sqrt(real[t]*real[t] + imag[t]*imag[t] and phase is atan2(imag[t], real[t]).
So, I used formula amplitude * cos(2*π*x / frequency + phase) for a frequency bin. And I summed those regenerated waves. As far as I know, this should generate intact signal fits to original signal. But it ends up always with some time lag from original signal.
Yeah, I think its about phase but thats so simple to calculate and its working correctly. If it has error, reconstructed signal would not fit to its original signal in shape.
This is the code to generate cosine wave. I generated cosine wave from sin(x + π/2).
std::vector<short> encodeSineWavePCM(
const double frequency,
const double amplitude,
const double offSetPhase)
{
const double pi = 3.1415926535897932384626;
const int N = 44100; // 1 sec length wave
std::vector<short> s(N);
const double wavelength = 1.0 * N / frequency;
const double unitlength = 2 * pi / wavelength;
for (int i = 0; i<N; i ++) {
double val = sin(offSetPhase + i * unitlength);
val *= amplitude;
s[i] = (short)val;
}
return s;
}
What am I missing?
Quite normal. You're doing a frame-by-frame transform. That means the FFT frame is produced after one time frame. When transforming back, you have the inverse effect: your time frame starts after the FFT frame has been decoded.

Linear sampled Gaussian blur quality issue

I recently implemented a linear sampled gaussian blur based on this article: Linear Sampled Gaussian Blur
It generally came out well, however it appears there is slight aliasing on text and thinner collections of pixels. I'm pretty stumped as to what is causing this, is it an issue with my shader or weight calculations or is it an inherit draw back of using this method?
I'd like to add that I don't run into this issue when I sample each pixel regularly instead of using bilinear filtering.
Any insights are much appreciated. Here's a code sample of how I work out my weights:
int support = int(sigma * 3.0f);
float total = 0.0f;
weights.push_back(exp(-(0*0)/(2*sigma*sigma))/(sqrt(2*constants::pi)*sigma));
total += weights.back();
offsets.push_back(0);
for (int i = 1; i <= support; i++)
{
float w1 = exp(-(i*i)/(2*sigma*sigma))/(sqrt(2*constants::pi)*sigma);
float w2 = exp(-((i+1)*(i+1))/(2*sigma*sigma))/(sqrt(2*constants::pi)*sigma);
weights.push_back(w1 + w2);
total += 2.0f * weights[i];
offsets.push_back((i * w1 + (i + 1) * w2) / weights[i]);
}
for (int i = 0; i < support; i++)
{
weights[i] /= total;
}
And here is the fragment shader (there is another vertical version of this shader too):
void main()
{
vec3 acc = texture2D(tex_object, v_tex_coord.st).rgb*weights[0];
for (int i = 1; i < NUM_SAMPLES; i++)
{
acc += texture2D(tex_object, (v_tex_coord.st+(vec2(offsets[i], 0.0)/tex_size))).rgb*weights[i];
acc += texture2D(tex_object, (v_tex_coord.st-(vec2(offsets[i], 0.0)/tex_size))).rgb*weights[i];
}
gl_FragColor = vec4(acc, 1.0);
Here is a screenshot depicting the issue:
This looks like a correct gaussian blur to me. The extent to which text is disrupted depends on your sigma. What value are you using?
Also I would check the scaling matrix for the projection you are using.
If you want to blur but without affecting text and thin pixel lines, you might think of
compositing the result with the output of a mild high-pass filter
use a smaller sigma
change the shape of the kernel so it's not gaussian: rather than exp(-i*i/s*s), you might try a function with higher excess kurtosis. You could try a linear up/down function, or one of the functions listed on this page instead: http://en.wikipedia.org/wiki/Kurtosis . They will all lead to blurs with varying degrees of disrupting fine detail.
This is an inherent issue with the bilinear filtering. It's unavoidable.