I am using CUDA's Cufft to process data i receive from a hydrophone(500,000 integers a second at 250hertz, high and low channels). Now as a basic example of how Cufft works is here...
void runTest(int argc, char** argv)
{
printf("[1DCUFFT] is starting...\n");
cufftComplex* h_signal = (cufftComplex*)malloc(sizeof(cufftComplex)* SIGNAL_SIZE);
// Allocate host memory for the signal
//Complex* h_signal = (Complex*)malloc(sizeof(Complex) * SIGNAL_SIZE);
// Initalize the memory for the signal
for (unsigned int i = 0; i < SIGNAL_SIZE; ++i) {
h_signal[i].x = rand() / (float)RAND_MAX;
h_signal[i].y = 0;
}
int mem_size = sizeof(cufftComplex)* SIGNAL_SIZE;
// Allocate device memory for signal
cufftComplex* d_signal;
cudaMalloc((void**)&d_signal, mem_size);
// Copy host memory to device
cudaMemcpy(d_signal, h_signal, mem_size,
cudaMemcpyHostToDevice);
// CUFFT plan
cufftHandle plan;
cufftPlan1d(&plan, mem_size, CUFFT_C2C, 1);
// Transform signal
printf("Transforming signal cufftExecC2C\n");
cufftExecC2C(plan, (cufftComplex *)d_signal, (cufftComplex *)d_signal, CUFFT_FORWARD);
// Transform signal back
printf("Transforming signal back cufftExecC2C\n");
cufftExecC2C(plan, (cufftComplex *)d_signal, (cufftComplex *)d_signal, CUFFT_INVERSE);
// Copy device memory to host
cufftComplex* h_inverse_signal = (cufftComplex*)malloc(sizeof(cufftComplex)* SIGNAL_SIZE);;
cudaMemcpy(h_inverse_signal, d_signal, mem_size,
cudaMemcpyDeviceToHost);
for (int i = 0; i < SIGNAL_SIZE; i++){
h_inverse_signal[i].x = h_inverse_signal[i].x / (float)SIGNAL_SIZE;
h_inverse_signal[i].y = h_inverse_signal[i].y / (float)SIGNAL_SIZE;
printf("first : %f %f after %f %f \n", h_signal[i].x, h_signal[i].y, h_inverse_signal[i].x, h_inverse_signal[i].y);
}
//Destroy CUFFT context
cufftDestroy(plan);
// cleanup memory
free(h_signal);
free(h_inverse_signal);
cudaFree(d_signal);
cudaDeviceReset();
}
Now all I want to know is, how do i set the frequency of the FFT (cufft) to be 250hertz?
Thanks
James
You don't. The FFT of N points is the same, regardless of the frequency at which those N points were sampled.
Also, 500.000 integers per second is 500.000 hz sample rate, aka 500 kHz. That gives you a Nyquist limit of 250 khz.
If I understand you right, you just need to know which element in the output vector is 250Hz.
The FFT gives you all the frequencies that are justified to be calculated based on the length and time resolution of your time vector.
The simple rule to calculate is :
- frequency range = 1/time resolution.
- frequency resolution = 1/time length.
In addition one has to know that the FFT of a real function (no data imaginary portion of the time vector) yields a symmetric spectrum with redundancy. The spectrum reaches from (- 1/2 frequency range to +1/2 freq. range). The negative frequency data can be discarded in the case of a real time vector. It's a little more complicated, though. The standard implementation of the FFT (which is an inplace operation) gives you the positive frequencies first , then the negative frequencies. Since you are only interested in the positive frequencies, the 2nd half of the FFT vector can be discarded. In your case, just ignore data above index 250k.
In your case the frequencies span from -250kHz to 250 kHz with a resolution of 1Hz, but because of the above, the first 250k points are actually the positive frequencies, at a separation of 1Hz.
So take the 250th point in the (unshifted, i.e. raw) FFT and you have the signal at 250 Hz. I would plot the data from 0 to around 500 to see how broad that peak is around 250 Hz. The signal strength is the integral of those non-zero frequencies (non-zero applied loosely here to indicate everything above noise). The signal width indicates the modulation that is being applied to the signal (which could include other measurement artifacts). If the signal is shifted from 250 Hz you might have a Doppler shift (either your source or you are moving).
If you are only interested in a finite frequency range, it might be faster to calculate the Fourier integral (O(n^2)) just for those few frequency points. Generally people use the FFT because it is O(n*log(n)), but if you need only say 10 frequency points then O(10*n) is not much different.
Related
When I am using Intel IPP's ippsFFTFwd_RToCCS_64f and then ippsMagnitude_64fc I get a massive peak at zero index in magnitudes array.
My sine wave is long and main component I am interested is somewhere between 0.15 Hz and 0.25 Hz. I take the sample with 500Hz sampling frequency. If I reduce mean from the signal before FFT I get really small zero component not that peak anymore. Below is a pic of magnitudes array head:
Also the magnitude scaling seems to be 10 times the magnitude I see in the time series of the signal e.g. if amplitude is 29 in magnitudes it is 290.
I Am not sure why this is so and my question is 1. Do I really need to address the zero index peak with mean reduction and 2. Where does this scale of 10 come?
void CalculateForwardTransform(array<double> ^signal, array<double> ^transformedSignal, array<double> ^magnitudes)
{
// source signal
pin_ptr<double> pinnedSignal = &signal[0];
double *pSignal = pinnedSignal;
int order = (int)Math::Round(Math::Log(signal->Length, 2));
// get sizes
int sizeSpec = 0, sizeInit = 0, sizeBuf = 0;
int status = ippsFFTGetSize_R_64f(order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, &sizeSpec, &sizeInit, &sizeBuf);
// memory allocation
IppsFFTSpec_R_64f* pSpec;
Ipp8u *pSpecMem = (Ipp8u*)ippMalloc(sizeSpec);
Ipp8u *pMemInit = (Ipp8u*)ippMalloc(sizeInit);
// FFT specification structure initialized
status = ippsFFTInit_R_64f(&pSpec, order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, pSpecMem, pMemInit);
// transform
pin_ptr<double> pinnedTransformedSignal = &transformedSignal[0];
double *pDst = pinnedTransformedSignal;
Ipp8u *pBuffer = (Ipp8u*)ippMalloc(sizeBuf);
status = ippsFFTFwd_RToCCS_64f(pSignal, pDst, pSpec, pBuffer);
// get magnitudes
pin_ptr<double> pinnedMagnitudes = &magnitudes[0];
double *pMagn = pinnedMagnitudes;
status = ippsMagnitude_64fc((Ipp64fc*)pDst, pMagn, magnitudes->Length); // magnitudes is half of signal len
// free memory
ippFree(pSpecMem);
ippFree(pMemInit);
ippFree(pBuffer);
}
Do I really need to address the zero index peak with mean reduction?
For low frequency signal analysis a small bias can really interfere (especially due to spectral leakage). For sake of illustration, consider the following low-frequency signal tone and another one with a constant bias tone_with_bias:
fs = 1;
f0 = 0.15;
for (int i = 0; i < N; i++)
{
tone[i] = 0.001*cos(2*M_PI*i*f0/fs);
tone_with_bias[i] = 1 + tone[i];
}
If we plot the frequency spectrum of a 100 sample window of these signals, you should notice that the spectrum of tone_with_bias completely drowns out the spectrum of tone:
So yes it's better if you can remove that bias. However, it should be emphasized that this is possible provided that you know the nature of the bias. If you know that the bias is indeed a constant, removing it will reveal the low-frequency component. Otherwise, removing the mean from the signal may not achieve the desired effect if the bias is just a very low-frequency component.
Where does this scale of 10 come?
Scaling of the magnitude by the FFT should be expected, as described in this answer of approximately 0.5*N (where N is the FFT size). If you were processing a small chunk of 20 samples, then you should get such a factor of 10 scaling. If you scale the output of the FFT by 2/N (or equivalently scale by 2 while also using the IPP_FFT_DIV_FWD_BY_N flag) you should get results that have similar magnitudes as the time-domain signal.
I am currently creating a C code, which takes as an input a wav file (specifically just one channel of the original wav file), and it performs the short-time Fourier transform.
The main part of the code is this one:
stft_data = (fftw_complex*)(fftw_malloc(sizeof(fftw_complex)*windowSize));
fft_result= (fftw_complex*)(fftw_malloc(sizeof(fftw_complex)*windowSize));
storage = (fftw_complex*)(fftw_malloc(sizeof(fftw_complex)*storage_capacity));
//define the fftw plane
fftw_plan plan_forward;
plan_forward = fftw_plan_dft_1d(windowSize, stft_data, fft_result, FFTW_FORWARD, FFTW_ESTIMATE);
//integer indexes
int i,counter ;
counter = 0 ;
//create a Hamming window
double hamming_result[windowSize];
hamming(windowSize, hamming_result);
//implement the stft position indexes
int chunkPosition = 0; //actual chunk position
int readIndex ; //read the index of the wav file
while (chunkPosition < wav_length ){
//read the window
for(i=0; i<windowSize; i++){
readIndex = chunkPosition + i;
if (readIndex < wav_length){
stft_data[i] = wav_data[readIndex]*hamming_result[i]*_Complex_I + 0.0*I;
}
else{
//if we are beyond the wav_length
stft_data[i] = 0.0*_Complex_I + 0.0*I;//padding
break;
}
}
//compute the fft
fftw_execute(plan_forward);
//store the stft in a data structure
for (i=0; i<windowSize;i++)
{
//printf("RE: %.2f IM: %.2f\n", creal(fft_result[i]),cimag(fft_result[i]));
storage[counter] = creal(fft_result[i]) + cimag(fft_result[i]);
counter+=1;
}
//update indexes
chunkPosition += hop_size;
printf("Chunk Position %d\n", chunkPosition);
printf("Counter position %d\n", counter);
printf("Fourier transform done\n");
}
Once the FFT has been computed onto the selected window, I am storing the FFT real and imaginary part into a storage variable.
After that I would like to compute the cross correlation among the data points in each of the N windows I have in the end.
As an example, I would like to compute the correlation between the first data point of the first window ( storage[0] ) with the first element of the second window (storage[windowSize+1]).
However, I am facing some problems and I don't have reasonable values. According to what I studied, the correlation in the Fourier space it is just the complex multiplication between two Fourier terms. Thus,
what I am doing is something like :
correlation = storage[0]*conj(storage[windowSize+1]);
However, I got very huge values, which makes me wonder if I am really computing a correlation.
Where am I wrong?
How should I scale my correlation results?
How should I compute the correlation with the Fourier values?
and then, how should I plot the Fourier values I have from FFTW3 calculations? should I shift all the values or are they already shifted?
Thanks very much
The line storage[counter] = creal(fft_result[i]) + cimag(fft_result[i]); makes storage purely real. Since computing correlation = storage[0]*conj(storage[windowSize+1]); is the next step in the computation of the cross correlation, there is a problem. Indeed, there is no point in conjugating a real number.
Trying storage[counter] = fft_result[i]; could partly resolve the issue.
In addition, correlation = storage[0]*conj(storage[windowSize+1]); should be modified to correlation = storage[0]*conj(storage[windowSize]);
By performing correlation = storage[0]*conj(storage[windowSize]);, the magnitude of index [0] of the DFT of the correlation is obtained. Indeed, storage[0] corresponds to the average of the first frame, while storage[windowSize] corresponds to the average of the second frame. It is not equal to the averages, but much larger, as it is scaled by the length of the frame windowSize.
To compute the correlation between the two signals, the next step should be:
for (i=0; i<windowSize;i++)
{
dftofcorrelation[i]=storage[i]*conj(storage[i+windowSize]
}
Then, the inverse DFT must be applied to the array dftofcorrelation to get the correlation as an array. It must be kept in mind that neither the forward nor the backward DFT of FFTW include any scaling, see what FFTW really computes:
fftw_execute(plan_backward);
If two scalars are to be retained of this correlation array, it's its maximum (high if the signal are similar up to a delay) and the index of the maximum, that is the estimated time offset between the two signals.
The overall scaling induced by FFTW is a power of windowSize (windowSize^3?). It can be checked by computing the autocorrelation of a uniform signal (which is uniform).
I am using fftw to analyse the frequency spectrum of audio input to a computer from the mic input. I am using portaudio c++ libraries to capture the windows of time-domain audio data and then fftw to do a real to complex r2c transformation of this data to the frequency domain. Below is my function which I call everytime I receive the block of data.
The sample rate is 44100 samples per second , the sample type is short (signed 16 bit integer)and I am taking 250ms blocks of data in each window. The fft resolution is therefore 4Hz.
The problem is , i'm not sure how to interpret the data which I am receiving after the transformation. When no audio is played , I am getting amplitudes of around 1000 to 4000 for every frequency component, as soon as audio is played from an instrument for example, all of the amplitudes go negative.
I have tried doing a normalisation before the fft, by dividing by the average power and then the data makes more sense. All amplitudes are from 200 to 500 when nothing is played, then for example if I play a tone of 76Hz, the amplitude for this component increases to around 2000. So that is something along the lines of what I expect, but still not sure if this process can be implemented better.
My question is, am I doing the right thing here? Must the data be normalised and am I doing it right? Why am I still receiving high amplitudes on the frequencies that are not being played. Has anyone any experience of doing something similar and maybe give some tips. Many thanks in advance.
void AudioProcessor::GetFFT(void* inputData, void* freqSpectrum)
{
double* input = (double*)inputData;
short* freq_spectrum = (short*)freqSpectrum;
fftPlan = fftw_plan_dft_r2c_1d(FRAMES_PER_BUFFER, input, complexOut, FFTW_ESTIMATE);
fftw_execute(fftPlan);
////
for (int k = 0; k < (FRAMES_PER_BUFFER + 1) / 2; ++k)
{
freq_spectrum[k] = (short)(sqrt(complexOut[k][0] * complexOut[k][0] + complexOut[k][1] * complexOut[k][1]));
}
if (FRAMES_PER_BUFFER % 2 == 0) /* frames per buffer is even number */
{
freq_spectrum[FRAMES_PER_BUFFER / 2] = (short)(sqrt(complexOut[FRAMES_PER_BUFFER / 2][0] * complexOut[FRAMES_PER_BUFFER / 2][0] + complexOut[FRAMES_PER_BUFFER / 2][1] * complexOut[FRAMES_PER_BUFFER / 2][1])); /* Nyquist freq. */
}
}
Im reconstructing signal from amplitude, frequency and phase obtained fft. After I do fft, I picked some of its frequencies and reconstructed time line signal from those fft data. I know IFFT is for this but, I dont want to use IFFT.
Reconstruction seems fine but theres some time lag between two signals. This image shows this problem. Black one is the original signal and red one is that reconstructed.
If I know correctly, amplitude of frequency bin t is sqrt(real[t]*real[t] + imag[t]*imag[t] and phase is atan2(imag[t], real[t]).
So, I used formula amplitude * cos(2*π*x / frequency + phase) for a frequency bin. And I summed those regenerated waves. As far as I know, this should generate intact signal fits to original signal. But it ends up always with some time lag from original signal.
Yeah, I think its about phase but thats so simple to calculate and its working correctly. If it has error, reconstructed signal would not fit to its original signal in shape.
This is the code to generate cosine wave. I generated cosine wave from sin(x + π/2).
std::vector<short> encodeSineWavePCM(
const double frequency,
const double amplitude,
const double offSetPhase)
{
const double pi = 3.1415926535897932384626;
const int N = 44100; // 1 sec length wave
std::vector<short> s(N);
const double wavelength = 1.0 * N / frequency;
const double unitlength = 2 * pi / wavelength;
for (int i = 0; i<N; i ++) {
double val = sin(offSetPhase + i * unitlength);
val *= amplitude;
s[i] = (short)val;
}
return s;
}
What am I missing?
Quite normal. You're doing a frame-by-frame transform. That means the FFT frame is produced after one time frame. When transforming back, you have the inverse effect: your time frame starts after the FFT frame has been decoded.
I've written a program that generates a sine-wave at a user-specified frequency, and plays it on a 96kHz audio channel. To save a few CPU cycles I employ the old trick of pre-rendering a short section of audio into a buffer, and then playing back the buffer in a loop, so that I can avoid calling the sin() function 96000 times per second for the duration of the program and just do simple memory-copying instead.
My problem is efficiently determining what the minimum usable size of this pre-rendered buffer would be. For some frequencies it is easy -- for example, an 8kHz sine wave can be perfectly represented by generating a 12-sample buffer and playing it in a looping, because (8000*12 == 96000). For other frequencies, however, a single cycle of the sine wave requires a non-integral number of samples to represent, and therefore looping a single cycle's worth of samples would cause unacceptable glitching.
For some of those frequencies, however, it's possible to get around that problem by pre-rendering more than one cycle of the sine wave and looping that -- if I can figure out how many cycles are required so that the number of cycles present in the buffer will be integral, while also guaranteeing that the number of samples in the buffer are integral. For example, a sine-wave frequency of 12.8kHz translates to a single-cycle buffer-size of 7.5 samples, which won't loop cleanly, but if I render two consecutive cycles of the sine wave into a 15-sample buffer, then I can cleanly loop the result.
My current approach to solving this issue is brute force: I try all possible cycle-counts and see if any of them result in a buffer size with an integral number of samples in it. I think that approach is unsatisfactory for the following reasons:
1) It's very inefficient. For example, the program shown below (which prints buffer-size results for 480,000 possible frequency values between 0Hz and 48kHz) takes 35 minutes to complete on my 2.7GHz machine. I think there must be a much faster way to do this.
2) I suspect that the results are not 100% accurate, due to floating-point errors.
3) The algorithm gives up if it can't find an acceptable buffer size less than 10 seconds long. (I could make the limit higher, but of course that would make the algorithm even slower).
So, is there any way to calculate the minimum-usable-buffer-size analytically, preferably in O(1) time? It seems like it should be easy, but I haven't been able to figure out what kind of math I should use.
Thanks in advance for any advice!
#include <stdio.h>
#include <math.h>
static const long long SAMPLES_PER_SECOND = 96000;
static const long long MAX_ALLOWED_BUFFER_SIZE_SAMPLES = (SAMPLES_PER_SECOND * 10);
// Returns the length of the pre-render buffer needed to properly
// loop a sine wave at the given frequence, or -1 on failure.
static int GetNumCyclesNeededForPreRenderedBuffer(float freqHz)
{
double oneCycleLengthSamples = SAMPLES_PER_SECOND/freqHz;
for (int count=1; (count*oneCycleLengthSamples) < MAX_ALLOWED_BUFFER_SIZE_SAMPLES; count++)
{
double remainder = fmod(oneCycleLengthSamples*count, 1.0);
if (remainder > 0.5) remainder = 1.0-remainder;
if (remainder <= 0.0) return count;
}
return -1;
}
int main(int, char **)
{
for (int i=0; i<48000*10; i++)
{
double freqHz = ((double)i)/10.0f;
int numCyclesNeeded = GetNumCyclesNeededForPreRenderedBuffer(freqHz);
if (numCyclesNeeded >= 0)
{
double oneCycleLengthSamples = SAMPLES_PER_SECOND/freqHz;
printf("For %.1fHz, use a pre-render-buffer size of %f samples (%i cycles, %f samples/cycle)\n", freqHz, (numCyclesNeeded*oneCycleLengthSamples), numCyclesNeeded, oneCycleLengthSamples);
}
else printf("For %.1fHz, there was no suitable pre-render-buffer size under the allowed limit!\n", freqHz);
}
return 0;
}
number_of_cycles/size_of_buffer = frequency/samples_per_second
This implies that if you can simplify your frequency/samples_per_second fraction, you can find the size of your buffer and the number of cycles in the buffer. If frequency and samples_per_second are integers, you can simplify the fraction by finding the greatest common divisor, otherwise you can use the method of continued fractions.
Example:
Say your frequency is 1234.5, and your samples_per_second is 96000. We can make these into two integers by multiplying by 10, so we get the ratio:
frequency/samples_per_second = 12345/960000
The greatest common divisor is 15, so it can be reduced to 823/64000.
So you would need 823 cycles in a 64000 sample buffer to reproduce the frequency exactly.