Weird but close fft and ifft of image in c++ - c++

I wrote a program that loads, saves, and performs the fft and ifft on black and white png images. After much debugging headache, I finally got some coherent output only to find that it distorted the original image.
input:
fft:
ifft:
As far as I have tested, the pixel data in each array is stored and converted correctly. Pixels are stored in two arrays, 'data' which contains the b/w value of each pixel and 'complex_data' which is twice as long as 'data' and stores real b/w value and imaginary parts of each pixel in alternating indices. My fft algorithm operates on an array structured like 'complex_data'. After code to read commands from the user, here's the code in question:
if (cmd == "fft")
{
if (height > width) size = height;
else size = width;
N = (int)pow(2.0, ceil(log((double)size)/log(2.0)));
temp_data = (double*) malloc(sizeof(double) * width * 2); //array to hold each row of the image for processing in FFT()
for (i = 0; i < (int) height; i++)
{
for (j = 0; j < (int) width; j++)
{
temp_data[j*2] = complex_data[(i*width*2)+(j*2)];
temp_data[j*2+1] = complex_data[(i*width*2)+(j*2)+1];
}
FFT(temp_data, N, 1);
for (j = 0; j < (int) width; j++)
{
complex_data[(i*width*2)+(j*2)] = temp_data[j*2];
complex_data[(i*width*2)+(j*2)+1] = temp_data[j*2+1];
}
}
transpose(complex_data, width, height); //tested
free(temp_data);
temp_data = (double*) malloc(sizeof(double) * height * 2);
for (i = 0; i < (int) width; i++)
{
for (j = 0; j < (int) height; j++)
{
temp_data[j*2] = complex_data[(i*height*2)+(j*2)];
temp_data[j*2+1] = complex_data[(i*height*2)+(j*2)+1];
}
FFT(temp_data, N, 1);
for (j = 0; j < (int) height; j++)
{
complex_data[(i*height*2)+(j*2)] = temp_data[j*2];
complex_data[(i*height*2)+(j*2)+1] = temp_data[j*2+1];
}
}
transpose(complex_data, height, width);
free(temp_data);
free(data);
data = complex_to_real(complex_data, image.size()/4); //tested
image = bw_data_to_vector(data, image.size()/4); //tested
cout << "*** fft success ***" << endl << endl;
void FFT(double* data, unsigned long nn, int f_or_b){ // f_or_b is 1 for fft, -1 for ifft
unsigned long n, mmax, m, j, istep, i;
double wtemp, w_real, wp_real, wp_imaginary, w_imaginary, theta;
double temp_real, temp_imaginary;
// reverse-binary reindexing to separate even and odd indices
// and to allow us to compute the FFT in place
n = nn<<1;
j = 1;
for (i = 1; i < n; i += 2) {
if (j > i) {
swap(data[j-1], data[i-1]);
swap(data[j], data[i]);
}
m = nn;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
};
// here begins the Danielson-Lanczos section
mmax = 2;
while (n > mmax) {
istep = mmax<<1;
theta = f_or_b * (2 * M_PI/mmax);
wtemp = sin(0.5 * theta);
wp_real = -2.0 * wtemp * wtemp;
wp_imaginary = sin(theta);
w_real = 1.0;
w_imaginary = 0.0;
for (m = 1; m < mmax; m += 2) {
for (i = m; i <= n; i += istep) {
j = i + mmax;
temp_real = w_real * data[j-1] - w_imaginary * data[j];
temp_imaginary = w_real * data[j] + w_imaginary * data[j-1];
data[j-1] = data[i-1] - temp_real;
data[j] = data[i] - temp_imaginary;
data[i-1] += temp_real;
data[i] += temp_imaginary;
}
wtemp = w_real;
w_real += w_real * wp_real - w_imaginary * wp_imaginary;
w_imaginary += w_imaginary * wp_real + wtemp * wp_imaginary;
}
mmax=istep;
}}
My ifft is the same only with the f_or_b set to -1 instead of 1. My program calls FFT() on each row, transposes the image, calls FFT() on each row again, then transposes back. Is there maybe an error with my indexing?

Not an actual answer as this question is Debug only so some hints instead:
your results are really bad
it should look like this:
first line is the actual DFFT result
Re,Im,Power is amplified by a constant otherwise you would see a black image
the last image is IDFFT of the original not amplified Re,IM result
the second line is the same but the DFFT result is wrapped by half size of image in booth x,y to match the common results in most DIP/CV texts
As you can see if you IDFFT back the wrapped results the result is not correct (checker board mask)
You have just single image as DFFT result
is it power spectrum?
or you forget to include imaginary part? to view only or perhaps also to computation somewhere as well?
is your 1D **DFFT working?**
for real data the result should be symmetric
check the links from my comment and compare the results for some sample 1D array
debug/repair your 1D FFT first and only then move to the next level
do not forget to test Real and complex data ...
your IDFFT looks BW (no gray) saturated
so did you amplify the DFFT results to see the image and used that for IDFFT instead of the original DFFT result?
also check if you do not round to integers somewhere along the computation
beware of (I)DFFT overflows/underflows
If your image pixel intensities are big and the resolution of image too then your computation could loss precision. Newer saw this in images but if your image is HDR then it is possible. This is a common problem with convolution computed by DFFT for big polynomials.

Thank you everyone for your opinions. All that stuff about memory corruption, while it makes a point, is not the root of the problem. The sizes of data I'm mallocing are not overly large, and I am freeing them in the right places. I had a lot of practice with this while learning c. The problem was not the fft algorithm either, nor even my 2D implementation of it.
All I missed was the scaling by 1/(M*N) at the very end of my ifft code. Because the image is 512x512, I needed to scale my ifft output by 1/(512*512). Also, my fft looks like white noise because the pixel data was not rescaled to fit between 0 and 255.

Suggest you look at the article http://www.yolinux.com/TUTORIALS/C++MemoryCorruptionAndMemoryLeaks.html
Christophe has a good point but he is wrong about it not being related to the problem because it seems that in modern times using malloc instead of new()/free() does not initialise memory or select best data type which would result in all problems listed below:-
Possibly causes are:
Sign of a number changing somewhere, I have seen similar issues when a platform invoke has been used on a dll and a value is passed by value instead of reference. It is caused by memory not necessarily being empty so when your image data enters it will have boolean maths performed on its values. I would suggest that you make sure memory is empty before you put your image data there.
Memory rotating right (ROR in assembly langauge) or left (ROL) . This will occur if data types are being used which do not necessarily match, eg. a signed value entering an unsigned data type or if the number of bits is different in one variable to another.
Data being lost due to an unsigned value entering a signed variable. Outcomes are 1 bit being lost because it will be used to determine negative or positive, or at extremes if twos complement takes place the number will become inverted in meaning, look for twos complement on wikipedia.
Also see how memory should be cleared/assigned before use. http://www.cprogramming.com/tutorial/memory_debugging_parallel_inspector.html

Related

Character recognition from an image C++

*Note: while this post is pretty much asking about bilinear interpolation I kept the title more general and included extra information in case someone has any ideas on how I can possibly do this better
I have been having trouble implementing a way to identify letters from an image in order to create a word search solving program. For mainly educational but also portability purposes, I have been attempting this without the use of a library. It can be assumed that the image the characters will be picked off of contains nothing else but the puzzle. Although this page is only recognizing a small set of characters, I have been using it to guide my efforts along with this one as well. As the article suggested I have an image of each letter scaled down to 5x5 to compare each unknown letter to. I have had the best success by scaling down the unknown to 5x5 using bilinear resampling and summing the squares of the difference in intensity of each corresponding pixel in the known and unknown images. To attempt to get more accurate results I also added the square of the difference in width:height ratios, and white:black pixel ratios of the top half and bottom half of each image. The known image with the closest "difference score" to the unknown image is then considered the unknown letter. The problem is that this seems to have only about a 50% accuracy. To improve this I have tried using larger samples (instead of 5x5 I tried 15x15) but this proved even less effective. I also tried to go through the known and unknown images and look for features and shapes, and determine a match based on two images having about the same amount of the same features. For example shapes like the following were identified and counted up (Where ■ represents a black pixel). This proved less effective as the original method.
■ ■ ■ ■
■ ■
So here is an example: the following image gets loaded:
The program then converts it to monochrome by determining if each pixel has an intensity above or below the average intensity of an 11x11 square using a summed area table, fixes the skew and picks out the letters by identifying an area of relatively equal spacing. I then use the intersecting horizontal and vertical spaces to get a general idea of where each character is. Next I make sure that the entire letter is contained in each square picked out by going line by line, above, below, left and right of the original square until the square's border detects no dark pixels on it.
Then I take each letter, resample it and compare it to the known images.
*Note: the known samples are using arial font size 12, rescaled in photoshop to 5x5 using bilinear interpolation.
Here is an example of a successful match:
The following letter is picked out:
scaled down to:
which looks like
from afar. This is successfully matched to the known N sample:
Here is a failed match:
is picked out and scaled down to:
which, to no real surprise does not match to the known R sample
I changed how images are picked out, so that the letter is not cut off as you can see in the above images so I believe the issue comes from scaling the images down. Currently I am using bilinear interpolation to resample the image. To understand how exactly this works with downsampling I referred to the second answer in this post and came up with the following code. Previously I have tested that this code works (at least to a "this looks ok" point) so it could be a combination of factors causing problems.
void Image::scaleTo(int width, int height)
{
int originalWidth = this->width;
int originalHeight = this->height;
Image * originalData = new Image(this->width, this->height, 0, 0);
for (int i = 0; i < this->width * this->height; i++) {
int x = i % this->width;
int y = i / this->width;
originalData->setPixel(x, y, this->getPixel(x, y));
}
this->resize(width, height); //simply resizes the image, after the resize it is just a black bmp.
double factorX = (double)originalWidth / width;
double factorY = (double)originalHeight / height;
float * xCenters = new float[originalWidth]; //the following stores the "centers" of each pixel.
float * yCenters = new float[originalHeight];
float * newXCenters = new float[width];
float * newYCenters = new float[height];
//1 represents one of the originally sized pixel's side length
for (int i = 0; i < originalWidth; i++)
xCenters[i] = i + 0.5;
for (int i = 0; i < width; i++)
newXCenters[i] = (factorX * i) + (factorX / 2.0);
for (int i = 0; i < height; i++)
newYCenters[i] = (factorY * i) + (factorY / 2.0);
for (int i = 0; i < originalHeight; i++)
yCenters[i] = i + 0.5;
/* p[0] p[1]
p
p[2] p[3] */
//the following will find the closest points to the sampled pixel that still remain in this order
for (int x = 0; x < width; x++) {
for (int y = 0; y < height; y++) {
POINT p[4]; //POINT used is the Win32 struct POINT
float pDists[4] = { FLT_MAX, FLT_MAX, FLT_MAX, FLT_MAX };
float xDists[4];
float yDists[4];
for (int i = 0; i < originalWidth; i++) {
for (int j = 0; j < originalHeight; j++) {
float xDist = abs(xCenters[i] - newXCenters[x]);
float yDist = abs(yCenters[j] - newYCenters[y]);
float dist = sqrt(xDist * xDist + yDist * yDist);
if (xCenters[i] < newXCenters[x] && yCenters[j] < newYCenters[y] && dist < pDists[0]) {
p[0] = { i, j };
pDists[0] = dist;
xDists[0] = xDist;
yDists[0] = yDist;
}
else if (xCenters[i] > newXCenters[x] && yCenters[j] < newYCenters[y] && dist < pDists[1]) {
p[1] = { i, j };
pDists[1] = dist;
xDists[1] = xDist;
yDists[1] = yDist;
}
else if (xCenters[i] < newXCenters[x] && yCenters[j] > newYCenters[y] && dist < pDists[2]) {
p[2] = { i, j };
pDists[2] = dist;
xDists[2] = xDist;
yDists[2] = yDist;
}
else if (xCenters[i] > newXCenters[x] && yCenters[j] > newYCenters[y] && dist < pDists[3]) {
p[3] = { i, j };
pDists[3] = dist;
xDists[3] = xDist;
yDists[3] = yDist;
}
}
}
//channel is a typedef for unsigned char
//getOPixel(point) is a macro for originalData->getPixel(point.x, point.y)
float r1 = (xDists[3] / (xDists[2] + xDists[3])) * getOPixel(p[2]).r + (xDists[2] / (xDists[2] + xDists[3])) * getOPixel(p[3]).r;
float r2 = (xDists[1] / (xDists[0] + xDists[1])) * getOPixel(p[0]).r + (xDists[0] / (xDists[0] + xDists[1])) * getOPixel(p[1]).r;
float interpolated = (yDists[0] / (yDists[0] + yDists[3])) * r1 + (yDists[3] / (yDists[0] + yDists[3])) * r2;
channel r = (channel)round(interpolated);
r1 = (xDists[3] / (xDists[2] + xDists[3])) * getOPixel(p[2]).g + (xDists[2] / (xDists[2] + xDists[3])) * getOPixel(p[3]).g; //yDist[3]
r2 = (xDists[1] / (xDists[0] + xDists[1])) * getOPixel(p[0]).g + (xDists[0] / (xDists[0] + xDists[1])) * getOPixel(p[1]).g; //yDist[0]
interpolated = (yDists[0] / (yDists[0] + yDists[3])) * r1 + (yDists[3] / (yDists[0] + yDists[3])) * r2;
channel g = (channel)round(interpolated);
r1 = (xDists[3] / (xDists[2] + xDists[3])) * getOPixel(p[2]).b + (xDists[2] / (xDists[2] + xDists[3])) * getOPixel(p[3]).b; //yDist[3]
r2 = (xDists[1] / (xDists[0] + xDists[1])) * getOPixel(p[0]).b + (xDists[0] / (xDists[0] + xDists[1])) * getOPixel(p[1]).b; //yDist[0]
interpolated = (yDists[0] / (yDists[0] + yDists[3])) * r1 + (yDists[3] / (yDists[0] + yDists[3])) * r2;
channel b = (channel)round(interpolated);
this->setPixel(x, y, { r, g, b });
}
}
delete[] xCenters;
delete[] yCenters;
delete[] newXCenters;
delete[] newYCenters;
delete originalData;
}
I have utmost respect for anyone even remotely willing to sift through this to try and help. Any and all suggestion will be extremely appreciated.
UPDATE:
So as suggested I started augmenting the known data set with scaled down letters from word searches. This greatly improved accuracy from about 50% to 70% (percents calculated from a very small sample size so take the numbers lightly). Basically I'm using the original set of chars as a base (this original set was actually the most accurate out of other sets I've tried ex: a set calculated using the same resampling algorithm, a set using a different font etc.) And I just am manually adding knowns to that set. I basically will manually assign the first 20 or so images picked out in a search their corresponding letter and save that into the known set folder. I still am choosing the closest out of the entire known set to match a letter. Would this still be a good method or should some kind of change be made? I also implemented a feature where if a letter is about a 90% match with a known letter, I assume the match is correct and and the current "unknown" to the list of knowns. I could see this possibly going both ways, I feel like it could either a. make the program more accurate over time or b. solidify the original guess and possibly make the program less accurate over time. I have actually not noticed this cause a change (either for the better or for the worse). Am I on the right track with this? I'm not going to call this solved just yet, until I get accuracy just a little higher and test the program from more examples.

FFT Spectrum not displaying correctly

I'm currently trying to display an audio spectrum using FFTW3 and SFML. I've followed the directions found here and looked at numerous references on FFT and spectrums and FFTW yet somehow my bars are almost all aligned to the left like below. Another issue I'm having is I can't find information on what the scale of the FFT output is. Currently I'm dividing it by 64 yet it still reaches beyond that occasionally. And further still I have found no information on why the output of the from FFTW has to be the same size as the input. So my questions are:
Why is the majority of my spectrum aligned to the left unlike the image below mine?
Why isn't the output between 0.0 and 1.0?
Why is the input sample count related to the fft output count?
What I get:
What I'm looking for:
const int bufferSize = 256 * 8;
void init() {
sampleCount = (int)buffer.getSampleCount();
channelCount = (int)buffer.getChannelCount();
for (int i = 0; i < bufferSize; i++) {
window.push_back(0.54f - 0.46f * cos(2.0f * GMath::PI * (float)i / (float)bufferSize));
}
plan = fftwf_plan_dft_1d(bufferSize, signal, results, FFTW_FORWARD, FFTW_ESTIMATE);
}
void update() {
int mark = (int)(sound.getPlayingOffset().asSeconds() * sampleRate);
for (int i = 0; i < bufferSize; i++) {
float s = 0.0f;
if (i + mark < sampleCount) {
s = (float)buffer.getSamples()[(i + mark) * channelCount] / (float)SHRT_MAX * window[i];
}
signal[i][0] = s;
signal[i][1] = 0.0f;
}
}
void draw() {
int inc = bufferSize / 2 / size.x;
int y = size.y - 1;
int max = size.y;
for (int i = 0; i < size.x; i ++) {
float total = 0.0f;
for (int j = 0; j < inc; j++) {
int index = i * inc + j;
total += std::sqrt(results[index][0] * results[index][0] + results[index][1] * results[index][1]);
}
total /= (float)(inc * 64);
Rectangle2I rect = Rectangle2I(i, y, 1, -(int)(total * max)).absRect();
g->setPixel(rect, Pixel(254, toColor(BLACK, GREEN)));
}
}
All of your questions are related to the FFT theory. Study the properties of FFT from any standard text/reference book and you will be able to answer your questions all by yourself only.
The least you can start from is here:
https://en.wikipedia.org/wiki/Fast_Fourier_transform.
Many FFT implementations are energy preserving. That means the scale of the output is linearly related to the scale and/or size of the input.
An FFT is a DFT is a square matrix transform. So the number of outputs will always be equal to the number of inputs (or half that by ignoring the redundant complex conjugate half given strictly real input), unless some outputs are thrown away. If not, it's not an FFT. If you want less outputs, there are ways to downsample the FFT output or post process it in other ways.

generating correct spectrogram using fftw and window function

For a project I need to be able to generate a spectrogram from a .WAV file. I've read the following should be done:
Get N (transform size) samples
Apply a window function
Do a Fast Fourier Transform using the samples
Normalise the output
Generate spectrogram
On the image below you see two spectrograms of a 10000 Hz sine wave both using the hanning window function. On the left you see a spectrogram generated by audacity and on the right my version. As you can see my version has a lot more lines/noise. Is this leakage in different bins? How would I get a clear image like the one audacity generates. Should I do some post-processing? I have not yet done any normalisation because do not fully understand how to do so.
update
I found this tutorial explaining how to generate a spectrogram in c++. I compiled the source to see what differences I could find.
My math is very rusty to be honest so I'm not sure what the normalisation does here:
for(i = 0; i < half; i++){
out[i][0] *= (2./transform_size);
out[i][6] *= (2./transform_size);
processed[i] = out[i][0]*out[i][0] + out[i][7]*out[i][8];
//sets values between 0 and 1?
processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
}
after doing this I got this image (btw I've inverted the colors):
I then took a look at difference of the input samples provided by my sound library and the one of the tutorial. Mine were way higher so I manually normalised is by dividing it by the factor 32767.9. I then go this image which looks pretty ok I think. But dividing it by this number seems wrong. And I would like to see a different solution.
Here is the full relevant source code.
void Spectrogram::process(){
int i;
int transform_size = 1024;
int half = transform_size/2;
int step_size = transform_size/2;
double in[transform_size];
double processed[half];
fftw_complex *out;
fftw_plan p;
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
for(int x=0; x < wavFile->getSamples()/step_size; x++){
int j = 0;
for(i = step_size*x; i < (x * step_size) + transform_size - 1; i++, j++){
in[j] = wavFile->getSample(i)/32767.9;
}
//apply window function
for(i = 0; i < transform_size; i++){
in[i] *= windowHanning(i, transform_size);
// in[i] *= windowBlackmanHarris(i, transform_size);
}
p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE);
fftw_execute(p); /* repeat as needed */
for(i = 0; i < half; i++){
out[i][0] *= (2./transform_size);
out[i][11] *= (2./transform_size);
processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13];
processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
}
for (i = 0; i < half; i++){
if(processed[i] > 0.99)
processed[i] = 1;
In->setPixel(x,(half-1)-i,processed[i]*255);
}
}
fftw_destroy_plan(p);
fftw_free(out);
}
This is not exactly an answer as to what is wrong but rather a step by step procedure to debug this.
What do you think this line does? processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13] Likely that is incorrect: fftw_complex is typedef double fftw_complex[2], so you only have out[i][0] and out[i][1], where the first is the real and the second the imaginary part of the result for that bin. If the array is contiguous in memory (which it is), then out[i][12] is likely the same as out[i+6][0] and so forth. Some of these will go past the end of the array, adding random values.
Is your window function correct? Print out windowHanning(i, transform_size) for every i and compare with a reference version (for example numpy.hanning or the matlab equivalent). This is the most likely cause, what you see looks like a bad window function, kind of.
Print out processed, and compare with a reference version (given the same input, of course you'd have to print the input and reformat it to feed into pylab/matlab etc). However, the -60 and 1e-6 are fudge factors which you don't want, the same effect is better done in a different way. Calculate like this:
power_in_db[i] = 10 * log(out[i][0]*out[i][0] + out[i][1]*out[i][1])/log(10)
Print out the values of power_in_db[i] for the same i but for all x (a horizontal line). Are they approximately the same?
If everything so far is good, the remaining suspect is setting the pixel values. Be very explicit about clipping to range, scaling and rounding.
int pixel_value = (int)round( 255 * (power_in_db[i] - min_db) / (max_db - min_db) );
if (pixel_value < 0) { pixel_value = 0; }
if (pixel_value > 255) { pixel_value = 255; }
Here, again, print out the values in a horizontal line, and compare with the grayscale values in your pgm (by hand, using the colorpicker in photoshop or gimp or similar).
At this point, you will have validated everything from end to end, and likely found the bug.
The code you produced, was almost correct. So, you didn't left me much to correct:
void Spectrogram::process(){
int transform_size = 1024;
int half = transform_size/2;
int step_size = transform_size/2;
double in[transform_size];
double processed[half];
fftw_complex *out;
fftw_plan p;
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
for (int x=0; x < wavFile->getSamples()/step_size; x++) {
// Fill the transformation array with a sample frame and apply the window function.
// Normalization is performed later
// (One error was here: you didn't set the last value of the array in)
for (int j = 0, int i = x * step_size; i < x * step_size + transform_size; i++, j++)
in[j] = wavFile->getSample(i) * windowHanning(j, transform_size);
p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE);
fftw_execute(p); /* repeat as needed */
for (int i=0; i < half; i++) {
// (Here were some flaws concerning the access of the complex values)
out[i][0] *= (2./transform_size); // real values
out[i][1] *= (2./transform_size); // complex values
processed[i] = out[i][0]*out[i][0] + out[i][1]*out[i][1]; // power spectrum
processed[i] = 10./log(10.) * log(processed[i] + 1e-6); // dB
// The resulting spectral values in 'processed' are in dB and related to a maximum
// value of about 96dB. Normalization to a value range between 0 and 1 can be done
// in several ways. I would suggest to set values below 0dB to 0dB and divide by 96dB:
// Transform all dB values to a range between 0 and 1:
if (processed[i] <= 0) {
processed[i] = 0;
} else {
processed[i] /= 96.; // Reduce the divisor if you prefer darker peaks
if (processed[i] > 1)
processed[i] = 1;
}
In->setPixel(x,(half-1)-i,processed[i]*255);
}
// This should be called each time fftw_plan_dft_r2c_1d()
// was called to avoid a memory leak:
fftw_destroy_plan(p);
}
fftw_free(out);
}
The two corrected bugs were most probably responsible for the slight variation of successive transformation results. The Hanning window is very vell suited to minimize the "noise" so a different window would not have solved the problem (actually #Alex I already pointed to the 2nd bug in his point 2. But in his point 3. he added a -Inf-bug as log(0) is not defined which can happen if your wave file containts a stretch of exact 0-values. To avoid this the constant 1e-6 is good enough).
Not asked, but there are some optimizations:
put p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE); outside the main loop,
precalculate the window function outside the main loop,
abandon the array processed and just use a temporary variable to hold one spectral line at a time,
the two multiplications of out[i][0] and out[i][1] can be abandoned in favour of one multiplication with a constant in the following line. I left this (and other things) for you to improve
Thanks to #Maxime Coorevits additionally a memory leak could be avoided: "Each time you call fftw_plan_dft_rc2_1d() memory are allocated by FFTW3. In your code, you only call fftw_destroy_plan() outside the outer loop. But in fact, you need to call this each time you request a plan."
Audacity typically doesn't map one frequency bin to one horizontal line, nor one sample period to one vertical line. The visual effect in Audacity may be due to resampling of the spectrogram picture in order to fit the drawing area.

iOS - C/C++ - Speed up Integral Image calculation

I have a method which calculates an integral image (description here) commonly used in computer vision applications.
float *Integral(unsigned char *grayscaleSource, int height, int width, int widthStep)
{
// convert the image to single channel 32f
unsigned char *img = grayscaleSource;
// set up variables for data access
int step = widthStep/sizeof(float);
uint8_t *data = (uint8_t *)img;
float *i_data = (float *)malloc(height * width * sizeof(float));
// first row only
float rs = 0.0f;
for(int j=0; j<width; j++)
{
rs += (float)data[j];
i_data[j] = rs;
}
// remaining cells are sum above and to the left
for(int i=1; i<height; ++i)
{
rs = 0.0f;
for(int j=0; j<width; ++j)
{
rs += data[i*step+j];
i_data[i*step+j] = rs + i_data[(i-1)*step+j];
}
}
// return the integral image
return i_data;
}
I am trying to make it as fast as possible. It seems to me like this should be able to take advantage of Apple's Accelerate.framework, or perhaps ARMs neon intrinsics, but I can't see exactly how. It seems like that nested loop is potentially quite slow (for real time applications at least).
Does anyone think this is possible to speed up using any other techniques??
You can certainly vectorize the row by row summation. That is vDSP_vadd(). The horizontal direction is vDSP_vrsum().
If you want to write your own vector code, the horizontal sum might be sped up by something like psadbw, but that is Intel. Also, take a look at prefix sum algorithms, which are famously parallelizable.

FFT of an image

I have an assignment about fftw and I was trying to write a small program to create an fft of an image. I am using CImg to read and write images. But all I get is a dark image with a single white dot :(
I'm most likely doing this the wrong way and I would appreciate if someone could explain how this should be done. I don't need the code, I just need to know what is the right way to do this.
Here is my code:
CImg<double> input("test3.bmp");
CImg<double> image_fft(input, false);
unsigned int nx = input.dimx(), ny = input.dimy();
size_t align = sizeof(Complex);
array2<Complex> in (nx, ny, align);
fft2d Forward(-1, in);
for (int i = 0; i < input.dimx(); ++i) {
for (int j = 0; j < input.dimy(); ++j) {
in(i,j) = input(i,j);
}
}
Forward.fft(in);
for (int i = 0; i < input.dimx(); ++i) {
for (int j = 0; j < input.dimy(); ++j) {
image_fft(i,j,0) = image_fft(i,j,1) = image_fft(i,j,2) = std::abs(in(i,j));
}
}
image_fft.normalize(0, 255);
image_fft.save("test.bmp");
You need to take the log of the magnitude. The single white dot is the base value (0 Hz, DC, whatever you want to call it), so it will almost ALWAYS be by far the largest component of any image you take (Since pixel values cannot be negative, the DC value will always be positive and large).
What you need to do is calculate the log (ln, whatever, some type of logarithmic calculation) of the magnitude (so after you've converted from complex to magnitude/phase form (phasor notation iirc?)) on each point before you normalize it.
Please note that the values are there, they are just REALLY small compared to the DC value, taking the log (Which makes smaller values bigger by a lot, and bigger values only slightly larger) will make the other frequencies visible.