FFT : FFTW Matlab FFT2 mystery - c++

I inherited an old fortran code with fft subroutine and I am unable to trace the source of that program. The only thing I know is that there is a call to ff2prp() and the call to fft2() to perform 2D forward and backward DFT. In order to know what the code is doing I take DFT of a 4x4 2D array (matrix) and the results are very different from Matlab and FFTW results.
Question: Can someone tell what the code is doing by looking at the output. The input and output are both real arrays
Input array
0.20000 0.30000 1.00000 1.20000
0.00000 12.00000 5.00000 1.30000
0.30000 0.30000 1.00000 1.40000
0.00000 0.00000 0.00000 1.50000
After forward FFT with fft2() fortran routine
0.16875 -0.01875 -0.05000 0.05625
0.00000 12.00000 5.00000 1.30000
0.30000 0.30000 1.00000 1.40000
0.00000 0.00000 0.00000 1.50000
Matlab output performing DCT: dct2(input)
6.3750 -0.8429 -3.4250 -2.4922
2.4620 0.6181 -2.6356 -0.9887
-4.2750 -0.9798 4.2250 2.2730
-4.8352 -1.2387 5.0695 3.4819
Output from C++ code using FFTW library.
DCT from FFTW
(6.3750, 0.00) (-0.8429, 0.00) (-3.4250, 0.00) (-2.4922, 0.00)
(2.4620, 0.00) (0.6181, 0.00) (-2.6356, 0.00) (-0.9887, 0.00)
(-4.2750, 0.00) (-0.9798, 0.00) (4.2250, 0.00) (2.2730, 0.00)
(-4.8352, 0.00) (-1.2387, 0.00) (5.0695, 0.00) (3.4819, 0.00)
Forward FFT with Matlab - fft2(input)
25.5000 + 0.0000i -6.5000 - 7.2000i -10.5000 + 0.0000i -6.5000 + 7.2000i
-0.3000 -16.8000i -12.3000 + 4.8000i 0.1000 + 6.8000i 12.1000 + 5.2000i
-14.1000 + 0.0000i 3.5000 +11.2000i 9.1000 + 0.0000i 3.5000 -11.2000i
-0.3000 +16.8000i 12.1000 - 5.2000i 0.1000 - 6.8000i -12.3000 - 4.8000i
Forward FFT with FFTW
(25.50, 0.00) (-6.50, -7.20) (-10.50, 0.00) (-6.50, 7.20)
(-0.30, -16.80) (-12.30, 4.80) (0.10, 6.80) (12.10, 5.20)
(-14.10, 0.00) (3.50, 11.20) (9.10, 0.00) (3.50, -11.20)
(-0.30, 16.80) (12.10, -5.20) (0.10, -6.80) (-12.30, -4.80)
As you cane see the output of Matlab and FFTW agrees with each other but not with the output of the fortran code.
I would like to use FFTW but the results are different due to FFT. I can't figure out what FFT the fortran program is doing. Can anyone tell by looking at the output.

As far as I can tell, fft2 seems to have computed the 1D FFT of the first row (leaving all 3 others unchanged), with the result being scaled by 1/16 and packed in r0, r2, r1, i1 format.
In other words the output can be constructed in Matlab using:
input = [0.2 0.3 1 1.2;0 12 5 1.3;0.3 0.3 1 1.4;0 0 0 1.5];
N = size(input,2);
A = fft(input(1,:))/16;
B = reshape([real(A);imag(A)],1,2*N);
B(2) = B(N+1);
output = [B(1:N);A(2:size(input,1),:)];
If you have some reasons to believe that fft2 should be computing 2D FFTs, then there might some problem in the way you pass your data to this routine which causes incorrect results. Also, additional test cases (or how you call ff2prp) for different sizes inputs may provide more insight about the choice of scaling factor (eg. is it 1/N^2 or 1/4N, or something else).

Related

Using atan2 to turn a range from - 1 to 1 into degrees

I'm trying to use atan2 to turn a range of - 1 to 1 into radians and then from radians into degrees.
However atan2(0,1) is equal to 0 when it should be equal 90.0 what am I doing wrong here?
float radians = atan2(0, 1);
float degrees = (radians * 180 / PI);
if(radians < 0)
{
degrees += 360;
}
Edit: Okay so i've plugged in the values the right way round this time.
float xaxisval = controller->left_stick_x_axis();
float yaxisval = controller->left_stick_y_axis();
// plug values into atan2
float radians = atan2(yaxisval, xaxisval);
float degrees = (radians * 180 / PI);
if (radians < 0)
{
degrees += 360;
}
For context xaxisval and yaxisval are grabbing a value from an analog stick with a max value of 1 to the right and a minimum value of -1 to the left. So when i press the analog stick to the right the yaxisval is equal to 0 and the xaxisval is equal to 1.
This should return 90 degrees, as if you imagine the analog stick as a full 360 degree circle. Up direction is 360/0 right is 90 down is 180 left is 270 etc.
I stuck these values into the debugger and this is what it returned.
xaxisval: 1.00000
yaxisval: 0.00000
degrees: 0.00000
However I want this direction to come up as 90 degrees it has seemed to jumped up by 90 degrees and i tested the down position and it was equal to 90. Any suggestions?
Debugging Output:
Joystick Up position
xaxisval: 0.00000
yaxisval: -1.00000
degrees: 270.00000
Joystick Right position
xaxisval: 1.00000
yaxisval: 0.00000
degrees: 0.00000
Joystick Down position
xaxisval: 0.00000
yaxisval: 1.00000
degrees: 90.00000
Joystick Left position
xaxisval: -1.00000
yaxisval: 0.00000
degrees: 180.00000
Joystick North East position
xaxisval: 0.929412
yaxisval: 0.592157
degrees: 327.497528
You're passing in the arguments in the wrong order. std::atan2 expects the arguments in the order y,x, not x,y.
Yes, that is incredibly dumb, but it's related to how the tangent function is defined in the first place (which is defined as a ratio of the y-component to the x-component, not the other way around), and like many notational mistakes in mathematics, inertia set in thousands of years ago and you can't push back against it without being a crank.
So write your code like this:
float radians = atan2(1, 0);
Or, if you want everything as explicit as possible:
float x = 0, y = 1;
float radians = atan2(y, x); //Yes, that's the correct order, don't # me
And you'll get the results you expect.
Your second problem is that the values that atan2 correspond to don't match up with the directions you want. What you want is a circle where the top is 0°, the right side is 90°, the bottom is 180°, and the left side is 270°. Punching the values into atan2 is instead going to produce values where the right side is 0°, up is 90°, left is 180°, and down is 270°.
Also, in comparing with my own hardware, my y-axis is flipped compared to yours. Mine is y+↑, whereas your setup appears to be y-↑
So if you want to transform the normal atan2 rotation into the rotation you want, you'll need to transform it like this:
float radians = atan2(yaxisval, xaxisval);
float degrees = (radians * 180 / PI);
degrees = 90 - degrees;
if(degrees < 0)
degrees += 360;
Then, all you have to do from there is possibly transform the y-axis depending on whether you expect the joystick pushed up to return a positive or negative value. That's up to the domain of your program.

Math: Average out lines in polar coordinate system (c++ opencv)

I am using OpenCV for some line detection with HoughLines. Then I look for there intersections.
This is the end result:
http://i.imgur.com/PaGw8RI.png
(green dots being the intersections and red lines being the raw lines after houghlines operation)
As you can see there are a lot of lines detected and to compute intersections each and every line is compiled with each other thus extremely increasing the processing time.
I am looking to optimize the number of lines by averaging out similar lines after the initial HoughLines operation.
The problem is that HoughLines outputs the data in polar coordinate system and so far I was not able to find any similar code or a mathematical equation to do this.
Any help would be much appreciated.
Edit: added R, Phi sorted according to the line it belongs to.
R Phi -11.000 , 3.124
R Phi 15.000 , 0.000
R Phi 13.000 , 0.000
R Phi 22.000 , 0.000
R Phi -18.000 , 3.124
R Phi -9.000 , 3.107
R Phi -10.000 , 3.089
R Phi -7.000 , 3.089
R Phi 19.000 , 0.017
R Phi -6.000 , 3.107
R Phi -4.000 , 3.072
R Phi -14.000 , 3.107
R Phi 27.000 , 0.017
R Phi 172.000 , 1.553
R Phi 165.000 , 1.553
R Phi 173.000 , 1.536
R Phi 170.000 , 1.571
R Phi 166.000 , 1.536
R Phi -163.000 , 3.107
R Phi 169.000 , 0.017
R Phi 172.000 , 0.035
R Phi -165.000 , 3.124
R Phi -159.000 , 3.124
R Phi 165.000 , 0.000
R Phi 167.000 , 0.000
R Phi 167.000 , 0.035
R Phi -155.000 , 3.107
R Phi 313.000 , 1.571
R Phi 319.000 , 1.536
R Phi 312.000 , 1.588
R Phi 315.000 , 1.553
R Phi 317.000 , 1.553
R Phi 24.000 , 1.536
R Phi 26.000 , 1.518
R Phi 22.000 , 1.553
An average or delta would work fine I guess but I need to learn as to why negative values have different theta. In practical view the difference appears to be Pi so for every negative value I could go with abs(r) and Pi-Phi.
However I need to know if this is a 100 % full proof solution.
EDIT2: After testing I am sure that I was not exactly right here... The lines plainly switched place...
From what I gather from the documentation the HoughLines function returns a parametrization of the lines in polar coordinates. So the (r, phi) tuple describes the shortest distance of the line to the origin and the angle between the line from the origin to the point of shortest distance and the x axis. The sketch from the function documentation illustrates the situation (there is a right angle between the blue and red line):
Now, if two lines are similar, they may be tilted a bit against each other and their distance of closest approach to the origin will vary slightly. So, you can just coalesce detected lines that differ by less than a DeltaR and a DeltaPhi (simultaneously), e.g. by taking the means of the parameters. You can also perform a weighted average if you have some kind of measure of how trustworthy the detected lines are. How big the tolerances should be depends very much on your coordinate system and your application. Of course, as albemala said, raising the threshold of the algorithm to produce less false lines in the first place will help with getting more precise results.
Edit: The above assumes that there are no ambiguities in the parameters. As this is not the case here, you have to fold r to [0,∞) and phi to [0,2π). If r is negative flip its sign and add π to phi, then add a multiple of 2π to phi so that the result is in the range 0 <= phi < 2π. You will always have a branch cut somewhere, where phi jumps by 2π. This complicates the comparison of angles and their average:
You have to compare angles "modulo 2π", i.e. for two angles phi1, phi2 take the smaller of abs(phi1-phi2) and abs(abs(phi1-phi2)-2π) as their difference.
After you identified the lines that should be coalesced you have to bring their phi values numerically close together for the average, e.g. by adding 2π to the values smaller than π. You can then fold the averaged value back into the range [0,2π).
Edit 2: Values of phi less than π combined with a negative r can only caused by lines going through sectors 2 (x<0, y>0), 3 (x<0, y<0) and 4 (x>0, y<0) of the coordinate system. These lines will never be visible in sector 1 (x>0, y>0). Therefore, if you only consider lines through this sector phi - π will be in the range [0,2π) an you can save the folding step. You still have to cope with the branch cut at phi = 0 when averaging and measuring angle differences, though.

Incorrect results with invert in opencv

I am using opencv library in c++ for matrix inversion. I use the function invert with DECOMP_SVD flag. For matrices which are not singular, it computes using SVD method.
However it gives me an incorrect answer for singular matrices (determinant = 0) when I compare it to the output in Matlab for the same inversion.
The answer is off by a margin of 1e+4!
The methods I have used in matlab is pinv() and svd().
pinv() uses moore-Penrose method.
Need help
Thanks in advance!
Example :
original =
0.2667 0.0667 -1.3333 2.2222
0.0667 0.0667 -0.0000 0.8889
-1.3333 -0.0000 8.8889 -8.8889
2.2222 0.8889 -8.8889 20.7408
Inverse from matlab =
1.0e+04 *
9.8888 -0.0000 0.7417 -0.7417
-0.0000 9.8888 -0.7417 -0.7417
0.7417 -0.7417 0.1113 0.0000
-0.7417 -0.7417 0.0000 0.1113
Your matrix is ill-conditioned (weak main diagonal).
Try to increase the main diagonal elements, and, I think, the error should decrease.

Hilbert Transform (Analytical Signal) using Apple's Accelerate Framework?

I am having issues with getting a Matlab equivalent Hilbert transform in C++ with using Apple's Accelerate Framework. I have been able to get vDSP's FFT algorithm working and, with the help of Paul R's post, have managed to get the same outcome as Matlab.
I have read both: this stackoverflow question by Jordan and have read the Matlab algorithm (under the 'Algorithms' sub-heading). To sum the algorithm up in 3 stages:
Take forward FFT of input.
Zero reflection frequencies and double frequencies between DC and Nyquist.
Take inverse FFT of the modified forward FFT output.
Below are the outputs of both Matlab and C++ for each stage. The examples use the following arrays:
Matlab: m = [1 2 3 4 5 6 7 8];
C++: float m[] = {1,2,3,4,5,6,7,8};
Matlab Example
Stage 1:
36.0000 + 0.0000i
-4.0000 + 9.6569i
-4.0000 + 4.0000i
-4.0000 + 1.6569i
-4.0000 + 0.0000i
-4.0000 - 1.6569i
-4.0000 - 4.0000i
-4.0000 - 9.6569i
Stage 2:
36.0000 + 0.0000i
-8.0000 + 19.3137i
-8.0000 + 8.0000i
-8.0000 + 3.3137i
-4.0000 + 0.0000i
0.0000 + 0.0000i
0.0000 + 0.0000i
0.0000 + 0.0000i
Stage 3:
1.0000 + 3.8284i
2.0000 - 1.0000i
3.0000 - 1.0000i
4.0000 - 1.8284i
5.0000 - 1.8284i
6.0000 - 1.0000i
7.0000 - 1.0000i
8.0000 + 3.8284i
C++ Example (with Apple's Accelerate Framework)
Stage 1:
Real: 36.0000, Imag: 0.0000
Real: -4.0000, Imag: 9.6569
Real: -4.0000, Imag: 4.0000
Real: -4.0000, Imag: 1.6569
Real: -4.0000, Imag: 0.0000
Stage 2:
Real: 36.0000, Imag: 0.0000
Real: -8.0000, Imag: 19.3137
Real: -8.0000, Imag: 8.0000
Real: -8.0000, Imag: 3.3137
Real: -4.0000, Imag: 0.0000
Stage 3:
Real: -2.0000, Imag: -1.0000
Real: 2.0000, Imag: 3.0000
Real: 6.0000, Imag: 7.0000
Real: 10.0000, Imag: 11.0000
It's quite clear that the end results are not the same. I am looking to be able to compute the Matlab 'Stage 3' results (or at least the imaginary parts) but I am unsure how to go about it, I've tried everything I can think of with no success. I have a feeling that it's because I'm not zeroing out the reflection frequencies in the Apple Accelerate version as they are not provided (due to being calculated from the frequencies between DC and Nyquist) - so I think the FFT is just taking the conjugate of the doubled frequencies as the reflection values, which is wrong. If anyone could help me overcome this issue I would greatly appreciate it.
Code
void hilbert(std::vector<float> &data, std::vector<float> &hilbertResult){
// Set variable.
dataSize_2 = data.size() * 0.5;
// Clear and resize vectors.
workspace.clear();
hilbertResult.clear();
workspace.resize(data.size()/2+1, 0.0f);
hilbertResult.resize(data.size(), 0.0f);
// Copy data into the hilbertResult vector.
std::copy(data.begin(), data.end(), hilbertResult.begin());
// Perform forward FFT.
fft(hilbertResult, hilbertResult.size(), FFT_FORWARD);
// Fill workspace with 1s and 2s (to double frequencies between DC and Nyquist).
workspace.at(0) = workspace.at(dataSize_2) = 1.0f;
for (i = 1; i < dataSize_2; i++)
workspace.at(i) = 2.0f;
// Multiply forward FFT output by workspace vector.
for (i = 0; i < workspace.size(); i++) {
hilbertResult.at(i*2) *= workspace.at(i);
hilbertResult.at(i*2+1) *= workspace.at(i);
}
// Perform inverse FFT.
fft(hilbertResult, hilbertResult.size(), FFT_INVERSE);
for (i = 0; i < hilbertResult.size()/2; i++)
printf("Real: %.4f, Imag: %.4f\n", hilbertResult.at(i*2), hilbertResult.at(i*2+1));
}
The code above is what I used to get the 'Stage 3' C++ (with Apple's Accelerate Framework) result. The fft(..) method for the forward fft performs the vDSP fft routine followed by a scale of 0.5 and then unpacked (as per Paul R's post). When the inverse fft is performed, the data is packed, scaled by 2.0, fft'd using vDSP and finally scaled by 1/(2*N).
So the main problem (as far as I can tell, as your code sample doesn't include the actual calls into vDSP) is that you’re attempting to use the real-to-complex FFT routines for step three, which is fundamentally a complex-to-complex inverse FFT (as evidenced by the fact that your Matlab results have non-zero imaginary parts).
Here’s a simple C implementation using vDSP that matches your expected Matlab output (I used the more modern vDSP_DFT routines, which should generally be preferred to the older fft routines, but otherwise this is very similar to what you’re doing, and illustrates the need for a real-to-complex forward transform, but complex-to-complex inverse transform):
#include <Accelerate/Accelerate.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
const vDSP_Length n = 8;
vDSP_DFT_SetupD forward = vDSP_DFT_zrop_CreateSetupD(NULL, n, vDSP_DFT_FORWARD);
vDSP_DFT_SetupD inverse = vDSP_DFT_zop_CreateSetupD(forward, n, vDSP_DFT_INVERSE);
// Look like a typo? The real-to-complex DFT takes its input separated into
// the even- and odd-indexed elements. Since the real signal is [ 1, 2, 3, ... ],
// signal[0] is 1, signal[2] is 3, and so on for the even indices.
double even[n/2] = { 1, 3, 5, 7 };
double odd[n/2] = { 2, 4, 6, 8 };
double real[n] = { 0 };
double imag[n] = { 0 };
vDSP_DFT_ExecuteD(forward, even, odd, real, imag);
// At this point, we have the forward real-to-complex DFT, which agrees with
// MATLAB up to a factor of two. Since we want to double all but DC and NY
// as part of the Hilbert transform anyway, I'm not going to bother to
// unscale the rest of the frequencies -- they're already the values that
// we really want. So we just need to move NY into the "right place",
// and scale DC and NY by 0.5. The reflection frequencies are already
// zeroed out because the real-to-complex DFT only writes to the first n/2
// elements of real and imag.
real[0] *= 0.5; real[n/2] = 0.5*imag[0]; imag[0] = 0.0;
printf("Stage 2:\n");
for (int i=0; i<n; ++i) printf("%f%+fi\n", real[i], imag[i]);
double hilbert[2*n];
double *hilbertreal = &hilbert[0];
double *hilbertimag = &hilbert[n];
vDSP_DFT_ExecuteD(inverse, real, imag, hilbertreal, hilbertimag);
// Now we have the completed hilbert transform up to a scale factor of n.
// We can unscale using vDSP_vsmulD.
double scale = 1.0/n; vDSP_vsmulD(hilbert, 1, &scale, hilbert, 1, 2*n);
printf("Stage 3:\n");
for (int i=0; i<n; ++i) printf("%f%+fi\n", hilbertreal[i], hilbertimag[i]);
vDSP_DFT_DestroySetupD(inverse);
vDSP_DFT_DestroySetupD(forward);
return 0;
}
Note that if you already have your DFT setups built and your storage allocated, the computation is extremely straightforward; the “real work” is just:
vDSP_DFT_ExecuteD(forward, even, odd, real, imag);
real[0] *= 0.5; real[n/2] = 0.5*imag[0]; imag[0] = 0.0;
vDSP_DFT_ExecuteD(inverse, real, imag, hilbertreal, hilbertimag);
double scale = 1.0/n; vDSP_vsmulD(hilbert, 1, &scale, hilbert, 1, 2*n);
Sample output:
Stage 2:
36.000000+0.000000i
-8.000000+19.313708i
-8.000000+8.000000i
-8.000000+3.313708i
-4.000000+0.000000i
0.000000+0.000000i
0.000000+0.000000i
0.000000+0.000000i
Stage 3:
1.000000+3.828427i
2.000000-1.000000i
3.000000-1.000000i
4.000000-1.828427i
5.000000-1.828427i
6.000000-1.000000i
7.000000-1.000000i
8.000000+3.828427i

glsl performance: tan(acos(x)) vs sqrt(1-x*x)/x

I am writing a glsl fragment shader in which I use shadow mapping. Following this tutorial http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-16-shadow-mapping/ , I wrote this line to evaluate the shaodw bias to avoid the shadow acne
float bias = 0.005 * tan( acos ( N_L_dot ) );
But I know from math that
tan ( acos ( x ) = sqrt ( 1 - x^2 ) / x
Would it be faster to use such identity instead of tan and acos? In practice, to use this line of code
float bias = 0.005 * sqrt ( 1.f - N_L_dot * N_L_dot ) / N_L_dot ;
I think my question is something like "Is the gpu faster at doing sqrt and divisions or tan and acos?"
...or am I missing something?
Using AMD GPU Shader Analyzer it showed that float bias = 0.005 * sqrt ( 1.f - N_L_dot * N_L_dot ) / N_L_dot ;
Will generate fewer clock cycle instructions in the shader assembly ( 4 instructions estimating 4 clock cycles).
Where the float bias = 0.005 * tan( acos ( N_L_dot ) ); generated 15 instructions estimating 8 clock cycles to complete.
I ran the two different methods against the Radeon HD 6450 Assembly code. But the results seemed to track well for the different Radeon HD cards.
Looks like the sqrt method will generally perform better.