I am trying to implement the FFT algorithm on C. I wrote a code based on the function "four1" from the book "Numerical Recipes in C". I know that using external libraries such as FFTW would be more efficient, but I just wanted to try this as a first approach. But I am getting an error at runtime.
After trying to debug for a while, I have decided to copy the exact same function provided in the book, but I still have the same problem. The problem seems to be in the following commands:
tempr = wr * data[j] - wi * data[j + 1];
tempi = wr * data[j + 1] + wi * data[j];
and
data[j + 1] = data[i + 1] - tempi;
the j is sometimes as high as the last index of the array, so you cannot add one when indexing.
As I said, I didn´t change anything from the code, so I am very surprised that it is not working for me; it is a well-known reference for numerical methods in C, and I doubt there are errors in it. Also, I have found some questions regarding the same code example but none of them seemed to have the same issue (see C: Numerical Recipies (FFT), for example). What am I doing wrong?
Here is the code:
#include <iostream>
#include <stdio.h>
using namespace std;
#define SWAP(a,b) tempr=(a);(a)=(b);(b)=tempr
void four1(double* data, unsigned long nn, int isign)
{
unsigned long n, mmax, m, j, istep, i;
double wtemp, wr, wpr, wpi, wi, theta;
double tempr, tempi;
n = nn << 1;
j = 1;
for (i = 1; i < n; i += 2) {
if (j > i) {
SWAP(data[j], data[i]);
SWAP(data[j + 1], data[i + 1]);
}
m = n >> 1;
while (m >= 2 && j > m) {
j -= m;
m >>= 1;
}
j += m;
}
mmax = 2;
while (n > mmax) {
istep = mmax << 1;
theta = isign * (6.28318530717959 / mmax);
wtemp = sin(0.5 * theta);
wpr = -2.0 * wtemp * wtemp;
wpi = sin(theta);
wr = 1.0;
wi = 0.0;
for (m = 1; m < mmax; m += 2) {
for (i = m; i <= n; i += istep) {
j = i + mmax;
tempr = wr * data[j] - wi * data[j + 1];
tempi = wr * data[j + 1] + wi * data[j];
data[j] = data[i] - tempr;
data[j + 1] = data[i + 1] - tempi;
data[i] += tempr;
data[i + 1] += tempi;
}
wr = (wtemp = wr) * wpr - wi * wpi + wr;
wi = wi * wpr + wtemp * wpi + wi;
}
mmax = istep;
}
}
#undef SWAP
int main()
{
// Testing with random data
double data[] = {1, 1, 2, 0, 1, 3, 4, 0};
four1(data, 4, 1);
for (int i = 0; i < 7; i++) {
cout << data[i] << " ";
}
}
The first 2 editions of Numerical Recipes in C use the unusual (for C) convention that arrays are 1-based. (This was probably because the Fortran (1-based) version came first and the translation to C was done without regard to conventions.)
You should read section 1.2 Some C Conventions for Scientific
Computing, specifically the paragraphs on Vectors and One-Dimensional Arrays. As well as trying to justify their 1-based decision, this section does explain how to adapt pointers appropriately to match their code.
In your case, this should work -
int main()
{
// Testing with random data
double data[] = {1, 1, 2, 0, 1, 3, 4, 0};
double *data1based = data - 1;
four1(data1based, 4, 1);
for (int i = 0; i < 7; i++) {
cout << data[i] << " ";
}
}
However, as #Some programmer dude mentions in the comments the workaround advocated by the book is undefined behaviour as data1based points outside the bounds of the data array.
Whilst this way well work in practice, an alternative and non-UB workaround would be to change your interpretation to match their conventions -
int main()
{
// Testing with random data
double data[] = { -1 /*dummy value*/, 1, 1, 2, 0, 1, 3, 4, 0};
four1(data, 4, 1);
for (int i = 1; i < 8; i++) {
cout << data[i] << " ";
}
}
I'd be very wary of this becoming contagious though and infecting your code too widely.
The third edition tacitly recognised this 'mistake' and, as part of supporting C++ and standard library collections, switched to use the C & C++ conventions of zero-based arrays.
Related
I'm trying to create a convolution function but I'm having trouble during the access to the kernel data (cv::Mat).
I create the 3x3 kernel:
cv::Mat krn(3, 3, CV_32FC1);
krn.setTo(1);
krn = krn/9;
And I try to loop over it. Next the image Mat will be the image to which I want to apply the convolution operator and output will be the result of convolution:
for (int r = 0; r < image.rows - krn.rows; ++r) {
for (int c = 0; c < image.cols - krn.cols; ++c) {
int sum = 0;
for (int rs = 0; rs < krn.rows; ++rs) {
for (int cs = 0; cs < krn.cols; ++cs) {
sum += krn.data[rs * krn.cols + cs] * image.data[(r + rs) * image.cols + c + cs];
}
}
output.data[(r+1)*src.cols + c + 1]=sum; // assuming 3x3 kernel
}
}
However the output is not as desired (only randomic black and white pixel).
However, if I change my code this way:
for (int r = 0; r < image.rows - krn.rows; ++r) {
for (int c = 0; c < image.cols - krn.cols; ++c) {
int sum = 0;
for (int rs = 0; rs < krn.rows; ++rs) {
for (int cs = 0; cs < krn.cols; ++cs) {
sum += 0.11 * image.data[(r + rs) * image.cols + c + cs]; // CHANGE HERE
}
}
output.data[(r+1)*src.cols + c + 1]=sum; // assuming 3x3 kernel
}
}
Using 0.11 instead of the kernel values seems to give the correct output.
For this reason I think I'm doing something wrong accessing the kernel's data.
P.S: I cannot use krn.at<float>(rs,cs).
Thanks!
Instead of needlessly using memcpy, you can just cast the pointer. I'll use a C-style cast because why not.
cv::Mat krn = 1 / (cv::Mat_<float>(3,3) <<
1, 2, 3,
4, 5, 6,
7, 8, 9);
for (int i = 0; i < krn.rows; i += 1)
{
for (int j = 0; j < krn.cols; j += 1)
{
// to see clearly what's happening
uint8_t *byteptr = krn.data + krn.step[0] * i + krn.step[1] * j;
float *floatptr = (float*) byteptr;
// or in one step:
float *floatptr = (float*) (krn.data + krn.step[0] * i + krn.step[1] * j);
cout << "krn.at<float>(" << i << "," << j << ") = " << (*floatptr) << endl;
endl;
}
}
krn.at<float>(0,0) = 1
krn.at<float>(0,1) = 0.5
krn.at<float>(0,2) = 0.333333
krn.at<float>(1,0) = 0.25
krn.at<float>(1,1) = 0.2
krn.at<float>(1,2) = 0.166667
krn.at<float>(2,0) = 0.142857
krn.at<float>(2,1) = 0.125
krn.at<float>(2,2) = 0.111111
Note that pointer arithmetic may not be obvious. if you have a uint8_t*, adding 1 moves it by one uint8_t, and if you have a float*, adding 1 moves it by one float which is four bytes. The step[] contains offsets expressed in bytes.
Consult the documentation for details, which include information on the step[] array that contains the strides/steps to calculate the offset given a tuple of indices into the matrix.
cv::Mat::data is pointer of type uchar.
By data[y * cols + x] you access some byte of stored float values in krn. To get full float values use at method template:
krn.at<float>(rs,cs)
Consider changing type of sum variable to be real. Without this, you may lose partial results when calculating convolution .
So, if you cannot use at, just read 4 bytes from data pointer:
float v = 0.0;
memcpy(&v, krn.data + (rs * krn.step + cs * sizeof(float)), 4);
step - means total bytes occupied by one line in mat.
Firstly, I would like to mention that I am a complete beginner when it comes to coding, let alone C++, so bear with me, as I need complete guidance. My task is to implement the Lanczos algorithm for the case of a 1-D anharmonic oscillator in C++, with reference to the paper linked Analytical Lanczos method.
The paper offers a step by step guide for the implementation of the algorithm:
Step by step guide here
with the initial trial function being: Psi_1 = (1 + x^2) * (exp(-x^2 - 1/4 * x^4).
The paper also contains code in MATHEMATICA for this particular case. Mathematica code
and thus, here is my attempt, which is greatly unfinished, however, I wanted to ensure I was going along the correct path with regards to the programming logic. There are still plentiful errors etc. (Also excuse the lack of fundamentals here, I am only a beginner. Thank you very much.)
int main() {
//Grid parameters.
const int Rmin = 1, Rmax = 31, nx = 300;//Grid length and stepsize.
double dx = (Rmax- Rmin) / nx; //Delta x.
double a, b;
std::vector<double> x, psi_1;
for (int j = 1; j < 64; ++j) { //Corresponds to each succesive Lanczos Vector.
for (int i = Rmin; i < nx + 1; i++) { //Defining the Hamiltonian on the grid.
x[i] = (nx / 2) + i;
psi_1[i] = (1 + pow(x[i] * dx, 2)) * exp(pow(-x[i] * dx, 2) - (1 / 4 * pow(x[i] * dx, 4 )) //Trial wavefunction.
H[i] = ((PSI[j][i + 1] - 2 * PSI[j][i] + PSI[j][i - 1]) / pow(dx, 2)) + PSI[j][i] * 1/2 * pow(x[i] * dx, 2) + PSI[j][i] * 2 * pow(x[i] * dx, 4) + PSI[j][i] * 1/2 * pow(x[i], 6); //Hamiltonian. ****
//First Lanczos step.
PSI[1][i] = psi_1[i]
}
//Normalisation of the wavefunction (b).
double b[j] = 0.0;
for (int i = Rmin; i < nx + 1; i++) {
PSI[1][i] = psi_1[i];
b[j] += abs(pow(PSI[j][i], 2));
}
b[j] = b[j] * dx;
for (int i = Rmin; i < nx + 1; i++) {
PSI[j] = PSI[j] / sqrt(b[j]);
}
//Expectation values (a). Main diagonal of the Hamiltonian matrix.
double a[j] = 0.0;
for (int i = Rmin; i < nx + 1; i++) {
a[j] += PSI[j] * H[i] * PSI[j] * dx
}
//Recursive expression.
PSI[j] = H[i] * PSI[j-1] - PSI[j-1] * a[j-1] - PSI[j-2] * b[j-1]
//Lanczos Matrix.
LanczosMatrix[R][C] =
for (int R = 1; R < 64; R++) {
row[R] =
}
}
I have yet to finish the code, but some experienced guidance would be greatly appreciated! (also, the code has to be cleaned up greatly, but this was an attempt to get the general idea down first.)
my codes does not work for Gauss Elimination for Matrix. The core code is ok, but it seems to be missing some final touch which I honestly dont know. Would be great if someone can point out the mistake.
Basically when I input a square 3x3 Matrix filled with 3s, I get back (3, 3, 3, 0, -3, -3, 0, 0, 3) but it should be (3, 3, 3, 0, 0, 0, 0, 0, 0)
n is number of rows of matrix and m is number of columns.
All elements of matrix are stored in a SINGLE DIMENSION array called entries[i]
My code below for GaussElimination basically starts with placing the row with the largest first element on the top row. Then after that I just delete the elements right below the top elements.
Matrix Matrix::GaussElim() const {
double maxEle;
int maxRow;
for (int i = 1; i <= m; i++) {
maxEle = fabs(entries[i-1]);
maxRow = i;
for (int k = i+1; k <= m; k++) {
if (fabs(entries[(k - 1) * n + i - 1]) > maxEle) {
maxEle = entries[(k - 1) * n + i - 1];
maxRow = k;
}
}
for (int a = 1; a <= m; a++) {
swap(entries[(i - 1) * m + a - 1], entries[(maxRow - 1) * m + a - 1]);
}
for (int b = i + 1; b <= n; b++) {
double c = -(entries[(b - 1) * m + i - 1]) / entries[(i - 1) * m + i - 1];
for (int d = i; d <= n; d++) {
if (i == d) {
entries[(b - 1) * m + d - 1] = 0;
}
else {
entries[(b - 1) * m + d - 1] = c * entries[(i - 1) * m + d - 1];
}
}
}
}
Matrix Result(n, m, entries);
return Result;
}
For starters, I'd suggest to drop the habit of starting the loops at 1 instead of the more idiomatic 0, it would simplify all of the formulas.
That said, this statement
else {
entries[(b - 1) * m + d - 1] = c * entries[(i - 1) * m + d - 1];
// ^^^
}
Looks suspicious. There should be a += (or a -=, depending on how you choose the sign of the pivot).
Another source of unexpected results is the way chosen to calculate the constant c:
double c = -(entries[(b - 1) * m + i - 1]) / entries[(i - 1) * m + i - 1];
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Even in case of partial pivoting, that value could be zero (or too small), due to the nature of the starting matrix, like in the posted example, or to numerical errors. In those cases, it would be preferable to just zero out all the remaining elements of the matrix.
What is an elegant algorithm to mix the elements two by two in two arrays (of potentially differing sizes) so that the items are drawn in an alternating fashion from each array, with the leftovers added to the end?
E.g.
Array 1: 0, 2, 4, 6
Array 2: 1, 3, 5, 7
Mixed array: 0, 2, 1, 3, 4, 6, 5, 7
Don't worry about null checking or any other edge cases, I'll handle those.
Here is my solution but it does not work properly:
for (i = 0; i < N; i++) {
arr[2 * i + 0] = A[i];
arr[2 * i + 1] = A[i+1];
arr[2 * i + 0] = B[i];
arr[2 * i + 1] = B[i+1];
}
It is very fiddly to calculate the array indices explicitly, especially if your arrays can be of different and possibly odd lengths. It is easier if you keep three separate indices, one for each array:
int pairwise(int c[], const int a[], size_t alen, const int b[], size_t blen)
{
size_t i = 0; // index into a
size_t j = 0; // index into b
size_t k = 0; // index into c
while (i < alen || j < blen) {
if (i < alen) c[k++] = a[i++];
if (i < alen) c[k++] = a[i++];
if (j < blen) c[k++] = b[j++];
if (j < blen) c[k++] = b[j++];
}
return k;
}
The returned value k will be equal to alen + blen, which is the implicit dimension of the result array c. Because the availability of a next item is checked for each array operation, this code works for arrays of different lengths and when the arrays have an odd number of elements.
You can use the code like this:
#define countof(x) (sizeof(x) / sizeof(*x))
int main()
{
int a[] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
int b[] = {-1, -2, -3, -4, -5, -6};
int c[countof(a) + countof(b)];
int i, n;
n = pairwise(c, a, countof(a), b, countof(b));
for (i = 0; i < n; i++) {
if (i) printf(", ");
printf("%d", c[i]);
}
puts("");
return 0;
}
(The example is in C, not C++, but your code doesn't use any of C++'s containers such as vector, so I've uses plain old ´int` arrays with explicit dimensions, which are the same in C and C++.)
Some notes on the loop you have;
You use the same position in the result array arr to assign two values to it (one from A and one from B).
The calculation for the index is possibly more complex than it needs to be, consider using two indexers given the two ways you are indexing over the arrays.
I would propose you use a loop that has two indexers (i and j) and explicitly loop over the four elements of the result (i.e. two position for each input array). In each loop you increment the indexers appropriately (by 4 for the output array and by 2 for the input arrays).
#include <iostream>
int main()
{
using namespace std;
constexpr int N = 4;
int A[N] = {2, 4, 6, 8};
int B[N] = {1, 3, 5, 7};
int arr[N*2];
for (auto i = 0, j=0; i < N*2; i+=4, j+=2) {
arr[i + 0] = A[j];
arr[i + 1] = A[j+1];
arr[i + 2] = B[j];
arr[i + 3] = B[j+1];
}
for (auto i =0; i < N*2; ++i) {
cout << arr[i] << ",";
}
cout << endl;
}
Note: you mention you take care of corner cases, so the code here requires the input arrays to be of the same length and that the length is even.
Try this:
for (i = 0; i < N; i += 2) {
arr[2 * i + 0] = A[i];
arr[2 * i + 1] = A[i+1];
arr[2 * i + 2] = B[i];
arr[2 * i + 3] = B[i+1];
}
Didn't consider any corner case, just fixing your concept. For example, check whether any array index out of bound occurs or not. You can run live here.
it should like this.
for (i = 0; i < N; i+=2) {
arr[2 * i + 0] = A[i];
arr[2 * i + 1] = A[i+1];
arr[2 * i + 2] = B[i];
arr[2 * i + 3] = B[i+1];
}
I've been using the openCV to do some block matching and I've noticed it's sum of squared differences code is very fast compared to a straight forward for loop like this:
int SSD = 0;
for(int i =0; i < arraySize; i++)
SSD += (array1[i] - array2[i] )*(array1[i] - array2[i]);
If I look at the source code to see where the heavy lifting happens, the
OpenCV folks have their for loops do 4 squared difference calculations at a time in each iteration of the loop. The function to do the block matching looks like this.
int64
icvCmpBlocksL2_8u_C1( const uchar * vec1, const uchar * vec2, int len )
{
int i, s = 0;
int64 sum = 0;
for( i = 0; i <= len - 4; i += 4 )
{
int v = vec1[i] - vec2[i];
int e = v * v;
v = vec1[i + 1] - vec2[i + 1];
e += v * v;
v = vec1[i + 2] - vec2[i + 2];
e += v * v;
v = vec1[i + 3] - vec2[i + 3];
e += v * v;
sum += e;
}
for( ; i < len; i++ )
{
int v = vec1[i] - vec2[i];
s += v * v;
}
return sum + s;
}
This calculation is for unsigned 8 bit integers. They perform a similar calculation for 32-bit floats in this function:
double
icvCmpBlocksL2_32f_C1( const float *vec1, const float *vec2, int len )
{
double sum = 0;
int i;
for( i = 0; i <= len - 4; i += 4 )
{
double v0 = vec1[i] - vec2[i];
double v1 = vec1[i + 1] - vec2[i + 1];
double v2 = vec1[i + 2] - vec2[i + 2];
double v3 = vec1[i + 3] - vec2[i + 3];
sum += v0 * v0 + v1 * v1 + v2 * v2 + v3 * v3;
}
for( ; i < len; i++ )
{
double v = vec1[i] - vec2[i];
sum += v * v;
}
return sum;
}
I was wondering if anyone had any idea if breaking a loop up into chunks of 4 like this might speed up code? I should add that there is no multithreading occuring in this code.
My guess is that this is just a simple implementation of unrolling the loop - it saves 3 additions and 3 compares on each pass of the loop, which can be a great savings if, for example, checking len involves a cache miss. The downside is that this optimization adds code complexity (e.g. the additional for loop at the end to finish the loop for the len % 4 items left if the length is not evenly divisible by 4) and, of course, it's an architecture-dependent optimization whose magnitude of improvement will vary by hardware/compiler/etc...
Still, it's straightforward to follow compared to most optimizations and will probably result in some sort of performance increase regardless of the architecture, so it's low risk to just throw it in there and hope for the best. Since OpenCV is such a well-supported chunk of code, I'm sure that someone instrumented these chunks of code and found them to be well worth it - as you yourself have done.
There is one obvious optimisation of your code, viz:
int SSD = 0;
for(int i = 0; i < arraySize; i++)
{
int v = array1[i] - array2[i];
SSD += v * v;
}