I am learning to use a c++ library to perform non-uniform FFT (NUFFT). The library provides 3 types of NUFFT.
Type 1: forward transform from a non-uniform x grid to a uniform k-space grid.
Type 2: backward transform from a uniform k-space grid to a non-uniform x grid
Type 3: from non-uniform to non-uniform
I tested the library in 1D by performing NUFFT on a test function sin(x) from -pi to pi using Type1 NUFFT, transform it back using Type2 NUFFT, and compare the output with sin(x). At first, I tested it on a uniform x grid, which shows a very small error. The error unfortunately is very large when the test is done on a non-uniform x grid.
Two possibilities:
My implementation of NUFFT is incorrect, but the implementation is rather simple, so I doubt if this is the case.
The author mentions that Type2 is NOT the inverse of Type1, so I believe that might be the problem. Since I am not an expert in NUFFT, I wonder if there is an alternative way to perform a forward/backward test with NUFFT?
My purpose is to develop a FFT Poisson solver on a irregular mesh, so I need to perform NUFFT forward and backward, and therefore important to overcome this problem. Besides using FINUFFT, any other suggestion is also welcome.
Thank you for reading.
The code is here for those who is interested.
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <complex>
#include <fftw3.h>
#include <functional>
#include "finufft/src/finufft.h"
using namespace std;
int main()
{
double pi = 3.14159265359;
int N = 128*2;
int i;
double X[N];
double L = 2*pi;
double dx = L/(N);
nufft_opts opts; finufft_default_opts(&opts);
complex<double> R = complex<double>(1.0,0.0); // the real unit
complex<double> in1[N], out1[N], out2[N];
for(i = 0; i < N; i++) {
//X[i] = -(L/2) + i*dx ; // uniform grid
X[i] = -(L/2) + pow(double(i)/N,7.0)*L; //non-uniform grid
in1[i] = sin(X[i])*R ;}
int ier = finufft1d1(N,X,in1,-1,1e-10,N,out1,opts); // type-1 NUFFT
int ier2 = finufft1d2(N,X,out2,+1,1e-10,N,out1,opts); // type-2 NUFFT
// checking the error
double erl1 = 0.;
for ( i = 0; i < N; i++) {
erl1 += fabs( in1[i].real() - out2[i].real()/(N))*dx;
}
std::cout<< erl1 <<" " << ier << " "<< ier2<< std::endl ; // error
return 0;
}
For some reason, the developer made an update on their page which answers exactly my question. https://finufft.readthedocs.io/en/latest/examples.html#periodic-poisson-solve-on-non-cartesian-quadrature-grid. In brief, their NUFFT code is NOT good in the case of fully adaptive scheme, but I would still provide an answer and code here for completeness.
There are two ingredients missing in my code.
(1) I need to multiply the function, sin(x), with a weight before using NUFFT. The weight comes from the determinant of the Jacobian in their 2D example, so the weight is simply the derivative the of the nonuniform coordinate with respect to the uniform coordinate dx/dksi for a 1D example.
(2) Nk must be smaller than N.
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <complex>
#include <fftw3.h>
#include <functional>
#include "finufft/src/finufft.h"
using namespace std;
int main()
{
double pi = 3.14159265359;
int N = 128*2;
int Nk = 32; // smaller than N
int i;
double X[N];
double L = 2*pi;
double dx = L/(N);
nufft_opts opts; finufft_default_opts(&opts);
complex<double> R = complex<double>(1.0,0.0); // the real unit
complex<double> in1[N], out1[N], out2[N];
for(i = 0; i < N; i++) {
ksi[i] = -(L/2) + i*dx ; //uniform grid
X[i] = -(L/2) + pow(double(i)/(N-1),6)*L; //nonuniform grid
}
dX = der(ksi,X,1); // your own derivative code
for(i = 0; i < N; i++) {
in1[i] = sin(X[i]) * dX[i] * R ; // applying weight
}
int ier = finufft1d1(N,X,in1,-1,1e-10,Nk,out1,opts); // type-1 NUFFT
int ier2 = finufft1d2(N,X,out2,+1,1e-10,Nk,out1,opts); // type-2 NUFFT
// checking the error
double erl1 = 0.;
for ( i = 0; i < N; i++) {
erl1 += fabs( in1[i].real() - out2[i].real()/(N))*dx;
}
std::cout<< erl1 <<" " << ier << " "<< ier2<< std::endl ; // error
return 0;
}
Related
For the following code which generates random numbers for Monte Carlo simulation, I need to receive the exact sum for each run, but this will not happen, although I have fixed the seed. I would appreciate it if anyone could point out the problem with this code
#include <cmath>
#include <random>
#include <iostream>
#include <chrono>
#include <cfloat>
#include <iomanip>
#include <cstdlib>
#include <omp.h>
#include <trng/yarn2.hpp>
#include <trng/mt19937_64.hpp>
#include <trng/uniform01_dist.hpp>
using namespace std;
using namespace chrono;
const double landa = 1;
const double exact_solution = landa / (pow(landa, 2) + 1);
double function(double x) {
return cos(x) / landa;
}
int main() {
int rank;
const int N = 1000000;
double sum = 0.0;
trng::yarn2 r[6];
for (int i = 0; i <6; i++)
{
r[i].seed(0);
}
for (int i = 0; i < 6; i++)
{
r[i].split(6,i);
}
trng::uniform01_dist<double> u;
auto start = high_resolution_clock::now();
#pragma omp parallel num_threads(6)
{
rank=omp_get_thread_num();
#pragma omp for reduction (+: sum)
for (int i = 0; i<N; ++i) {
//double x = distribution(g);
double x= u(r[rank]);
x = (-1.0 / landa) * log(1.0 - x);
sum = sum+function(x);
}
}
double app = sum / static_cast<double> (N);
auto end = high_resolution_clock::now();
auto diff=duration_cast<milliseconds>(end-start);
cout << "Approximation is: " <<setprecision(17) << app << "\t"<<"Time: "<< setprecision(17) << diff.count()<<" Error: "<<(app-exact_solution)<< endl;
return 0;
}
TL;DR The problem is two-fold:
Floating point addition is not associative;
You are generating different random number for each thread.
I need to receive the exact sum for each run, but this will not
happen, although I have fixed the seed. I would appreciate it if
anyone could point out the problem with this code
First, you have a race-condition on rank=omp_get_thread_num();, the variable rank is shared among all threads, to fix that you can declared the variable rank inside the parallel region, hence, making it private to each thread.
#pragma omp parallel num_threads(6)
{
int rank=omp_get_thread_num();
...
}
In your code, you should not expect that the value of the sum will be the same for different number of threads. Why ?
because you are adding doubles in parallel
double sum = 0.0;
...
#pragma omp for reduction (+: sum)
for (int i = 0; i<N; ++i) {
//double x = distribution(g);
double x= u(r[rank]);
x = (-1.0 / landa) * log(1.0 - x);
sum = sum+function(x);
}
and from What Every Computer Scientist Should Know about Floating
Point Arithmetic one can read:
Another grey area concerns the interpretation of parentheses. Due to roundoff errors, the associative laws of algebra do not necessarily hold for floating-point numbers. For example, the
expression (x+y)+z has a totally different answer than x+(y+z) when
x = 1e30, y = -1e30 and z = 1 (it is 1 in the former case, 0 in the
latter).
Hence, from that you conclude that floating point addition is not
associative, and the reason why for a different number of threads you might have different sum values.
You are generating different random values per thread:
for (int i = 0; i < 6; i++)
{
r[i].split(6,i);
}
Consequently, for different number of threads, the variable sum
gets different results as well.
As kindly point out by jérôme-richard in the comments:
Note that more precise algorithm like the Kahan summation can
significantly reduces the rounding issue while being still relatively
fast.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
Could someone please help me fix my program and explain why it s not working?
It's supposed to generate n points with 2 coordinates, which are both random numbers. The values themselves are random but have to scale the interval from 0 to some chosen value k. All the points have to be apart from each other by some radius which is taken to be 1.
For some reason my program doesn't even start. When I run it, Windows just says that the program is not responding and is trying to diagnose the problem.
Please simplify your explanation as much as possible since I'm a complete beginner and probably won't understand otherwise. Thanks a bunch in advance.
#include <iostream>
#include <vector>
#include <cstdlib>
#include <cmath>
#include <fstream>
using namespace std;
int main()
{
int n=5;
int k=100;
vector<vector<double>> a(n, vector<double> (2));
srand(132);
//a[0][1]=k*((float(rand()))/RAND_MAX);
//a[0][0]=k*((float(rand()))/RAND_MAX);
for(int i=0; i<n;){
a[i][0]=k*((float(rand()))/RAND_MAX);
a[i][1]=k*((float(rand()))/RAND_MAX);
for (int j=0; j<n; j+=1){
if (sqrt(pow((a[i][1]-a[j][1]),2)+pow((a[i][0]-a[j][0]),2))<=1){
i=i;
break;}
else if(j==n-1){
cout << a[i][0] << " " << a[i][1] << endl;
i+=1;}
}}
return 0;
}
Your code lacks structure. That's why it is hard to understand, as you now learned even for you.
I think a good start would be to write a class for point and two functions, one for random points and for point distance then all, especially the double loops, will become much easier to read and debug.
Look at this:
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
struct Point
{
Point() = default;
float x;
float y;
};
float scaled_random(int k)
{
return k*((float(rand()))/RAND_MAX);
}
float distance(const Point& a, const Point& b)
{
return sqrt(pow(a.y-b.y,2)+pow(a.x-b.x,2));
}
int main()
{
int n = 5;
int k = 100;
vector<Point> a(n);
srand(132);
for (int i=0; i<n; ) {
a[i].x = scaled_random(k);
a[i].y = scaled_random(k);
for (int j=0; j<n; j+=1) {
if (distance(a[i], a[j]) <= 1) {
i = i;
break;
} else if (j == n-1) {
cout << a[i].x << " " << a[i].y << endl;
i += 1;
}
}
}
return 0;
}
The issue is still the same, but it has now more structure, better formatting and superfluous includes removed.
Maybe you can see the problem yourself much better this way.
The first time through your code i and j will both be zero, this means a[i][1] - a[j][1] and a[i][0] - a[j][0] are zero, this resets i to 0, breaks the loop and starts again resulting in an infinite loop.
Checking i != j fixes the problem:
if (i != j && sqrt(pow((a[i][1] - a[j][1]), 2) + pow((a[i][0] - a[j][0]), 2)) <= 1) {
Your code might be better structured as:
#include <iostream>
#include <vector>
#include <cstdlib>
#include <cmath>
#include <algorithm>
int main()
{
int n = 5;
int k = 100;
std::vector<std::vector<double>> a(n, std::vector<double>(2));
srand(132);
for (int i = 0; i < n; i++) {
auto end = a.begin() + i;
do
{
a[i][0] = k * ((float(rand())) / RAND_MAX);
a[i][1] = k * ((float(rand())) / RAND_MAX);
}
while (end != std::find_if(a.begin(), end, [&](const std::vector<double>& element)
{
return sqrt(pow((a[i][1] - element[1]), 2) + pow((a[i][0] - element[0]), 2)) <= 1;
}));
std::cout << a[i][0] << " " << a[i][1] << "\n";
}
return 0;
}
Using this code only the values before i are checked each time rather than all of the values.
rand should be avoided in modern c++, see Why is the use of rand() considered bad?
As the elements of your vector always have 2 elements it'd be better to use std::pair or std::array.
pow may be quite an inefficient way to square two numbers. The sqrt could be avoided by squaring your distance instead.
Using the above points your code could become:
#include <iostream>
#include <vector>
#include <cstdlib>
#include <cmath>
#include <algorithm>
#include <array>
#include <random>
using point = std::array<double, 2>;
double distanceSquared(const point& a, const point& b)
{
auto d0 = a[0] - b[0];
auto d1 = a[1] - b[1];
return d0 * d0 + d1 * d1;
}
int main()
{
int n = 5;
int k = 100;
std::vector<point> a(n);
std::random_device rd;
std::mt19937_64 engine(rd());
std::uniform_real_distribution<double> dist(0, k);
for (int i = 0; i < n; i++) {
auto end = a.begin() + i;
do
{
a[i][0] = dist(engine);
a[i][1] = dist(engine);
}
while (end != std::find_if(a.begin(), end, [&](const point& element)
{
return distanceSquared(a[i], element) <= 1;
}));
std::cout << a[i][0] << " " << a[i][1] << "\n";
}
return 0;
}
I have a small script for manipulating a sparse matrix in C++. It works perfectly fine except taking too much time. Since I'm doing this manipulation over and over, it is critical to speed it up. I appreciate any idea.Thanks
#include <stdio.h> /* printf, scanf, puts, NULL */
#include <stdlib.h> /* srand, rand */
#include <time.h> /* time */
#include <iostream> /* cout, fixed, scientific */
#include <string>
#include <cmath>
#include <vector>
#include <list>
#include <string>
#include <sstream> /* SJW 08/09/2010 */
#include <fstream>
#include <Eigen/Dense>
#include <Eigen/Sparse>
using namespace Eigen;
using namespace std;
SparseMatrix<double> MatMaker (int n1, int n2, double prob)
{
MatrixXd A = (MatrixXd::Random(n1, n2) + MatrixXd::Ones(n1, n2))/2;
A = (A.array() > prob).select(0, A);
return A.sparseView();
}
////////////////This needs to be optimized/////////////////////
int SD_func(SparseMatrix<double> &W, VectorXd &STvec, SparseMatrix<double> &Wo, int tauR, int tauD)
{
W = W + 1/tauR*(Wo - W);
for (int k = 0; k < W.outerSize(); ++k)
for (SparseMatrix<double>::InnerIterator it(W, k); it; ++it)
W.coeffRef(it.row(),it.col()) = it.value() * (1-STvec(it.col())/tauD);
return 1;
}
int main ()
{
SparseMatrix<double> Wo = MatMaker(5000, 5000, 0.1);
SparseMatrix<double> W = MatMaker(5000, 5000, 0.1);
VectorXd STvec = VectorXd::Random(5000);
clock_t tsd1,tsd2;
float Timesd = 0.0;
tsd1 = clock();
///////////////////////////////// Any way to speed up this function???????
SD_func(W, STvec, Wo, 8000, 50);
//////////////////////////////// ??????????
tsd2 = clock();
Timesd += (tsd2 - tsd1);
cout<<"SD time: " << Timesd / CLOCKS_PER_SEC << " s" << endl;
return 0;
}
The most critical performance improvement (IMO) you can make is to not use W.coeffRef(it.row(),it.col()). It performs a binary search in W for the element each time. As you are already using SparseMatrix<double>::InnerIterator it(W, k); it is very simple to change your function to skip the binary search:
int SD_func_2(SparseMatrix<double> &W, VectorXd &STvec, SparseMatrix<double> &Wo, int tauR, int tauD)
{
W = W + 1/tauR*(Wo - W);
double tauDInv = 1./tauD;
for (int k = 0; k < W.outerSize(); ++k)
for (SparseMatrix<double>::InnerIterator it(W, k); it; ++it)
it.valueRef() *= (1-STvec(it.col())*tauDInv);
return 1;
}
This results in a roughly x3 speedup. Note that I've incorporated #dshin's comment that multiplying is faster than division, however the performance improvement is about 90% removing the binary search, 10% multiplication vs. division.
I have been beating my head against the wall on this DFT. It should print out: 8,0,0,0,0,0,0,0 but instead I get 8 and then very very tiny numbers. Are these rounding errors? Is there anything I can do? My Radix2 FFT gives correct results, it seems silly a DFT could not also work.
I started with complex numbers so I know there is a good bit missing, I tried to strip it down to illustrate the problem.
#include <cstdlib>
#include <math.h>
#include <iostream>
#include <complex>
#include <cassert>
#define SIZE 8
#define M_PI 3.14159265358979323846
void fft(const double src[], double dst[], const unsigned int n)
{
for(int i=0; i < SIZE; i++)
{
const double ph = -(2*M_PI) / n;
const int gid = i;
double res = 0.0f;
for (int k = 0; k < n; k++) {
double t = src[k];
const double val = ph * k * gid;
double cs = cos(val);
double sn = sin(val);
res += ((t * cs) - (t * sn));
int a = 1;
}
dst[i] = res;
std::cout << dst[i] << std::endl;
}
}
int main(void)
{
double array1[SIZE];
double array2[SIZE];
for(int i=0; i < SIZE; i++){
array1[i] = 1;
array2[i] = 0;
}
fft(array1, array2, SIZE);
return 666;
}
An FFT can actually produce more accurate results than a straight DFT calculation, as the fewer arithmetic ops usually allow fewer opportunities for arithmetic quantization errors to accumulate. There's a paper by one of the FFTW authors on this topic.
Since the DFT/FFT deal with a transcendental basis function, the results will never (except perhaps in a few special cases, or by lucky accident) be exactly correct using any non-symbolic and finite computer number format. So values very close (within a few LSB) to zero should simply be ignored as noise, or considered to be the same as zero.
I wrote a multithreaded simulated annealing program but its not running. I am not sure if the code is correct or not. The code is able to compile but when i run the code it crashes. Its just a run time error.
#include <stdio.h>
#include <time.h>
#include <iostream>
#include <stdlib.h>
#include <math.h>
#include <string>
#include <vector>
#include <algorithm>
#include <fstream>
#include <ctime>
#include <windows.h>
#include <process.h>
using namespace std;
typedef vector<double> Layer; //defines a vector type
typedef struct {
Layer Solution1;
double temp1;
double coolingrate1;
int MCL1;
int prob1;
}t;
//void SA(Layer Solution, double temp, double coolingrate, int MCL, int prob){
double Rand_NormalDistri(double mean, double stddev) {
//Random Number from Normal Distribution
static double n2 = 0.0;
static int n2_cached = 0;
if (!n2_cached) {
// choose a point x,y in the unit circle uniformly at random
double x, y, r;
do {
// scale two random integers to doubles between -1 and 1
x = 2.0*rand()/RAND_MAX - 1;
y = 2.0*rand()/RAND_MAX - 1;
r = x*x + y*y;
} while (r == 0.0 || r > 1.0);
{
// Apply Box-Muller transform on x, y
double d = sqrt(-2.0*log(r)/r);
double n1 = x*d;
n2 = y*d;
// scale and translate to get desired mean and standard deviation
double result = n1*stddev + mean;
n2_cached = 1;
return result;
}
} else {
n2_cached = 0;
return n2*stddev + mean;
}
}
double FitnessFunc(Layer x, int ProbNum)
{
int i,j,k;
double z;
double fit = 0;
double sumSCH;
if(ProbNum==1){
// Ellipsoidal function
for(j=0;j< x.size();j++)
fit+=((j+1)*(x[j]*x[j]));
}
else if(ProbNum==2){
// Schwefel's function
for(j=0; j< x.size(); j++)
{
sumSCH=0;
for(i=0; i<j; i++)
sumSCH += x[i];
fit += sumSCH * sumSCH;
}
}
else if(ProbNum==3){
// Rosenbrock's function
for(j=0; j< x.size()-1; j++)
fit += 100.0*(x[j]*x[j] - x[j+1])*(x[j]*x[j] - x[j+1]) + (x[j]-1.0)*(x[j]-1.0);
}
return fit;
}
double probl(double energychange, double temp){
double a;
a= (-energychange)/temp;
return double(min(1.0,exp(a)));
}
int random (int min, int max){
int n = max - min + 1;
int remainder = RAND_MAX % n;
int x;
do{
x = rand();
}while (x >= RAND_MAX - remainder);
return min + x % n;
}
//void SA(Layer Solution, double temp, double coolingrate, int MCL, int prob){
void SA(void *param){
t *args = (t*) param;
Layer Solution = args->Solution1;
double temp = args->temp1;
double coolingrate = args->coolingrate1;
int MCL = args->MCL1;
int prob = args->prob1;
double Energy;
double EnergyNew;
double EnergyChange;
Layer SolutionNew(50);
Energy = FitnessFunc(Solution, prob);
while (temp > 0.01){
for ( int i = 0; i < MCL; i++){
for (int j = 0 ; j < SolutionNew.size(); j++){
SolutionNew[j] = Rand_NormalDistri(5, 1);
}
EnergyNew = FitnessFunc(SolutionNew, prob);
EnergyChange = EnergyNew - Energy;
if(EnergyChange <= 0){
Solution = SolutionNew;
Energy = EnergyNew;
}
if(probl(EnergyChange ,temp ) > random(0,1)){
//cout<<SolutionNew[i]<<endl;
Solution = SolutionNew;
Energy = EnergyNew;
cout << temp << "=" << Energy << endl;
}
}
temp = temp * coolingrate;
}
}
int main ()
{
srand ( time(NULL) ); //seed for getting different numbers each time the prog is run
Layer SearchSpace(50); //declare a vector of 20 dimensions
//for(int a = 0;a < 10; a++){
for (int i = 0 ; i < SearchSpace.size(); i++){
SearchSpace[i] = Rand_NormalDistri(5, 1);
}
t *arg1;
arg1 = (t *)malloc(sizeof(t));
arg1->Solution1 = SearchSpace;
arg1->temp1 = 1000;
arg1->coolingrate1 = 0.01;
arg1->MCL1 = 100;
arg1->prob1 = 3;
//cout << "Test " << ""<<endl;
_beginthread( SA, 0, (void*) arg1);
Sleep( 100 );
//SA(SearchSpace, 1000, 0.01, 100, 3);
//}
return 0;
}
Please help.
Thanks
Avinesh
As leftaroundabout pointed out, you're using malloc in C++ code. This is the source of your crash.
Malloc will allocate a block of memory, but since it was really designed for C, it doesn't call any C++ constructors. In this case, the vector<double> is never properly constructed. When
arg1->Solution1 = SearchSpace;
Is called, the member variable "Solution1" has an undefined state and the assignment operator crashes.
Instead of malloc try
arg1 = new t;
This will accomplish roughly the same thing but the "new" keyword also calls any necessary constructors to ensure the vector<double> is properly initialized.
This also brings up another minor issue, that this memory you've newed also needs to be deleted somewhere. In this case, since arg1 is passed to another thread, it should probably be cleaned up like
delete args;
by your "SA" function after its done with the args variable.
While I don't know the actual cause for your crashes I'm not really surprised that you end up in trouble. For instance, those "cached" static variables in Rand_NormalDistri are obviously vulnerable to data races. Why don't you use std::normal_distribution? It's almost always a good idea to use standard library routines when they're available, and even more so when you need to consider multithreading trickiness.
Even worse, you're heavily mixing C and C++. malloc is something you should virtually never use in C++ code – it doesn't know about RAII, which is one of the few intrinsically safe things you can cling onto in C++.