I have a quadratic programming optimization problem that I am solving with qpOASES. In there exists a matrix X that I need to precondition, so I am using Armadillo and the routine arma::pinv from there in order to calculate the Moor-Penrose pseudoinverse.
The problem: I write the matrix X in a file , and then I read it in a separate program (say test.cpp) that does not depend in any way to qpOASES. The routine pinv runs fine.
#include <iostream>
#include <fstream>
#include <armadillo>
#include <string>
using namespace std;
using namespace arma;
int main(){
// Read design matrix.
int NRows = 199;
int NFields = 26;
string flname_in = "chol_out_2_data";
mat A (NRows,NFields);
for (int i=0; i < NRows; ++i)
for (int j=0; j < NFields; ++j)
myin >> A(i,j) ;
// Calculate pseudoinverse
mat M;
pinv(M,A); // <========= THIS fails when I use flag: -lqpOASES
}
When I include the same routine in the file where I perform the QP optimization (say true_QP.cpp), I get a runtime error, due to pinv not being able to calculate the pseudo inverse. I've done extensive tests, the file is read in OK and the values are the same.
I've tracked down the problem that is a conflict in the following way: I compiled the program that does not depend in any way on qpOASES (test.cpp - as described above) also with the flag -lqpOASES and then, the code gives run time error.
That is,compile:
g++ test.cpp -o test.xxx -larmadillo
runs fine:
./test.xxx
compile:
g++ test.cpp -o test.xxx -larmadillo -lqpOASES
throws exception (due to failure of calculating pinv):
./test.xxx
Therefore I suspect some conflict - it seems that using -lqpOASES affects some flag in armadillo also? Any ideas? Is there some dependency in LAPACK/BLAS or some flag internally that may change the setup of Armadillo? Thank you for your time.
Here is the documentation for the arma::pinv function:
http://arma.sourceforge.net/docs.html#pinv
I have resolved the issue by calculating pinv from Eigen, instead of Armadillo.
The function definition I used for Eigen, based on this bug report:
http://eigen.tuxfamily.org/bz/show_bug.cgi?id=257
is:
template<typename _Matrix_Type_>
Eigen::MatrixXd pinv(const _Matrix_Type_ &a, double epsilon =std::numeric_limits<double>::epsilon())
{
Eigen::JacobiSVD< _Matrix_Type_ > svd(a ,Eigen::ComputeThinU | Eigen::ComputeThinV);
double tolerance = epsilon * std::max(a.cols(), a.rows()) *svd.singularValues().array().abs()(0);
return
svd.matrixV() * (svd.singularValues().array().abs() > tolerance).select(svd.singularValues().array().inverse(), 0).matrix().asDiagonal() * svd.matrixU().adjoint();
}
Related
I am new to c++ and the Eigen library. I want to perform LU decomposition (partial pivoting) on a matrix of size 1815 X 1815, with complex entries. However, the performance of my code is bad, the LU decomposition is taking 77.2852 seconds, compared to MATLAB taking only 0.140946 seconds. Please find the attached code. Any advice on how I can improve the code? Please note that in the first part of the code, I am importing the matrix from a file with entries: a + bi, where a and b are complex numbers. The matrix file was generated from MATLAB. Thank you.
#include <iostream>
#include <Eigen/Dense>
#include <fstream>
#include <complex>
#include <string>
#include <chrono>
using namespace std;
using namespace std::chrono;
using namespace Eigen;
int main(){
int mat_sz = 1815; // size of matrix
MatrixXcd c_mat(mat_sz,mat_sz); // initialize eigen matrix
double re, im;
char sign;
string entry;
ifstream myFile("A_mat"); // format of entries : a + bi. 'a' and 'b' are complex numbers
//Import and assign matrix to an Eigen matrix
for (int i = 0; i < mat_sz; i++){
for (int j = 0; j < mat_sz; j++){
myFile >> entry;
stringstream stream(entry);
stream >> re >> sign >> im;
c_mat(i,j) = {re, (sign == '-') ? -im : im}; // Assigning matrix entries
}
}
// LU Decomposition
auto start = high_resolution_clock::now();
c_mat.partialPivLu(); // Solving equation through partial LU decomposition
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
double million = 1000000;
cout << "Time taken by function: " << duration.count()/million << " seconds" << endl;
}
I'll summarize the comments into an answer.
When you feel that Eigen is running slow there are a list of things that should be verified.
Are optimizations turned on?
Eigen is a template heavy library that does a lot of compile time checks and that should be optimized out. If optimizations are not on, none of it gets inlined and many pointless function calls are made. Turning on even the lowest level of optimizations usually alleviates most of this (-O1 or higher in gcc/clang, /O1 or higher in MSVC). General notes on optimizations can be found here.
Am I utilizing all the hardware options?
A lot of code in Eigen can be vectorized if allowed. Make sure that this is enabled with flags turning on SSE/AVX/etc. if the target hardware supports it. Enable FMA if available as well. There's a placeholder doc here.
Enable multithreading
If your process/hardware allow, consider enabling OpenMP to allow Eigen to utilize multiple cores for some of the operations.
Use the right precision
In many applications, only the first few digits matter. If this is the case in your application, consider using single precision instead of double precision.
Link to a fine tuned library
In the end, Eigen spits out some finely built C++ code and relies on the compiler to handle most of the optimizations itself. In some cases, a more finely tuned library such as MKL may improve performance. Eigen can link to MKL to squeeze a bit more speed out of the hardware.
I'm trying to write an R wrapper for the FINUFFT routines for calculating the FFT of an unevenly sampled series. I have virtually no experience with C/C++, so I'm working from an example that compares the traditional Fourier transform to the NUFFT. The example code follows.
// this is all you must include for the finufft lib...
#include "finufft.h"
#include <complex>
// also needed for this example...
#include <stdio.h>
#include <stdlib.h>
using namespace std;
int main(int argc, char* argv[])
/* Simple example of calling the FINUFFT library from C++, using plain
arrays of C++ complex numbers, with a math test. Barnett 3/10/17
Double-precision version (see example1d1f for single-precision)
Compile with:
g++ -fopenmp example1d1.cpp -I ../src ../lib-static/libfinufft.a -o example1d1 -lfftw3 -lfftw3_omp -lm
or if you have built a single-core version:
g++ example1d1.cpp -I ../src ../lib-static/libfinufft.a -o example1d1 -lfftw3 -lm
Usage: ./example1d1
*/
{
int M = 1e6; // number of nonuniform points
int N = 1e6; // number of modes
double acc = 1e-9; // desired accuracy
nufft_opts opts; finufft_default_opts(&opts);
complex<double> I = complex<double>(0.0,1.0); // the imaginary unit
// generate some random nonuniform points (x) and complex strengths (c):
double *x = (double *)malloc(sizeof(double)*M);
complex<double>* c = (complex<double>*)malloc(sizeof(complex<double>)*M);
for (int j=0; j<M; ++j) {
x[j] = M_PI*(2*((double)rand()/RAND_MAX)-1); // uniform random in [-pi,pi)
c[j] = 2*((double)rand()/RAND_MAX)-1 + I*(2*((double)rand()/RAND_MAX)-1);
}
// allocate output array for the Fourier modes:
complex<double>* F = (complex<double>*)malloc(sizeof(complex<double>)*N);
// call the NUFFT (with iflag=+1): note N and M are typecast to BIGINT
int ier = finufft1d1(M,x,c,+1,acc,N,F,opts);
int n = 142519; // check the answer just for this mode...
complex<double> Ftest = complex<double>(0,0);
for (int j=0; j<M; ++j)
Ftest += c[j] * exp(I*(double)n*x[j]);
int nout = n+N/2; // index in output array for freq mode n
double Fmax = 0.0; // compute inf norm of F
for (int m=0; m<N; ++m) {
double aF = abs(F[m]);
if (aF>Fmax) Fmax=aF;
}
double err = abs(F[nout] - Ftest)/Fmax;
printf("1D type-1 NUFFT done. ier=%d, err in F[%d] rel to max(F) is %.3g\n",ier,n,err);
free(x); free(c); free(F);
return ier;
}
Much of this I don't need, such as generating the test series and comparing to the traditional FFT. Further, I want to return the values of the transform, not just an error code indicating success. Below is my code.
#include "finufft.h"
#include <complex>
#include <Rcpp.h>
#include <stdlib.h>
using namespace Rcpp;
using namespace std;
// [[Rcpp::export]]
ComplexVector finufft(int M, NumericVector x, ComplexVector c, int N) {
// From example code for finufft, sets precision and default options
double acc = 1e-9;
nufft_opts opts; finufft_default_opts(&opts);
// allocate output array for the finufft routine:
complex<double>* F = (complex<double>*)malloc(sizeof(complex<double>*)*N);
// Change vector inputs from R types to C++ types
double* xd = as< double* >(x);
complex<double>* cd = as< complex<double>* >(c);
// call the NUFFT (with iflag=-1): note N and M are typecast to BIGINT
int ier = finufft1d1(M,xd,cd,-1,acc,N,F,opts);
ComplexVector Fd = as<ComplexVector>(*F);
return Fd;
}
When I try to source this in Rstudio, I get the error "no matching function for call to 'as(std::complex<double>*&)'", pointing to the line declaring Fd towards the end. I believe the error indicates that either the function 'as' isn't defined (which I know is false), or the argument to 'as' isn't the correct type. The examples here include one using 'as' to convert to a NumericVector, so unless there's some complication with complex values I don't see why it should be a problem here.
I know there are potential problems using two namespaces, but I don't believe that's the issue here. My best guess is that there's an issue with how I'm trying to use pointers, but I lack the experience to identify it and I can't find any similar examples online to guide me.
Rcpp::as<T> converts from an R data type (SEXP) to a C++ data type, e.g. Rcpp::ComplexVector. This does not fit your situation, where you try to convert from a C-style array to C++. Fortunately Rcpp::Vector, which is the basis for Rcpp::ComplexVector, has a constructor for this task: Vector (InputIterator first, InputIterator last). For the other direction (going from C++ to C-style array) you can use vector.begin() or &vector[0].
However, one needs a reinterpret_cast to convert between Rcomplex* and std::complex<double>*. That should cause no problems, though, since Rcomplex (a.k.a. complex double in C) and std::complex<doulbe> are compatible.
A minimal example:
#include <Rcpp.h>
#include <complex>
using namespace Rcpp;
// [[Rcpp::export]]
ComplexVector foo(ComplexVector v) {
std::complex<double>* F = reinterpret_cast<std::complex<double>*>(v.begin());
int N = v.length();
// do something with F
ComplexVector Fd(reinterpret_cast<Rcomplex*>(F),
reinterpret_cast<Rcomplex*>(F + N));
return Fd;
}
/*** R
set.seed(42)
foo(runif(4)*(1+1i))
*/
Result:
> Rcpp::sourceCpp('56675308/code.cpp')
> set.seed(42)
> foo(runif(4)*(1+1i))
[1] 0.9148060+0.9148060i 0.9370754+0.9370754i 0.2861395+0.2861395i 0.8304476+0.8304476i
BTW, you can move these reinterpret_casts out of sight by using std::vector<std::complex<double>> as argument and return types for your function. Rcpp does the rest for you. This also helps getting rid of the naked malloc:
#include <Rcpp.h>
// dummy function with reduced signature
int finufft1d1(int M, double *xd, std::complex<double> *cd, int N, std::complex<double> *Fd) {
return 0;
}
// [[Rcpp::export]]
std::vector<std::complex<double>> finufft(int M,
std::vector<double> x,
std::vector<std::complex<double>> c,
int N) {
// allocate output array for the finufft routine:
std::vector<std::complex<double>> F(N);
// Change vector inputs from R types to C++ types
double* xd = x.data();
std::complex<double>* cd = c.data();
std::complex<double>* Fd = F.data();
int ier = finufft1d1(M, xd, cd, N, Fd);
return F;
}
I am trying to implement a DFT using the fftw library (link to FFTW documentation).
All the libraries have been correctly linked, and the project builds just fine. However, the code doesn't run the moment any function from the fftw library is called.
#include <iostream>
#include <fftw3.h>
using namespace std;
int main() {
int vectorSize = 100;
cout << vectorSize << endl;
fftw_complex vec[vectorSize], vecOut[vectorSize];
for(int i = 0; i < vectorSize; i++) {
vec[i][0] = i;
vec[i][1] = 1;
}
// Call to function to create an FFT plan
fftw_plan plan = fftw_plan_dft_1d(vectorSize, vec, vecOut, FFTW_FORWARD, FFTW_ESTIMATE);
cout << "test" << endl;
return 0;
}
If I comment the line where the fftw_plan is instantiated, the code outputs 100 and "test" as expected. There are no issues in the build, as far as I can tell. I haven't really been able to find any post which describes a similar problem.
I am running this on eclipse, using MinGW and the 32 bit version of the pre-compiled binary available for windows (download link).
Any help would be really appreciated :)
Fftw requires input/output to be 16-byte aligned. When you declare the arrays on stack, this can't be guaranteed. So you need to call fftw_malloc or other function to allocate the arrays. Also, your code only creates the plan but doesn't execute it, thus no fft is carried out on the input data.
my code computes in a loop exp of a large data array and when I tried to do Perf to profile the code, I get to see
Symbols conflicting in multiple files
/build/buildd/eglibc-2.19/math/../sysdeps/ieee754/dbl-64/e_exp.c
/build/buildd/eglibc-2.19/math/../sysdeps/ieee754/dbl-64/w_exp.c
I am using C++ and included "math.h", and a flag to specify std=c++11.
I am not sure why this conflict happens!
A simple snippet to generate this issue is to compile the below(release config) and run perf..
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
double d = 0;
double e = 0;
for (int i=0;i<8192;i++)
for (int j=0;j<=255;j++)
{
e = ((j/255)-0.5)*((j/255)-0.5);
e = exp(e);
d = d + e;
}
cout<<d;
return 0;
}
I am using eclipse to run profiling with perf and the messages are seen in Perf profile view.
Please help, it appears that the conflicts change with the magnitude of the value that goes into exp..Thank you.
Adding more information after exchanging some chat with Alfred
If we are using commandline instead of eclipse, then with perf report, two calls to exp can be seen
1. ieee754_exp_sse2
2. __GI__exp
These two are shown as conflicts in the eclipse GUI with an error marked on these lines
I have a simple program, which generates (using Boost) some initial velocities and position, and calculates the time it takes to propagate a certain distance. Based on the transverse distances (x, y), the final axial (z) velocity is added to a vector. Here is the simple program:
#include <iostream>
#include <boost/random.hpp>
#include <boost/random/normal_distribution.hpp>
using namespace std;
int main()
{
boost::mt19937 engine(static_cast<unsigned int>(time(0)));
boost::normal_distribution<double> nd(0.0, 1.0);
boost::variate_generator< boost::mt19937, boost::normal_distribution<double> > normal_std_one(engine, nd);
double coordX, coordY, coordZ, time;
double velX, velY, velZ;
const double factor = 0.01;
const double distance = 15.0;
vector<double> cont;
int i;
for(i=0; i<1000000000; i++)
{
coordX = factor*normal_std_one();
coordY = factor*normal_std_one();
coordZ = 0.0;
velX = normal_std_one();
velY = normal_std_one();
velZ = 20.0*normal_std_one()+300;
time = distance/velZ;
coordX += velX*time;
coordY += velY*time;
if(sqrt(coordX*coordX + coordY*coordY) < 0.02)
{
cont.push_back(velZ);
}
}
cout << cont.size() << endl;
return 0;
}
I thought a nice addition would be to parallelize the for-loop using OpenMP. This I do by adding the following line just before the loop is initiated:
#pragma omp parallel for
In addition, I have added -fopenmp to the compiler options and `-fopenmp* to the linker settings. My program compiles and links without errors, but when I execute the file I get the message:
Process terminated with status -1073741819 (0 minutes, 2 seconds)
It is not clear to me what I have done wrong here. I am using Windows and g++ (through Code::Blocks IDE).
I post this as an answer but not comment just to accumulate the results and to avoid long list of comments. It works with parallel_for from Microsoft's PPL if you handle std:vector's size properly to avoid out-of-range exception. But the problem is when i exceeds ~20000, the boost::variate_generator cannot handle multiple requests generating APPLICATION_FAULT_INVALID_POINTER_READ error with program's crash.
Update: When used without boost::variate_generator (simply assigning a value to vector's index) on dual core notebook it runs without errors but shows the result the opposite to expected - sequential code runs faster than multithreaded with parallel_for.
You can't use cont.push_back unsynchronised across multiple threads. It's not thread safe. You will need to use a different container, or use some kind of mutex lock on access. You may also need to do something to preserve the order they go into the container if that matters.