I am using Eigen for some simulations. I get segmentation fault error (more precisely Segmentation fault (core dumped) with no other details) whenever I include even the smallest overload Eigen operation (even if I have x=y where x,y are Eigen::VectorXd of the same size). What make this very strange is that it only happens if I have the matrix operations in certain functions.
Let me show you:
//main.cu
#include <Eigen/Dense>
#include "def.h"
using namespace std;
int main(int argc, char *argv[])
{
params p;
int ns;
//some code here
MatrixXR A(ns,ns);
VectorXR u(ns);
VectorXR v(ns);
VectorXR unew(ns);
VectorXR aux(ns);
VectorXR vnew(ns);
VectorXR vcouple(ns);
VectorXR q(ns);
Real* output;
output=new Real[output_size];
//output_size is a number depending on the system I am simulating, usually about 1000000.
CPUsim(output,p,u,v,A,unew,vnew,q,aux,vcouple);
delete [] &(output[0]);
return 0;
}
//def.h
#ifndef DEF_H_
#include <Eigen/Dense>
#define DEF_H_
#ifdef DOUBLE
typedef double Real;
typedef Eigen::MatrixXd MatrixXR;
typedef Eigen::VectorXd VectorXR;
#else
typedef float Real;
typedef Eigen::MatrixXf MatrixXR;
typedef Eigen::VectorXf VectorXR;
#endif
struct params
{
//some parameters
};
#endif
//sim.h
#ifndef SIM_H_
#define SIM_H_
#include "def.h"
#include <Eigen/Dense>
void CPUsim(Real* output,params &p, VectorXR& u,VectorXR& v,MatrixXR& A,VectorXR& unew,VectorXR& vnew,VectorXR& q,VectorXR& aux,VectorXR& vcouple);
//other functions
#endif
//sim.cu
#include "sim.h"
#include "coupling.h"
//some functions
void CPUsim(Real* output,params &p, VectorXR& u,VectorXR& v,MatrixXR& A,VectorXR& unew,VectorXR& vnew,VectorXR& q,VectorXR& aux,VectorXR& vcouple)
{
//some code
coupling(u,unew,v,vnew,p,A,vcouple,aux,no);
}
//coupling.h
#ifndef COUPLING_H_
#define COUPLING_H_
#include <Eigen/Dense>
#include "def.h"
//some declarations
void coupling(VectorXR& u,VectorXR& unew,VectorXR& v,VectorXR& vnew,params& p,MatrixXR& A,VectorXR& vcouple,VectorXR& aux,noise& no);
//coupling.cpp
void coupling(VectorXR& u,VectorXR& unew,VectorXR& v,VectorXR& vnew,params& p,MatrixXR& A,VectorXR& vcouple,VectorXR& aux,noise& no)
{
vcouple=A*v;
//some other stuff
}
Now, some explanations:
If I have vcouple=vcouple in coupling, I get no error, if I have vcouple=v, I do get the error. I get no errors if I have vcouple=A*v in main or in CPUsim. Somebody recommended defining 'EIGEN_DONT_ALIGN', but that works only in some cases (i.e. for the same ns, but different values for the elements of the matrices and vectors, it might show the error or it might not). Do you happen to know what might be causing this error?
BTW, I use the nvcc compiler because I am using CUDA for some parts of the simulation. However, Eigen is used only for portions of the code that run entirely on the CPU. For the host compiler, I use GCC 5.4.1 and I have ubuntu 16.04.
Edit:
The error disappears if I don't store the result (i.e. just A*v; instead of vcouple=A*v;)
I finally found the answer. Apparently nvcc and gcc align Eigen arrays differently. This explains why there was no problem when calculating (and assigning) the matrix product in the .cu files while getting an error in the .cpp ones. Simply changing the extension of the coupling.cpp file to .cu solved the problem.
More details can be found here:
https://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2016/06/msg00003.html
Related
I first created a CSR matrix using the mkl sparse matrix module.
This part is normal and can be created.
Then I used mkl_sparse_s_add for matrix addition, and then the program reported an error.
The content of the error report is
Exception thrown at 0x00007FFDA75F478C (KernelBase.dll) (in mkl.exe): 0xC06D007E: Module not found (parameter: 0x000000CEB30FF5B0).
Here's my code
#include <stdio.h>
#include <assert.h>
#include <math.h>
#include "mkl_spblas.h"
#include <mkl.h>
int main() {
MKL_INT rowPtr[6] = { 0,3,5,8,11,13 };
MKL_INT columns[13] = { 0,1,3,0,1,2,3,4,0,2,3,1,4 };
float values[13] = { 1,-1,-3,-2,5,4,6,4,-4,2,7,8,-5 };
sparse_matrix_t elementMatrix2; sparse_matrix_t elementMatrix3;
mkl_sparse_s_create_csr(&elementMatrix2,SPARSE_INDEX_BASE_ZERO,5,5,rowPtr,rowPtr+1,columns,values);
mkl_sparse_s_add(SPARSE_OPERATION_NON_TRANSPOSE, elementMatrix2, 1, elementMatrix2, &elementMatrix3);
}
Helps me run the program normally
Anyone else who is looking into this with similar issues can refer to the Intel communities for the solution as this query has been addressed here.
In this case, the issue got resolved after reinstalling the MKL.
I was prepossessing data in C++ using the Armadillo library. The program end product is a ucube, which is a cube filled with unsigned integers. After its run, I want to load the ucube to R to perform some final statistical tests. To do so, I made a C++ function that load the ucube returning an array.
But it does not work!
I got the following warning: "warning: Cube::load(): incorrect header in B.bin" and the program returns a 0x0x0 array.
Trying to find why, I made a toy C++ program, which works fine. It is able to load the cubes without any problem.
#include <iostream>
#include <armadillo>
using namespace arma;
void read_cubes(char const* A, char const* B){
cube C;
ucube D;
C.load(A, arma_binary);
D.load(B, arma_binary);
}
int main(int argc, char** argv){
cube A = randu<cube>(5,5,5);
ucube B = randi<ucube>(5,5,5, distr_param(1, 10));
A.save(argv[1], arma_binary);
B.save(argv[2], arma_binary);
read_cubes(argv[1], argv[2]);
}
But I do not know why, doing the same steps in R does not work. To illustrate, please run the toy program as ./a.out A.bin B.bin. It will yield the Cube<double> A.bin and the Cube<uword> B.bin, which I will mention later.
The problem
If I source the following C++ code with Rcpp::sourceCpp and I try to read the Cube<double> A.bin with read_cube("A.bin") it works, but if I do the same for the Cube<uword> B.bin with read_ucube("B.bin") it does not (I get the warning).
#include <RcppArmadillo.h>
#include <iostream>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::cube read_cube(char const* x){
arma::cube A;
A.load(x, arma::arma_binary);
return A;
}
// [[Rcpp::export]]
arma::ucube read_ucube(char const* x){
arma::ucube B;
B.load(x, arma::arma_binary);
return B;
}
Of course I could cast the Cube<uword> to a Cube<double> before ending the C++ program, but I would like to know why this happen and if it is possible to load a Cube<uword> in RcppArmadillo. Because it should be possible, right?
Unfortunately R still only supports 32 bit integers, so RcppArmadillo forces Armadillo to use 32 bit integers. This is done by defining ARMA_32BIT_WORD before including the armadillo header. See RcppArmadillo's configuration here.
You can apply the same "trick" with your Armadillo programs like so:
#define ARMA_32BIT_WORD
#include <armadillo>
One of the effects is that ucube (Cube<uword>) will use 32 bit unsigned integers.
After doing the above trick, recompile your Armadillo programs and save the ucubes again. They can then be loaded in RcppArmadillo.
I have a very complicated program and i have simplified it in order to make my problem easy to understand: I have a 2 scripts and 1 header: time_analysis.cu, DSMC_kernel_float.cu and DSMC_kernel_float.h;
Here is the time_analysis.cu
#include <cstdlib>
#include <cstdio>
#include <algorithm>
#include <math.h>
#include <cutil.h>
#include <stdio.h>
#include <assert.h>
#include <memory.h>
#include <string.h>
#include <time.h>
#include <cuda_gl_interop.h>
#include <cutil_math.h>
#include "math_constants.h"
#include "vector_types.h"
#include "vector_functions.h"
typedef struct {
int seme;
} iniran;
typedef struct{
int jp1;
int jp2;
float kx;
float ky;
float kz;
} stato_struct;
stato_struct* coll_CPU=0;
stato_struct* coll2dev=0;
stato_struct* coll_GPU=0;
#include "DSMC_kernel_float.h"
//==============================================================
int main(void){
int N_thread = 4;
int ind;
coll_CPU[0].jp1= 0;
coll_CPU[1].jp2= 1;
coll_CPU[2].kx= 2;
coll_CPU[3].ky= 3;
coll_CPU[4].kz= 4;
for(ind=0;ind<=5;ind++){
coll2dev[ind]=coll_CPU[ind];
}
coll2dev=(stato_struct*) malloc(N_thread*sizeof(stato_struct));
CUDA_SAFE_CALL(cudaMalloc((void**)&coll_GPU, N_thread*sizeof(stato_struct)));
CUDA_SAFE_CALL(cudaMemcpy(coll_GPU,coll2dev,N_thread*sizeof(stato_struct), cudaMemcpyHostToDevice));
CollisioniGPU<<<4,N_thread>>>(coll_GPU);
CUT_CHECK_ERROR("Esecuzione kernel fallita");
CUDA_SAFE_CALL(cudaMemcpy(coll2dev, coll_GPU, N_thread*sizeof(stato_struct),cudaMemcpyDeviceToHost));
free(coll2dev);
CUDA_SAFE_CALL(cudaFree(coll_GPU));
free(coll_CPU);
return 0;
}
Here is the DSMC_kernel_float.cu
// Kernel della DSMC
#include "DSMC_kernel_float.h"
__global__ void CollisioniGPU(stato_struct *coll_GPU){
coll_GPU[0].vAx=1;
coll_GPU[1].vAy=1;
coll_GPU[2].vAz=1;
coll_GPU[3].tetaAp=1;
coll_GPU[4].phiAp=1;
}
Here is the DSMC_kernel_float.h
__global__ void CollisioniGPU(stato_struct* coll_GPU);
However when i type nvcc -I common/inc -rdc=true time_analysis.cu DSMC_kernel_float.cu in the terminal I get a weird message error and i don't understand why
DSMC_kernel_float.h(1): error: attribute "global" does not apply here
DSMC_kernel_float.h(1): error: incomplete type is not allowed
DSMC_kernel_float.h(1): error: identifier "stato_struct" is undefined
DSMC_kernel_float.h(1): error: identifier "coll_GPU" is undefined
DSMC_kernel_float.cu(4): error: variable "CollisioniGPU" has already been defined
DSMC_kernel_float.cu(4): error: attribute "global" does not apply here
DSMC_kernel_float.cu(4): error: incomplete type is not allowed
DSMC_kernel_float.cu(4): error: expected a ";"
At end of source: warning: parsing restarts here after previous syntax error
8 errors detected in the compilation of "/tmp/tmpxft_00003f1f_00000000-22_DSMC_kernel_float.cpp1.ii".
From what I read in the internet, I believe the error is cause by the struct but i don't understand how i could fix it to make the program work properly; how is possible that global does not apply here if i have other examples where it seems to be just fine?
Note: commom/inc is the folder provided by Nvidia in order to make Cuda compile correctly.
Regarding this statement:
Note: commom/inc is the folder provided by Nvidia in order to make Cuda compile correctly.
That's a mischaracterization. The referenced files (cutil.h and cutil_math.h) and macros (e.g. CUT_CHECK_ERROR) were provided in fairly old CUDA releases (prior to CUDA 5.0) as part of the cuda sample codes that were delivered at that time. They are not required "in order to make Cuda compile correctly." Furthermore, their use should be considered deprecated (refer to the CUDA 5.0 toolkit release notes). And if you are actually using an old toolkit like that, I would suggest upgrading to a newer one.
Regarding the compile issues, as #talonmies has pointed out, the compiler has no way of knowing what the definition of stato_struct is, when compiling any module that does not contain the definition (whether directly or included). This would be the case for your DSMC_kernel_float.cu module, which is where all your compile errors are coming from.
At first glance, it would seem that a sensible fix would be to move the typedef containing the stato_struct definition from your time_analysis.cu file into your header file (DSMC_kernel_float.h) and move the #include statement for that to the top of the time_analysis.cu file, along with your other includes.
However, it appears that your DSMC_kernel_analysis.cu file believes that there are a variety of members of that stato_struct:
__global__ void CollisioniGPU(stato_struct *coll_GPU){
coll_GPU[0].vAx=1;
coll_GPU[1].vAy=1;
coll_GPU[2].vAz=1;
coll_GPU[3].tetaAp=1;
coll_GPU[4].phiAp=1;
}
which are not part of your current definition of stato_struct:
typedef struct{
int jp1;
int jp2;
float kx;
float ky;
float kz;
} stato_struct;
So this is confusing code, and I don't think anyone else can sort that out for you. You will either need two separate struct definitions, with separate names, or else you will need to modify your stato_struct definition to include those members (.vAx, .vAy, .vAz, .tetaAp, .phiAp).
The (mis)handling of this struct definition and the resultant errors have nothing to do with CUDA. This is arising out of the C/C++ language expectations.
I am aware there are numerous similar queries on here, however I haven't been able to resolve this, not has a colleague, so:
I am using MinGW (4.8.x) with Eclipse CDT Kepler.
1) I have my own code and to clean it up I changed it to use a vector of structs - all is fine, except that the function that receives it complains about Invalid Argument'.
2) I reduced my code down to a minimum working example, if I place it all in a single file it works, however if I move out my definitions to the header (which I need to do in my main code) it suddenly cannot resolve the fields in the struct...
The code below is for a three file configuration, header/function/main.
(In my main code I use namespace std - but that doesn't seem to be the problem. Also, there are extraneous headers for a minimum working example in this, however they are needed in my main code.)
myheaders.h
/*************************/
/****** myheaders.h ******/
/*************************/
/**-- Header Files --**/
// File Streams and IO
#include <stdio.h>
#include <sstream>
#include <iostream>
#include <fstream>
// For strtod -> string to double
#include <stdlib.h>
// Math Operations
//#include <math.h>
#include <cmath>
// To get the CPU time
#include <time.h>
// For Vectors
#include <vector>
// For strings, C strings right now...
#include <cstring>
// Needed globally for the function definitions
// using namespace std;
#ifndef MY_HEADERS
#define MY_HEADERS
struct SpeciesLoss {
int ReactionID;
int SpeciesID;
double coefficient;
};
std::vector< double > SpeciesLossRate(std::vector<SpeciesLoss> , int, const std::vector< double > & );
#endif
function.cpp
/*************************/
/****** function.cpp *****/
/*************************/
#include "myheaders.h"
std::vector< double > SpeciesLossRate(
std::vector< SpeciesLoss > SpeciesLossList,
int Number_Species,
const std::vector< double >& Combined_Rates
)
{
std::vector< double > temp_species_loss;
temp_species_loss.resize(1);
temp_species_loss[0]=SpeciesLossList[0].ReactionID;
return temp_species_loss;
}
main.cpp
/*************************/
/******** main.cpp *******/
/*************************/
#include "myheaders.h"
std::vector< SpeciesLoss > SpeciesLossAll; // New vector for recording species loss, uses a vector of structs
int main(int argc, char* argv[])
{
std::vector< double > Rates;
Rates.push_back(1);
SpeciesLossAll.push_back(SpeciesLoss());
SpeciesLossAll[0].ReactionID = 0;
SpeciesLossAll[0].SpeciesID = 0;
SpeciesLossAll[0].coefficient = 0;
std::vector< double > SpeciesConcentrationChange = SpeciesLossRate(SpeciesLossAll,1, Rates);
return 0;
}
Edit:
Screenshot
Edit 2:
And interesting update - it compiles fine on Linux with GCC. Better than nothing, but I still want to know what is going wrong, plus I'd like my code to be cross platform...
Edit 3:
This is more and more bizarre - I just tested my code (the full project that compiles on Linux) on my home PC which runs Windows 7 where it builds fine while my laptop runs Windows 8 and the problem occurs.
The Settings for the C++ build are absolutely identical.
Both run MinGW 4.8.1...
Both run the latest Eclipse Kepler...
And yes, I am aware that I need to test some suggestions still.
#ifndef MY_HEADERS
#define MY_HEADERS
Should be at the beginning of your file. Since you have no idea in what order the compiler is going to include headers this might be causing a problem... Especially if you are including your personal header in multiple files, wich will definitely make it behave like so. Also, keep in mind that since you are not providing a default constructor but rather using the one the compiler provides for you, those variables inside the struct will most likely not be initialized to zero as you expect them.
EDIT#1
Are you compiling everything NOT just main... I just copied your code into VS and it works!
EDIT#2
Try defining the function inline instead of a separate implementation file.
static std::vector< double > SpeciesLossRate(
std::vector< SpeciesLoss > SpeciesLossList,
int Number_Species,
const std::vector< double >& Combined_Rates
)
{
std::vector< double > temp_species_loss;
temp_species_loss.resize(1);
temp_species_loss[0]=SpeciesLossList[0].ReactionID;
return temp_species_loss;
}
EDIT#3
Ok, from the screen-shot this is definitely valid code. For sake of trying everything; implement your own constructor and copy constructor of the struct. I know this might sound silly but maybe Eclipse doesn't think so.
OK - I have found the answer - I think - and it boils down to Eclipse.
-> Project -> C/C++ Index -> Rebuild
This resolves the issue.
In fact, this problem is known on earlier Eclipse CDT versions: https://bugs.eclipse.org/bugs/show_bug.cgi?id=348170
I am trying to pass data around the numpy and boost::ublas layers. I
have written an ultra thin wrapper because swig cannot parse ublas'
header correctly. The code is shown below
#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/lexical_cast.hpp>
#include <algorithm>
#include <sstream>
#include <string>
using std::copy;
using namespace boost;
typedef boost::numeric::ublas::matrix<double> dm;
typedef boost::numeric::ublas::vector<double> dv;
class dvector : public dv{
public:
dvector(const int rhs):dv(rhs){;};
dvector();
dvector(const int size, double* ptr):dv(size){
copy(ptr, ptr+sizeof(double)*size, &(dv::data()[0]));
}
~dvector(){}
};
with the SWIG interface that looks something like
%apply(int DIM1, double* INPLACE_ARRAY1) {(const int size, double* ptr)}
class dvector{
public:
dvector(const int rhs);
dvector();
dvector(const int size, double* ptr);
%newobject toString;
char* toString();
~dvector();
};
I have compiled them successfully via gcc 4.3 and vc++9.0. However
when I simply run
a = dvector(array([1.,2.,3.]))
it gives me a segfault. This is the first time I use swigh with numpy
and not have fully understanding between the data conversion and
memory buffer passing. Does anyone see something obvious I have
missed? I have tried to trace through with a debugger but it crashed within the assmeblys of python.exe. I have no clue if this is a swig problem or of my simple wrapper. Anything is appreciated.
You may be interested in looking at the pyublas module. It does the conversion between numpy arrays and ublas data types seamlessly and without copying.
You may want to replace
copy(ptr, ptr+sizeof(double)*size, &(dv::data()[0]));
by
copy(ptr, ptr+size, &(dv::data()[0]));
Remember that in C/C++ adding or subtracting from a pointer moves it by a multiple of the size of the datatype it points to.
Best,