Why isn't my mkl sparse matrix module working properly? - c++

I first created a CSR matrix using the mkl sparse matrix module.
This part is normal and can be created.
Then I used mkl_sparse_s_add for matrix addition, and then the program reported an error.
The content of the error report is
Exception thrown at 0x00007FFDA75F478C (KernelBase.dll) (in mkl.exe): 0xC06D007E: Module not found (parameter: 0x000000CEB30FF5B0).
Here's my code
#include <stdio.h>
#include <assert.h>
#include <math.h>
#include "mkl_spblas.h"
#include <mkl.h>
int main() {
MKL_INT rowPtr[6] = { 0,3,5,8,11,13 };
MKL_INT columns[13] = { 0,1,3,0,1,2,3,4,0,2,3,1,4 };
float values[13] = { 1,-1,-3,-2,5,4,6,4,-4,2,7,8,-5 };
sparse_matrix_t elementMatrix2; sparse_matrix_t elementMatrix3;
mkl_sparse_s_create_csr(&elementMatrix2,SPARSE_INDEX_BASE_ZERO,5,5,rowPtr,rowPtr+1,columns,values);
mkl_sparse_s_add(SPARSE_OPERATION_NON_TRANSPOSE, elementMatrix2, 1, elementMatrix2, &elementMatrix3);
}
Helps me run the program normally

Anyone else who is looking into this with similar issues can refer to the Intel communities for the solution as this query has been addressed here.
In this case, the issue got resolved after reinstalling the MKL.

Related

Unable to load armadillo Cube<uword> when using RcppArmadillo

I was prepossessing data in C++ using the Armadillo library. The program end product is a ucube, which is a cube filled with unsigned integers. After its run, I want to load the ucube to R to perform some final statistical tests. To do so, I made a C++ function that load the ucube returning an array.
But it does not work!
I got the following warning: "warning: Cube::load(): incorrect header in B.bin" and the program returns a 0x0x0 array.
Trying to find why, I made a toy C++ program, which works fine. It is able to load the cubes without any problem.
#include <iostream>
#include <armadillo>
using namespace arma;
void read_cubes(char const* A, char const* B){
cube C;
ucube D;
C.load(A, arma_binary);
D.load(B, arma_binary);
}
int main(int argc, char** argv){
cube A = randu<cube>(5,5,5);
ucube B = randi<ucube>(5,5,5, distr_param(1, 10));
A.save(argv[1], arma_binary);
B.save(argv[2], arma_binary);
read_cubes(argv[1], argv[2]);
}
But I do not know why, doing the same steps in R does not work. To illustrate, please run the toy program as ./a.out A.bin B.bin. It will yield the Cube<double> A.bin and the Cube<uword> B.bin, which I will mention later.
The problem
If I source the following C++ code with Rcpp::sourceCpp and I try to read the Cube<double> A.bin with read_cube("A.bin") it works, but if I do the same for the Cube<uword> B.bin with read_ucube("B.bin") it does not (I get the warning).
#include <RcppArmadillo.h>
#include <iostream>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::cube read_cube(char const* x){
arma::cube A;
A.load(x, arma::arma_binary);
return A;
}
// [[Rcpp::export]]
arma::ucube read_ucube(char const* x){
arma::ucube B;
B.load(x, arma::arma_binary);
return B;
}
Of course I could cast the Cube<uword> to a Cube<double> before ending the C++ program, but I would like to know why this happen and if it is possible to load a Cube<uword> in RcppArmadillo. Because it should be possible, right?
Unfortunately R still only supports 32 bit integers, so RcppArmadillo forces Armadillo to use 32 bit integers. This is done by defining ARMA_32BIT_WORD before including the armadillo header. See RcppArmadillo's configuration here.
You can apply the same "trick" with your Armadillo programs like so:
#define ARMA_32BIT_WORD
#include <armadillo>
One of the effects is that ucube (Cube<uword>) will use 32 bit unsigned integers.
After doing the above trick, recompile your Armadillo programs and save the ucubes again. They can then be loaded in RcppArmadillo.

Undefined Reference, DisparityBilateralFilter in OpenCV using CUDA

I'm a beginner with OpenCV and CUDA so sorry, if this Question is trivial.
I've installed CUDA 9.0 with, OpenCV 3.3.1 and I'm using Qt 5.
I try to filter a disparity map with cuda::DisparityBilaterFilter. Unfortunately It's not working.
Code Example:
#include <iostream>
#include "opencv2/opencv.hpp"
#include "opencv2/ximgproc.hpp"
#include "opencv2/cudastereo.hpp"
using namespace cv;
int main()
{
int nDisp = 64;
int radius = 3;
int iters = 1;
Ptr<cuda::DisparityBilateralFilter> pCudaBilFilter = cuda::createDisparityBilateralFilter(nDisp, radius, iters);
// pCudaBilFilter->apply(DispMapInp, LeftImages, filteredDispMap);
return 0;
}
I'm getting en error:
error: undefined reference to `cv::cuda::createDisparityBilateralFilter(int, int, int)'
Other OpenCV Code (like StereoMatching) is working fine. What am I missing? I'm sure it's a stupid mistake.
Thanks in advance.
Solved it myself now. Forgot to add the library -lopencv_cudastereo to my .pro file.

Segmentation Fault with Eigen Library (3.3.4)

I am using Eigen for some simulations. I get segmentation fault error (more precisely Segmentation fault (core dumped) with no other details) whenever I include even the smallest overload Eigen operation (even if I have x=y where x,y are Eigen::VectorXd of the same size). What make this very strange is that it only happens if I have the matrix operations in certain functions.
Let me show you:
//main.cu
#include <Eigen/Dense>
#include "def.h"
using namespace std;
int main(int argc, char *argv[])
{
params p;
int ns;
//some code here
MatrixXR A(ns,ns);
VectorXR u(ns);
VectorXR v(ns);
VectorXR unew(ns);
VectorXR aux(ns);
VectorXR vnew(ns);
VectorXR vcouple(ns);
VectorXR q(ns);
Real* output;
output=new Real[output_size];
//output_size is a number depending on the system I am simulating, usually about 1000000.
CPUsim(output,p,u,v,A,unew,vnew,q,aux,vcouple);
delete [] &(output[0]);
return 0;
}
//def.h
#ifndef DEF_H_
#include <Eigen/Dense>
#define DEF_H_
#ifdef DOUBLE
typedef double Real;
typedef Eigen::MatrixXd MatrixXR;
typedef Eigen::VectorXd VectorXR;
#else
typedef float Real;
typedef Eigen::MatrixXf MatrixXR;
typedef Eigen::VectorXf VectorXR;
#endif
struct params
{
//some parameters
};
#endif
//sim.h
#ifndef SIM_H_
#define SIM_H_
#include "def.h"
#include <Eigen/Dense>
void CPUsim(Real* output,params &p, VectorXR& u,VectorXR& v,MatrixXR& A,VectorXR& unew,VectorXR& vnew,VectorXR& q,VectorXR& aux,VectorXR& vcouple);
//other functions
#endif
//sim.cu
#include "sim.h"
#include "coupling.h"
//some functions
void CPUsim(Real* output,params &p, VectorXR& u,VectorXR& v,MatrixXR& A,VectorXR& unew,VectorXR& vnew,VectorXR& q,VectorXR& aux,VectorXR& vcouple)
{
//some code
coupling(u,unew,v,vnew,p,A,vcouple,aux,no);
}
//coupling.h
#ifndef COUPLING_H_
#define COUPLING_H_
#include <Eigen/Dense>
#include "def.h"
//some declarations
void coupling(VectorXR& u,VectorXR& unew,VectorXR& v,VectorXR& vnew,params& p,MatrixXR& A,VectorXR& vcouple,VectorXR& aux,noise& no);
//coupling.cpp
void coupling(VectorXR& u,VectorXR& unew,VectorXR& v,VectorXR& vnew,params& p,MatrixXR& A,VectorXR& vcouple,VectorXR& aux,noise& no)
{
vcouple=A*v;
//some other stuff
}
Now, some explanations:
If I have vcouple=vcouple in coupling, I get no error, if I have vcouple=v, I do get the error. I get no errors if I have vcouple=A*v in main or in CPUsim. Somebody recommended defining 'EIGEN_DONT_ALIGN', but that works only in some cases (i.e. for the same ns, but different values for the elements of the matrices and vectors, it might show the error or it might not). Do you happen to know what might be causing this error?
BTW, I use the nvcc compiler because I am using CUDA for some parts of the simulation. However, Eigen is used only for portions of the code that run entirely on the CPU. For the host compiler, I use GCC 5.4.1 and I have ubuntu 16.04.
Edit:
The error disappears if I don't store the result (i.e. just A*v; instead of vcouple=A*v;)
I finally found the answer. Apparently nvcc and gcc align Eigen arrays differently. This explains why there was no problem when calculating (and assigning) the matrix product in the .cu files while getting an error in the .cpp ones. Simply changing the extension of the coupling.cpp file to .cu solved the problem.
More details can be found here:
https://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2016/06/msg00003.html

Global function not recognized by CUDA C

I have a very complicated program and i have simplified it in order to make my problem easy to understand: I have a 2 scripts and 1 header: time_analysis.cu, DSMC_kernel_float.cu and DSMC_kernel_float.h;
Here is the time_analysis.cu
#include <cstdlib>
#include <cstdio>
#include <algorithm>
#include <math.h>
#include <cutil.h>
#include <stdio.h>
#include <assert.h>
#include <memory.h>
#include <string.h>
#include <time.h>
#include <cuda_gl_interop.h>
#include <cutil_math.h>
#include "math_constants.h"
#include "vector_types.h"
#include "vector_functions.h"
typedef struct {
int seme;
} iniran;
typedef struct{
int jp1;
int jp2;
float kx;
float ky;
float kz;
} stato_struct;
stato_struct* coll_CPU=0;
stato_struct* coll2dev=0;
stato_struct* coll_GPU=0;
#include "DSMC_kernel_float.h"
//==============================================================
int main(void){
int N_thread = 4;
int ind;
coll_CPU[0].jp1= 0;
coll_CPU[1].jp2= 1;
coll_CPU[2].kx= 2;
coll_CPU[3].ky= 3;
coll_CPU[4].kz= 4;
for(ind=0;ind<=5;ind++){
coll2dev[ind]=coll_CPU[ind];
}
coll2dev=(stato_struct*) malloc(N_thread*sizeof(stato_struct));
CUDA_SAFE_CALL(cudaMalloc((void**)&coll_GPU, N_thread*sizeof(stato_struct)));
CUDA_SAFE_CALL(cudaMemcpy(coll_GPU,coll2dev,N_thread*sizeof(stato_struct), cudaMemcpyHostToDevice));
CollisioniGPU<<<4,N_thread>>>(coll_GPU);
CUT_CHECK_ERROR("Esecuzione kernel fallita");
CUDA_SAFE_CALL(cudaMemcpy(coll2dev, coll_GPU, N_thread*sizeof(stato_struct),cudaMemcpyDeviceToHost));
free(coll2dev);
CUDA_SAFE_CALL(cudaFree(coll_GPU));
free(coll_CPU);
return 0;
}
Here is the DSMC_kernel_float.cu
// Kernel della DSMC
#include "DSMC_kernel_float.h"
__global__ void CollisioniGPU(stato_struct *coll_GPU){
coll_GPU[0].vAx=1;
coll_GPU[1].vAy=1;
coll_GPU[2].vAz=1;
coll_GPU[3].tetaAp=1;
coll_GPU[4].phiAp=1;
}
Here is the DSMC_kernel_float.h
__global__ void CollisioniGPU(stato_struct* coll_GPU);
However when i type nvcc -I common/inc -rdc=true time_analysis.cu DSMC_kernel_float.cu in the terminal I get a weird message error and i don't understand why
DSMC_kernel_float.h(1): error: attribute "global" does not apply here
DSMC_kernel_float.h(1): error: incomplete type is not allowed
DSMC_kernel_float.h(1): error: identifier "stato_struct" is undefined
DSMC_kernel_float.h(1): error: identifier "coll_GPU" is undefined
DSMC_kernel_float.cu(4): error: variable "CollisioniGPU" has already been defined
DSMC_kernel_float.cu(4): error: attribute "global" does not apply here
DSMC_kernel_float.cu(4): error: incomplete type is not allowed
DSMC_kernel_float.cu(4): error: expected a ";"
At end of source: warning: parsing restarts here after previous syntax error
8 errors detected in the compilation of "/tmp/tmpxft_00003f1f_00000000-22_DSMC_kernel_float.cpp1.ii".
From what I read in the internet, I believe the error is cause by the struct but i don't understand how i could fix it to make the program work properly; how is possible that global does not apply here if i have other examples where it seems to be just fine?
Note: commom/inc is the folder provided by Nvidia in order to make Cuda compile correctly.
Regarding this statement:
Note: commom/inc is the folder provided by Nvidia in order to make Cuda compile correctly.
That's a mischaracterization. The referenced files (cutil.h and cutil_math.h) and macros (e.g. CUT_CHECK_ERROR) were provided in fairly old CUDA releases (prior to CUDA 5.0) as part of the cuda sample codes that were delivered at that time. They are not required "in order to make Cuda compile correctly." Furthermore, their use should be considered deprecated (refer to the CUDA 5.0 toolkit release notes). And if you are actually using an old toolkit like that, I would suggest upgrading to a newer one.
Regarding the compile issues, as #talonmies has pointed out, the compiler has no way of knowing what the definition of stato_struct is, when compiling any module that does not contain the definition (whether directly or included). This would be the case for your DSMC_kernel_float.cu module, which is where all your compile errors are coming from.
At first glance, it would seem that a sensible fix would be to move the typedef containing the stato_struct definition from your time_analysis.cu file into your header file (DSMC_kernel_float.h) and move the #include statement for that to the top of the time_analysis.cu file, along with your other includes.
However, it appears that your DSMC_kernel_analysis.cu file believes that there are a variety of members of that stato_struct:
__global__ void CollisioniGPU(stato_struct *coll_GPU){
coll_GPU[0].vAx=1;
coll_GPU[1].vAy=1;
coll_GPU[2].vAz=1;
coll_GPU[3].tetaAp=1;
coll_GPU[4].phiAp=1;
}
which are not part of your current definition of stato_struct:
typedef struct{
int jp1;
int jp2;
float kx;
float ky;
float kz;
} stato_struct;
So this is confusing code, and I don't think anyone else can sort that out for you. You will either need two separate struct definitions, with separate names, or else you will need to modify your stato_struct definition to include those members (.vAx, .vAy, .vAz, .tetaAp, .phiAp).
The (mis)handling of this struct definition and the resultant errors have nothing to do with CUDA. This is arising out of the C/C++ language expectations.

eigen library selfadjointView problem

I am persistently getting error messages whenever I try to use the selfadjointView property of any matrix or sparse matrix using the eigen library. Below is a simple code to check that. In my program I do try with self-adjoint matrix:
#define EIGEN_YES_I_KNOW_SPARSE_MODULE_IS_NOT_STABLE_YET
#include <Eigen/Sparse>
#include <Eigen/Dense>
#include <Eigen/Core>
#include <iostream>
using namespace Eigen;
int main ()
{
SparseMatrix<float> mat(3,3);
Matrix<float, 3, 1> vec;
std::cout<<mat.selfadjointView<>()*vec;
}
The error message I get is:
error: no matching function for call to ‚'Eigen::SparseMatrix::selfadjointView()‚
You have to specify the template argument, so it should read mat.selfadjointView<Upper>() or mat.selfadjointView<Lower>() . The first one means that it should use the entries in the upper triangular part of mat and fill the lower triangular part to make the matrix self-adjoint. The second one is the other way around.