Intel PARDISO factorization slower when linking dynamically

Intel PARDISO factorization slower when linking dynamically - c++

We have a code that runs PARDISO solver. When we link this code statically, the factorization step is 2x faster than when we link the same code dinamically.
Here are link lines obtained from CMake in both cases (I've used the MKL link line advisor to help me define the parameters):
Static linking:
/opt/intel/compilers_and_libraries_2016/linux/bin/intel64/icpc -rdynamic CMakeFiles/simplesolver.dir/pardiso_sym_c.cpp.o -o simplesolver -Wl,--start-group /opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64/libmkl_intel_thread.a /opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -liomp5 -lpthread -lm -ldl
Dynamic linking:
/opt/intel/compilers_and_libraries_2016/linux/bin/intel64/icpc
-rdynamic CMakeFiles/simplesolver.dir/pardiso_sym_c.cpp.o -o simplesolver
-L/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl
Is there any known issue with this? or we are missing some compilation/linking flags to improve factorization performance? The code is exactly the same (solverc example from the MKL distribution). The only thing we change is how to link, and then we obtain this big difference in running speed.
We are using Intel Compiler C++ 2016 and using the MKL from it, under Linux (Ubuntu 14.04). We measure the time by looking at the output of PARDISO (msglvl=1).
Only if it helps, this is the code (I've ommited the function readData that reads the matrix information from a file).
#include "mkl_pardiso.h"
#include "mkl_types.h"
#include <cmath>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <string>
MKL_INT main (void)
{
int n; // dimension of matrix
int nnz; // number of non zeroes
int* ia; // coordinates in i of each value in a
int* ja; // coordinates in j of each value in a
double* a; // values of the matrix
double* b; // vector of forces
double* x_expected; // computed solution our software (Ax=rhs)
double* x; // computed solution in pardiso
bool result = false;
std::string fileIn = "problemData.bin";
//METHOD THAT FILLS ALL THE VALUES... NOT RELEVANT.
result = readData(fileIn, n, nnz, &ia, &ja, &a, &b, &x_expected, false);
x = new double[n];
if (!result)
return 1;
MKL_INT mtype = -2; /* Real symmetric matrix */
/* RHS and solution vectors. */
MKL_INT nrhs = 1; /* Number of right hand sides. */
/* Internal solver memory pointer pt, */
/* 32-bit: int pt[64]; 64-bit: long int pt[64] */
/* or void *pt[64] should be OK on both architectures */
void *pt[64];
/* Pardiso control parameters. */
MKL_INT iparm[64];
MKL_INT maxfct, mnum, phase, error, msglvl;
/* Auxiliary variables. */
MKL_INT i;
double ddum; /* Double dummy */
MKL_INT idum; /* Integer dummy. */
/* -------------------------------------------------------------------- */
/* .. Setup Pardiso control parameters. */
/* -------------------------------------------------------------------- */
for ( i = 0; i < 64; i++ )
{
iparm[i] = 0;
}
iparm[0] = 1; /* No solver default */
iparm[1] = 2; /* Fill-in reordering from METIS */
iparm[3] = 0; /* No iterative-direct algorithm */
iparm[4] = 0; /* No user fill-in reducing permutation */
iparm[5] = 0; /* Write solution into x */
iparm[6] = 0; /* Not in use */
iparm[7] = 2; /* Max numbers of iterative refinement steps */
iparm[8] = 0; /* Not in use */
iparm[9] = 13; /* Perturb the pivot elements with 1E-13 */
iparm[10] = 1; /* Use nonsymmetric permutation and scaling MPS */
iparm[11] = 0; /* Not in use */
iparm[12] = 0; /* Maximum weighted matching algorithm is switched-off (default for symmetric). Try iparm[12] = 1 in case of inappropriate accuracy */
iparm[13] = 0; /* Output: Number of perturbed pivots */
iparm[14] = 0; /* Not in use */
iparm[15] = 0; /* Not in use */
iparm[16] = 0; /* Not in use */
iparm[17] = -1; /* Output: Number of nonzeros in the factor LU */
iparm[18] = -1; /* Output: Mflops for LU factorization */
iparm[19] = 0; /* Output: Numbers of CG Iterations */
maxfct = 1; /* Maximum number of numerical factorizations. */
mnum = 1; /* Which factorization to use. */
msglvl = 1; /* Print statistical information in file */
error = 0; /* Initialize error flag */
/* -------------------------------------------------------------------- */
/* .. Initialize the internal solver memory pointer. This is only */
/* necessary for the FIRST call of the PARDISO solver. */
/* -------------------------------------------------------------------- */
for ( i = 0; i < 64; i++ )
{
pt[i] = 0;
}
/* -------------------------------------------------------------------- */
/* .. Reordering and Symbolic Factorization. This step also allocates */
/* all memory that is necessary for the factorization. */
/* -------------------------------------------------------------------- */
phase = 11;
PARDISO (pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error);
if ( error != 0 )
{
printf ("\nERROR during symbolic factorization: %d", error);
return -1;
}
printf ("\nReordering completed ... ");
printf ("\nNumber of nonzeros in factors = %d", iparm[17]);
printf ("\nNumber of factorization MFLOPS = %d", iparm[18]);
/* -------------------------------------------------------------------- */
/* .. Numerical factorization. */
/* -------------------------------------------------------------------- */
phase = 22;
PARDISO (pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, &ddum, &ddum, &error);
if ( error != 0 )
{
printf ("\nERROR during numerical factorization: %d", error);
return -2;
}
printf ("\nFactorization completed ... ");
/* -------------------------------------------------------------------- */
/* .. Back substitution and iterative refinement. */
/* -------------------------------------------------------------------- */
phase = 33;
iparm[7] = 2; /* Max numbers of iterative refinement steps. */
PARDISO (pt, &maxfct, &mnum, &mtype, &phase,
&n, a, ia, ja, &idum, &nrhs, iparm, &msglvl, b, x, &error);
if ( error != 0 )
{
printf ("\nERROR during solution: %d", error);
return -3;
}
printf ("\nSolve completed ... ");
/* -------------------------------------------------------------------- */
/* .. Termination and release of memory. */
/* -------------------------------------------------------------------- */
phase = -1; /* Release internal memory. */
PARDISO (pt, &maxfct, &mnum, &mtype, &phase,
&n, &ddum, ia, ja, &idum, &nrhs,
iparm, &msglvl, &ddum, &ddum, &error);
delete [] ia;
delete [] ja;
delete [] a;
delete [] b;
delete [] x;
delete [] x_expected;
return 0;
}

You have turned on 0 optimisations. A lot of factors would then contribute to runtime behaviour that are not at all related to MKL / dynamic vs static linking etc. What happens, when you compile above code with -O2 -g -march=native, -O3 -march=native? The differences should go away entirely.

Related

How can I create an executable to run a kernel in a given PTX file?

As far as I know, you need a host code (for CPU) and a device code (for GPU), without them you can't run something on GPU.
I am learning PTX ISA and I don't know how to execute it on Windows. Do I need a .cu file to run it or is there another way to run it?

TL;DR:
How can I assemble .ptx file and host code file and make a executable file?
You use the CUDA driver API. Relevant sample codes are vectorAddDrv (or perhaps any other driver API sample code) as well as ptxjit.
Do I need a .cu file to run it or is there another way to run it?
You do not need a .cu file (nor do you need nvcc) to use the driver API method, if you start with device code in PTX form.
Details:
The remainder of this answer is not intended to be a tutorial on driver API programming (use the references already given and the API reference manual here), nor is it intended to be a tutorial on PTX programming. For PTX programming I refer you to the PTX documentation.
To start with, we need an appropriate PTX kernel definition. (For that, rather than writing my own kernel PTX code, I will use the one from the vectorAddDrv sample code, from the CUDA 11.1 toolkit, converting that CUDA C++ kernel definition to an equivalent PTX kernel definition via nvcc -ptx vectorAdd_kernel.cu):
vectorAdd_kernel.ptx:
.version 7.1
.target sm_52
.address_size 64
// .globl VecAdd_kernel
.visible .entry VecAdd_kernel(
.param .u64 VecAdd_kernel_param_0,
.param .u64 VecAdd_kernel_param_1,
.param .u64 VecAdd_kernel_param_2,
.param .u32 VecAdd_kernel_param_3
)
{
.reg .pred %p<2>;
.reg .f32 %f<4>;
.reg .b32 %r<6>;
.reg .b64 %rd<11>;
ld.param.u64 %rd1, [VecAdd_kernel_param_0];
ld.param.u64 %rd2, [VecAdd_kernel_param_1];
ld.param.u64 %rd3, [VecAdd_kernel_param_2];
ld.param.u32 %r2, [VecAdd_kernel_param_3];
mov.u32 %r3, %ntid.x;
mov.u32 %r4, %ctaid.x;
mov.u32 %r5, %tid.x;
mad.lo.s32 %r1, %r3, %r4, %r5;
setp.ge.s32 %p1, %r1, %r2;
#%p1 bra $L__BB0_2;
cvta.to.global.u64 %rd4, %rd1;
mul.wide.s32 %rd5, %r1, 4;
add.s64 %rd6, %rd4, %rd5;
cvta.to.global.u64 %rd7, %rd2;
add.s64 %rd8, %rd7, %rd5;
ld.global.f32 %f1, [%rd8];
ld.global.f32 %f2, [%rd6];
add.f32 %f3, %f2, %f1;
cvta.to.global.u64 %rd9, %rd3;
add.s64 %rd10, %rd9, %rd5;
st.global.f32 [%rd10], %f3;
$L__BB0_2:
ret;
}
We'll also need a driver API C++ source code file that does all the host-side work to load this kernel and launch it. Again I will use the source code from the vectorAddDrv sample project (the .cpp file), with modifications to load PTX instead of fatbin:
vectorAddDrv.cpp:
// Vector addition: C = A + B.
// Includes
#include <stdio.h>
#include <string>
#include <iostream>
#include <cstring>
#include <fstream>
#include <streambuf>
#include <cuda.h>
#include <cmath>
#include <vector>
#define CHK(X) if ((err = X) != CUDA_SUCCESS) printf("CUDA error %d at %d\n", (int)err, __LINE__)
// Variables
CUdevice cuDevice;
CUcontext cuContext;
CUmodule cuModule;
CUfunction vecAdd_kernel;
CUresult err;
CUdeviceptr d_A;
CUdeviceptr d_B;
CUdeviceptr d_C;
// Host code
int main(int argc, char **argv)
{
printf("Vector Addition (Driver API)\n");
int N = 50000, devID = 0;
size_t size = N * sizeof(float);
// Initialize
CHK(cuInit(0));
CHK(cuDeviceGet(&cuDevice, devID));
// Create context
CHK(cuCtxCreate(&cuContext, 0, cuDevice));
// Load PTX file
std::ifstream my_file("vectorAdd_kernel.ptx");
std::string my_ptx((std::istreambuf_iterator<char>(my_file)), std::istreambuf_iterator<char>());
// Create module from PTX
CHK(cuModuleLoadData(&cuModule, my_ptx.c_str()));
// Get function handle from module
CHK(cuModuleGetFunction(&vecAdd_kernel, cuModule, "VecAdd_kernel"));
// Allocate/initialize vectors in host memory
std::vector<float> h_A(N, 1.0f);
std::vector<float> h_B(N, 2.0f);
std::vector<float> h_C(N);
// Allocate vectors in device memory
CHK(cuMemAlloc(&d_A, size));
CHK(cuMemAlloc(&d_B, size));
CHK(cuMemAlloc(&d_C, size));
// Copy vectors from host memory to device memory
CHK(cuMemcpyHtoD(d_A, h_A.data(), size));
CHK(cuMemcpyHtoD(d_B, h_B.data(), size));
// Grid/Block configuration
int threadsPerBlock = 256;
int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock;
void *args[] = { &d_A, &d_B, &d_C, &N };
// Launch the CUDA kernel
CHK(cuLaunchKernel(vecAdd_kernel, blocksPerGrid, 1, 1,
threadsPerBlock, 1, 1,
0,
NULL, args, NULL));
// Copy result from device memory to host memory
// h_C contains the result in host memory
CHK(cuMemcpyDtoH(h_C.data(), d_C, size));
// Verify result
for (int i = 0; i < N; ++i)
{
float sum = h_A[i] + h_B[i];
if (fabs(h_C[i] - sum) > 1e-7f)
{
printf("mismatch!");
break;
}
}
return 0;
}
(Note that I have stripped out various items such as deallocation calls. This is intended to demonstrate the overall method; the code above is merely a demonstrator.)
On Linux:
We can compile and run the code as follows:
$ g++ vectorAddDrv.cpp -o vectorAddDrv -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcuda
$ ./vectorAddDrv
Vector Addition (Driver API)
$
On Windows/Visual Studio: Create a new C++ project in visual studio. Add the above .cpp file to the project. Make sure the vectorAdd_kernel.ptx file is in the same directory as the built executable. You will also need to modify the project definition to point the location of the CUDA include files and the CUDA library files. Here's what I did in VS2019:
File...New...Project...Console App...Create
Replace the contents of the given .cpp file with the .cpp file contents above
Change project target to x64
In project...properties
change platform to x64
in configuration properties...C/C++...General...Additional Include Directories, add the path to the CUDA toolkit include directory, on my machine it was C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\include
in configuration properties...Linker...General...Additional Library Directories, add the path to the CUDA toolkit library directory, on my machine it was C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\lib\x64
in configuration properties...Linker...Input...Additional Dependencies, add the cuda.lib file (for the driver API library)
Save the project properties, then do Build....Rebuild
From the console output, locate the location of the built executable. Make sure the vectorAdd_kernel.ptx file is in that directory, and run the executable from that directory. (i.e. open a command prompt. change to that directory. run the application from the command prompt)
NOTE: If you are not using the CUDA 11.1 toolkit or newer, or if you are running on a GPU of compute capability 5.0 or lower, the above PTX code will not work, and so this example will not work verbatim. However the overall method will work, and this question is not about how to write PTX code.
EDIT: Responding to a question in the comments:
What if you wanted the binary to not have to build anything at runtime? i.e. assemble the PTX and stick it in a binary with the compiled host-side code?
I'm not aware of a method provided by the NVIDIA toolchain to do this. It's pretty much the domain of the runtime API to create these unified binaries, from my perspective.
However the basic process seems to be evident from what can be seen of the driver API flow already in the above example: whether we start with a .cubin or a .ptx file, either way the file is loaded into a string, and the string is handed off to cuModuleLoad(). Therefore, it doesn't seem that difficult to build a string out of a .cubin binary with a utility, and then incorporate that in the build process.
I'm really just hacking around here, you should use this at your own risk, and there may be any number of factors that I haven't considered. I'm just going to demonstrate on linux for this part. Here is the source code and build example for the utility:
$ cat f2s.cpp
// Includes
#include <stdio.h>
#include <string>
#include <iostream>
#include <cstring>
#include <fstream>
#include <streambuf>
int main(int argc, char **argv)
{
std::ifstream my_file("vectorAdd_kernel.cubin");
std::string my_bin((std::istreambuf_iterator<char>(my_file)), std::istreambuf_iterator<char>());
std::cout << "unsigned char my_bin[] = {";
for (int i = 0; i < my_bin.length()-1; i++) std::cout << (int)(unsigned char)my_bin[i] << ",";
std::cout << (int)(unsigned char)my_bin[my_bin.length()-1] << "};";
return 0;
}
$ g++ f2s.cpp -o f2s
$
The next step here is to create a .cubin file for use. In the above example, I created the ptx file via nvcc -ptx vectorAdd_kernel.cu. We can just change that to nvcc -cubin vectorAdd_kernel.cu or you can use whatever method you like to generate the .cubin file.
With the cubin file created, we need to convert that into something that can be sucked into our C++ code build process. That is the purpose of the f2s utility. You would use it like this:
./f2s > my_bin.h
(probably it would be good to allow the f2s utility to accept an input filename as a command-line argument. Exercise left to reader. This is just for demonstration/amusement.)
After the creation of the above header file, we need to modify our .cpp file as follows:
$ cat vectorAddDrv_bin.cpp
// Vector addition: C = A + B.
// Includes
#include <stdio.h>
#include <string>
#include <iostream>
#include <cstring>
#include <fstream>
#include <streambuf>
#include <cuda.h>
#include <cmath>
#include <vector>
#include <my_bin.h>
#define CHK(X) if ((err = X) != CUDA_SUCCESS) printf("CUDA error %d at %d\n", (int)err, __LINE__)
// Variables
CUdevice cuDevice;
CUcontext cuContext;
CUmodule cuModule;
CUfunction vecAdd_kernel;
CUresult err;
CUdeviceptr d_A;
CUdeviceptr d_B;
CUdeviceptr d_C;
// Host code
int main(int argc, char **argv)
{
printf("Vector Addition (Driver API)\n");
int N = 50000, devID = 0;
size_t size = N * sizeof(float);
// Initialize
CHK(cuInit(0));
CHK(cuDeviceGet(&cuDevice, devID));
// Create context
CHK(cuCtxCreate(&cuContext, 0, cuDevice));
// Create module from "binary string"
CHK(cuModuleLoadData(&cuModule, my_bin));
// Get function handle from module
CHK(cuModuleGetFunction(&vecAdd_kernel, cuModule, "VecAdd_kernel"));
// Allocate/initialize vectors in host memory
std::vector<float> h_A(N, 1.0f);
std::vector<float> h_B(N, 2.0f);
std::vector<float> h_C(N);
// Allocate vectors in device memory
CHK(cuMemAlloc(&d_A, size));
CHK(cuMemAlloc(&d_B, size));
CHK(cuMemAlloc(&d_C, size));
// Copy vectors from host memory to device memory
CHK(cuMemcpyHtoD(d_A, h_A.data(), size));
CHK(cuMemcpyHtoD(d_B, h_B.data(), size));
// Grid/Block configuration
int threadsPerBlock = 256;
int blocksPerGrid = (N + threadsPerBlock - 1) / threadsPerBlock;
void *args[] = { &d_A, &d_B, &d_C, &N };
// Launch the CUDA kernel
CHK(cuLaunchKernel(vecAdd_kernel, blocksPerGrid, 1, 1,
threadsPerBlock, 1, 1,
0,
NULL, args, NULL));
// Copy result from device memory to host memory
// h_C contains the result in host memory
CHK(cuMemcpyDtoH(h_C.data(), d_C, size));
// Verify result
for (int i = 0; i < N; ++i)
{
float sum = h_A[i] + h_B[i];
if (fabs(h_C[i] - sum) > 1e-7f)
{
printf("mismatch!");
break;
}
}
return 0;
}
$ g++ vectorAddDrv_bin.cpp -o vectorAddDrv_bin -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcuda -I.
$ ./vectorAddDrv_bin
Vector Addition (Driver API)
$
It seems to work. YMMV. For further amusement, this approach seems to create a form of obfuscation:
$ cuobjdump -sass vectorAdd_kernel.cubin
code for sm_52
Function : VecAdd_kernel
.headerflags #"EF_CUDA_SM52 EF_CUDA_PTX_SM(EF_CUDA_SM52)"
/* 0x001cfc00e22007f6 */
/*0008*/ MOV R1, c[0x0][0x20] ; /* 0x4c98078000870001 */
/*0010*/ S2R R0, SR_CTAID.X ; /* 0xf0c8000002570000 */
/*0018*/ S2R R2, SR_TID.X ; /* 0xf0c8000002170002 */
/* 0x001fd842fec20ff1 */
/*0028*/ XMAD.MRG R3, R0.reuse, c[0x0] [0x8].H1, RZ ; /* 0x4f107f8000270003 */
/*0030*/ XMAD R2, R0.reuse, c[0x0] [0x8], R2 ; /* 0x4e00010000270002 */
/*0038*/ XMAD.PSL.CBCC R0, R0.H1, R3.H1, R2 ; /* 0x5b30011800370000 */
/* 0x001ff400fd4007ed */
/*0048*/ ISETP.GE.AND P0, PT, R0, c[0x0][0x158], PT ; /* 0x4b6d038005670007 */
/*0050*/ NOP ; /* 0x50b0000000070f00 */
/*0058*/ #P0 EXIT ; /* 0xe30000000000000f */
/* 0x081fd800fea207f1 */
/*0068*/ SHL R6, R0.reuse, 0x2 ; /* 0x3848000000270006 */
/*0070*/ SHR R0, R0, 0x1e ; /* 0x3829000001e70000 */
/*0078*/ IADD R4.CC, R6.reuse, c[0x0][0x140] ; /* 0x4c10800005070604 */
/* 0x001fd800fe0207f2 */
/*0088*/ IADD.X R5, R0.reuse, c[0x0][0x144] ; /* 0x4c10080005170005 */
/*0090*/ { IADD R2.CC, R6, c[0x0][0x148] ; /* 0x4c10800005270602 */
/*0098*/ LDG.E R4, [R4] }
/* 0xeed4200000070404 */
/* 0x001fd800f62007e2 */
/*00a8*/ IADD.X R3, R0, c[0x0][0x14c] ; /* 0x4c10080005370003 */
/*00b0*/ LDG.E R2, [R2] ; /* 0xeed4200000070202 */
/*00b8*/ IADD R6.CC, R6, c[0x0][0x150] ; /* 0x4c10800005470606 */
/* 0x001fc420fe4007f7 */
/*00c8*/ IADD.X R7, R0, c[0x0][0x154] ; /* 0x4c10080005570007 */
/*00d0*/ FADD R0, R2, R4 ; /* 0x5c58000000470200 */
/*00d8*/ STG.E [R6], R0 ; /* 0xeedc200000070600 */
/* 0x001ffc00ffe007ea */
/*00e8*/ NOP ; /* 0x50b0000000070f00 */
/*00f0*/ EXIT ; /* 0xe30000000007000f */
/*00f8*/ BRA 0xf8 ; /* 0xe2400fffff87000f */
..........
$ cuobjdump -sass vectorAddDrv_bin
cuobjdump info : File 'vectorAddDrv_bin' does not contain device code
$
LoL

Could sqlite3 generate non-deterministic errors?

I am getting random errors when saving and reading data in Sqlite3 database (VS 2015 Express test).
Made simple trace and can see there is something a bit modified read by select after update sometimes, but not sure where it comes from or what is reason.
Sometimes they are frequent, but now they are very rare - last time 2 times different uint8_t values during few manual runs of same testcase in a short time for example...
How can I check what is causing this - DB / FS / sqlite3.c problem or something else ?
I am testing on a 3 month old NTB having single SSD drive only.
My tracing code:
sqlite3_trace_v2(database, SQLITE_TRACE_STMT | SQLITE_TRACE_ROW, DBtrace::Trace, 0);
class DBtrace {
public:
static bool logNow;
static std::string log;
DBtrace(bool enable = false) { logNow = enable; }
DBtrace & DBtrace::Instance()
{
static DBtrace instance;
return instance;
}
static int Trace(unsigned int d, void *m, void *p, void *x);
};
#include "sqlite3.h"
#include "db_trace.hpp"
#include <stdio.h>
#include <algorithm>
using namespace std;
namespace databaseNS
{
bool DBtrace::logNow;
std::string DBtrace::log;
static int isUseless(int c) { return c < 32; }
bool BothAreSpaces(char lhs, char rhs) { return (lhs == rhs) && (lhs < 32); }
int DBtrace::Trace(unsigned int d, void *m, void *p, void *x) {
if (!logNow) return 0;
string sql;
std::string::iterator new_end;
sqlite3_value *v;
sqlite3_stmt *pStmt = reinterpret_cast<sqlite3_stmt *>(p);
int cc = sqlite3_column_count(pStmt);
switch (d)
{
case SQLITE_TRACE_STMT:
sql = sqlite3_expanded_sql(pStmt);
new_end = std::unique(sql.begin(), sql.end(), BothAreSpaces);
sql.erase(new_end, sql.end());
sql += "\r\n";
log += sql;
break;
case SQLITE_TRACE_ROW:
int n = sqlite3_bind_parameter_count(pStmt);
for (int i = 0; i < cc; i++)
{
sql = sqlite3_column_name(pStmt, i);
v = sqlite3_column_value(pStmt, i);
if (v->flags & MEM_Int) {
sql += ":" + std::to_string(v->u.i) + ", ";
log += sql;
}
if (v->flags & MEM_Str) {
sql += ":";
sql += v->z;
sql += ", ";
log += sql;
};
}
log += "\r\n";
break;
}
return 0;
}
}
And a lot of required declarations found in sqlite3.c (db_trace.hpp)
#include <string>
/**
* #brief Routine for preparation sql queries needed for data handling
*
* #return true if success, false otherwise
*
*/
/*
** Integers of known sizes. These typedefs might change for architectures
** where the sizes very. Preprocessor macros are available so that the
** types can be conveniently redefined at compile-type. Like this:
**
** cc '-DUINTPTR_TYPE=long long int' ...
*/
#ifndef UINT32_TYPE
# ifdef HAVE_UINT32_T
# define UINT32_TYPE uint32_t
# else
# define UINT32_TYPE unsigned int
# endif
#endif
#ifndef UINT16_TYPE
# ifdef HAVE_UINT16_T
# define UINT16_TYPE uint16_t
# else
# define UINT16_TYPE unsigned short int
# endif
#endif
#ifndef INT16_TYPE
# ifdef HAVE_INT16_T
# define INT16_TYPE int16_t
# else
# define INT16_TYPE short int
# endif
#endif
#ifndef UINT8_TYPE
# ifdef HAVE_UINT8_T
# define UINT8_TYPE uint8_t
# else
# define UINT8_TYPE unsigned char
# endif
#endif
#ifndef INT8_TYPE
# ifdef HAVE_INT8_T
# define INT8_TYPE int8_t
# else
# define INT8_TYPE signed char
# endif
#endif
#ifndef LONGDOUBLE_TYPE
# define LONGDOUBLE_TYPE long double
#endif
typedef sqlite_int64 i64; /* 8-byte signed integer */
typedef sqlite_uint64 u64; /* 8-byte unsigned integer */
typedef UINT32_TYPE u32; /* 4-byte unsigned integer */
typedef UINT16_TYPE u16; /* 2-byte unsigned integer */
typedef INT16_TYPE i16; /* 2-byte signed integer */
typedef UINT8_TYPE u8; /* 1-byte unsigned integer */
typedef INT8_TYPE i8; /* 1-byte signed integer */
typedef struct FuncDef FuncDef;
struct sqlite3_value {
union MemValue {
double r; /* Real value used when MEM_Real is set in flags */
i64 i; /* Integer value used when MEM_Int is set in flags */
int nZero; /* Extra zero bytes when MEM_Zero and MEM_Blob set */
const char *zPType; /* Pointer type when MEM_Term|MEM_Subtype|MEM_Null */
FuncDef *pDef; /* Used only when flags==MEM_Agg */
} u;
u16 flags; /* Some combination of MEM_Null, MEM_Str, MEM_Dyn, etc. */
u8 enc; /* SQLITE_UTF8, SQLITE_UTF16BE, SQLITE_UTF16LE */
u8 eSubtype; /* Subtype for this value */
int n; /* Number of characters in string value, excluding '\0' */
char *z; /* String or BLOB value */
/* ShallowCopy only needs to copy the information above */
char *zMalloc; /* Space to hold MEM_Str or MEM_Blob if szMalloc>0 */
int szMalloc; /* Size of the zMalloc allocation */
u32 uTemp; /* Transient storage for serial_type in OP_MakeRecord */
sqlite3 *db; /* The associated database connection */
void(*xDel)(void*);/* Destructor for Mem.z - only valid if MEM_Dyn */
#ifdef SQLITE_DEBUG
Mem *pScopyFrom; /* This Mem is a shallow copy of pScopyFrom */
u16 mScopyFlags; /* flags value immediately after the shallow copy */
#endif
};
/* One or more of the following flags are set to indicate the validOK
** representations of the value stored in the Mem struct.
**
** If the MEM_Null flag is set, then the value is an SQL NULL value.
** For a pointer type created using sqlite3_bind_pointer() or
** sqlite3_result_pointer() the MEM_Term and MEM_Subtype flags are also set.
**
** If the MEM_Str flag is set then Mem.z points at a string representation.
** Usually this is encoded in the same unicode encoding as the main
** database (see below for exceptions). If the MEM_Term flag is also
** set, then the string is nul terminated. The MEM_Int and MEM_Real
** flags may coexist with the MEM_Str flag.
*/
#define MEM_Null 0x0001 /* Value is NULL (or a pointer) */
#define MEM_Str 0x0002 /* Value is a string */
#define MEM_Int 0x0004 /* Value is an integer */
#define MEM_Real 0x0008 /* Value is a real number */
#define MEM_Blob 0x0010 /* Value is a BLOB */
#define MEM_AffMask 0x001f /* Mask of affinity bits */
/* Available 0x0020 */
/* Available 0x0040 */
#define MEM_Undefined 0x0080 /* Value is undefined */
#define MEM_Cleared 0x0100 /* NULL set by OP_Null, not from data */
#define MEM_TypeMask 0xc1ff /* Mask of type bits */

C++ Linker error undefined reference [duplicate]

This question already has answers here:
Combining C++ and C - how does #ifdef __cplusplus work?
(4 answers)
What is an undefined reference/unresolved external symbol error and how do I fix it?
(39 answers)
Closed 4 years ago.
I have been working with C on the embedded side and C# on PC side for quite some time. Now I decided to start working with C++ on an embedded target.
However the linker tells me that it could not find my function.
The complete error looks like this:
Building target: Cpp_Test.elf
Invoking: MCU G++ Linker
arm-none-eabi-g++ -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -specs=nosys.specs -specs=nano.specs -T"../STM32F429ZITx_FLASH.ld" -Wl,-Map=output.map -Wl,--gc-sections -fno-exceptions -fno-rtti -o "Cpp_Test.elf" #"objects.list" -lm
Src/freertos.o: In function `StartDefaultTask':
\Cpp_Test\Debug/../Src/freertos.c:127: undefined reference to `calc_values'
The test_cplus.h file looks like this:
#include "stdint.h"
int calc_values();
The test_cplus.cpp file looks like this:
#include "test_cplus.h"
#include <iostream>
using namespace std;
class Rectangle {
int width, height;
public:
void set_values (int,int);
int area() {return width*height;}
};
void Rectangle::set_values (int x, int y) {
width = x;
height = y;
}
int calc_values()
{
Rectangle rect;
rect.set_values(1,2);
int area = rect.area();
return area;
}
Could you point me in the right direction?
Edit freertos.c
/* Includes ------------------------------------------------------------------*/
#include "FreeRTOS.h"
#include "task.h"
#include "cmsis_os.h"
/* USER CODE BEGIN Includes */
#include "test_file.h"
#include "test_cplus.h"
/* USER CODE END Includes */
/* Variables -----------------------------------------------------------------*/
osThreadId defaultTaskHandle;
/* USER CODE BEGIN Variables */
/* USER CODE END Variables */
/* Function prototypes -------------------------------------------------------*/
#ifdef __cplusplus
extern "C" {
#endif
void StartDefaultTask(void const * argument);
extern void MX_USB_DEVICE_Init(void);
extern void MX_FATFS_Init(void);
void MX_FREERTOS_Init(void); /* (MISRA C 2004 rule 8.1) */
#ifdef __cplusplus
}
#endif
/* USER CODE BEGIN FunctionPrototypes */
/* USER CODE END FunctionPrototypes */
/* Hook prototypes */
/* Init FreeRTOS */
void MX_FREERTOS_Init(void) {
/* USER CODE BEGIN Init */
/* USER CODE END Init */
/* USER CODE BEGIN RTOS_MUTEX */
/* add mutexes, ... */
/* USER CODE END RTOS_MUTEX */
/* USER CODE BEGIN RTOS_SEMAPHORES */
/* add semaphores, ... */
/* USER CODE END RTOS_SEMAPHORES */
/* USER CODE BEGIN RTOS_TIMERS */
/* start timers, add new ones, ... */
/* USER CODE END RTOS_TIMERS */
/* Create the thread(s) */
/* definition and creation of defaultTask */
osThreadDef(defaultTask, StartDefaultTask, osPriorityNormal, 0, 128);
defaultTaskHandle = osThreadCreate(osThread(defaultTask), NULL);
/* USER CODE BEGIN RTOS_THREADS */
/* add threads, ... */
/* USER CODE END RTOS_THREADS */
/* USER CODE BEGIN RTOS_QUEUES */
/* add queues, ... */
/* USER CODE END RTOS_QUEUES */
}
/* StartDefaultTask function */
void StartDefaultTask(void const * argument)
{
/* init code for USB_DEVICE */
MX_USB_DEVICE_Init();
/* init code for FATFS */
MX_FATFS_Init();
/* USER CODE BEGIN StartDefaultTask */
/* Infinite loop */
for(;;)
{
run_test();
calc_values();
osDelay(100);
}
/* USER CODE END StartDefaultTask */
}
/* USER CODE BEGIN Application */
/* USER CODE END Application */

Adding libengine.lib library from Matlab to QT

I'm facing a tough problem adding the static library libengine.lib from Matlab to a neq QT project.
As I read from some blogs, I have to include both the engine.h header and the libengine.lib library to the .pro file. But when I build and run the project it seems that QT is not able to find the engOpen function of the library.
Here is my (basic) code. classe file is, for now, an empty class. The main.cpp file contains example code which is taken from Matlab tutorial, so I guess it is not a problem.
//.pro file
QT += core gui
greaterThan(QT_MAJOR_VERSION, 4): QT += widgets
TARGET = Progetto
TEMPLATE = app
CONFIG += staticlib
SOURCES += main.cpp\
classe.cpp
FORMS += classe.ui
INCLUDEPATH += "C:/Users/Zeno/Desktop/M_libraries"
LIBS += "C:/Users/Zeno/Desktop/M_libraries/libeng.lib"
HEADERS += classe.h \
../../M_libraries/engine.h
//main.cpp
#include "classe.h"
#include <QApplication>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "engine.h"
#define BUFSIZE 256
int main(int argc, char *argv[])
{
QApplication a(argc, argv);
Engine *ep;
mxArray *T = NULL, *result = NULL;
char buffer[BUFSIZE+1];
double time[10] = { 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0 };
/*
* Call engOpen with a NULL string. This starts a MATLAB process
* on the current host using the command "matlab".
*/
if (!(ep = engOpen(""))) {
fprintf(stderr, "\nCan't start MATLAB engine\n");
return EXIT_FAILURE;
}
/*
* PART I
*
* For the first half of this demonstration, we will send data
* to MATLAB, analyze the data, and plot the result.
*/
/*
* Create a variable for our data
*/
T = mxCreateDoubleMatrix(1, 10, mxREAL);
memcpy((void *)mxGetPr(T), (void *)time, sizeof(time));
/*
* Place the variable T into the MATLAB workspace
*/
engPutVariable(ep, "T", T);
/*
* Evaluate a function of time, distance = (1/2)g.*t.^2
* (g is the acceleration due to gravity)
*/
engEvalString(ep, "D = .5.*(-9.8).*T.^2;");
/*
* Plot the result
*/
engEvalString(ep, "plot(T,D);");
engEvalString(ep, "title('Position vs. Time for a falling object');");
engEvalString(ep, "xlabel('Time (seconds)');");
engEvalString(ep, "ylabel('Position (meters)');");
/*
* use fgetc() to make sure that we pause long enough to be
* able to see the plot
*/
printf("Hit return to continue\n\n");
fgetc(stdin);
/*
* We're done for Part I! Free memory, close MATLAB figure.
*/
printf("Done for Part I.\n");
mxDestroyArray(T);
engEvalString(ep, "close;");
/*
* PART II
*
* For the second half of this demonstration, we will request
* a MATLAB string, which should define a variable X. MATLAB
* will evaluate the string and create the variable. We
* will then recover the variable, and determine its type.
*/
/*
* Use engOutputBuffer to capture MATLAB output, so we can
* echo it back. Ensure first that the buffer is always NULL
* terminated.
*/
buffer[BUFSIZE] = '\0';
engOutputBuffer(ep, buffer, BUFSIZE);
while (result == NULL) {
char str[BUFSIZE+1];
/*
* Get a string input from the user
*/
printf("Enter a MATLAB command to evaluate. This command should\n");
printf("create a variable X. This program will then determine\n");
printf("what kind of variable you created.\n");
printf("For example: X = 1:5\n");
printf(">> ");
fgets(str, BUFSIZE, stdin);
/*
* Evaluate input with engEvalString
*/
engEvalString(ep, str);
/*
* Echo the output from the command.
*/
printf("%s", buffer);
/*
* Get result of computation
*/
printf("\nRetrieving X...\n");
if ((result = engGetVariable(ep,"X")) == NULL)
printf("Oops! You didn't create a variable X.\n\n");
else {
printf("X is class %s\t\n", mxGetClassName(result));
}
}
/*
* We're done! Free memory, close MATLAB engine and exit.
*/
printf("Done!\n");
mxDestroyArray(result);
engClose(ep);
return EXIT_SUCCESS;
Classe w;
w.show();
return a.exec();
}
This is the error I face:
undefined reference to 'engOpen' in main.cpp

Undefined reference when a function is listed in a header file, but fine if I copy and paste code directly

I am working on a large project in C++ that is proprietary, so I can't actually share the source code. I have most of the code compiled, but there is one function, in a particular file, that is giving me quite a bit of trouble.
I've created a simple example that shows what the problem is. We have:
WTPSHORT.H
#ifndef WTP_H
#define WTP_H 1
#define VERSION "Version 2.1"
#define DATE "Nov., 2001"
/* ANSI C header files */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <fcntl.h>
#include <ctype.h>
#define TRUE 1
#define FALSE 0
/************ Data structures for Water Treatment Plant ***************/
struct Effluent { /****** Data Packet for All Unit Processes *******/
/* Operating data: */
double DegK; /* Temperature (Deg K) */
double Flow; /* Average flow (MGD) */
double Peak; /* Max hourly flow (MDG) */
/* Unit process counters: */
short cl2cnt; /* Number of times chlorine added. */
/* Measurable Water Quality Parameters: */
double pH; /* [H+]=pow(10,-pH) (-) */
/* More variable definitions go here */
double beta_br; /* Constant for chlorine to bromine reactivity ratio */
double time_step; /* Time step (hrs) */
}; /************ End of struct Effluent ************/
/*****************************************/
struct ProcessTrain { /* Control Structure for Process Train */
struct UnitProcess *head; /* First UnitProcess in ProcessTrain */
struct UnitProcess *null; /* Always NULL */
struct UnitProcess *tail; /* Last UnitProcess in ProcessTrain */
char file_name[120]; /* Full path and extension */
}; /*****************************************/
struct UnitProcess { /********** Treatment Process ***************/
struct UnitProcess *next; /* Double Linked list */
struct UnitProcess *prev; /* " " " */
short type; /* Defined unit process types */
short pad; /* Maintain 32 bit alinment of pointers */
union { /* Design and operating parameters: */
void *ptr;
struct Influent *influent;
struct Mechdbp *mechdbp; //FOR MECH MODEL
struct Alum *alum;
struct Gac *gac;
struct Filter *filter;
struct Basin *basin;
// struct Membrane *membrane;
struct Mfuf *mfuf;
struct Nf *nf;
struct Iron *iron;
struct chemical *chemical;
struct clo2 *clo2;
struct lime *lime;
/*struct WTP_effluent *wtp_effluent; No longer needed - WJS, 11/98 */
struct Avg_tap *avg_tap;
struct End_of_system *end_of_system;
} data;
struct Effluent eff;
};
struct Influent { /* Raw Water Data */
double pH; /* (-) */
double temp; /* Average temperature (C) */
double low_temp; /* Low temperature for disinfection (C) */
double toc; /* (mg/L) */
double uv254; /* (1/cm) */
double bromide; /* (mg/L) */
double alkalinity; /* (mg/L as CaCO3) */
double calcium; /* Calcium Hardness (mg/L as CaCO3) */
double hardness; /* Total Hardness (mg/L as CaCO3) */
double nh3; /* Ammonia (mg/L as N) */
double ntu; /* Turbidity */
double crypto_req; /* Crypto Log removal + Log Inact. required*/
double clo2_crypto_ct_mult; /* Multiplier */
double peak_flow; /* Peak Hourly Flow for disinfection (MGD) */
double avg_flow; /* Average Flow (MGD) */
int swflag; /* TRUE=Surface Water; FALSE=Ground Water */
// char *run_name;
};
void s1_s2_est(struct UnitProcess *unit);
/* define(s) for UnitProcess.type */
#define VACANT 0
#define INFLUENT 1
#define RAPID_MIX 2
#define SLOW_MIX 3
#define SETTLING_BASIN 4
#define FILTER 5
#define BASIN 6
#define CONTACT_TANK 7
#define CLEARWELL 8
#define O3_CONTACTOR 9
#define GAC 10
#define MFUF_UP 11
#define NF_UP 12
#endif
And then there are two source files in the project:
s1s2_est.c
/* s1s2_est.c -- December, 2000*/
#include "WTPSHORT.H"
void s1_s2_est(struct UnitProcess *unit)
{
double toc, uva, s1_0, s2h_0, s2star_0, s2t_0, s1_f, s2h_f, s2star_f, s2t_f, H;
struct Effluent *eff;
eff = &unit->eff;
/* Get these inputs */
toc = eff->TOC;
uva = eff->UV;
s1_0 = eff->s1;
s2h_0 = eff->s2h;
s2star_0 = eff->s2_star;
H = pow(10.0, -eff->pH);
s2t_0 = s2h_0 + s2star_0;
s1_f = s1_0;
s2t_f = s2t_0;
s2star_f = s2star_0;
s2h_f = s2h_0;
if(eff->s1_s2_estflag == 'C')
{
/* Safety check */
if (toc < 0.0) toc = 0.0;
if (uva < 0.0) uva = 0.0;
s1_f = 5.05 * pow(toc, 0.57) * pow(uva, 0.54);
s2t_f = 13.1 * pow(toc, 0.38) * pow(uva, 0.40);
/* No increases in the S values allowed */
if(s1_f > s1_0 ) s1_f = s1_0;
if(s2t_f > s2t_0) s2t_f = s2t_0;
/* Speciate S2 */
s2h_f = s2t_f * eff->k21r * H / (eff->k21f + eff->k21r * H);
s2star_f = s2t_f * eff->k21f / (eff->k21f + eff->k21r * H);
}
if(eff->s1_s2_estflag != 'C' && unit->type == INFLUENT)
{/* Speciate S2 in raw water*/
s2h_f = s2t_f * eff->k21r * H / (eff->k21f + eff->k21r * H);
s2star_f = s2t_f * eff->k21f / (eff->k21f + eff->k21r * H);
}
/* Update Effluent data structure */
eff->s1 = s1_f;
eff->s2h = s2h_f;
eff->s2_star = s2star_f;
}/* End subroutine "s1_s2_est()"*/
and then
main.cpp
#include <stdio.h>
#include "WTPSHORT.H"
int main(int argc, char **argv)
{
UnitProcess *myunit;
s1_s2_est(myunit);
printf("done\n");
return 0;
}
When compiling and linking I see this error:
C:\WINDOWS\system32\cmd.exe /C "C:/MinGW/bin/mingw32-make.exe -j8 SHELL=cmd.exe -e -f Makefile"
"----------Building project:[ simple - Debug ]----------"
mingw32-make.exe[1]: Entering directory 'C:/Users/joka0958/Desktop/wtp/compiledwtp/simple'
C:/MinGW/bin/g++.exe -c "C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/main.cpp" -g -O0 -Wall -o ./Debug/main.cpp.o -I. -I.
C:/MinGW/bin/gcc.exe -c "C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/s1s2_est.c" -g -O0 -Wall -o ./Debug/s1s2_est.c.o -I. -I.
C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/main.cpp: In function 'int main(int, char**)':
C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/main.cpp:7:22: warning: 'myunit' is used uninitialized in this function [-Wuninitialized]
s1_s2_est(myunit);
^
C:/MinGW/bin/g++.exe -o ./Debug/simple #"simple.txt" -L.
./Debug/main.cpp.o: In function `main':
C:/Users/joka0958/Desktop/wtp/compiledwtp/simple/main.cpp:7: undefined reference to `s1_s2_est(UnitProcess*)'
collect2.exe: error: ld returned 1 exit status
mingw32-make.exe[1]: [Debug/simple] Error 1
mingw32-make.exe: [All] Error 2
simple.mk:78: recipe for target 'Debug/simple' failed
mingw32-make.exe[1]: Leaving directory 'C:/Users/joka0958/Desktop/wtp/compiledwtp/simple'
Makefile:4: recipe for target 'All' failed
2 errors, 1 warnings
So the question is: Why am I getting an undefined reference?
I realize this is one of those errors that probably masks another problem, but I have really exhausted all possibilities, in my mind, of what could be causing the problem. Note that this is part of a larger project where many other functions compile and link properly.
By the way I am using Codelite with the MinGW compiler on Windows 10.

I'm sorry for all the consternation this question caused. It turns out that there was C++-specific functions within this file, but because the file was named with a *.c, codelite defaulted to the C compiler to actually compile this particular source file. Once I changed the filename, the code compiled successfully. Thanks for all of your useful suggestions!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Intel PARDISO factorization slower when linking dynamically - c++

You have turned on 0 optimisations. A lot of factors would then contribute to runtime behaviour that are not at all related to MKL / dynamic vs static linking etc. What happens, when you compile above code with -O2 -g -march=native, -O3 -march=native? The differences should go away entirely.

Related

How can I create an executable to run a kernel in a given PTX file?

Could sqlite3 generate non-deterministic errors?

C++ Linker error undefined reference [duplicate]

Adding libengine.lib library from Matlab to QT

Undefined reference when a function is listed in a header file, but fine if I copy and paste code directly

Categories

Resources