c++ program behaving different when profiling - c++

I would like to profile some c++ code using gprof. I compile the program exactly like normal but with -pg added at the end; i.e. something like
g++ prog.cpp $(OBJECTS) -lgmp -lgmpxx -lmpfr -lmpc -msse2 -std=c++11 -O2 -o prog_P -pg
However when I run the resulting executable I get a bunch of errors that are not normally there. Specifically they are from the zkcm multiprecision library:
Warning: in zkcm_gauss::gauss, partial pivoting failed.
This is bad news for my LU decomposition. Any ideas?
EDIT: I use cygwin

Related

How do retain debug symbols when profiling C++ code on Windows?

I'm trying to profile a C++ shared library on Windows 10, in order to find which lines the program is spending most time on. (The code happens to form part of an R package.)
I've previously used
AMD µprof and
Very Sleepy. However, I'm now having trouble compiling the code: all these profilers show is which DLL is being used, rather than which function / line.
I suspect that the problem relates to debugging symbol tables being missing. Per
Enabling debug symbols in shared library using GCC, I've ensured that a -g flag is applied when compiling each file, and that there is no -s flag at the linker stage. What else do I need to do to allow µprof / Very Sleepy to tell me which lines of the code are proving a bottleneck?
Detailed compilation notes
I'm using RBuildTools MinGW-w64 v3 g++ 8.3.0 to compile the code on 64-bit Windows 10.
Here are some sample compile commands, which are being generated by R, using Makevars / Makeconf templates.
g++ -std=gnu++14 -I"<<include paths>>" -DNDEBUG -g -O2 -Wall
-mfpmath=sse -msse2 -mstackrealign
-c source_file.cpp -o source_file.o
g++ -shared -static-libgcc -g -Og
-o PackageName.dll tmp.def source_file.o <<Other files>>
-L<<Library paths>>
I've also tried replacing -g with -gdwarf-2 -g3, and adding -fno-omit-frame-pointer, per Very Sleepy doesn't see function names when capturing MinGW compiled file.
Running without shared library
ssbssa suggested running against a simple executable.
I tried:
#include <chrono>
#include <thread>
#include <iostream>
long sumto(long n) {
if(n > 0) {
std::this_thread::sleep_for(std::chrono::milliseconds(1));
return n + sumto(n - 1);
}
return 1;
}
int main() {
std::cout << sumto(1000) << std::endl;
return 0;
}
>"C:/RBuildTools/4.0/mingw64/bin/"g++ -std=gnu++14 -gdwarf-2 -g3 -Og -c test.cpp
>"C:/RBuildTools/4.0/mingw64/bin/"g++ -std=gnu++14 -gdwarf-2 -g3 -Og -o test test.o
test.exe runs as expected. When I profile test.exe, AMD µprof states "The raw file has no data!", whereas VerySleepy does detect activity in sumto and displays the associated source code.

Suddenly getting maxrregcount warnings and undefined reference errors when linking

I maintain the C+=-flavored CUDA API wrappers library. The library's current commit is relatively-well-tested, with some example programs and quite a few users. However, sometime very recently (can't say exactly when), and without committing anything new, I now get NVCC warnings during the "dlink" phase of my example programs, e.g.:
/path/to/nvcc /path/to/cuda-api-wrappers/examples/modified_cuda_samples/vectorAdd/vectorAdd.cu -dc -o /path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o -ccbin /opt/gcc-5.4.0/bin/gcc -m64 -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -DNVCC -I/path/to/cuda/include -I/path/to/cuda-api-wrappers/src
/path/to/nvcc -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -m64 -ccbin /opt/gcc-5.4.0/bin/gcc -dlink /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o /path/to/cuda/lib64/libcublas_device.a -o /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/./vectorAdd_intermediate_link.o
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
... this repeats many times ...
but the dlink face does conclude. This is already strange, since I haven't explicitly used any beta features.
/opt/gcc-5.4.0/bin/g++ -Wall -Wpedantic -O2 -DNDEBUG -L/path/to/cuda/lib64 -rdynamic CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/vectorAdd_generated_vectorAdd.cu.o CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o -o examples/bin/vectorAdd lib/libcuda-api-wrappers.a -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -ldl -lrt -lnvToolsExt -Wl,-Bstatic -lcudadevrt -Wl,-Bdynamic
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_cublas_compute_70_cpp1_ii_f0559976':
link.stub:(.text+0xe0): undefined reference to `__fatbinwrap_25_cublas_compute_70_cpp1_ii_f0559976'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_xerbla_compute_70_cpp1_ii_cd7f3ad3':
link.stub:(.text+0x190): undefined reference to `__fatbinwrap_25_xerbla_compute_70_cpp1_ii_cd7f3ad3'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_23_nrm2_compute_70_cpp1_ii_8edbce95':
link.stub:(.text+0x240): undefined reference to `__fatbinwrap_23_nrm2_compute_70_cpp1_ii_8edbce95'
... more udnefined reference errors here ...
My question: Why would this happen and how do I circumvent/avoid/resolve it?
Notes:
I'm using separable compilation
I'm getting these specific errors with CUDA 9.1 and a SM 5.2 device (no 7.0).
The CMakeLists.txt is here.
I'm obviously clearing CMakeCache.txt before building.
This has happened to me both on a GNU/Linux Mint 18.3 and Fedora 26. On the first machine there have been some apt-get dist-upgrade's done, and now GCC is up to version 5.5.0, in case that matters. On the second machine - there really has been no change that I'm aware of; same compiler and CUDA version.
A partial answer / workaround:
This issue only seems to occur when libcublas is involved. If I remove /path/to/cuda/lib64/libcublas_device.a from the -dlink phase command-line, all warnings and errors go away (including from later stages). And in fact, my wrapper library is oblivious of cublas, not sure why CMake is adding it; it's not in $CUDA_LIBRARIES. See also:
Why does CMake force the use of libcublas with separable compilation?

Getting Cplex example to run: Undefined references

I am trying to get the Cplex basic LP example to work. The code can be found here. I am completely new with c++, but hope to be able to get this running.
I am trying to compile it on linux. I am using the following command to run it
g++ -D IL_STD -I /opt/ibm/ILOG/CPLEX_Studio1271/opl/include ilolpex1.cpp
The -D IL_STD was put there to solve an error as found here. The -I ... was put there to specify the location of the header files. I came up with this myself after a lot of trying and googling, so i am in no way sure this is correct.
Anyway, i when i run it i get errors of undefined references:
/tmp/ccl9O1YF.o: In function `populatebyrow(IloModel, IloNumVarArray, IloRangeArray)':
ilolpex1.cpp:(.text+0x18f): undefined reference to `IloNumVar::IloNumVar(IloEnv, double, double, IloNumVar::Type, char const*)'
I did not make any changes in the file, so i assume the only thing which can be wrong is how the files are linked. I have the feeling it probably just is a simple setting, but after hours of looking i still have no idea how to fix it.
The easiest way to compile the ilolpex1.cpp example is to use the Makefile that is included with the installation. For example, you should do the following:
$ cd /opt/ibm/ILOG/CPLEX_Studio1271/cplex/examples/x86-64_linux/static_pic
$ make ilolpex1
This will produce output, like the following:
g++ -O0 -c -m64 -O -fPIC -fno-strict-aliasing -fexceptions -DNDEBUG -DIL_STD -I../../../include -I../../../../concert/include ../../../examples/src/cpp/ilolpex1.cpp -o ilolpex1.o
g++ -O0 -m64 -O -fPIC -fno-strict-aliasing -fexceptions -DNDEBUG -DIL_STD -I../../../include -I../../../../concert/include -L../../../lib/x86-64_linux/static_pic -L../../../../concert/lib/x86-64_linux/static_pic -o ilolpex1 ilolpex1.o -lconcert -lilocplex -lcplex -lm -lpthread
This will tell you everything you'll need to know if you choose to compile your own application by hand in the future. The details about this are described in the documentation (e.g., here).
Obviously, the iloplex1.cpp file is just a demo how to use IloCplex.
What you yet need is IloCplex itself. This should come either as (a) further source file(s) you have to compile with the demo or as a library you link against.
Have a look at your cplex directories, you might find a lib[...].a file somewhere there, possibly in /opt/ibm/ILOG/CPLEX_Studio1271/opl/lib.
You can link against using GCC's (clang's) -l and -L options. Be aware that when using -l, you leave out lib and .a (-l [...] with above (invalid) sample name).

Running Eigen's sparse CG solver multi-threaded

I have Matlab code that uses sparse and '\' as solver for a linear system. I have hand tailored a C++ function that uses the conjage gradient sparse solver from Eigen in order to run the code outside Matlab using the coder toolbox to export the rest of the Matlab code. I export a static library and I'm able to compile and execute it on my remote system without any problem. However, I'm not able to run the code using multi-threading. I have tried to export it as a Matlab executable (mex) and the whole code runs in parallel without problem inside Matlab.
So my conclusions are that it must be something different in the compiler/linker flags on my remote system. I use -fopenmp in both complier and linker and I run the executable with OMP_NUM_THREADS=n and if I read out "n" inside my program I get the same number as I have in my execution.
My question is, do I need to include anything else in my compiler/linker, apart from needed things related to my particular code, in order to get Eigen to run multi-threaded?
UPDATE:
On the remote system I do:
g++ -c -m64 -fopenmp -std=c++11 -I /usr/local/include/Eigen/src/misc/
~/src/myHandTailoredFile.cpp -o ~/src/myHandTailoredFile.o
and with linker options
-fopenmp -L /usr/local/lib64/ -llapack -L /usr/local/lib/ -lcurl
To compile my hand tailored file together with myBigProgram into a Mex-file I do
g++ -DHAVE_LAPACK_CONFIG_H -DLAPACK_COMPLEX_STRUCTURE -DMW_HAVE_LAPACK_DECLS -c -ansi -fexceptions -fPIC -fno-omit-frame-pointer -pthread -D_GNU_SOURCE -DMATLAB_MEX_FILE -std=c++0x -fopenmp -DOMPLIBNAME="\"/usr/local/MATLAB/R2016a/sys/os/glnxa64/libiomp5.so\"" -O -DNDEBUG -I "/usr/local/MATLAB/R2016a/simulink/include" -I "/usr/local/MATLAB/R2016a/toolbox/shared/simtargets" -I "./interface" -I "/usr/local/lib" -I "/usr/local/MATLAB/R2016a/extern/include" -I "." "~/src/myHandTailoredFile.cpp"
with linker options set to
-pthread -Wl,--no-undefined -Wl,-rpath-link,/usr/local/MATLAB/R2016a/bin/glnxa64 -shared -L/usr/local/MATLAB/R2016a/bin/glnxa64 -lmx -lmex -lmat -lm -lstdc++ -lcurl -fPIC -L/usr/local/MATLAB/R2016a/sys/os/glnxa64 -liomp5 -o myBigProgram_mex.mexa64 -L"/usr/local/MATLAB/R2016a/bin/glnxa64" -lmwblas -lmwlapack -lemlrt -lcovrt -lut -lmwmathutil
Note that the compiler and linker options for the later is completely defined by Matlab.

Armadillo issue in ubuntu

I have been writing a c++ program in Ubuntu and window8 using armadillo. Under Windows8 the program compiles without problems.
The program is just using the linear systems solver.
Under Ubuntu the compiler says
"reference to `wrapper_dgels_' not defined"
The compiler line I use is:
mpic++ -O2 -std=c++11 -Wall -fexceptions -O2 -larmadillo -llapack -lblas program.o
However, right before the error I see:
g++ module_of_the_error.o
Which is something I haven't set.
I am using code blocks in Ubuntu, and I compiled armadillo with all the libraries that cmake asked. (BLAS< LAPACK, OpenBLAS, HDF5, ARPACK, etc)
I have no clue what might be causing the problem, since the exact same code compiles in visual studio.I have tried the compiler line modifications suggested but it does not seem to work.
Any help is appreciated.
This is one trap I fell into myself one time. You will not like the likely cause of your error.
The order of the arguments to the linker matters.
Instead of
mpic++ -O2 -std=c++11 -Wall -fexceptions -O2 -larmadillo -llapack -lblas program.o
try:
mpic++ -O2 -std=c++11 -Wall -fexceptions -O2 program.o -larmadillo -llapack -lblas
I.e., put the object files to be linked into the executable before the libraries.
By the way, at this stage you are only linking files that have already been compiled. It is not necessary to repeat command line options that are only relevant for compiling. So this will be equivalent:
mpic++ program.o -larmadillo -llapack -lblas
Moreover, depending on how you installed Armadillo, you are adding either one or two superfluous libraries in that line. One of the following should be enough:
mpic++ program.o -larmadillo
or
mpic++ program.o -llapack -lblas
EDIT: as the answer by rerx states, the problem is probably just a simple ordering of the switches/arguments supplied to g++. All the -l switches need to be after the -o switch. Or in other words, put the -o switch before any -l switches. For example:
g++ prog.cpp -o prog -O3 -larmadillo
original answer:
Looks like your compiler can't find the Armadillo run-time library. The proper solution is to specify the path for armadillo run-time library using the -L switch. For example, g++ -O2 blah.cpp -o blah -L /usr/local/lib/ -larmadillo
Another possible solution is to define ARMA_DONT_USE_WRAPPER before including the armadillo header, and then directly link with LAPACK and BLAS. For example:
#define ARMA_DONT_USE_WRAPPER
#include <armadillo>
More details are available at the Armadillo frequently asked questions page.