Profile OpenMP Code With Intel VTune XE - c++

I have a really weird bug in my code where when running in the Intel XE profiler, it crashes out before the OpenMP section.
If I run the command:
./sphere_benchmark -i ../sphere_team6_dealmesh.prm -m 2 -h 1 -p 3
top shows 100% CPU usage in a small serial section, spinning up to 1200% when it drops into the OpenMP block.
However if I run the command:
amplxe-cl -collect concurrency -- ./sphere_benchmark -i ../sphere_team6_dealmesh.prm -m 2 -h 1 -p 3
I see the same 100% serial section, but then it stops, and claims there were no profiled regions, as it's looking to profile concurrency. If I change the metric to 'advanced-hotspots', it does profile the code, but only the lines prior to the OpenMP section! (Note - I am using absolute paths in reality, omitted here for brevity).
However if my application is crashing before the OpenMP section, which I assume it is, it must be a very clean crash as I get no error, no warning, nothing untoward going on even specifying -vv to amplxe-cl. And it runs fine when not being profiled.
The program I am using relies on the dealii library, which has been built from source using icc 16.0.1, and has all of it's many dependencies built from source with the same.
My compile command looks a bit like:
mpiicpc -DDEBUG -DTBB_DO_ASSERT=1 -DTBB_IMPLEMENT_CPP0X=1 -DTBB_USE_DEBUG -qopenmp -I<incs> -fpic -ansi -w2 -wd68 -wd135 -wd175 -wd177 -wd191 -wd193 -wd279 -wd327 -wd383 -wd981 -wd1418 -wd1478 -wd1572 -wd2259 -wd21 -wd2536 -wd15531 -wd111 -wd128 -wd185 -wd280 -qopenmp-simd -std=c++11 -Wno-return-type -Wno-parentheses -O0 -g -gdwarf-2 -grecord-gcc-switches -o src/curlfunction.cc.o -c src/curlfunction.cc
and my link:
mpiicpc -qopenmp -shared-intel -rdynamic -qopenmp <objs> -o sphere_benchmark -rdynamic <libs> -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lmpifort -lmpi -Wl,-Bstatic -lmpigi -Wl,-Bdynamic -ldl -lrt -lpthread <libs> -Wl,-<libs>
I've tried to trim these down to make them vaugley readable whilst still including everything.
Does anyone with experience of Intel XE have any suggestions? I've changed my depreacted -openmp flag to -qopenmp and according to the documentation it really shouldn't be this hard.

Related

Getting Arduino IDE to compile for C++14

I've been looking to modify the build flags under Arduino's IDE 1.x, or even the Arduino CLI (which I haven't used but am willing to adopt) such that I can undefine -std=gnu++11 and instead define -std=gnu++14
I found a question related to this which gives me almost what I need:
Arduino 1.0.6: How to change compiler flag?
But it only shows how to add flags, not to remove them. I found another related post about changing arduino to GNU C++17 but the answer was it's not possible.
In this case, I know it's possible, as I do it in Platform IO in order to use the htcw_gfx library. It works great on most platforms that will run GFX reasonably anyway.
But I just don't know how to fiddle with Arduino to get it to dance the way I need to.
Any help would be greatly appreciated.
You can modify the default compile flags in the hardware/arduino/avr/platform.txt file.
$ grep -n "std" hardware/arduino/avr/platform.txt
23:compiler.c.flags=-c -g -Os {compiler.warning_flags} -std=gnu11 -ffunction-sections -fdata-sections -MMD -flto -fno-fat-lto-objects
28:compiler.cpp.flags=-c -g -Os {compiler.warning_flags} -std=gnu++11 -fpermissive -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -Wno-error=narrowing -MMD -flto
For some linux systems the following would work to automatically do this:
dirname $(realpath $(which arduino)) | xargs -I{} sed -i "s/\(std=gnu++1\)1/\14/" {}/hardware/arduino/avr/platform.txt
But this is not very portable, and will not work if the user has installed Arduino with Snap (as snap has these files mounted RO).
Sources:
https://stackoverflow.com/a/28047811/6946577
https://stackoverflow.com/a/55254754/6946577

Unable to use gdb with hdf5 c++ application

I am trying to use gdb to debug an hdf5 C++ application that I have written. The h5 package that I am using was installed using conda. The command that I am using is:
h5c++ hdf5.cpp
This generates an executable which I then run with gdb as follows:
gdb a.out
gdb launches alright. But when I add breakpoint using:
b 10
or any line number, it gives a message: No line 10 in file "init.c"
When I press run, it runs the whole program at once (which I don't want) and exits. The h5c++ -show command gives the following output:
x86_64-conda_cos6-linux-gnu-c++ -I/i3c/hpcl/sms821/software/tensorflow/anaconda2/include -D_FORTIFY_SOURCE=2 -O2 -g -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -I/i3c/hpcl/sms821/software/tensorflow/anaconda2/include -fdebug-prefix-map==/usr/local/src/conda/- -fdebug-prefix-map==/usr/local/src/conda-prefix -L/i3c/hpcl/sms821/software/tensorflow/anaconda2/lib /i3c/hpcl/sms821/software/tensorflow/anaconda2/lib/libhdf5_hl_cpp.a /i3c/hpcl/sms821/software/tensorflow/anaconda2/lib/libhdf5_cpp.a /i3c/hpcl/sms821/software/tensorflow/anaconda2/lib/libhdf5_hl.a /i3c/hpcl/sms821/software/tensorflow/anaconda2/lib/libhdf5.a -L/i3c/hpcl/sms821/software/tensorflow/anaconda2/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,-rpath,/i3c/hpcl/sms821/software/tensorflow/anaconda2/lib -L/i3c/hpcl/sms821/software/tensorflow/anaconda2/lib -g -lrt -lpthread -lz -ldl -lm -Wl,-rpath -Wl,/i3c/hpcl/sms821/software/tensorflow/anaconda2/lib
I think this has to do with the compiler the compiler that it is using. I tried replacing x86_64-conda_cos6-linux-gnu-c++ with my native g++ compiler in the h5c++ script but that gives linker error.
Please suggest how should make my h5 application work with gdb. Should I install hdf5 from source since I don't have sudo access? I am working on a Linux machine.
I simply installed hdf5 from the source files. While configuring the installation I turned the --enable-build-mode and --enable-symbol switches. Hdf5 has a dependency on szip which I also installed from source code. My exact configuration was as follows:
./configure --prefix=<hdf5 install directory> --enable-cxx --enable-build-mode=debug --enable-symbols=yes --enable-profiling=yes --with-szlib=<szip install directory>
The above solution worked and I was able to compile my h5 application using h5c++ hdf5.cpp and also use gdb to debug it.

Suddenly getting maxrregcount warnings and undefined reference errors when linking

I maintain the C+=-flavored CUDA API wrappers library. The library's current commit is relatively-well-tested, with some example programs and quite a few users. However, sometime very recently (can't say exactly when), and without committing anything new, I now get NVCC warnings during the "dlink" phase of my example programs, e.g.:
/path/to/nvcc /path/to/cuda-api-wrappers/examples/modified_cuda_samples/vectorAdd/vectorAdd.cu -dc -o /path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o -ccbin /opt/gcc-5.4.0/bin/gcc -m64 -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -DNVCC -I/path/to/cuda/include -I/path/to/cuda-api-wrappers/src
/path/to/nvcc -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -m64 -ccbin /opt/gcc-5.4.0/bin/gcc -dlink /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o /path/to/cuda/lib64/libcublas_device.a -o /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/./vectorAdd_intermediate_link.o
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
... this repeats many times ...
but the dlink face does conclude. This is already strange, since I haven't explicitly used any beta features.
/opt/gcc-5.4.0/bin/g++ -Wall -Wpedantic -O2 -DNDEBUG -L/path/to/cuda/lib64 -rdynamic CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/vectorAdd_generated_vectorAdd.cu.o CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o -o examples/bin/vectorAdd lib/libcuda-api-wrappers.a -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -ldl -lrt -lnvToolsExt -Wl,-Bstatic -lcudadevrt -Wl,-Bdynamic
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_cublas_compute_70_cpp1_ii_f0559976':
link.stub:(.text+0xe0): undefined reference to `__fatbinwrap_25_cublas_compute_70_cpp1_ii_f0559976'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_xerbla_compute_70_cpp1_ii_cd7f3ad3':
link.stub:(.text+0x190): undefined reference to `__fatbinwrap_25_xerbla_compute_70_cpp1_ii_cd7f3ad3'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_23_nrm2_compute_70_cpp1_ii_8edbce95':
link.stub:(.text+0x240): undefined reference to `__fatbinwrap_23_nrm2_compute_70_cpp1_ii_8edbce95'
... more udnefined reference errors here ...
My question: Why would this happen and how do I circumvent/avoid/resolve it?
Notes:
I'm using separable compilation
I'm getting these specific errors with CUDA 9.1 and a SM 5.2 device (no 7.0).
The CMakeLists.txt is here.
I'm obviously clearing CMakeCache.txt before building.
This has happened to me both on a GNU/Linux Mint 18.3 and Fedora 26. On the first machine there have been some apt-get dist-upgrade's done, and now GCC is up to version 5.5.0, in case that matters. On the second machine - there really has been no change that I'm aware of; same compiler and CUDA version.
A partial answer / workaround:
This issue only seems to occur when libcublas is involved. If I remove /path/to/cuda/lib64/libcublas_device.a from the -dlink phase command-line, all warnings and errors go away (including from later stages). And in fact, my wrapper library is oblivious of cublas, not sure why CMake is adding it; it's not in $CUDA_LIBRARIES. See also:
Why does CMake force the use of libcublas with separable compilation?

Running Eigen's sparse CG solver multi-threaded

I have Matlab code that uses sparse and '\' as solver for a linear system. I have hand tailored a C++ function that uses the conjage gradient sparse solver from Eigen in order to run the code outside Matlab using the coder toolbox to export the rest of the Matlab code. I export a static library and I'm able to compile and execute it on my remote system without any problem. However, I'm not able to run the code using multi-threading. I have tried to export it as a Matlab executable (mex) and the whole code runs in parallel without problem inside Matlab.
So my conclusions are that it must be something different in the compiler/linker flags on my remote system. I use -fopenmp in both complier and linker and I run the executable with OMP_NUM_THREADS=n and if I read out "n" inside my program I get the same number as I have in my execution.
My question is, do I need to include anything else in my compiler/linker, apart from needed things related to my particular code, in order to get Eigen to run multi-threaded?
UPDATE:
On the remote system I do:
g++ -c -m64 -fopenmp -std=c++11 -I /usr/local/include/Eigen/src/misc/
~/src/myHandTailoredFile.cpp -o ~/src/myHandTailoredFile.o
and with linker options
-fopenmp -L /usr/local/lib64/ -llapack -L /usr/local/lib/ -lcurl
To compile my hand tailored file together with myBigProgram into a Mex-file I do
g++ -DHAVE_LAPACK_CONFIG_H -DLAPACK_COMPLEX_STRUCTURE -DMW_HAVE_LAPACK_DECLS -c -ansi -fexceptions -fPIC -fno-omit-frame-pointer -pthread -D_GNU_SOURCE -DMATLAB_MEX_FILE -std=c++0x -fopenmp -DOMPLIBNAME="\"/usr/local/MATLAB/R2016a/sys/os/glnxa64/libiomp5.so\"" -O -DNDEBUG -I "/usr/local/MATLAB/R2016a/simulink/include" -I "/usr/local/MATLAB/R2016a/toolbox/shared/simtargets" -I "./interface" -I "/usr/local/lib" -I "/usr/local/MATLAB/R2016a/extern/include" -I "." "~/src/myHandTailoredFile.cpp"
with linker options set to
-pthread -Wl,--no-undefined -Wl,-rpath-link,/usr/local/MATLAB/R2016a/bin/glnxa64 -shared -L/usr/local/MATLAB/R2016a/bin/glnxa64 -lmx -lmex -lmat -lm -lstdc++ -lcurl -fPIC -L/usr/local/MATLAB/R2016a/sys/os/glnxa64 -liomp5 -o myBigProgram_mex.mexa64 -L"/usr/local/MATLAB/R2016a/bin/glnxa64" -lmwblas -lmwlapack -lemlrt -lcovrt -lut -lmwmathutil
Note that the compiler and linker options for the later is completely defined by Matlab.

Armadillo issue in ubuntu

I have been writing a c++ program in Ubuntu and window8 using armadillo. Under Windows8 the program compiles without problems.
The program is just using the linear systems solver.
Under Ubuntu the compiler says
"reference to `wrapper_dgels_' not defined"
The compiler line I use is:
mpic++ -O2 -std=c++11 -Wall -fexceptions -O2 -larmadillo -llapack -lblas program.o
However, right before the error I see:
g++ module_of_the_error.o
Which is something I haven't set.
I am using code blocks in Ubuntu, and I compiled armadillo with all the libraries that cmake asked. (BLAS< LAPACK, OpenBLAS, HDF5, ARPACK, etc)
I have no clue what might be causing the problem, since the exact same code compiles in visual studio.I have tried the compiler line modifications suggested but it does not seem to work.
Any help is appreciated.
This is one trap I fell into myself one time. You will not like the likely cause of your error.
The order of the arguments to the linker matters.
Instead of
mpic++ -O2 -std=c++11 -Wall -fexceptions -O2 -larmadillo -llapack -lblas program.o
try:
mpic++ -O2 -std=c++11 -Wall -fexceptions -O2 program.o -larmadillo -llapack -lblas
I.e., put the object files to be linked into the executable before the libraries.
By the way, at this stage you are only linking files that have already been compiled. It is not necessary to repeat command line options that are only relevant for compiling. So this will be equivalent:
mpic++ program.o -larmadillo -llapack -lblas
Moreover, depending on how you installed Armadillo, you are adding either one or two superfluous libraries in that line. One of the following should be enough:
mpic++ program.o -larmadillo
or
mpic++ program.o -llapack -lblas
EDIT: as the answer by rerx states, the problem is probably just a simple ordering of the switches/arguments supplied to g++. All the -l switches need to be after the -o switch. Or in other words, put the -o switch before any -l switches. For example:
g++ prog.cpp -o prog -O3 -larmadillo
original answer:
Looks like your compiler can't find the Armadillo run-time library. The proper solution is to specify the path for armadillo run-time library using the -L switch. For example, g++ -O2 blah.cpp -o blah -L /usr/local/lib/ -larmadillo
Another possible solution is to define ARMA_DONT_USE_WRAPPER before including the armadillo header, and then directly link with LAPACK and BLAS. For example:
#define ARMA_DONT_USE_WRAPPER
#include <armadillo>
More details are available at the Armadillo frequently asked questions page.