Suddenly getting maxrregcount warnings and undefined reference errors when linking - c++

I maintain the C+=-flavored CUDA API wrappers library. The library's current commit is relatively-well-tested, with some example programs and quite a few users. However, sometime very recently (can't say exactly when), and without committing anything new, I now get NVCC warnings during the "dlink" phase of my example programs, e.g.:
/path/to/nvcc /path/to/cuda-api-wrappers/examples/modified_cuda_samples/vectorAdd/vectorAdd.cu -dc -o /path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o -ccbin /opt/gcc-5.4.0/bin/gcc -m64 -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -DNVCC -I/path/to/cuda/include -I/path/to/cuda-api-wrappers/src
/path/to/nvcc -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -m64 -ccbin /opt/gcc-5.4.0/bin/gcc -dlink /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o /path/to/cuda/lib64/libcublas_device.a -o /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/./vectorAdd_intermediate_link.o
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
... this repeats many times ...
but the dlink face does conclude. This is already strange, since I haven't explicitly used any beta features.
/opt/gcc-5.4.0/bin/g++ -Wall -Wpedantic -O2 -DNDEBUG -L/path/to/cuda/lib64 -rdynamic CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/vectorAdd_generated_vectorAdd.cu.o CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o -o examples/bin/vectorAdd lib/libcuda-api-wrappers.a -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -ldl -lrt -lnvToolsExt -Wl,-Bstatic -lcudadevrt -Wl,-Bdynamic
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_cublas_compute_70_cpp1_ii_f0559976':
link.stub:(.text+0xe0): undefined reference to `__fatbinwrap_25_cublas_compute_70_cpp1_ii_f0559976'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_xerbla_compute_70_cpp1_ii_cd7f3ad3':
link.stub:(.text+0x190): undefined reference to `__fatbinwrap_25_xerbla_compute_70_cpp1_ii_cd7f3ad3'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_23_nrm2_compute_70_cpp1_ii_8edbce95':
link.stub:(.text+0x240): undefined reference to `__fatbinwrap_23_nrm2_compute_70_cpp1_ii_8edbce95'
... more udnefined reference errors here ...
My question: Why would this happen and how do I circumvent/avoid/resolve it?
Notes:
I'm using separable compilation
I'm getting these specific errors with CUDA 9.1 and a SM 5.2 device (no 7.0).
The CMakeLists.txt is here.
I'm obviously clearing CMakeCache.txt before building.
This has happened to me both on a GNU/Linux Mint 18.3 and Fedora 26. On the first machine there have been some apt-get dist-upgrade's done, and now GCC is up to version 5.5.0, in case that matters. On the second machine - there really has been no change that I'm aware of; same compiler and CUDA version.

A partial answer / workaround:
This issue only seems to occur when libcublas is involved. If I remove /path/to/cuda/lib64/libcublas_device.a from the -dlink phase command-line, all warnings and errors go away (including from later stages). And in fact, my wrapper library is oblivious of cublas, not sure why CMake is adding it; it's not in $CUDA_LIBRARIES. See also:
Why does CMake force the use of libcublas with separable compilation?

Related

Compilation failing on EnableABIBreakingChecks

I recently installed LLVM v8.0.0 (on RHEL 7.4). I am going through the LLVM Kaleidoscope tutorial to learn how to use the system, but am running into an issue linking.
Per the tutorial (end of chapter 2), I run:
clang++ -g -O3 kld.cpp `llvm-config --cxxflags` -o kld
It compiles, but the linker fails on:
/tmp/kld-f7264f.o:(.data+0x0): undefined reference to `llvm::EnableABIBreakingChecks'
clang-8: error: linker command failed with exit code 1 (use -v to see invocation)
I suspected this may have been an issue with llvm-config, so I also tried using the --ldflags and --system-libs flags, but no luck.
llvm-config --cxxflags gives (reformatted for readability)
-I~/project/llvm-src/include -I~/project/llvm-build/include
-fPIC -fvisibility-inlines-hidden
-std=c++11
-Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -pedantic -Wno-long-long
-Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
-g
-fno-exceptions -fno-rtti
-D_GNU_SOURCE -D_DEBUG -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
Where ~/... is just the path to my home directory (edited for privacy; the actual output is a fullpath). I am working on a shared system that requires I install new software locally.
The tutorial code never references ABI explicitly, so I assume this must be some kind of compiler-flags issue. greping for the missing symbol in non-binary files gives an extern declaration in include/llvm/Config/abi-breaking.h and the real declaration in lib/Support/Error.cpp:
#if LLVM_ENABLE_ABI_BREAKING_CHECKS
int EnableABIBreakingChecks;
#else
int DisableABIBreakingChecks;
#endif
I thought I would try re-compiling, then, with -DLLVM_ENABLE_ABI_BREAKING_CHECKS. That also does that work.
I'm not really clear what the ABI breaking checks are doing in the first place, and this may be way over my C++ comfort level. But how can I silence this error, if I don't need the referenced functionality; or fix it, if i do?
Thanks.
Turns out the answer was hidden in abi-breaking.h:
/* Allow selectively disabling link-time mismatch checking so that header-only
ADT content from LLVM can be used without linking libSupport. */
#if !LLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING
I'm not sure if I'll need libSupport down the line, but compiling with LLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING=1 works for the time being.
Add linkage with LLVMSupport library will also solve this problem.
With this CMake snippet:
find_package(LLVM REQUIRED CONFIG)
message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}")
message(STATUS "Using LLVMConfig.cmake in: ${LLVM_DIR}")
add_executable(main main.cpp)
target_link_libraries(main LLVMSupport)
Based on the discussion in llvm irc channel.
Try the following command to compile : clang++ -O3 -c $(llvm-config --cxxflags) source_file.cpp -o obj_code.
Then try linking with this command : clang++ obj_code $(llvm-config --ldflags --libs) -lpthread.
I think linking part doesn't mentioned in the kaleidoscope section. The above solution worked for me.
I don't know the influence and reason why it works. But when I add -DLLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING, the errors disappear.
clang++ xxx -DLLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING
ref:
https://www.coder.work/article/6278120
https://blog.csdn.net/qq_37887537/article/details/112790961

Undefined reference to boost::system::detail::system_category_instance that gets fixed when switching from C++14 to C++11

I am trying to build my C++ application that uses boost 1.68.0. On trying to build it using cmake followed by make, I get the following linking errors,
/usr/local/bin/g++ -Wall -Wextra -g3 -std=c++14 -Wl,-rpath=/usr/local/lib -L/usr/local/lib CMakeFiles/Supervisor.dir/HeartbeatManager.cpp.o CMakeFiles/Supervisor.dir/JobReceiver.cpp.o CMakeFiles/Supervisor.dir/ResultSender.cpp.o CMakeFiles/Supervisor.dir/Supervisor.cpp.o CMakeFiles/Supervisor.dir/Process.cpp.o -o Supervisor -rdynamic -lpthread -lboost_system-mt
CMakeFiles/Supervisor.dir/HeartbeatManager.cpp.o: In function `boost::system::system_category()':
/usr/local/include/boost/system/error_code.hpp:473: undefined reference to `boost::system::detail::system_category_instance'
On switching the -std=c++14 flag with -std=c++11, the error disappears. I got the idea from this answer. I do not know why that fixes it. Now in my project I cannot use -std=c++11 flag instead of the -std=c++14 flag.
You'll have to recompile boost specifying cxxstd=14.

GCC with -std=c++11 does not see C++ header files (via PyDSTool)

I am on a complex project built in python2.7 that uses the PyDSTool package for analysis of dynamical system. PyDSTool provides two C-based integrators - Radau and Dopri - which I want to use to integrate my system of equations whose source is coded in a bunch of C/C++ files.
I have little control on the package, and when I instantiate the integrator, I can only add headers *.H files, source files (*.C, *.CPP) and pass the directories to include in the search path of the compiler as well as libraries to link to.
Since a consistent part of the code is based on C++11 I am passing to the compiler also the argument -std=C++11.
Eventually, /PyDSTool/Generators/mixins.py launch a setup command (line 185) which in turn runs the command build_ext from distutils to which all the above flags are appended.
For the sake of clarity: the flags that I am appending are:
compile options: '-I/usr/lib64/python2.7/site-packages/numpy/core/include -I/home/maurizio/Dropbox/StabilityAnalysis_tmp -I/usr/local/pydstool/PyDSTool/integrator -I/usr/include/python2_7 -I/usr/include/numpy -I/home/maurizio/Dropbox/Ongoing_Projects/pycustommodules -I/home/maurizio/Dropbox/Ongoing_Projects/c_libraries -I/home/maurizio/Dropbox/Ongoing_Projects/c_libraries/models -I/home/maurizio/Dropbox/Ongoing_Projects/DePitta_PNAS/Software/Stability_Analysis/ -I/usr/lib64/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c'
extra options: '-std=c++11 -w -Wno-return-type -Wall -lpython2.7 -lm -lgsl -lgslcblas -D__DOPRI__'
The resulting compilation command as issued by PyDSTool reads:
error: Command "gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/lib64/python2.7/site-packages/numpy/core/include -I/home/maurizio/Dropbox/StabilityAnalysis_tmp -I/usr/local/pydstool/PyDSTool/integrator -I/usr/include/python2_7 -I/usr/include/numpy -I/home/maurizio/Dropbox/Ongoing_Projects/pycustommodules -I/home/maurizio/Dropbox/Ongoing_Projects/c_libraries -I/home/maurizio/Dropbox/Ongoing_Projects/c_libraries/models -I/home/maurizio/Dropbox/Ongoing_Projects/DePitta_PNAS/Software/Stability_Analysis/ -I/usr/lib64/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c /home/maurizio/Dropbox/StabilityAnalysis_tmp/dop853_temp/ei_network_vf.c -o /home/maurizio/Dropbox/StabilityAnalysis_tmp/dop853_temp/home/maurizio/Dropbox/StabilityAnalysis_tmp/dop853_temp/ei_network_vf.o -std=c++11 -w -Wno-return-type -Wall -lpython2.7 -lm -lgsl -lgslcblas -D__DOPRI__" failed with exit status 1
Once looking into the build.log file automatically generated by PyDSTool, it turns out that the exit status is due to the fact that the compiler does not see the C++ libraries that are in several routines/libs used by my code, e.g.
/usr/include/blitz/blitz.h:45:18: fatal error: string: No such file or directory
#include <string>
^
Compilation Terminated
Now, it is not a problem of my code, because if I compile my code as a standalone in python or through scipy.weave with the same compile and extra options pasted above, it works. It is a problem of making PyDSTool build the code within the integrator. As I am NOT practical with distutils and all gcc options I hope there is some expert here that could provide me with some insight. I suspect in fact that I am missing some options or whatever to pass to the compiler.
Just for the sake of completeness. The issue I pointed out above does not have an easy workaround. PyDSTool C-based integrators (i.e. Radau and Dopri) cannot be compiled with source code for the equations in C++ but only in C. So either you recast your code in plain C or try to edit PyDSTool integrators and recast them in C++. The first option is likely the only one currently possible (at least to some non-experts as who is writing).

Linking CUDA + plain C++ code: undefined reference to `__fatbinwrap_66_tmpxft_ etc

Somehow my CUDA binary build process has been messed up. All of the .cu files compile nicely to .o files, but when I try to link, I get:
CMakeFiles/tester.dir/tester_intermediate_link.o: In function `__cudaRegisterLinkedBinary_66_tmpxft_00007a5f_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37':
/tmp/tmpxft_00006b54_00000000-2_tester_intermediate_link.reg.c:7: undefined reference to `__fatbinwrap_66_tmpxft_00007a5f_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37'
Now, I have not used compute_52 anywhere. My nvcc command-line is:
/usr/local/cuda/bin/nvcc -M -D__CUDACC__ /home/joeuser/src/my_project/src/kernel_specific/elementwise/Add.cu -o /home/joeuser/src/my_project/CMakeFiles/tester.dir/src/kernel_specific/elementwise/tester_generated_Add.cu.o.NVCC-depend -ccbin /usr/bin/gcc-4.9.3 -m64 --std c++11 -D__STRICT_ANSI__ -Xcompiler ,\"-Wall\",\"-g\",\"-g\",\"-O0\" -gencode arch=compute_35,code=compute_35 -g -G --generate-line-info -DNVCC -I/usr/local/cuda/include -I/opt/cub -I/usr/local/cuda/include
and my link line is:
/usr/bin/g++-4.9.3 -Wall -std=c++11 -g some.o files.o here.o blah.o blahblah.o bar.cu.o baz.cu.o -o bin/myapp -rdynamic -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -lrt -ldl /usr/lib/libboost_system.so /usr/lib/libboost_program_options.so -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -lrt -ldl /usr/local/cuda/extras/CUPTI/lib64/libcupti.so -lnvToolsExt -lOpenCL /usr/lib/libboost_system.so /usr/lib/libboost_program_options.so /usr/local/cuda/extras/CUPTI/lib64/libcupti.so -lnvToolsExt -lOpenCL -Wl,-rpath,/usr/lib:/usr/local/cuda/extras/CUPTI/lib64
I'll note I have separate compilation enabled, and do not seem to have skipped my intermediate link phase.
Why is this happening?
CUDA has two compilation modes, relocatable and static.
The relocatable mode is required for some configurations-which we will not get into now.
If you want to compile in relocatable mode -rdc=true, you'll need the Cuda device runtime library.
Which is located in the file cudadevrt.lib.
On some instances, supplying -lcudadevrt as a command line switch to the CUDA linker does the job, but on e.g. MSVC, you'll also need to specify cudadebrt.lib as a link dependency.
Well, I'm not sure why I'm seeing missing references to Compute 5.2 calls, but adding -lcudadevrt to the end of the link command makes the error go away.

Profile OpenMP Code With Intel VTune XE

I have a really weird bug in my code where when running in the Intel XE profiler, it crashes out before the OpenMP section.
If I run the command:
./sphere_benchmark -i ../sphere_team6_dealmesh.prm -m 2 -h 1 -p 3
top shows 100% CPU usage in a small serial section, spinning up to 1200% when it drops into the OpenMP block.
However if I run the command:
amplxe-cl -collect concurrency -- ./sphere_benchmark -i ../sphere_team6_dealmesh.prm -m 2 -h 1 -p 3
I see the same 100% serial section, but then it stops, and claims there were no profiled regions, as it's looking to profile concurrency. If I change the metric to 'advanced-hotspots', it does profile the code, but only the lines prior to the OpenMP section! (Note - I am using absolute paths in reality, omitted here for brevity).
However if my application is crashing before the OpenMP section, which I assume it is, it must be a very clean crash as I get no error, no warning, nothing untoward going on even specifying -vv to amplxe-cl. And it runs fine when not being profiled.
The program I am using relies on the dealii library, which has been built from source using icc 16.0.1, and has all of it's many dependencies built from source with the same.
My compile command looks a bit like:
mpiicpc -DDEBUG -DTBB_DO_ASSERT=1 -DTBB_IMPLEMENT_CPP0X=1 -DTBB_USE_DEBUG -qopenmp -I<incs> -fpic -ansi -w2 -wd68 -wd135 -wd175 -wd177 -wd191 -wd193 -wd279 -wd327 -wd383 -wd981 -wd1418 -wd1478 -wd1572 -wd2259 -wd21 -wd2536 -wd15531 -wd111 -wd128 -wd185 -wd280 -qopenmp-simd -std=c++11 -Wno-return-type -Wno-parentheses -O0 -g -gdwarf-2 -grecord-gcc-switches -o src/curlfunction.cc.o -c src/curlfunction.cc
and my link:
mpiicpc -qopenmp -shared-intel -rdynamic -qopenmp <objs> -o sphere_benchmark -rdynamic <libs> -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lmpifort -lmpi -Wl,-Bstatic -lmpigi -Wl,-Bdynamic -ldl -lrt -lpthread <libs> -Wl,-<libs>
I've tried to trim these down to make them vaugley readable whilst still including everything.
Does anyone with experience of Intel XE have any suggestions? I've changed my depreacted -openmp flag to -qopenmp and according to the documentation it really shouldn't be this hard.