Linking CUDA + plain C++ code: undefined reference to `__fatbinwrap_66_tmpxft_ etc - c++

Somehow my CUDA binary build process has been messed up. All of the .cu files compile nicely to .o files, but when I try to link, I get:
CMakeFiles/tester.dir/tester_intermediate_link.o: In function `__cudaRegisterLinkedBinary_66_tmpxft_00007a5f_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37':
/tmp/tmpxft_00006b54_00000000-2_tester_intermediate_link.reg.c:7: undefined reference to `__fatbinwrap_66_tmpxft_00007a5f_00000000_16_cuda_device_runtime_compute_52_cpp1_ii_8b1a5d37'
Now, I have not used compute_52 anywhere. My nvcc command-line is:
/usr/local/cuda/bin/nvcc -M -D__CUDACC__ /home/joeuser/src/my_project/src/kernel_specific/elementwise/Add.cu -o /home/joeuser/src/my_project/CMakeFiles/tester.dir/src/kernel_specific/elementwise/tester_generated_Add.cu.o.NVCC-depend -ccbin /usr/bin/gcc-4.9.3 -m64 --std c++11 -D__STRICT_ANSI__ -Xcompiler ,\"-Wall\",\"-g\",\"-g\",\"-O0\" -gencode arch=compute_35,code=compute_35 -g -G --generate-line-info -DNVCC -I/usr/local/cuda/include -I/opt/cub -I/usr/local/cuda/include
and my link line is:
/usr/bin/g++-4.9.3 -Wall -std=c++11 -g some.o files.o here.o blah.o blahblah.o bar.cu.o baz.cu.o -o bin/myapp -rdynamic -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -lrt -ldl /usr/lib/libboost_system.so /usr/lib/libboost_program_options.so -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -lrt -ldl /usr/local/cuda/extras/CUPTI/lib64/libcupti.so -lnvToolsExt -lOpenCL /usr/lib/libboost_system.so /usr/lib/libboost_program_options.so /usr/local/cuda/extras/CUPTI/lib64/libcupti.so -lnvToolsExt -lOpenCL -Wl,-rpath,/usr/lib:/usr/local/cuda/extras/CUPTI/lib64
I'll note I have separate compilation enabled, and do not seem to have skipped my intermediate link phase.
Why is this happening?

CUDA has two compilation modes, relocatable and static.
The relocatable mode is required for some configurations-which we will not get into now.
If you want to compile in relocatable mode -rdc=true, you'll need the Cuda device runtime library.
Which is located in the file cudadevrt.lib.
On some instances, supplying -lcudadevrt as a command line switch to the CUDA linker does the job, but on e.g. MSVC, you'll also need to specify cudadebrt.lib as a link dependency.

Well, I'm not sure why I'm seeing missing references to Compute 5.2 calls, but adding -lcudadevrt to the end of the link command makes the error go away.

Related

Why would I need to list -ldl before a library that calls dlopen/dlclose/dlerror when linking

I am building an executable (foo.exe let's call it) on RHEL with gcc 6.2. It links against a few third-party libraries, libzzdesign.so, libyydesign.so. Yydesign uses dlopen/dlclose/dlerror. I would expect this command-line to work:
g++ -Wall -fcheck-new -fno-strict-aliasing -msse2 -fno-omit-frame-pointer -pthread -O3 -Wl,--export-dynamic -o foo.exe foo.o -L/path/to/zzdesign -Wl,-rpath=/path/to/zzdesign -lzzdesign -L/path/to/yydesign -Wl,-rpath=/path/to/yydesign -lyydesign -ldl
(I'm listing all the options used in case it matters)
It produces the errors,
/path/to/yydesign/libyydesign.so: undefined reference to 'dlclose'
/path/to/yydesign/libyydesign.so: undefined reference to 'dlerror'
If I change the command line to put -ldl before -lyydesign:
g++ -Wall -fcheck-new -fno-strict-aliasing -msse2 -fno-omit-frame-pointer -pthread -O3 -Wl,--export-dynamic -o foo.exe foo.o -L/path/to/zzdesign -Wl,-rpath=/path/to/zzdesign -lzzdesign -L/path/to/yydesign -Wl,-rpath=/path/to/yydesign -ldl -lyydesign
... it works without error.
This is the opposite of everything I thought I knew about order of libraries on the command line when linking.
Why does -ldl have to come before -lyydesign?
Other than dumb luck to stumble across this solution, how could I troubleshoot the original error to understand what's going on?
And since changing the build system to move -ldl first in all the places it's needed is kind of a pain, is there a way I can avoid having to put -ldl first?
Order of libs for LD does matter. Lib yydesign use dlclose and dlerror that's why lib dl need to be passed before yydesign.

gcc - linking and compiling in one command

I am new to C++ and learning RTI DDS at the moment by compiling their examples. I am currently using their make files but I want to learn how to compile individual files using gcc directly. The make files first compiles objects and links them together as per below.
g++ -DRTI_UNIX -DRTI_LINUX -DRTI_64BIT -m64 -O2 -o objs/x64Linux3gcc4.8.2/HelloPublisher.o -Isrc -Isrc/idl -I/opt/rti_connext_dds-5.2.3/include -I/opt/rti_connext_dds-5.2.3/include/ndds -c src/HelloPublisher.cpp
g++ -m64 -static-libgcc -Wl,--no-as-needed objs/x64Linux3gcc4.8.2/HelloPublisher.o -o objs/x64Linux3gcc4.8.2/HelloPublisher -L/opt/rti_connext_dds-5.2.3/lib/x64Linux3gcc4.8.2 -lnddscppz -lnddscz -lnddscorez -ldl -lnsl -lm -lpthread -lrt
How can I write a single command using g++/gcc to do both?
The usual way is
g++ -o $prog -DRTI_UNIX $moreflags $file1.cpp $file2.cpp $prog.cpp $libs
You'll have to try a bit with the myriad of arguments you got since order matters.

Autotools issues when linking with OpenMP, MPI and CUDA

I have a project that is compiled with autotools and up until this week needed to be only compiled with OpenMP and MPI support. I have now added a CUDA kernel that I wish to compile into the code under certain circumstances. The compiling of the code goes okay and all of the object files are created. When it comes to linking the objects into the executable the following command is used:
/bin/bash ../libtool --tag=CXX --mode=link nvcc -ccbin=mpicxx -I/usr/local/cuda/include -Xcompiler -std=c++0x -Xcompiler -fopenmp -L/usr/local/cuda/lib64 -lcuda -lcudart -lcufft -o utrplauncher utrplauncher-UTRP.o crossovers/libcrossovers.a initialisers/libinitialisers.a mutators/libmutators.a problem/libproblem.a common/libcommon.a variables/libvariables.a ../libraries/framework/libmoeaframework.a ../libraries/ticpp/libticpp.a
Which in turn generates the follwong link command
libtool: link: nvcc -ccbin=mpicxx -I/usr/local/cuda/include -std=c++0x -fopenmp -o utrplauncher utrplauncher-UTRP.o -L/usr/local/cuda/lib64 -lcuda -lcudart -lcufft crossovers/libcrossovers.a initialisers/libinitialisers.a mutators/libmutators.a problem/libproblem.a common/libcommon.a variables/libvariables.a ../libraries/framework/libmoeaframework.a ../libraries/ticpp/libticpp.a
This the generates the following error because the -std=c++0x and -fopenmp are interpreted by the CUDA compiler and not the mpicxx compiler.
nvcc fatal : Value 'c++0x' is not defined for option 'std'
I can post my configure.ac if that would help but wanted to keep the question concise at the moment.
My question is therefore is it possible to forward the -Xcompiler flags to the mpicxx compiler rather than having them stripped off by libtool?
One way is to pass both -Xcompiler=-std=c++0x and -Xcompiler=-fopenmp directly to the compiler using -Wc,, thus -Xcompiler is not stripped by libtool. For instance following dry-run:
libtool -n --tag=CXX --mode=link nvcc
-ccbin=mpicxx-I/usr/local/cuda/include -Wc,-Xcompiler=-std=c++0x -Wc,-Xcompiler=-fopenmp -L/usr/local/cuda/lib64 -lcuda -lcudart -lcufft -o utrplauncher utrplauncher-UTRP.o crossovers/libcrossovers.a initialisers/libinitialisers.a mutators/libmutators.a
problem/libproblem.a common/libcommon.a variables/libvariables.a
../libraries/framework/libmoeaframework.a
../libraries/ticpp/libticpp.a
generates:
libtool: link: nvcc -ccbin=mpicxx-I/usr/local/cuda/include
-Xcompiler=-std=c++0x -Xcompiler=-fopenmp -o utrplauncher utrplauncher-UTRP.o -L/usr/local/cuda/lib64 -lcuda -lcudart -lcufft
crossovers/libcrossovers.a initialisers/libinitialisers.a
mutators/libmutators.a problem/libproblem.a common/libcommon.a
variables/libvariables.a ../libraries/framework/libmoeaframework.a
../libraries/ticpp/libticpp.a

C++ Symbol lookup error in shared library when accessing boost bind

I am trying to add multithreading into my library, so I am working on creating a thread executor for my library. For this I am using boost threads.
This is the error I am getting when running a test case that links to the library:
symbol lookup error: libmylibexample.so.0: undefined symbol: _ZTVN5boost6detail16thread_data_baseE
This is the line of code in my shared library that is causing the error:
MyNameSpace::Producer producer = MyNameSpace::Producer();
threads.create_thread(boost::bind(&MyNameSpace::Producer::run, &producer));
I am compiling the library using autotools and libtool. The code compiles fine. I then create a test case that I am trying to reference the library. Here is the compilation order for compiling the test case:
g++ -I. -I../include -g -O2 -MT runTest-runTest.o -MD -MP -MF .deps/runTest-runTest.Tpo -c -o runTest-runTest.o `test -f 'runTest.cc' || echo './'`runTest.cc
and this is my linking stage:
mv -f .deps/runTest-runTest.Tpo .deps/runTest-runTest.Po
/bin/bash ../libtool --tag=CXX --mode=link g++ -g -O2 ../libmylibexample/libmylibexample.la -o runTest runTest-runTest.o -lboost_system -lboost_filesystem -lboost_regex -lboost_thread-mt -lfftw3 -ltiff
libtool: link: g++ -g -O2 -o .libs/runTest runTest-runTest.o ../libmylibexample/.libs/libmylibexample.so -lboost_system -lboost_filesystem -lboost_regex -lboost_thread-mt -lfftw3 /usr/lib/x86_64-linux-gnu/libtiff.so
One of my colleagues suggested initializing some boost templates relating to threading to help the shared library to load the symbol from the boost_thread library. I am not entirely certain the best method to do this and if this it the right way of making things get loaded.
So to wrap things up: The error appears to involve not being able to load a symbol defined in libboost_thread from my shared library.
As the error indicates, you need to link libmylibexample with libboost_thread.

Error when run LD_PRELOAD with boost

I compiled LD_PRELOAD which uses boost (locks.hpp). Compile was successfull. I copied this LD_PRELOAD to other linux server, and when i run, error:
/usr/bin/java: symbol lookup error: /test/test.so: undefined symbol:
_ZN5boost11this_thread20disable_interruptionC1Ev
How can i fix this? Can i avoid this problem without installing boost on this server?
How i compile LD_PRELOAD:
g++ -fPIC -m32 -shared -Wl,-soname,test.so -ldl -o test.so test.cpp
Thanks!
It seems you have to get libboost_thread into your test.so file. Something along the lines of:
g++ -fPIC -m32 -shared -Wl,-soname,test.so -ldl -o test.so test.cpp \
/usr/lib/libboost_thread.a -lpthread
Since I wouldn't know the specifics for your system, the boost library might be in a different place than from mine.