I am on Ubuntu 17.10. I installed the CUDA 9.1 SDK from NVIDIA.
This is what I tried:
~/GrinGoldMiner/src/Cudacka$ clang++-5.0 -Wl,--cuda-path=/usr/local/cuda-9.1 kernel.cu
clang: error: cannot find libdevice for sm_20. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation. Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
Obviously it doesn't work. It seems like the linker flags are not getting passed. How can I pass them correctly?
It seems clang++-5.0 does not support CUDA 9.X ...
clang++ is able to compile CUDA kernels with CUDA 8.0:
$ clang++-5.0 -O0 -g --cuda-gpu-arch=sm_50 --cuda-path=/usr/local/cuda-8.0 -o t1 t1.cu -L/usr/local/cuda-8.0/lib64 -lcudart
But when using CUDA 9.X I get the same error as you:
$ clang++-5.0 --cuda-gpu-arch=sm_50 --cuda-path=/usr/local/cuda-9.0 -o t1 t1.cu -L/usr/local/cuda-9.0/lib64 -lcudart
clang: error: cannot find libdevice for sm_50. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
They added support for Volta (sm_70) and CUDA 9.0 in this commit: 6d4cb40.
In 2017, this was only available on master branch, and you would have confirmed it like this:
$ git clone https://github.com/llvm-mirror/clang.git
$ cd clang/
$ git branch --contains 6d4cb40
* master
$ git checkout release_50
Branch release_50 set up to track remote branch release_50 from origin.
Switched to a new branch 'release_50'
$ git log | grep 6d4cb40
$ (output was empty)
Note that clang (7.0.0, released September 2018) supports CUDA 7.0 through 9.2.
I tried to build the GrinGoldMiner's Cudacka under Ubuntu 17.10, and all I had to do was:
This generated two commands on my machine (after some cleanup):
/usr/local/cuda-9.1/bin/nvcc -ccbin g++ -m64 -Xcompiler -fpermissive -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o cudacka.o -c kernel.cu
/usr/local/cuda-9.1/bin/nvcc -ccbin g++ -m64 -Xcompiler -fpermissive -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_70,code=compute_70 -o Cudacka.exe cudacka.o
And they finished successfully generating executable Cudacka.exe.
If you are interested specifically in clang:
When I tried to replace g++ with clang++-5.0, I got this error:
nvcc fatal : The version ('50000') of the host compiler ('clang') is not supported
If I use -std=c++11 -ccbin clang++ instead of -ccbin g++, I get this error:
kernel.cu(397): error: explicit instantiation definition directive for __global__ functions with clang host compiler is not yet supported
So, I doubt that you can use clang to compile that code for Ubuntu.
Software info:
Linux Kernel: 4.14.83-1-MANJARO
R: 3.5.1
Rcpp: 1.0.0
g++ g++ (GCC) 8.2.1 20180831
I'm trying to install the later package from CRAN, but it fails due to a compilation error and I can't figure out whether there is something wrong with my own configuration or the package. The installation stops after the call
g++ -I"/usr/include/R/" -DNDEBUG -pthread -DTHREADS_H_SUPPORT=1 -I"/home/karpfen/R-libs/Rcpp/include" -I"/usr/lib/R/library/BH/include" -D_FORTIFY_SOURCE=2 -fopenmp -fpic -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt -c timestamp_win32.cpp -o timestamp_win32.o
with the error message
make: *** No rule to make target '-fopenmp', needed by 'later.so'. Stop.
Rcpp and other packages depending on it work as expected. Is there anything that I could be doing wrong here? I tried reinstalling R + packages already, but no changes here.
The first few lines of the installation output are
* installing *source* package ‘later’ ...
** package ‘later’ successfully unpacked and MD5 sums checked
Running configure script
Using CC=gcc
Using CFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong -fno-plt
C11-style threads.h support detected.
System information (from /etc/lsb-release)
I maintain the C+=-flavored CUDA API wrappers library. The library's current commit is relatively-well-tested, with some example programs and quite a few users. However, sometime very recently (can't say exactly when), and without committing anything new, I now get NVCC warnings during the "dlink" phase of my example programs, e.g.:
/path/to/nvcc /path/to/cuda-api-wrappers/examples/modified_cuda_samples/vectorAdd/vectorAdd.cu -dc -o /path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o -ccbin /opt/gcc-5.4.0/bin/gcc -m64 -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -DNVCC -I/path/to/cuda/include -I/path/to/cuda-api-wrappers/src
/path/to/nvcc -gencode arch=compute_52,code=compute_52 --std=c++11 -Xcompiler -Wall -O3 -DNDEBUG -m64 -ccbin /opt/gcc-5.4.0/bin/gcc -dlink /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/./vectorAdd_generated_vectorAdd.cu.o /path/to/cuda/lib64/libcublas_device.a -o /export/path/to/cuda-api-wrappers/CMakeFiles/vectorAdd.dir/./vectorAdd_intermediate_link.o
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
#O#ptxas info : 'device-function-maxrregcount' is a BETA feature
... this repeats many times ...
but the dlink face does conclude. This is already strange, since I haven't explicitly used any beta features.
/opt/gcc-5.4.0/bin/g++ -Wall -Wpedantic -O2 -DNDEBUG -L/path/to/cuda/lib64 -rdynamic CMakeFiles/vectorAdd.dir/examples/modified_cuda_samples/vectorAdd/vectorAdd_generated_vectorAdd.cu.o CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o -o examples/bin/vectorAdd lib/libcuda-api-wrappers.a -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -ldl -lrt -lnvToolsExt -Wl,-Bstatic -lcudadevrt -Wl,-Bdynamic
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_cublas_compute_70_cpp1_ii_f0559976':
link.stub:(.text+0xe0): undefined reference to `__fatbinwrap_25_cublas_compute_70_cpp1_ii_f0559976'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_25_xerbla_compute_70_cpp1_ii_cd7f3ad3':
link.stub:(.text+0x190): undefined reference to `__fatbinwrap_25_xerbla_compute_70_cpp1_ii_cd7f3ad3'
CMakeFiles/vectorAdd.dir/vectorAdd_intermediate_link.o: In function `__cudaRegisterLinkedBinary_23_nrm2_compute_70_cpp1_ii_8edbce95':
link.stub:(.text+0x240): undefined reference to `__fatbinwrap_23_nrm2_compute_70_cpp1_ii_8edbce95'
... more udnefined reference errors here ...
My question: Why would this happen and how do I circumvent/avoid/resolve it?
I'm using separable compilation
I'm getting these specific errors with CUDA 9.1 and a SM 5.2 device (no 7.0).
The CMakeLists.txt is here.
I'm obviously clearing CMakeCache.txt before building.
This has happened to me both on a GNU/Linux Mint 18.3 and Fedora 26. On the first machine there have been some apt-get dist-upgrade's done, and now GCC is up to version 5.5.0, in case that matters. On the second machine - there really has been no change that I'm aware of; same compiler and CUDA version.
A partial answer / workaround:
This issue only seems to occur when libcublas is involved. If I remove /path/to/cuda/lib64/libcublas_device.a from the -dlink phase command-line, all warnings and errors go away (including from later stages). And in fact, my wrapper library is oblivious of cublas, not sure why CMake is adding it; it's not in $CUDA_LIBRARIES. See also:
Why does CMake force the use of libcublas with separable compilation?
I'm trying to build Magma and I'm running into problems which I'm pretty sure I didn't run into when using earlier versions of CUDA. (I'm using 6.5 now). What happens is that the makefile generates the following command:
nvcc -fPIC -O3 -DADD_ -Xcompiler -fno-strict-aliasing -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -I/opt/cuda/include -I../include -I../control -I../sparse-iter/include -c zgemv_conjv.cu -o zgemv_conjv.o
nvcc fatal : Unknown option 'fPIC'
Googling shows that -fPIC should be used only with -Xcompiler because it's not an nvcc option. But as you can see I do have -Xcompiler in my nvcc command.
I tried putting -fPIC behind -Xcompiler like this:
nvcc -O3 -DADD_ -Xcompiler -fPIC -fno-strict-aliasing -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -I/opt/cuda/include -I../include -I../control -I../sparse-iter/include -c zgemv_conjv.cu -o zgemv_conjv.o
nvcc fatal : Unknown option 'fno-strict-aliasing'
It fails on the next non-nvcc option, even though it is behind -Xcompiler. What works is this:
nvcc -O3 -DADD_ -Xcompiler -fno-strict-aliasing -Xcompiler -fPIC -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -I/opt/cuda/include -I../include -I../control -I../sparse-iter/include -c zgemv_conjv.cu -o zgemv_conjv.o
Where I have duplicated -Xcompiler switch.
Does anyone know if this is the intended behaviour? I couldn't find any reference or documentaion regarding it, and I'm pretty sure it didn't use to work like that in previous versions of CUDA. Could it be a bug?
According to this, you have to separate your different -Xcompiler sub-options with a comma or you have to use for each option a separate -Xcompiler, like you did in your last try. It looks like this is intended.
I am using clang 3.5 as distributed by them. I'm using the following command lines to install it in my travis vm:
sudo apt-add-repository 'deb http://llvm.org/apt/precise/ llvm-toolchain-precise-3.5 main'
sudo apt-add-repository 'deb http://llvm.org/apt/precise/ llvm-toolchain-precise-3.5 main'
When I run my test build with optimizations turned on, I get this error:
clang: error: optimization flag '-finline-functions' is not supported
"clang++" -c -x c++ -std=c++1y -Werror -O3 -finline-functions -Wno-inline -Wall -Werror -pthread -fPIC -std=c++1y -DBOOST_ALL_DYN_LINK -DNDEBUG -I"." -I"gamgee" -I"lib/htslib" -o "test/bin/run.test/clang-linux-3.5.0/release/threading-multi/sam_builder_test.o" "test/sam_builder_test.cpp"
I don't get the same error on my mac which runs the older 3.4 version of clang.
Has clang cut support to -finline-functions in 3.5? Is this something specific about this package build? How should one substitute the -finline-functions option for optimized builds with clang-3.5+?
See this commit: http://llvm.org/klaus/clang/commit/6590426aeb5275ec33dac2877f9349bbbb2d4b2e/#0-L-571
Previously, that flag was ignored and the user was not notified. Now the user is notified that it is ignored. You shouldn't have seen any difference in the code generation with or without that flag.
It should only be a warning, but you've upgraded it to an error with -Werror.
I'm trying to compile CUDA code using nvcc on Ubuntu. However, when I do, I get this output:
> make
/usr/local/cuda/bin/nvcc -m64 --ptxas-options="-v" -gencode arch=compute_11,code=sm_11 -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -o main main.cu
gcc: No such file or directory
make: *** [main] Error 1
Even when I'm trying to compile a file with only a main function in it, it still doesn't work:
> /usr/local/cuda/bin/nvcc main.cu
gcc: No such file or directory
nvcc seems to respond to --version, so it's definitely there. I'm not sure why it's invoking gcc though.
nvcc is not a compiler in itself. It's a "compiler driver", orchestrating the entire process of compiling device code, host code and linking it together. On Linux, it uses gcc for compiling the host code.
To install gcc on Ubuntu:
$ sudo apt-get --yes install build-essential