Linking error while separating CUDA function into declaration and definition - c++

I'm following instructions provided on NVidia blog post on how to separate functions called from a kernel into declaration and definition. Using CUDA 10 version and Visual Studio compiler produces linking errors. To nvcc compiler's options I've added -dc, as instructed in the referenced post. The files are all located in the same folder under the same project.
test.cuh
__host__ __device__ float test(float, float);
test.cu
#include "test.cuh"
__host__ __device__ float test(float a, float b)
{
return a + b;
}
kernel.cu
#include <stdio.h>
#include "test.cuh"
__global__ void addKernel(int *c, const int *a, const int *b)
{
int i = threadIdx.x;
c[i] = test(a[i], b[i]);
}
Linking error
1>kernel.cu.obj : error LNK2019: unresolved external symbol __cudaRegisterLinkedBinary_41_tmpxft_0000796c_00000000_7_kernel_cpp1_ii_f853efa9 referenced in function "void __cdecl __nv_cudaEntityRegisterCallback(void * *)" (?__nv_cudaEntityRegisterCallback##YAXPEAPEAX#Z)
1>test.cuh.obj : error LNK2019: unresolved external symbol __cudaRegisterLinkedBinary_39_tmpxft_00006d84_00000000_7_test_cpp1_ii_f2c23be0 referenced in function "void __cdecl __nv_cudaEntityRegisterCallback(void * *)" (?__nv_cudaEntityRegisterCallback##YAXPEAPEAX#Z)
1>test.cu.obj : error LNK2019: unresolved external symbol __cudaRegisterLinkedBinary_39_tmpxft_00008044_00000000_7_test_cpp1_ii_f2c23be0 referenced in function "void __cdecl __nv_cudaEntityRegisterCallback(void * *)" (?__nv_cudaEntityRegisterCallback##YAXPEAPEAX#Z)
1>D:\Workspaces\src\sandbox\cuda_dc\x64\Debug\cuda_dc.exe : fatal error LNK1120: 3 unresolved externals
It doesn't make any difference if change file extensions to ".c", ".cpp", or ".cuh"

These are the steps I followed, using the code you have shown, plus adding a simple main() function so we can have a complete project.
(In Visual Studio)
File..New..Project
on left hand side, scroll down to NVIDIA and select it
select CUDA X.Y runtime project, give project a name, click OK
at the top menu bar, next to Debug, change x86 to x64
The project should have a default file in it, kernel.cu. Replace the contents of this with (modifying your kernel.cu to add a main function):
#include <stdio.h>
#include "test.cuh"
__global__ void addKernel(int *c, const int *a, const int *b)
{
int i = threadIdx.x;
c[i] = test(a[i], b[i]);
}
int main() {
int *c = NULL;
int *a = NULL;
int *b = NULL;
addKernel << <1, 1 >> > (c, a, b);
}
(in windows, e.g. using file manager)
In the project folder where kernel.cu is located, place your files test.cuh and test.cu (the updated versions you posted without C linkage)
(in visual studio)
Go to the project in the solution explorer windows, right click on the project name, and select Properties
On the left hand side of the dialog, select "CUDA C/C++"
On the right hand side, change the drop-down next to "Generate Relocatable Device Code" from No to Yes
On the left side, select "CUDA linker" and confirm that "Perform device link" is already set to yes
Select OK to close the dialog
Again, in the solution explorer window, right-click on the project name and select add... existing item
A file selection dialog should open. You should see the kernel.cu file plus the test.cu and test.cuh files you added to the folder
Select and add the test.cu file
Now select Build...Rebuild Solution
When I do those steps, I get a clean compilation with no errors:
1>------ Rebuild All started: Project: test37, Configuration: Debug x64 ------
1>
1> c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37\kernel.cu" -clean
1>CUDACOMPILE : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1> kernel.cu
1>
1> c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o x64\Debug\test.cu.obj "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37\test.cu" -clean
1>CUDACOMPILE : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1> test.cu
1> Compiling CUDA source file kernel.cu...
1> Compiling CUDA source file test.cu...
1>
1> c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2015 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37\kernel.cu"
1>
1> c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_20,compute_20\" --use-local-env --cl-version 2015 -ccbin "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o x64\Debug\test.cu.obj "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37\test.cu"
1>CUDACOMPILE : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1> kernel.cu
1>CUDACOMPILE : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1> test.cu
1>
1> c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\test37>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -dlink -o x64\Debug\test37.device-link.obj -Xcompiler "/EHsc /W3 /nologo /Od /Zi /RTC1 /MDd " -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64" cudart.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib -gencode=arch=compute_20,code=sm_20 -G --machine 64 x64\Debug\kernel.cu.obj x64\Debug\test.cu.obj
1>CUDALINK : nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
1> cudart.lib
1> kernel32.lib
1> user32.lib
1> gdi32.lib
1> winspool.lib
1> comdlg32.lib
1> advapi32.lib
1> shell32.lib
1> ole32.lib
1> oleaut32.lib
1> uuid.lib
1> odbc32.lib
1> odbccp32.lib
1> kernel.cu.obj
1> test.cu.obj
1> test37.vcxproj -> c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\x64\Debug\test37.exe
1> test37.vcxproj -> c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\x64\Debug\test37.pdb (Full PDB)
1> copy "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\cudart*.dll" "c:\Users\bob-tosh\documents\visual studio 2015\Projects\test37\x64\Debug\"
1> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\cudart32_80.dll
1> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\cudart64_80.dll
1> 2 file(s) copied.
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========
My opinion is that if you can follow the above steps exactly, starting with a new project and using the files I indicate, and you get the same results I do, then the problem you are describing in your question relates to something you haven't shown or haven't described. You should then provide a MCVE, and be sure to provide the same level of specifics I have given in my answer. Every step used to create, build, and compile the project, along with the console build output and all files used.
I've used CUDA 8 and Visual Studio 2015, but I don't think there should be substantial differences for what I am describing here with a newer VS and newer CUDA versions.

Related

CUDA 11.5 bandwidth test sample build fails with Visual Studio 2019

I am trying to build bandwidth_test sample from CUDA 11.5 and it fails with:
C:\Program Files (x86)\Microsoft Visual
Studio\2019\Community\MSBuild\Microsoft\VC\v160\BuildCustomizations\CUDA
11.5.targets(785,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\bin\nvcc.exe"
-gencode=arch=compute_35,code="sm_35,compute_35" -gencode=arch=compute_37,code="sm_37,compute_37" -gencode=arch=compute_50,code="sm_50,compute_50" -gencode=arch=compute_52,code="sm_52,compute_52" -gencode=arch=compute_60,code="sm_60,compute_60" -gencode=arch=compute_61,code="sm_61,compute_61" -gencode=arch=compute_70,code="sm_70,compute_70" -gencode=arch=compute_75,code="sm_75,compute_75" -gencode=arch=compute_80,code="sm_80,compute_80" -gencode=arch=compute_86,code="sm_86,compute_86" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64" -x cu
-I./ -I../../common/inc -I./ -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5/include" -I../../common/inc -I"C:\Program
Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\include" -G
--keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static --threads 0 -g -DWIN32 -DWIN32 -D_MBCS -D_MBCS -Xcompiler
"/EHsc /W3 /nologo /Od /Fdx64/Debug/vc142.pdb /FS /Zi /RTC1 /MTd " -o
x64/Debug/bandwidthTest.cu.obj "C:\ProgramData\NVIDIA Corporation\CUDA
Samples\v11.5\1_Utilities\bandwidthTest\bandwidthTest.cu"" exited with
code 1.
I looked everywhere but, I can't find the problem. Also there are no additional errors in the logs or I can't find any.
Does anybody knows why is this happening?
By the way the first sample deviceQuery can be build and I can execute it but I guess that one does not use nvcc.
You've got error MSB3721, which does not say anything on its own.
MSB3721 is the VS way of saying “I ran nvcc, and it returned an error code.”
Other than knowing that your compilation failed, it is completely
useless for understanding why it failed.
To understand why, it’s necessary to increase the verbosity of VS
output so that it shows the actual invocation of nvcc and the actual
error reported by nvcc (prior to VS reporting the MSB3721 error).
If you google how to increase verbosity of VS output, you’ll be able
to find articles explaining how.

Why does VS2019 Pro have compile errors with xutility, xmemory, and atomic when creating a CUDA project via CMake?

I'm trying to create a simple CUDA project via CMake and getting strange compilation errors. I'm following this tutorial.
Originally, I was using Visual Studio 2019 Community, CMake 3.18.3, and CUDA 11.3 and everything worked fine. Then, I updated to Visual Studio 2019 Professional and CMake 3.20.3, and it failed to compile the same exact same source code.
Here's my entire CMakeLists file:
cmake_minimum_required(VERSION 3.18.3)
project(hello_world LANGUAGES CXX CUDA)
add_executable(hello_world_target main.cu)
target_compile_features(hello_world_target PUBLIC cxx_std_11)
set_target_properties(hello_world_target PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
set_target_properties(hello_world_target PROPERTIES CUDA_ARCHITECTURES "52")
Here's my only source file, main.cu:
#include <iostream>
int main(){
std::cout << "Hello, world!" << std::endl;
return 0;
}
When I try to compile, I get the following errors:
1>Compiling CUDA source file ..\main.cu...
1>
1>C:\Users\[username]\Documents\hello_cmake\build>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_52,code=\"compute_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30037\bin\HostX64\x64" -x cu -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\include" --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -std=c++14 -Xcompiler="/EHsc -Zi -Ob0" -g -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -Xcompiler "/EHsc /W1 /nologo /Od /Fdhello_world_target.dir\Debug\vc142.pdb /FS /Zi /RTC1 /MDd /GR" -o hello_world_target.dir\Debug\main.obj "C:\Users\[username]\Documents\hello_cmake\main.cu"
1>C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30037\include\xutility(1309): error : expected a "("
1> detected during instantiation of "void std::_Adl_verify_range(const _Iter &, const _Sentinel &) [with _Iter=const char *, _Sentinel=const char *]"
1>C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30037\include\xlocale(1990): here
1>
1>C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30037\include\xutility(1309): error : expected a "("
1> detected during instantiation of "void std::_Adl_verify_range(const _Iter &, const _Sentinel &) [with _Iter=__wchar_t *, _Sentinel=__wchar_t *]"
1>C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30037\include\xlocale(1991): here
1>
1>C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30037\include\xutility(1309): error : expected a "("
1> detected during instantiation of "void std::_Adl_verify_range(const _Iter &, const _Sentinel &) [with _Iter=const __wchar_t *, _Sentinel=const __wchar_t *]"
1>C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30037\include\xlocale(2026): here
.....etc., etc., etc.....
31 errors detected in the compilation of "C:/Users/[username]/Documents/hello_cmake/main.cu".
1>main.cu
1>C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\MSBuild\Microsoft\VC\v160\BuildCustomizations\CUDA 11.3.targets(785,9): error MSB3721: The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_52,code=\"compute_52,compute_52\" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.29.30037\bin\HostX64\x64" -x cu -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\include" --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -std=c++14 -Xcompiler="/EHsc -Zi -Ob0" -g -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -Xcompiler "/EHsc /W1 /nologo /Od /Fdhello_world_target.dir\Debug\vc142.pdb /FS /Zi /RTC1 /MDd /GR" -o hello_world_target.dir\Debug\main.obj "C:\Users\[username]\Documents\hello_cmake\main.cu"" exited with code 1.
1>Done building project "hello_world_target.vcxproj" -- FAILED.
What's perplexing is that this worked fine with the different versions of Visual Studio and CMake. Also, if I rewrite the CMakeLists.txt file to remove CUDA from the LANGUAGES list, and change main.cu to main.cpp, everything works fine.
It's also confusing that the compiler would complain about the xutility, xmemory, and atomic files. That sounds like a red herring, though.
What could be causing this issue?
UPDATE 10/20/2021: For me, VS2019 16.11.5 works fine with CUDA 11.4.120 and CMake 3.21.3, no modifications needed.
I ran into basically the same problem after upgrading from Visual Studio 2019 16.9.6 to 16.10.
The problem seems to be caused by changes in xutility, xmemory etc. in the version of the MSVC v142 build tools 14.29.30037 delivered with Visual Studio 2019 16.10.
I could not solve the problem for the new version of the build tools, but I found a workaround. It is possible to install the v142 build tools from VS2019 16.9 with VS2019 16.10:
In VS the installer, "Visual Studio 2019->Modify->Individual components" add
MSVC v142 - VS2019 C++ x64/x86 build tools (14.28-16.9)
C++ v14.28 (16.9) ATL for v142 build tools (x86 & x64)
optionally: MFC, commandline tools, etc.
To compile the CUDA CMake project the MSVC toolset version needs to be set explicitly. This can be done by entering
version=14.28.29910
in "Optional toolset to use (argument to -T) in the CMake GUI (CMake cache needs to be deleted).
Nothing wrong with the program and with the MSVC compiler and libs. You can use the latest release (now 16.10.1)
cmake generated this (chaotic) compiler command in the vs 2019 IDE:
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin\nvcc.exe" -gencode=arch=compute_75,code="compute_75,compute_75" -gencode=arch=compute_75,code="sm_75,compute_75" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30037\bin\HostX64\x64" -x cu -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\include" --keep-dir x64\Release -maxrregcount=0 --machine 64 --compile -cudart static -std=c++14 -Xcompiler="/EHsc -Ob2" -D_WINDOWS -DNDEBUG -D"CMAKE_INTDIR="Release"" -D_MBCS -D"CMAKE_INTDIR="Release"" -Xcompiler "/EHsc /W1 /nologo /O2 /Fdhello_world_target.dir\Release\vc142.pdb /FS /MD /GR" -o hello_world_target.dir\Release\main.obj "D:\projects\test\main.cu"
remove additional option: -std=c++14 in this command and it works
Why:
With cuda 11.3 and MSVC 19.29.++ MSVC arranges itself the setting for the host compiler icw NVCC, so using this option gives problems.
It must be edited in the CMAKE release. Module compiler(NVIDIA-CUDA.cmake) I think. I'm not an expert on that, but you can see there a lot of hocus-pocus around this option.
Also Windows Visual Studio does not support specifying CUDAHOSTCXX or CMAKE_CUDA_HOST_COMPILER env settings and gives problems like this.
I used cmake version 3.20.3 and win-build 3.20.20210609-g5e26887 to test.

call kernel inside CUDA kernel

I am trying to do something like that:
__global__ void foo()
{
// do stuff
}
__global__ void boo()
{
foo<<<m, n>>>();
}
but I am getting the error "kernel launch from __device__ or __global__ functions requires separate compilation mode"
I tried googling for an answer and I saw some results talking about "dynamic-parallelism" and it says that it requires compute capability 3 or above which I have(GTX 750 Ti compute capability 5).
I also so that I need to turn "rdc" flag on, while it does make the error go away it makes the compilation fail no matter what(even if I comment everything)
So how can I achieve my goal or what might be the problem?
(using cuda 11.0)
I also added "cudadevrt.lib;cudart.lib;" to input in linker in project properties
EDIT:
The error it gives when rdc is set to true:
Error MSB3721 The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc.exe" -dlink -o "x64\Debug\crimson cuda.device-link.obj" -Xcompiler "/EHsc /W3 /nologo /Od /Zi /Fdx64\Debug\vc142.pdb /RTC1 /MDd " -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin/crt" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib\x64" cudadevrt.lib cudart.lib cudart_static.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib -gencode=arch=compute_50,code=sm_50 -G --machine 64 x64\Debug\CrimsonNet.cu.obj x64\Debug\kernel.cu.obj" exited with code 1.
EDIT 2:
I continued to investigate and it seems that the problem occur while linking the files which I don't fully understand how it works when using rdc.
Using MS VS 2019 and CUDA 11.0, the following steps allowed me to create a dynamic parallelism (CDP) example:
Create a new CUDA Runtime project
In the kernel.cu file that is generated, modify the kernel like so:
__global__ void child_kernel() {printf("hello\n");}
__global__ void addKernel(int *c, const int *a, const int *b)
{
child_kernel << <1, 1 >> > ();
int i = threadIdx.x;
c[i] = a[i] + b[i];
}
In Project...Properties...CUDA C++...Common set Generate Relocatable Device Code to "Yes"
In Project...Properties...CUDA Linker...General add cudadevrt.lib to Additional Dependencies
Build or rebuild the project, you should then see output like this:
1>------ Rebuild All started: Project: test23, Configuration: Debug x64 ------
1>Compiling CUDA source file kernel.cu...
1>
1>C:\Users\Robert Crovella\source\repos\test23>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc.exe" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.20.27508\bin\HostX86\x64" -x cu -rdc=true -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64\Debug\vc142.pdb /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj "C:\Users\Robert Crovella\source\repos\test23\kernel.cu"
1>kernel.cu
1>
1>C:\Users\Robert Crovella\source\repos\test23>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc.exe" -dlink -o x64\Debug\test23.device-link.obj -Xcompiler "/EHsc /W3 /nologo /Od /Zi /Fdx64\Debug\vc142.pdb /RTC1 /MDd " -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin/crt" -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib\x64" cudadevrt.lib cudart_static.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib cudart.lib -gencode=arch=compute_52,code=sm_52 -G --machine 64 x64\Debug\kernel.cu.obj
1>cudadevrt.lib
1>cudart_static.lib
1>kernel32.lib
1>user32.lib
1>gdi32.lib
1>winspool.lib
1>comdlg32.lib
1>advapi32.lib
1>shell32.lib
1>ole32.lib
1>oleaut32.lib
1>uuid.lib
1>odbc32.lib
1>odbccp32.lib
1>cudart.lib
1>kernel.cu.obj
1> Creating library C:\Users\Robert Crovella\source\repos\test23\x64\Debug\test23.lib and object C:\Users\Robert Crovella\source\repos\test23\x64\Debug\test23.exp
1>test23.vcxproj -> C:\Users\Robert Crovella\source\repos\test23\x64\Debug\test23.exe
========== Rebuild All: 1 succeeded, 0 failed, 0 skipped ==========
Notes:
CUDA 11.0 (and higher) only target devices that will support CDP. For earlier versions, you may need to set the device code generation target to match a GPU that will support CDP (e.g. compute_35,sm_35)
In MS VS, the MSB3721 error is not that useful by itself. It simply indicates "something went wrong". To get more useful info from Visual Studio, you should increase the verbosity of the console output. The exact method to do this will vary by VS version, but you can find instructions via a search such as this. The objective is to increase the verbosity so VS will show you the actual output generated by nvcc when there is an error.
For CUDA 11.0/VS2019, the addition of cudadevrt.lib isn't necessary because it is already included in the project. For other/older versions it may be necessary.
If you're still having trouble, I suggest you increase the verbosity to get a better idea of the exact issue. You should also try the steps listed above exactly to make sure you understand them (i.e. starting with a new project). If you're still having trouble, post a new question with your actual code, as well as the console compile output after you increase the verbosity.
I still don't know what caused the problem but after deleting everything related to Nvidia except for the driver and then reinstalling everything through the CUDA installer the error went away and it now works fine.

Addressing Errors Associated with Building "Debug x64" Version of darknet

I have successfully built the "Release x64" version of AlexeyAB's C/C++ solution called darknet. I am using a PC with Windows 10 Professional, Visual Studio Community 2019, an NVIDIA GeForce RTX 2080 Ti GPU, CUDA 10.2, cuDNN 7.6.5, and OpenCV 4.1.2.
While I have successfully built the "Release x64" version of darknet, the Debug version fails to build due to three errors:
MSB3721 associated with line 764 of CUDA 10.2.targets:
The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvcc.exe" -gencode=arch=compute_30,code=\"sm_30,compute_30\" -gencode=arch=compute_75,code=\"sm_75,compute_75\" --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.24.28314\bin\HostX86\x64" -x cu -IC:\opencv_4.1.2\opencv\build\include -I..\..\include -I..\..\3rdparty\stb\include -I..\..\3rdparty\pthreads\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -I\include -I\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include" -G --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -g -DCUDNN_HALF -DCUDNN -D_CRTDBG_MAP_ALLOC -D_MBCS -D_TIMESPEC_DEFINED -D_SCL_SECURE_NO_WARNINGS -D_CRT_SECURE_NO_WARNINGS -D_CRT_RAND_S -DGPU -DWIN32 -DDEBUG -D_CONSOLE -D_LIB -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /Fdx64\Debug\vc142.pdb /FS /Zi /RTC1 /MDd " -o x64\Debug\network_kernels.cu.obj "C:\Users\Tom\Documents\AI\darknet\darknet\src\network_kernels.cu"" exited with code 1.
An error associated with lines 149 and 150 of common_functions.h:
more than one instance of overloaded function "_malloc_dbg" has "C" linkage.
An error associated with lines 149 and 150 of common_functions.h:
expected a type specifier.
Does anyone have any suggestions for resolving these three errors? It could be that these three errors are related. Someone suggested modifying the -ccbin option to reference a different compiler. If this seems likely, would you please offer concrete steps for changing this option? Are there other things I can do?
I had this problem. I tracked it down to my debug version having a MACRO defined: _CRTDBG_MAP_ALLOC. MS doc states:
When the _CRTDBG_MAP_ALLOC flag is defined in the debug version of an application, the base version of the heap functions are directly mapped to their debug versions. The flag is used in Crtdbg.h to do the mapping. This flag is only available when the _DEBUG flag has been defined in the application.
So it appears that this causes a conflict due to different versions of malloc and free being declared. Remove the definition and you should be good. I don't need debug versions of these functions in my debug build and you probably don't either. By the way, the setting/flag/definition is at:
Properties->C/C++/Preprocessor->Preprocessor Definitions

Visual Studio 2015: MSB3721 error exited with code 1 on CUDA 8.0

I have Visual Studio 2015 and latest CUDA version 8.0.60.
When I create a CUDA template in VS, in default example when calling a device function getting error about "<<< >>>" and say's "expected an expression"
and another error is MSB3721 that say's:
Severity Code Description Project File Line Suppression State
Error MSB3721 The command ""C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe" -gencode=arch=compute_20,code=\"sm_21,compute_20\" --use-local-env --cl-version 2015 -ccbin "D:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\x86_amd64" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include" --keep-dir x64\Debug -maxrregcount=0 --machine 64 --compile -cudart static -g -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -o x64\Debug\kernel.cu.obj "D:\c++ project\xhfy\xhfy\kernel.cu"" exited with code 1. xhfy C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\BuildCustomizations\CUDA 8.0.targets 689
What can I do to fix this error?
Thanks.
I changed my platform toolset to Visual Studio 2013 and it worked perfectly.
I modified cmakelists.txt, and it worked.
I got GTX1016, so I changed arch=compute_75,code=sm_75 into arch=compute_61,code=sm_61.