NVCC warning level - c++

I would like NVCC to treat the warning below as an error:
warning : calling a __host__ function("foo") from a __host__ __device__ function("bar")
NVCC documentation "NVIDIA CUDA Compiler Driver NVCC" doesn't even contain the word "warning".

Quoting the CUDA COMPILER DRIVER NVCC reference guide, Section 3.2.8. "Generic Tool Options":
--Werror kind Make warnings of the specified kinds into errors. The following is the list of warning kinds accepted by this option:
cross-execution-space-call Be more strict about unsupported cross execution space calls. The compiler will generate an error instead of a warning for a call from a __host__ __device__ to a __host__ function.
Therefore, do the following:
Project -> Properties -> Configuration Properties -> CUDA C/C++ -> Command Line -> Additional Optics -> add --Werror cross-execution-space-call
This test program
#include <cuda.h>
#include <cuda_runtime.h>
void foo() { int a = 2;}
__host__ __device__ void test() {
int tId = 1;
foo();
}
int main(int argc, char **argv) { }
returns the following warning
warning : calling a __host__ function("foo") from a __host__ __device__ function("test") is not allowed
without the above mentioned additional compilation option and returns the following error
Error 3 error : calling a __host__ function("foo") from a __host__ __device__ function("test") is not allowed
with the above mentioned additional compilation option.

Related

gcc warning about unitialized values caused by turning optimization on

Consider the following code
#include <iostream>
template<typename Value>
struct Wrapper {
Value m_value;
Wrapper() {}
Wrapper(const Wrapper<Value> &copy_from) : m_value(copy_from.m_value) {}
};
template<typename Value>
struct DefaultContainer {
Value m_default;
DefaultContainer(const Value& def) : m_default(def) {}
};
template<typename Value>
struct DefaultContainerUser : public DefaultContainer<Value> {
DefaultContainerUser() : DefaultContainer<Value>(Value()) {}
};
int main() {
DefaultContainerUser<Wrapper<double>> user;
std::cout << user.m_default.m_value << std::endl;
}
When I compile this with c++ -O1 -Werror -Wall test.cpp, I get the following error:
test.cpp: In function ‘int main()’:
test.cpp:8:63: error: ‘<anonymous>.Wrapper<double>::m_value’ is used uninitialized in this function [-Werror=uninitialized]
8 | Wrapper(const Wrapper<Value> &copy_from) : m_value(copy_from.m_value) {}
| ~~~~~~~~~~^~~~~~~
cc1plus: all warnings being treated as errors
If I disable optimizations using -O0, everything works fine. Adding -Wno-error=maybe-uninitialized with optimizations still turned on doesn't help. What am I doing wrong here?
The compiler that I'm using is c++ (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6).
It is normal that the warnings reported by a compiler varies depending on the optimization level. Warnings are usually a bi-product of optimization in the sense that the analysis needed for a particular optimization may uncover possible problems in the code or that transformations applied during optimization may uncover possible errors. This implies that when optimization is off and the analysis and transformations are not performed the problems are not detected.

Compiling CUDA with clang on math functions

Compiling the following CUDA code helloWorld.cu with clang-11,
int main() {
return max(1.0f, 2.0f);
}
, using command clang++-11 -o helloWorld helloWorld.cu --cuda-gpu-arch=sm_75 -ldl -lrt -lcudart_static -pthread -L/usr/local/cuda/lib64, encountered the error:
helloWorld.cu:2:12: error: no matching function for call to 'max'
return max(1.0f, 2.0f);
^~~
/usr/lib/llvm-11/lib/clang/11.0.0/include/__clang_cuda_math.h:194:16: note: candidate function not viable: call to __device__ function from __host__ function
__DEVICE__ int max(int __a, int __b) { return __nv_max(__a, __b); }
...
/usr/local/cuda-10.2/include/crt/math_functions.hpp:1079:31: note: candidate function not viable: call to __device__ function from __host__ function
__MATH_FUNCTIONS_DECL__ float max(float a, float b)
...
Note that the matching function was actually located correctly by the compiler (ie, "math_functions.hpp:1079:31"), but was mistakenly inferred as a "_device_" function.
Thanks for any help in advance.
The code you have written is host code and it is not syntactically valid C++. That code should not compile, and the compiler behaviour is correct. The code should look like this in order to compile:
#include <algorithm>
int main() {
return std::max(1.0f, 2.0f);
}
i.e. you have to actually include the standard library header which defines the max function, and you have to use the correct namespace. C++ has no built-in max function. CUDA does. All you are seeing is an artifact of the clang CUDA compilation trajectory.

nvcc problems compiling object oriented code due to missing class template

I have an issue with compiling CUDA code using nvcc. To demonstrate it I created a dummy class to represent a surface in 3D space.
Here goes the file surface.h:
#ifndef SURFACE_H
#define SURFACE_H
class surface
{
private:
float dX; // grid resolution in x [m]
float dY; // grid resolution in y [m]
int nX; // number of elements in x
int nY; // number of elements in y
float* depth; // pointer to depth array [m]
public:
__host__ __device__ void set_dim(const int _nX, const int _nY);
__host__ __device__ void set_res(const float _dX, const float _dY);
__host__ __device__ float get_surface_mm(const int iX, const int iY);
};
#endif
And here is the corresponding surface.cpp file:
#include "surface.h"
__host__ __device__ void surface::set_dim(const int _nX, const int _nY){
nX = _nX;
nY = _nY;
return;
}
__host__ __device__ void surface::set_res(const float _dX, const float _dY){
dX = _dX;
dY = _dY;
return;
}
__host__ __device__ float surface::get_surface_mm(const int iX, const int iY){
float surfLvl = (float) iX * iY;
return surfLvl;
}
I am trying to compile it with nvcc -x cu -arch=sm_50 -I. -dc surface.cpp -o surface.o but get the following errors:
surface.h(4): error: argument list for class template "surface" is missing
surface.cpp(7): error: argument list for class template "surface" is missing
surface.cpp(8): error: identifier "nX" is undefined
surface.cpp(9): error: identifier "nY" is undefined
surface.cpp(13): error: argument list for class template "surface" is missing
surface.cpp(14): error: identifier "dX" is undefined
surface.cpp(15): error: identifier "dY" is undefined
surface.cpp(19): error: argument list for class template "surface" is missing
8 errors detected in the compilation of "/tmp/tmpxft_000bedf2_00000000-6_surface.cpp1.ii".
I really don't get the reason for this error because in my opinion the class is fully defined and the argument list should be known to the compiler. Did any of you already experience a similar issue? If I remove the __device__ and __host__ flags and compile it with gcc everything works fine.
nvcc --version output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
I am aware that CUDA does not necessarily support every single feature of object oriented programming but double-checked that the stuff I am trying to compile here should be compatible.
I appreciate every hint :). Thanks a lot in advance.
The only problem with this code was that surface is already a built-in type of cuda.h resulting in this problem. After renaming it everything runs through without error.

CUDA __device__ Unresolved extern function [duplicate]

This question already has an answer here:
External calls are not supported - CUDA
(1 answer)
Closed 7 years ago.
I am trying to understand how to decouple CUDA __device__ codes in separate header files.
I have three files.
File: 1: int2.cuh
#ifndef INT2_H_
#define INT2_H_
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
__global__ void kernel();
__device__ int k2(int k);
int launchKernel(int dim);
#endif /* INT2_H_ */
File 2: int2.cu
#include "int2.cuh"
#include "cstdio"
__global__ void kernel() {
int tid = threadIdx.x;
printf("%d\n", k2(tid));
}
__device__ int k2(int i) {
return i * i;
}
int launchKernel(int dim) {
kernel<<<1, dim>>>();
cudaDeviceReset();
return 0;
}
File 3: CUDASample.cu
include <stdio.h>
#include <stdlib.h>
#include "int2.cuh"
#include "iostream"
using namespace std;
static const int WORK_SIZE = 256;
__global__ void sampleCuda() {
int tid = threadIdx.x;
// printf("%d\n", k2(tid)); //Can not call k2
printf("%d\n", tid * tid);
}
int main(void) {
int var;
var = launchKernel(16);
kernel<<<1, 16>>>();
cudaDeviceReset();
sampleCuda<<<1, 16>>>();
cudaDeviceReset();
return 0;
}
The code works fine. I can call the sampleCuda() kernel (in same file), call the C function launchKernel() (in other file), and call kernel() directly (in other file).
However, I get the following error when calling the __device__ function from the sampleCuda() kernel. The same function is callable in kernel().
10:58:11 **** Incremental Build of configuration Debug for project CUDASample ****
make all
Building file: ../src/CUDASample.cu
Invoking: NVCC Compiler
/Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 -gencode arch=compute_20,code=sm_20 -odir "src" -M -o "src/CUDASample.d" "../src/CUDASample.cu"
/Developer/NVIDIA/CUDA-6.5/bin/nvcc -G -g -O0 --compile --relocatable-device-code=false -gencode arch=compute_20,code=compute_20 -gencode arch=compute_20,code=sm_20 -x cu -o "src/CUDASample.o" "../src/CUDASample.cu"
../src/CUDASample.cu(18): warning: variable "var" was set but never used
../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced
../src/CUDASample.cu(18): warning: variable "var" was set but never used
../src/CUDASample.cu(8): warning: variable "WORK_SIZE" was declared but never referenced
ptxas fatal : Unresolved extern function '_Z2k2i'
make: *** [src/CUDASample.o] Error 255
10:58:14 Build Finished (took 2s.388ms)
How do I call the __device__ function from the sampleCuda() kernel ?
The issue is that you defined a __device__ function in separate compilation unit from __global__ that calls it. You need to either explicitely enable relocatable device code mode by adding -dc flag or move your definition to the same unit.
From nvcc documentation:
--device-c|-dc Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that contains relocatable device code. It is equivalent to
--relocatable-device-code=true --compile.
See Separate Compilation and Linking of CUDA C++ Device Code for more information.

Suppress unused variable warning in C++ => Compiler bug or code bug?

Presently, I am using the following function template to suppress unused variable warnings:
template<typename T>
void
unused(T const &) {
/* Do nothing. */
}
However, when porting to cygwin from Linux, I am now getting compiler errors on g++ 3.4.4 (On linux I am 3.4.6, so maybe this is a bug fix?):
Write.cpp: In member function `void* Write::initReadWrite()':
Write.cpp:516: error: invalid initialization of reference of type 'const volatile bool&' from expression of type 'volatile bool'
../../src/common/Assert.h:27: error: in passing argument 1 of `void unused(const T&) [with T = volatile bool]'
make[1]: *** [ARCH.cygwin/release/Write.o] Error 1
The argument to unused is a member variable declared as:
volatile bool readWriteActivated;
Is this a compiler bug or a bug in my code?
Here is the minimal test case:
template<typename T>
void unused(T const &) { }
int main() {
volatile bool x = false;
unused(!x); // type of "!x" is bool
}
The actual way of indicating you don't actually use a parameter is not giving it a name:
int f(int a, float) {
return a*2;
}
will compile everywhere with all warnings turned on, without warning about the unused float. Even if the argument does have a name in the prototype (e.g. int f(int a, float f);), it still won't complain.
I'm not 100% sure that this is portable, but this is the idiom I've usually used for suppressing warnings about unused variables. The context here is a signal handler that is only used to catch SIGINT and SIGTERM, so if the function is ever called I know it's time for the program to exit.
volatile bool app_killed = false;
int signal_handler(int signum)
{
(void)signum; // this suppresses the warnings
app_killed = true;
}
I tend to dislike cluttering up the parameter list with __attribute__((unused)), since the cast-to-void trick works without resorting to macros for Visual C++.
It is a compiler bug and there are no known work arounds:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42655
It is fixed in v4.4.
In GCC, you can define a macro as follows:
#ifdef UNUSED
#elif defined(__GNUC__)
# define UNUSED(x) UNUSED_ ## x __attribute__((unused))
#elif defined(__LCLINT__)
# define UNUSED(x) /*#unused#*/ x
#else
# define UNUSED(x) x
#endif
Any parameters marked with this macro will suppress the unused warning GCC emits (and renames the parameter with a prefix of UNUSED_). For Visual Studio, you can suppress warnings with a #pragma directive.
The answer proposed by haavee (amended by ur) is the one I would normally use:
int f(int a, float /*epsilon*/) {
return a*2;
}
The real problem happens when the argument is sometimes but not always used in the method, e.g.:
int f(int a, float epsilon) {
#ifdef LOGGING_ENABLED
LOG("f: a = %d, epsilon = %f\n", a, epsilon);
#endif
return a*2;
}
Now, I can't comment out the parameter name epsilon because that will break my logging build (I don't want to insert another #ifdef in the argument list because that makes the code much harder to read).
So I think the best solution would be to use Tom's suggestion:
int f(int a, float epsilon) {
(void) epsilon; // suppress compiler warning for possibly unused arg
#ifdef LOGGING_ENABLED
LOG("f: a = %d, epsilon = %f\n", a, epsilon);
#endif
return a*2;
}
My only worry would be that some compilers might warn about the "(void) epsilon;" statement, e.g. "statement has no effect" warning or some such - I guess I'll just have to test on all the compilers I'm likely to use...