nvcc problems compiling object oriented code due to missing class template - c++

I have an issue with compiling CUDA code using nvcc. To demonstrate it I created a dummy class to represent a surface in 3D space.
Here goes the file surface.h:
#ifndef SURFACE_H
#define SURFACE_H
class surface
{
private:
float dX; // grid resolution in x [m]
float dY; // grid resolution in y [m]
int nX; // number of elements in x
int nY; // number of elements in y
float* depth; // pointer to depth array [m]
public:
__host__ __device__ void set_dim(const int _nX, const int _nY);
__host__ __device__ void set_res(const float _dX, const float _dY);
__host__ __device__ float get_surface_mm(const int iX, const int iY);
};
#endif
And here is the corresponding surface.cpp file:
#include "surface.h"
__host__ __device__ void surface::set_dim(const int _nX, const int _nY){
nX = _nX;
nY = _nY;
return;
}
__host__ __device__ void surface::set_res(const float _dX, const float _dY){
dX = _dX;
dY = _dY;
return;
}
__host__ __device__ float surface::get_surface_mm(const int iX, const int iY){
float surfLvl = (float) iX * iY;
return surfLvl;
}
I am trying to compile it with nvcc -x cu -arch=sm_50 -I. -dc surface.cpp -o surface.o but get the following errors:
surface.h(4): error: argument list for class template "surface" is missing
surface.cpp(7): error: argument list for class template "surface" is missing
surface.cpp(8): error: identifier "nX" is undefined
surface.cpp(9): error: identifier "nY" is undefined
surface.cpp(13): error: argument list for class template "surface" is missing
surface.cpp(14): error: identifier "dX" is undefined
surface.cpp(15): error: identifier "dY" is undefined
surface.cpp(19): error: argument list for class template "surface" is missing
8 errors detected in the compilation of "/tmp/tmpxft_000bedf2_00000000-6_surface.cpp1.ii".
I really don't get the reason for this error because in my opinion the class is fully defined and the argument list should be known to the compiler. Did any of you already experience a similar issue? If I remove the __device__ and __host__ flags and compile it with gcc everything works fine.
nvcc --version output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
I am aware that CUDA does not necessarily support every single feature of object oriented programming but double-checked that the stuff I am trying to compile here should be compatible.
I appreciate every hint :). Thanks a lot in advance.

The only problem with this code was that surface is already a built-in type of cuda.h resulting in this problem. After renaming it everything runs through without error.

Related

Cannot compile `enqueue_kernel` on opencl 2.1 NEO device

I have the following code on the device Intel(R) Gen9 HD Graphics NEO -- OpenCL 2.1 NEO :
__kernel void update(
const __global uint* positions,
const __global float3* offsets,
const int size,
__global int* cost
) {
int global_id = get_global_id(0);
if (global_id >= size) {
return;
}
int3 update_index = position_from_index(grid_centroid_positions[global_id], SIZE) -
offset_grid;
ndrange_t ndrange = ndrange_3d(size, size, size);
enqueue_kernel(get_default_queue(), ndrange,
^{update_surrouding_cells(offsets, global_id, update_index, update_edge_size, size, cost)});
}
But i get the following compiler error:
6:158:5: error: use of undeclared identifier 'ndrange_t'
ndrange_t ndrange = ndrange_3d(size, size, size);
^
6:161:5: error: implicit declaration of function 'enqueue_kernel' is invalid in OpenCL
enqueue_kernel(get_default_queue(), ndrange,
^
6:161:20: error: implicit declaration of function 'get_default_queue' is invalid in OpenCL
enqueue_kernel(get_default_queue(), ndrange,
^
6:162:163: error: expected ';' after expression
^{update_surrouding_cells(offsets, global_id, update_index, update_edge_size, size, cost)});
^
;
6:161:41: error: use of undeclared identifier 'ndrange'
enqueue_kernel(get_default_queue(), ndrange,
Compilation options are as follows:
-I "/home/development/cl" -g
-D SIZE=256
The device supports opencl 2.1, yet when compiling it seems none of the things for enqueue_kernel exist. Do i need a special extension or something? I am reading the spec here, but it doesn't seem to say anything about actually compiling the examples with dynamic parallelism.
When compiling, it is not just the version of the device that is important. The compiled version of cl code is passed into the compilation options. AKA the compilation options when compiling the opencl program (kernel code) should include:
-cl-std=CL2.0
Or the specific standard that you are looking for.

Compiling CUDA with clang on math functions

Compiling the following CUDA code helloWorld.cu with clang-11,
int main() {
return max(1.0f, 2.0f);
}
, using command clang++-11 -o helloWorld helloWorld.cu --cuda-gpu-arch=sm_75 -ldl -lrt -lcudart_static -pthread -L/usr/local/cuda/lib64, encountered the error:
helloWorld.cu:2:12: error: no matching function for call to 'max'
return max(1.0f, 2.0f);
^~~
/usr/lib/llvm-11/lib/clang/11.0.0/include/__clang_cuda_math.h:194:16: note: candidate function not viable: call to __device__ function from __host__ function
__DEVICE__ int max(int __a, int __b) { return __nv_max(__a, __b); }
...
/usr/local/cuda-10.2/include/crt/math_functions.hpp:1079:31: note: candidate function not viable: call to __device__ function from __host__ function
__MATH_FUNCTIONS_DECL__ float max(float a, float b)
...
Note that the matching function was actually located correctly by the compiler (ie, "math_functions.hpp:1079:31"), but was mistakenly inferred as a "_device_" function.
Thanks for any help in advance.
The code you have written is host code and it is not syntactically valid C++. That code should not compile, and the compiler behaviour is correct. The code should look like this in order to compile:
#include <algorithm>
int main() {
return std::max(1.0f, 2.0f);
}
i.e. you have to actually include the standard library header which defines the max function, and you have to use the correct namespace. C++ has no built-in max function. CUDA does. All you are seeing is an artifact of the clang CUDA compilation trajectory.

How to compile C++ with CUB library?

I am using the CUB device function just like the example here (https://forums.developer.nvidia.com/t/cub-library/37675/2). I was able to compile the .cu source file in the above example using nvcc.
However, I wonder if it is possible to call CUB device function in .cpp source file and compile the .cpp source file (using nvcc or g++)? I know its possible for thrust, since the example here works for me.
Currently I simply move the main function into a new main.cpp file and include the cub header file in main.cpp, but I failed to compile it using nvcc or g++ because of the same errors, part of the error message:
/home/xx/cub/cub/block/specializations/../../block/../util_type.cuh:261:5: error: ‘__host__’ does not name a type; did you mean ‘__loff_t’?
__host__ __device__ __forceinline__ NullType& operator =(const T&) { return *this; }
^~~~~~~~
__loff_t
/home/xx/cub/cub/block/specializations/../../block/../util_type.cuh:316:19: error: ‘short4’ was not declared in this scope
__CUB_ALIGN_BYTES(short4, 8)
^
/home/xx/cub/cub/block/specializations/../../block/../util_type.cuh:314:52: error: ISO C++ forbids declaration of ‘__align__’ with no type [-fpermissive]
{ enum { ALIGN_BYTES = b }; typedef __align__(b) t Type; };
^
/home/xx/cub/cub/block/specializations/../../block/../util_type.cuh:545:9: error: ‘__host__’ does not name a type; did you mean ‘__loff_t’?
__host__ __device__ __forceinline__ CubVector operator+(const CubVector &other) const { \
^
/home/xx/cub/cub/block/specializations/../../block/../util_arch.cuh:64:38: error: ‘__host__’ does not name a type; did you mean ‘CUhostFn’?
#define CUB_RUNTIME_FUNCTION __host__ __device__
^
/home/xx/cub/cub/device/../iterator/arg_index_input_iterator.cuh:144:25: error: ‘__forceinline__’ does not name a type; did you mean ‘__thrust_forceinline__’?
__host__ __device__ __forceinline__ ArgIndexInputIterator(
^~~~~~~~~~~~~~~
__thrust_forceinline__
/home/xx/cub/cub/device/device_reduce.cuh:148:12: error: ‘cudaError_t’ does not name a type; did you mean ‘cudaError_enum’?
static cudaError_t Reduce(
^~~~~~~~~~~
cudaError_enum
Here are my source files:
device.h
#pragma once
#include <cub/cub.cuh>
void scan_on_device();
device.cu
#include "device.h"
void scan_on_device()
{
// Declare, allocate, and initialize device pointers for input and output
int num_items = 7;
int *d_in;
int h_in[] = {8, 6, 7, 5, 3, 0, 9};
int sz = sizeof(h_in)/sizeof(h_in[0]);
int *d_out; // e.g., [ , , , , , , ]
cudaMalloc(&d_in, sz*sizeof(h_in[0]));
cudaMalloc(&d_out, sz*sizeof(h_in[0]));
cudaMemcpy(d_in, h_in, sz*sizeof(h_in[0]), cudaMemcpyHostToDevice);
printf("\nInput:\n");
for (int i = 0; i < sz; i++) printf("%d ", h_in[I]);
// Determine temporary device storage requirements
void *d_temp_storage = NULL;
size_t temp_storage_bytes = 0;
cub::DeviceScan::InclusiveSum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
// Allocate temporary storage
cudaMalloc(&d_temp_storage, temp_storage_bytes);
// Run inclusive prefix sum
cub::DeviceScan::InclusiveSum(d_temp_storage, temp_storage_bytes, d_in, d_out, num_items);
// d_out s<-- [8, 14, 21, 26, 29, 29, 38]
cudaMemcpy(h_in, d_out, sz*sizeof(h_in[0]), cudaMemcpyDeviceToHost);
printf("\nOutput:\n");
for (int i = 0; i < sz; i++) printf("%d ", h_in[i]);
printf("\n");
}
host.cpp
#include "device.h"
#include <cub/cub.cuh>
int main(void)
{
scan_on_device();
return 0;
}
I tried to compile them in three steps:
nvcc -O2 -c device.cu -I/home/xx/cub
g++ -O2 -c host.cpp -I/usr/local/cuda/include/ -I/home/xx/cub
g++ -o tester device.o host.o -L/usr/local/cuda/lib64 -lcudart
The first step went well, but the second step gives the above errors. Any ideas are appreciated. Maybe I mess up some links (to cuda or cub)?
The cub header library (e.g. cub.cuh) contains CUDA C++ code. Such code cannot be compiled by an ordinary host compiler like g++. You will get compile errors if you try to do so.
However your project doesn't require cub.cuh to be in your device.h header file, nor does it require cub.cuh to be compiled by g++. The only thing needed in your device.h header file is the function prototype for scan_on_device().
Therefore if you include the cub header file in the function implementation file device.cu, and remove it elsewhere in your project, your code will compile.

forward declaration problems

I think I am having a problem with forward declarations. I think one is necessary, but I'm not sure.
Basically I have a main.cpp:
//main.cpp
#include <iostream>
#include "CalculateForces.h"
#include "ParticleBox.h"
int main(void)
{
//g++ main.cpp ParticleBox.cpp -lgsl -lgslcblas -lm -std=c++0x
CalculateForces* calculate_forces= new CalculateForces();
ParticleBox* particles_box = new ParticleBox(2000,100,100,100);
delete calculate_forces;
delete particles_box;
return 0;
}
CalculateForces.h looks like this:
//CalculateForces.h
//We update the forces on each particle
class ParticleBox;
class CalculateForces
{
public:
CalculateForces(void);
~CalculateForces(void);
int UpdateForces(ParticleBox* particlebox);
int DiscretizeSpace(float cutoff_distance);
int LJForce(int local_index, int remote_index, ParticleBox* particlebox);
};
And finally the ParticleBox.h File looks like this:
//ParticleBox.h
//This is the definition of the particlebox. We manage all the particles in this
//file
//This should be changed to a template so that we can run float and double calcs properly :D
struct Particle;
class ParticleBox
{
public:
ParticleBox(int Num_Particles, float Box_length_x_, float Box_length_y_, float Box_length_z_);
~ParticleBox(void);
int set_num_particles(int Num_Particles);
int InitialiseUniverse(int temp,float mass);
float Boltzmann(float temperature);
int GenerateRandomUniquePositions(int number, float max, float min, float* rand_dim_positions);
private:
//Array to hold particles. Each particle has its own struct
Particle** particle_list_;
int num_particles_;
float box_length_x_;
float box_length_y_;
float box_length_z_;
float* rand_x_positions_;
float* rand_y_positions_;
float* rand_z_positions_;
float cutoff_distance_;
float sigma_;
float epsilon_;
};
int CalculateForces::DiscretizeSpace(float cutoff_distance, ParticleBox* particlebox)
{
......
return 0;
}
I use a forward declaration in ParticleBox.h of the Particle Struct and I can add a pointer of type Particle* to the class. This works fine.
The forward in CalculateForces.h of Class ParticleBox causes loads of compiler errors (too many to post but they start in an identical way to the below). Omitting it produces only a few errors:
In file included from main.cpp:3:0:
CalculateForces.h:9:20: error: ‘ParticleBox’ has not been declared
CalculateForces.h:11:50: error: ‘ParticleBox’ has not been declared
In file included from CalculateForces.cpp:3:0:
CalculateForces.h:9:20: error: ‘ParticleBox’ has not been declared
CalculateForces.h:11:50: error: ‘ParticleBox’ has not been declared
CalculateForces.cpp:12:35: error: ‘int CalculateForces::UpdateForces’ is not a static member of ‘class CalculateForces’
CalculateForces.cpp:12:35: error: ‘ParticleBox’ was not declared in this scope
CalculateForces.cpp:12:48: error: ‘particlebox’ was not declared in this scope
CalculateForces.cpp:13:1: error: expected ‘,’ or ‘;’ before ‘{’ token
I thought i would need the forward declaration as I try to use that type as an argument? What am i doing wrong?
Thanks and sorry for the long post
Your post is quite confusing because you posted the error that show up when you omit the forward declaration and you obviously have some additional errors in your code that mix with the error you asked about.
I assume that with the forward declaration, the errors change as they appear mostly in the implementation files, right? In this case the problem might be that the forward declaration is enough as long as you declare a pointer to the type, but it is not enough when you start using the pointer (dereferencing it).
If that is the case, the problem is most likely that you forgot to #include "ParticleBox.h" in CalculateForces.cpp (or some other implementation files).
As Rob determined the source of the errors was in a bit of code that I did not post. There were a few errors but the biggest was that in CalculateForces.cpp I tried to access int num_particles_; which is of course a private member of ParticleBox.

NVCC warning level

I would like NVCC to treat the warning below as an error:
warning : calling a __host__ function("foo") from a __host__ __device__ function("bar")
NVCC documentation "NVIDIA CUDA Compiler Driver NVCC" doesn't even contain the word "warning".
Quoting the CUDA COMPILER DRIVER NVCC reference guide, Section 3.2.8. "Generic Tool Options":
--Werror kind Make warnings of the specified kinds into errors. The following is the list of warning kinds accepted by this option:
cross-execution-space-call Be more strict about unsupported cross execution space calls. The compiler will generate an error instead of a warning for a call from a __host__ __device__ to a __host__ function.
Therefore, do the following:
Project -> Properties -> Configuration Properties -> CUDA C/C++ -> Command Line -> Additional Optics -> add --Werror cross-execution-space-call
This test program
#include <cuda.h>
#include <cuda_runtime.h>
void foo() { int a = 2;}
__host__ __device__ void test() {
int tId = 1;
foo();
}
int main(int argc, char **argv) { }
returns the following warning
warning : calling a __host__ function("foo") from a __host__ __device__ function("test") is not allowed
without the above mentioned additional compilation option and returns the following error
Error 3 error : calling a __host__ function("foo") from a __host__ __device__ function("test") is not allowed
with the above mentioned additional compilation option.