thrust::device_vector in CUDA [duplicate] - c++

This question already has answers here:
Thrust inside user written kernels
(4 answers)
Closed 5 years ago.
i am new to CUDA and is trying to learn the usage. can someone please help. i have the following in the main function (i am in visual studio and my source and header files are .cu and .cuh respectively)
thrust::device_vector<float> d_vec(100);
kernel<<<100,1>>>(d_vec);
and then in the kernel i have
template <typename T> __global__ kernel(thrust::device_vector<T> d_vec)
{ int tid = threadIdx.x + blockIdx.x*blockDim.x;
T xxx = 3.0;
d_vec[tid] = xxx;
}
my objective is to call the kernel once with float and once with double. Also note that in this simple example i have variable xxx (which in my real case is some computation producing double or float numbers).
and i get two error:
1> calling a __host__ function (operator =) from a __global__ function is not allowed
2> calling a __host__ function (operator []) from a __global__ function is not allowed
so i guess "[]" and "=" in "d_vec[tid] = .." is the problem. But my question is how do i access the device vector inside my kernel. Can someone please clarify what is the correct procedure and what i am doing wrong. thanks in advance

thrust::device_vector objects/references can not be used as kernel parameters.
You could use raw pointers to pass the device vector data.
thrust::device_vector<float> d_vec(100);
float* pd_vec = thrust::raw_pointer_cast(d_vec.data());
kernel<<<100,1>>>(pd_vec);
and here's the prototype of the kernel
template <typename T> __global__ kernel(T* pd_vec)
You Q is similar to this one. how to cast thrust::device_vector<int> to raw pointer

Related

How to Pass Vector of int into CUDA global function [duplicate]

This question already has answers here:
Using std::vector in CUDA device code
(5 answers)
Closed last year.
I'm writing my first CUDA program and encounter a lot of issues, as my main programming language is not C++.
In my console app I have a vector of int that holds a constant list of numbers. My code should create new vectors and check matches with the original constant vector.
I don't know how to pass / copy pointers of a vector into the GPU device. I get this error message after I tries to convert my code from C# into C++ and work with the Kernel:
"Error calling a host function("std::vector<int, ::std::allocator > ::vector()") from a global function("MagicSeedCUDA::bigCUDAJob") is not allowed"
This is part of my code:
std::vector<int> selectedList;
FillA1(A1, "0152793281263155465283127699107744880041");
selectedList = A1;
bigCUDAJob<< <640, 640, 640>> >(i, j, selectedList);
__global__ void bigCUDAJob(int i, int j, std::vector<int> selectedList)
{
std::vector<int> tempList;
// here comes code that adds numbers to tempList
// code to find matches between tempList and the
// parameter selectedList
}
How to modify my code so I won't get compiler errors? I can work with array of int as well.
I don't know how to pass / copy pointers of a vector into the GPU device
First, remind yourself of how to pass memory that's not in an std::vector to a CUDA kernel. (Re)read the vectorAdd example program, part of NVIDIA's CUDA samples.
cudaError_t status;
std::vector<int> selectedList;
// ... etc. ...
int *selectedListOnDevice = NULL;
std::size_t selectedListSizeInBytes = sizeof(int) * selectedList.size();
status = cudaMalloc((void **)&selectedListOnDevice, selectedListSizeInBytes);
if (status != cudaSuccess) { /* handle error */ }
cudaMemcpy(selectedListOnDevice, selectedList.data(), selectedListSizeInBytes);
if (status != cudaSuccess) { /* handle error */ }
// ... etc. ...
// eventually:
cudaFree(selectedListOnDevice);
That's using the official CUDA runtime API. If, however, you use my CUDA API wrappers (which you absolutely don't have to), the above becomes:
auto selectedListOnDevice = cuda::memory::make_unique<int[]>(selectedList.size());
cuda::memory::copy(selectedListOnDevice.get(), selectedList.data());
and you don't need to handle the errors yourself - on error, an exception will be thrown.
Another alternative is to use NVIDIA's thrust library, which offers an std::vector-like class called a "device vector". This allows you to write:
thrust::device_vector<int> selectedListOnDevice = selectedList;
and it should "just work".
I get this error message:
Error calling a host function("std::vector<int, ::std::allocator >
::vector()") from a global function("MagicSeedCUDA::bigCUDAJob") is
not allowed
That issue is covered in Using std::vector in CUDA device code , as #paleonix mentioned. In a nutshell: You just cannot have std::vector appear in your __device__ or __global__ functions, at all, no matter how you try and write it.
I'm writing my first CUDA program and encounter a lot of issues, as my main programming language is not C++.
Then, regardless of your specific issue with an std::vector, you should take some time to study C++ programming. Alternatively, you could brush up on C programming, as you can write CUDA kernels which are C'ish rather than C++'ish; but C++'ish features are actually quite useful when writing kernels, not just on the host-side.

How to accumulate vector in a CUDA kernel [duplicate]

This question already has answers here:
Thrust inside user written kernels
(4 answers)
Closed 5 years ago.
i am new to CUDA and is trying to learn the usage. can someone please help. i have the following in the main function (i am in visual studio and my source and header files are .cu and .cuh respectively)
thrust::device_vector<float> d_vec(100);
kernel<<<100,1>>>(d_vec);
and then in the kernel i have
template <typename T> __global__ kernel(thrust::device_vector<T> d_vec)
{ int tid = threadIdx.x + blockIdx.x*blockDim.x;
T xxx = 3.0;
d_vec[tid] = xxx;
}
my objective is to call the kernel once with float and once with double. Also note that in this simple example i have variable xxx (which in my real case is some computation producing double or float numbers).
and i get two error:
1> calling a __host__ function (operator =) from a __global__ function is not allowed
2> calling a __host__ function (operator []) from a __global__ function is not allowed
so i guess "[]" and "=" in "d_vec[tid] = .." is the problem. But my question is how do i access the device vector inside my kernel. Can someone please clarify what is the correct procedure and what i am doing wrong. thanks in advance
thrust::device_vector objects/references can not be used as kernel parameters.
You could use raw pointers to pass the device vector data.
thrust::device_vector<float> d_vec(100);
float* pd_vec = thrust::raw_pointer_cast(d_vec.data());
kernel<<<100,1>>>(pd_vec);
and here's the prototype of the kernel
template <typename T> __global__ kernel(T* pd_vec)
You Q is similar to this one. how to cast thrust::device_vector<int> to raw pointer

Why it is not possible to overload host/device member function of a CUDA C++ class [duplicate]

This question already has answers here:
How to make a kernel function which callable from both the host and device?
(2 answers)
Closed 7 years ago.
I have a 3d vector class with member functions marked as host and device functions. Below is snippet of one of the member function:
__host__ __device__
double Vector::GetMagReciprocal()
{
double result = 1/sqrt(x*x + y*y + z*z);
return result;
}
What I want to achieve is to have separate definition for host and device function so that I can get better performance by using CUDA math intrinsic function rqsrt when executing on device. The way I would do it is to overload this member function for host and device:
__host__
double Vector::GetMagReciprocal()
{
double result = 1/sqrt(x*x + y*y + z*z);
return result;
}
__device__
double Vector::GetMagReciprocal()
{
double result = rsqrt(x*x + y*y + z*z);
return result;
}
Now when I compile the Vector.cpp file using nvcc(-x cu flag), I get following error
function "Vector::GetMagReciprocal" has already been defined
Now I wonder why NVIDIA doesn't support this sort of overloading.
I can think of alternate ways of achieving the separation, but they have their own issues:
create separate member functions for host and device in vector class say GetMagReciprocalHost and GetMagReciprocalDevice and call the appropriate function in host/device code
Have a single member function GetMagReciprocal but pass a flag to the member function to choose between host code and device code
Maybe there is another easier way to achieve this. If someone has any suggestions, it will be nice.
REEDITED: I had not mentioned about possibility of conditional compilation using CUDA ARCH flag to generate separate host and device. This was actually the first thing I had done when modifying the member function. But something came to my mind which said this won't work. Perhaps I was wrong about my understanding of usage of this compilation flag. So the answer suugested by sgarizvi is the right answer
You can use conditional compilation flag __CUDA_ARCH__ to generate different codes for host and device in a __host__ __device__ function.
__CUDA_ARCH__ is defined only for device code, so to create different implementation for host and device, you can do the following:
__host__ __device__
double Vector::GetMagReciprocal()
{
double result;
#ifdef __CUDA_ARCH__
result = rsqrt(x*x + y*y + z*z);
#else
result = 1/sqrt(x*x + y*y + z*z);
#endif
return result;
}

Passing variable defining the size of a 2D array's elements [duplicate]

This question already has answers here:
How to pass a VLA to a function template?
(6 answers)
Closed 8 years ago.
I'm working on some passing of arrays in C++. The following works, provided I define the array with numbers such as:
gen0[6][7].
But, I cannot call the method where I send a variable as my size parameters. I realize that I probably need to something with passing them as pointers or by reference. I read elsewhere to use unsigned int, didn't work. I tried a few variations, but I'm struggling with the whole concept. Any tips/advice would be greatly appreciated!
//in main
int col1, col2;
col1 = rand() % 40 + 1;
col2 = rand() % 50 +1;
int gen0[col1][col2];
print(gen0)
//not in main
template<int R, int C>
void print(int (&array)[R][C])
VLA (variable length arrays) is an extension of some compiler and it is done at runtime.
whereas:
template<int R, int C> void print(const int (&array)[R][C])
is the correct way to pass multi-dimensional array by reference, this is done at compile time and is incompatible with VLA.
A possible alternative would be to use std::vector:
std::vector<std::vector<int>> gen0(col1, std::vector<int>(col2));
And
void print(const std::vector<std::vector<int>>& array)

Mysterious oneliner template code, any one? [duplicate]

This question already has answers here:
Can someone explain this template code that gives me the size of an array? [duplicate]
(4 answers)
Closed 7 years ago.
I was reading this page :
C++ Tip: How To Get Array Length. The writer presented a piece of code to know the size of static arrays.
template<typename T, int size>
int GetArrLength(T(&)[size]){return size;} // what does '(&)' mean ?
.
.
.
int arr[17];
int arrSize = GetArrLength(arr); // arrSize = 17
Could anyone please shed the light on this code, because I couldn't understand how it really works.
The function is passed a reference (&) to an array of type T, and size size.
sizeof(x)/sizeof(x[0])
Won't catch errors if the array degrads to a pointer type, but it will still compile!
Templated version is bullet proof.
T(&)[size] is reference to T[size]. If you don't use reference C++ will treat T[size] as T*, and function template parameter deduction would not work.
Wow, that's tricky. I don't know either, but if you keep reading down into the comments on that page:
is essentially
int arr[17]; int arrSize =
GetArrLength(arr);
which creates this function:
int GetArrLength(int(&)[17]){return
17;}
So & must mean reference like it always does, so it's taking a reference of the array type and the size (second item in the template) is then the size of the incoming array.
Think I'll stick with old
sizeof(x)/sizeof(x[0])