say I have a kernel
foo(int a, int b)
{
__shared__ int array[a];
}
it seems a has to be a constant value, I added const in front of int. It sill didn't work out,
any idea?
foo(const int a, const int b)
{
__shared__ int array[a];
}
While you can't have a dynamically-sized array because of the constraints of the C language (as mentioned in other answers), what you can do in CUDA is something like this:
extern __shared__ float fshared[];
__global__ void testShmem( float * result, unsigned int shmemSize ) {
// use fshared - shmemSize tells you how many bytes
// Note that the following is not a sensible use of shared memory!
for( int i = 0; i < shmemSize/sizeof(float); ++i ) {
fshared[i] = 0;
}
}
providing you tell CUDA how much shared memory you want during kernel invocation, like so:
testShmem<<<grid, block, 1024>>>( pdata, 1024 );
In ISO C++ the size of an array needs to be a so-called constant expression. This is stronger than a const-qualified variable. It basically means compile-time constant. So, the value has to be known at compile-time.
In ISO C90 this was also the case. C99 added VLAs, variable-length-arrays, that allow the size to be determined at runtime. The sizeof operator for these VLAs becomes a runtime operator.
I'm not familiar with CUDA or the __shared__ syntax. It's not clear to me why/how you use the term kernel. But I guess the rules are similar w.r.t. constant expressions and arrays.
I don't think CUDA or OpenCL let you dynamically allocate shared memory. Use #define macro instead.
If you need a dynamic sized array on a per program basis, you can supply it using -D MYMACRO (with OpenCL, I don't know for CUDA). See Bahbar's answer.
Here's how you can statically allocate a __shared__ array of n values in CUDA using C++ templates
template <int n>
kernel(...)
{
__shared__ int array[n];
}
const int n = 128;
kernel<n><<<grid_size,block_size>>>(...);
Note that n must be a known constant at compile time for this to work. If n is not known at compile time then you must use the approach Edric suggests.
I suspect this is a C language question.
If it were C++, you could simply use std::vector.
void foo( int a, int b )
{
std::vector<int> array( a );
// ...
}
It if really is C++, then what C++ features you can use safely may depend on the environment. It's not clear what you mean by "kernel".
Related
I am building an R package that contains a c++ program. The checking runs fine, but I am getting this message
: warning: ISO C++ forbids variable length array ‘s1’ [-Wvla]
The CRAN's maintainer says that the error is in this part of the code is shown below. I am thinking that the argument "nrows" is redundant , but I wonder if there is another way to solve the problem
double entCI(double input[], int cMatrix[], double partition,
int nrows, int begin, int end)
{
double s1[nrows], s2[nrows], entropy;
int cs1[nrows], cs2[nrows];
int s1Count=0, s2Count=0, sCount=0;
while(input[begin]<partition)
{
cs1[s1Count]=cMatrix[begin];
s1[s1Count++]=input[begin++];
}
while(begin<end)
{
cs2[s2Count]=cMatrix[begin];
s2[s2Count++]=input[begin++];
}
sCount=s1Count+s2Count;
entropy=(s1Count/double(sCount))*ent(s1,cs1,s1Count)
+(s2Count/double(sCount))*ent(s2,cs2,s2Count);
return entropy;
}
Indeed, the error is on these lines:
double s1[nrows], s2[nrows], entropy;
int cs1[nrows], cs2[nrows];
They declare arrays, whose size depend on the nrows argument. The value of nrows is determined at runtime and therefore the arrays must be variable length. Such array variables are not allowed by the c++ standard as told to you by the warning.
I am thinking that the argument "nrows" is redundant
I don't see how that is. It's used in the function.
but I wonder if there is another way to solve the problem
There are ways to solve the problem. If the size of the array needs to be determined at runtime, it must be allocated dynamically. The simplest and safest way to do that is to use std::vector.
Generally you should use dynamic memory allocation to create array out of variable:
double* s1 = new double[nrows];
Then, remember to delete that array.
Other solution is to use std::vector instead of plain array.
Variable Length Arrays is for a long time a feature from gcc. It has been accepted in C99 but not in C++11 (nor in any following C++ version I know).
An easy and clean solution would be to compile that function as C because it does not use any specific C++ feature, simply array manipulation. In fact, this function is plain C that happens to be accepted by g++ but is not correct C++ hence the warning.
My advice is :
put the function in a .c file and compile it in C99 mode
declare it as extern "C" double entCI(double input[], int cMatrix[], double partition,
int nrows, int begin, int end) in other C++ module, or better write the include file declaring it as
#ifdef C++
extern "C" {
#endif
double entCI(double input[], int cMatrix[], double partition,
int nrows, int begin, int end)
#ifdef C++
}
#endif
This code will not compile:
#ifndef RemoteControl_h
#define RemoteControl_h
#include "Arduino.h"
class RemoteControl
{
public:
RemoteControl();
~RemoteControl();
static void prev_track();
static void next_track();
static void play_pause_track();
static void mute();
static void vol_up();
static void vol_down();
void respond(int code);
void add_code(int code, void (*func)());
private:
boolean active = true;
struct pair {
int _code;
void (*_func)();
};
const int max = 1000;
int database_length = 0;
pair database[max]; //This line doesn't compile unless I use a literal constant instead of "max"
};
#endif
But if I put the section below in the constructor for the class instead it works fine.
const int max = 1000;
int database_length = 0;
pair database[max];
Am I not allowed to declare an array within a class in c++ and use a virtual constant as the length? I am working in arduino if that makes a difference, but I expect that I am not understanding something with the c++ language since this is a standard .h file. Oh and the problem isn't the .cpp file because I completely removed it with the same results: compiles with literal constant length but not virtual constant length.
In C or C++,try using malloc() in stdlib.h, cstdlib for c++. Don't forget free()
const int max = 1000;
struct pair *ptr = malloc(sizeof(pair) * max); // allocated 1000 pairs
free(ptr); // when the amount of memory is not needed anymore
Let me first clear a few things up for you.
In C, a const variable is considered as const-qualified, it is not a compile-time constant value (unlike an integer literal, which is a compile time constant value). So, as per the rules for normal array size specification, you cannot even use a const variable in this case.
In C, we may have the provision to use VLA which enables us to use syntax like pair database[max] even if max is not a const variable but that is again some optional feature of the compiler (as per C11).
In C++, we can use a const variable as the size of array, as in C++, a const variable is a compile time constant.
So, to answer your question:
In C, your code will be ok if your compiler supports VLA. and even if max is not const.
In C++, there is no VLA, but it maybe supported as a gnu extension. If max is const, it will be ok.
The easiest fix is to just take the
const int max = 1000;
out of the class and put it above the class.
Even better would be to ensure that it is a compile-time constant like so:
constexpr int max = 1000;
I want to call different instantiations of a templated CUDA kernel with dynamically allocated shared memory in one program. My first naive approach was to write:
template<typename T>
__global__ void kernel(T* ptr)
{
extern __shared__ T smem[];
// calculations here ...
}
template<typename T>
void call_kernel( T* ptr, const int n )
{
dim3 dimBlock(n), dimGrid;
kernel<<<dimGrid, dimBlock, n*sizeof(T)>>>(ptr);
}
int main(int argc, char *argv[])
{
const int n = 32;
float *float_ptr;
double *double_ptr;
cudaMalloc( (void**)&float_ptr, n*sizeof(float) );
cudaMalloc( (void**)&double_ptr, n*sizeof(double) );
call_kernel( float_ptr, n );
call_kernel( double_ptr, n ); // problem, 2nd instantiation
cudaFree( (void*)float_ptr );
cudaFree( (void*)double_ptr );
return 0;
}
However, this code cannot be compiled. nvcc gives me the following error message:
main.cu(4): error: declaration is incompatible with previous "smem"
(4): here
detected during:
instantiation of "void kernel(T *) [with T=double]"
(12): here
instantiation of "void call_kernel(T *, int) [with T=double]"
(24): here
I understand that I am running into a name conflict because the shared memory is declared as extern. Nevertheless there is no way around that if I want to define its size during runtime, as far as I know.
So, my question is: Is there any elegant way to obtain the desired behavior? With elegant I mean without code duplication etc.
Dynamically allocated shared memory is really just a size (in bytes) and a pointer being set up for the kernel. So something like this should work:
replace this:
extern __shared__ T smem[];
with this:
extern __shared__ __align__(sizeof(T)) unsigned char my_smem[];
T *smem = reinterpret_cast<T *>(my_smem);
You can see other examples of re-casting of dynamically allocated shared memory pointers in the programming guide which can serve other needs.
EDIT: updated my answer to reflect the comment by #njuffa.
(A variation on #RobertCrovella's answer)
NVCC is not willing to accept two extern __shared__ arrays of the same name but different types - even if they're never in each other's scope. We'll need to satisfy NVCC by having our template instances all use the same type for the shared memory under the hood, while letting the kernel code using them see the type it likes.
So we replace this instruction:
extern __shared__ T smem[];
with this one:
auto smem = shared_memory_proxy<T>();
where:
template <typename T>
__device__ T* shared_memory_proxy()
{
// do we need an __align__() here? I don't think so...
extern __shared__ unsigned char memory[];
return reinterpret_cast<T*>(memory);
}
is in some device-side code include file.
Advantages:
One-liner at the site of use.
Simpler syntax to remember.
Separation of concerns - whoever reads the kernel doesn't have to think about why s/he's seeing extern, or alignment specifiers, or a reinterpret cast etc.
Notes:
This is implemented as part of my CUDA kernel author's tools header-only library: shared_memory.cuh (where it's named shared_memory::dynamic::proxy() ).
I have not explored the question of alignment, when you use both dynamic and static shared memory.
There are tons of similar questions, but still I could not find any answer relevant for the feature of variable length arrays in C99/C11.
How to pass multidimensional variable length array to a function in C99/C11?
For example:
void foo(int n, int arr[][]) // <-- error here, how to fix?
{
}
void bar(int n)
{
int arr[n][n];
foo(n, arr);
}
Compiler (g++-4.7 -std=gnu++11) says:
error: declaration of ‘arr’ as multidimensional array must have bounds for all dimensions except the first
If I change it to int *arr[], compiler still complains:
error: cannot convert ‘int (*)[(((sizetype)(((ssizetype)n) + -1)) + 1)]’ to ‘int**’ for argument ‘2’ to ‘void foo(int, int**)’
Next question, how to pass it by value and how to pass it by reference? Apparently, usually you don't want the entire array to be copied when you pass it to a function.
With constant length arrays it's simple, since, as the "constant" implies, you should know the length when you declare the function:
void foo2(int n, int arr[][10]) // <-- ok
{
}
void bar2()
{
int arr[10][10];
foo2(10, arr);
}
I know, passing arrays to functions like this is not a best practice, and I don't like it at all. It is probably better to do with flat pointers, or objects (like std:vector) or somehow else. But still, I'm a bit curios what is the answer here from a theoretical standpoint.
Passing arrays to functions is a bit funny in C and C++. There are no rvalues of array types, so you're actually passing a pointer.
To address a 2D array (a real one, not array of arrays), you'll need to pass 2 chunks of data:
the pointer to where it starts
how wide one row is
And these are two separate values, be it C or C++ or with VLA or without or whatnot.
Some ways to write that:
Simplest, works everywhere but needs more manual work
void foo(int width, int* arr) {
arr[x + y*width] = 5;
}
VLA, standard C99
void foo(int width, int arr[][width]) {
arr[x][y] = 5;
}
VLA w/ reversed arguments, forward parameter declaration (GNU C extension)
void foo(int width; int arr[][width], int width) {
arr[x][y]=5;
}
C++ w/ VLA (GNU C++ extension, terribly ugly)
void foo(int width, int* ptr) {
typedef int arrtype[][width];
arrtype& arr = *reinterpret_cast<arrtype*>(ptr);
arr[x][y]=5;
}
Big remark:
The [x][y] notation with a 2D array works because the array's type contains the width. No VLA = array types must be fixed at compile-time.
Hence: If you can't use VLA, then...
there's no way to handle it in C,
there's no way to handle it without a proxy class w/ overloaded operator overloading in C++.
If you can use VLA (C99 or GNU C++ extensions), then...
you're in the green in C,
you still need a mess in C++, use classes instead.
For C++, boost::multi_array is a solid choice.
A workaround
For 2D arrays, you can make two separate allocations:
a 1D array of pointers to T (A)
a 2D array of T (B)
Then set the pointers in (A) to point into respective rows of (B).
With this setup, you can just pass (A) around as a simple T** and it will behave well with [x][y] indexing.
This solution is nice for 2D, but needs more and more boilerplate for higher dimensions. It's also slower than the VLA solution because of the extra layer of indirection.
You may also run into a similar solution with a separate allocation for every B's row. In C this looks like a malloc-in-a-loop, and is analogous of C++'s vector-of-vectors. However this takes away the benefit of having the whole array in one block.
There is no clear cut way for doing this but you can use a workaround to treat a 2 dimensional array as a one dimensional array and then reconvert it to a two dimensional array inside the function.
void foo2(int n, int *arr)
{
int *ptr; // use this as a marker to go to next block
int i;
int j;
for(i = 0; i < n; i++)
{
ptr = arr + i*n; // this is the starting for arr[i] ...
for (j = 0; j < n ;j++)
{
printf(" %d ", ptr[j]); // This is same as arr[i][j]
}
}
}
void bar2()
{
int arr[10][10];
foo2(10, (int *)arr);
}
As the problem stated, this is doable:
#include <iostream>
int main(int argc, char *argv[])
{
unsigned short int i;
std::cin >> i;
unsigned long long int k[i][i];
}
Here I declared an array that is sized i by i, both dimensions are variables.
But not this:
#include <iostream>
int main(int argc, char *argv[])
{
unsigned short int i;
std::cin >> i;
unsigned long long int** k = new int[i][i];
delete[] k;
}
I got an compiler message telling me that
error: only the first dimension of an allocated array may have dynamic
size
I am forced to do this:
#include <iostream>
int main(int argc, char *argv[])
{
unsigned short int i;
std::cin >> i;
unsigned long long int** k = new unsigned long long int*[i];
for ( unsigned short int idx = 0 ; idx < i ; ++ i )
k[idx] = new unsigned long long int[i];
for ( unsigned short int idx = 0 ; idx < i ; ++ i )
delete[] k[idx];
delete[] k;
}
To my understanding, new and delete are used to allocate something on heap, not on stack, which won't be deleted when it goes out of scope, and is useful for passing datas across functions and objects, etc.
What I don't understand is what happens when I declare that k in the first example, I am told that declared array should (and could) only have constant dimensions, and when in need for a array of unknown size, one should always consider new & delete or vectors.
Is there any pros and cons to those two solutions I'm not getting, or is it just what it is?
I'm using Apple's LLVM compiler by the way.
Neither form is C++ standard compliant, because the standard does not support variable-length arrays (VLAs) (interestingly, C99 does - but C is not C++). However, several compilers have an extension to support this, including your compiler:
From Clang's Manual:
Clang supports such variable length arrays in very limited circumstances for compatibility with GNU C and C99 programs:
The element type of a variable length array must be a POD ("plain old data") type, which means that it cannot have any user-declared constructors or destructors, any base classes, or any members of non-POD type. All C types are POD types.
Variable length arrays cannot be used as the type of a non-type template parameter.
But given that the extension is in place, why doesn't your second snippet work? That's because VLA only applies to automatic variables - that is, arguments or local variables. k is automatic but it's just a pointer - the array itself is defined by new int[i][i], which allocates on the heap and is decidedly not an automatic variable.
You can read more about this on the relevant GCC manual section.
I'm sure you can find implementation for 2D array functionality easily, but you can make your own class too. The simplest way is to use std::vector to hold the data and have an index-mapping function that takes your two coordinates and return a single index into the vector.
The client code will look a little different, instead of arr[x][y] you have arr.at(x,y) but otherwise it does the same. You do not have to fiddle with memory management as that is done by std::vector, just use v.resize(N*N) in constructor or dimension-setting function.
Essentially what compilers generally do with two-dimensional arrays (fixed or variable) is this:
int arr[x][y] ---> int arr[x*y];
arr[2][4]= something ---> arr[2+4*x]= something;
Basically they are just a nicer way of notation of a one-dimensional array (on the stack). Most compilers require fixed sizes, so the compiler has an easier way of telling what the dimensions are (and thus what to multiply with). It appears you have just a compiler, which can keep track of the dimensions (and multipliers) even if you use variables.
Of course you can mimick that with new[] yourself too, but it's not supported by the compiler per se.
Probably for the same reason, i.e. because it would be even harder keeping track of the dimensions, especially when moving the pointers around.
E.g. with a new-pointer you could later write:
newarr= someotherarray;
and someotherarray could be something with even different dimensions. If the compiler did a 2-dim -> one dim translation, he'd have to track all possible size transitions.
With the stack allocated arr above, this isn't necessary, because at least once the compiler made it, it stays that size.