I'm trying to get unified memory to work with classes, and to pass and manipulate arrays in unified memory with kernel calls. I want to pass everything by reference.
So I'm overriding the new method for classes and arrays so they are accessible by the GPU, but I think I need to add more code to have arrays in unified memory, but not quite sure how to do this. I get a memory access error when the fillArray() method is called.
If I have to do these sorts of operations (arithmetic on arrays and copying between different sized arrays) hundreds of times, is unified memory a good approach or should I stick with manually copying between cpu and gpu memory? Thank you very much!
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <iostream>
#include <stdio.h>
#define TILE_WIDTH 4
#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER __host__ __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif
__global__ void add1(int height, int width, int *a, int *resultArray)
{
int w = blockIdx.x * blockDim.x + threadIdx.x; // Col // width
int h = blockIdx.y * blockDim.y + threadIdx.y;
int index = h * width + w;
if ((w < width) && (h < height))
resultArray[index] = a[index] + 1;
}
class Managed
{
public:
void *operator new(size_t len)
{
void *ptr;
cudaMallocManaged(&ptr, len);
return ptr;
}
void Managed::operator delete(void *ptr)
{
cudaFree(ptr);
}
void* operator new[] (size_t len) {
void *ptr;
cudaMallocManaged(&ptr, len);
return ptr;
}
void Managed::operator delete[] (void* ptr) {
cudaFree(ptr);
}
};
class testArray : public Managed
{
public:
testArray()
{
height = 16;
width = 8;
myArray = new int[height*width];
}
~testArray()
{
delete[] myArray;
}
CUDA_CALLABLE_MEMBER void runTest()
{
fillArray(myArray);
printArray(myArray);
dim3 dimGridWidth((width - 1) / TILE_WIDTH + 1, (height - 1)/TILE_WIDTH + 1, 1);
dim3 dimBlock(TILE_WIDTH, TILE_WIDTH, 1);
add1<<<dimGridWidth,dimBlock>>>(height, width, myArray, myArray);
cudaDeviceSynchronize();
printArray(myArray);
}
private:
int *myArray;
int height;
int width;
void fillArray(int *myArray)
{
for (int i = 0; i < height; i++){
for (int j = 0; j < width; j++)
myArray[i*width+j] = i*width+j;
}
}
void printArray(int *myArray)
{
for (int i = 0; i < height; i++){
for (int j = 0; j < width; j++)
printf("%i ",myArray[i*width+j]);
printf("\n");
}
}
};
int main()
{
testArray *test = new testArray;
test->runTest();
//testArray test;
//test.runTest();
system("pause");
return 0;
}
I want to pass everything by reference so there's no copying.
__global__ void add1(int height, int width, int *&a, int *&resultArray)
Passing a pointer by reference has one use: to modify (reseat) the pointer in the caller's scope. Which you do not do. So the references are, in this case, superfluous. In fact, it's a pessimization, because you're introducing another level of indirection. Use the following signature instead:
__global__ void add1(int height, int width, int* a, int* resultArray)
This compiles and runs, but it seems that the +1 operation never occurs. Why is this?
I know I should have catch error statements, this code is just a simple example.
Well, it's really unfortunate, because adding proper error checking would probably have helped you find the error. In the future, consider adding error checking before asking on SO.
Your kernel expects its arguments to be in an address space it can access. That means it must be a pointer that was obtained through a call to any of the cudaMalloc variants.
But what are you passing?
myArray = new int[height*width]; // Not a cudaMalloc* variant
[...]
add1<<<dimGridWidth,dimBlock>>>(height, width, myArray, myArray);
Therefore the pointer you pass to your kernel has no meaning, because it is not in a "CUDA address space". Your kernel probably segfaults immediately.
I think your confusion may arise from the fact that the enclosing class of myArray (testArray) inherits from Managed. This means that new testArray will allocate a testArray in GPU-accessible address space, but it doesn't mean that using operator new on that class members will allocate them in that address space, too. They too need to be allocated through cudaMalloc* (for example, although not required, through an overloaded operator new that forwards the allocation to cudaMallocManaged). A simple solution is to allocate your array not with new but like this:
cudaMallocManaged(&myArray, width * height* sizeof(*myArray));
Replace the corresponding call to delete with cudaFree.
Additionally:
testArray test;
This does not allocate test on GPU-accessible space, because it is not allocated through operator new.
Related
Is there a way to tell the compiler that I've allocated a memory of size N * M and I wanna treat this pointer as N * M array? In other words, is there a way to write something like this?:
int arr[N][M] = (int[N][M])malloc(N * M * sizeof(int));
arr[x][y] = 123;
I know that the compiler doesn't know the dimensions of the array, all it knows is that that's a pointer. so my question is: can I somehow tell the compiler that this pointer returned by malloc is an array pointer and it's dimensions are N * M? I can use an array to pointers, pointer to arrays or pointer to pointers, but in all cases I'll have to lookup 2 addresses. I want to have a contiguous memory on the heap and treat it as a multidimensional array. just like how I would write:
int arr[N][M];
Is there any way to achieve that?
In a C++ program you should use the operator new.
As for malloc then in C++ M shall be a constant expression if you want to allocate a two-dimensional array.
You can write for example
int ( *arr )[M] = ( int ( * )[M] )malloc( N * M * sizeof(int) );
or
int ( *arr )[M] = ( int ( * )[M] )malloc( sizeof( int[N][M] ) );
If to use the operator new then the allocation can look like
int ( *arr )[M] = new int[N][M];
If M is not a compile-time constant then you can use the standard container std::vector as it is shown in the demonstrative program below
#include <iostream>
#include <vector>
int main()
{
size_t n = 10, m = 10;
std::vector<std::vector<int>> v( n, { m } );
return 0;
}
What you want is a "matrix" class like
template <typename T>
class matrix
{
size_t len;
size_t width;
std::vector<T> data;
public:
matrix(size_t len, size_t width) : len(len), width(width), data(len*width) {}
T& operator()(size_t row, size_t col) { return data[width * row + col]; }
const T& operator()(size_t row, size_t col) const { return data[width * row + col]; }
size_t size() const { return len * width; }
};
int main(int argc, char const *argv[])
{
matrix<int> m(5, 7);
m(3, 3) = 42;
std::cout << m(3, 3);
}
This keeps all of the data in a single contiguous buffer, and doesn't have any undefined behavior unlike all the other examples that use malloc. It's also RAII, and you don't have to write any copy or move constructors since all of the members "do the right thing" with the compiler provided defaults. You can make this class more complicated, provide a proxy object so you can overload operator[] and be able to do m[][], but at it's base this is what you want.
If you what to avoid use of stack and you need large single block of data to keep two dimensional array of constant size (know at compile time) the this is the best cleanest way to do it:
std::vector<std::array<int, M>> arr(N);
arr[x][y] = 3;
Now if you need M is a value known at run-time, it would be best to use boost::multi_array
I do not see a reason to use malloc.
You can do exactly what you want with a helper function. This let's you specify the array size at runtime, and it uses malloc as requested (although typically you should be using new):
#include <iostream>
#include <string>
#include <memory>
template <class T>
T** Get2DMalloc(size_t m, size_t n) {
T** ret = (T**)malloc(sizeof(T*) * m);
for (size_t i = 0; i < m; ++i) {
ret[i] = (T*)malloc(sizeof(T) * n);
}
return ret;
}
template <class T>
void Free2D(T** arr, size_t m, size_t n) {
for (int i = 0; i < m; ++i) {
free(arr[i]);
}
free(arr);
}
int main() {
int m = 3;
int n = 3;
int** a = Get2DMalloc<int>(3, 3);
for (int x = 0; x < m; ++x) {
for (int y = 0; y < n; ++y) {
a[x][y] = x * m + y;
}
}
for (int i = 0; i < m * n; ++i) {
std::cout << a[i / m][i % n] << std::endl;
}
Free2D<int>(a, m, n);
system("pause");
return 0;
}
In my code I create a function outside of the main, which creates a 1D array and initializes to 0.
void create_grid(double *&y, int Npoints)
{
y = new double[Npoints];
for (int i = 0; i < Npoints; i++)
{
y[i] = 0;
}
}
If I didn't have the syntax of declaring in the function as double *&y I couldn't access the values of y.
I tried doing the same for a 2D array but i don't know the syntax. I tried &&**y and &*&*y but it didn't work. Does anyone know how to create a function outside of the main, which initializes a 2d dynamic array so I can use it in the main?
E.g.:
void create_grid_2D(double **&&y, int Npoints1, int Npoints2)
{
y = new double*[Npoints1];
for (int i = 0; i < Npoints1; i++)
{
y[i] = new double[Npoints2];
}
for (int i = 0; i < Npoints1; i++)
{
for (int j = 0; j < Npoints2; j++)
{
y[i][j] = 0;
}
}
}
int main()
{
int N = 10;
double **z;//correcting this line, i wrote z**
create_grid_2D(z, N, N);
delete[]z;
return 0;
}
C++ does not allow forming a pointer to reference or reference to reference. (And without a space between the characters, && is a single token meaning something entirely different.)
And your declaration double z**; is incorrect - you probably mean double **z;.
To write a function that takes the argument double **z by reference, you just want a reference to pointer to pointer:
void create_grid_2D(double **&y,int Npoints1,int Npoints2)
{
//...
}
Except don't use new and delete. Using them slightly wrong leads to memory leaks and bugs with dangling pointers and double deletes. For example, you tried to clean up your memory in main with delete []z;, but new-expressions were evaluated 11 times to your one delete-expression, so this misses out on deleting the row arrays z[0], z[1], ... z[9]. There's pretty much always a better and simpler way using std::unique_ptr, std::shared_ptr, std::vector, or other RAII (Resource Allocation Is Initialization) tools.
So I would change the function to:
void create_grid_2D(std::vector<std::vector<double>>& y,
unsigned int Npoints1,
unsigned int Npoints2)
{
y.assign(Npoints1, std::vector<double>(Npoints2, 0.0));
}
int main()
{
unsigned int N=10;
std::vector<std::vector<double>> z;
create_grid_2D(z, N, N);
// No manual cleanup necessary.
}
Or even use a return value rather than assigning an argument:
std::vector<std::vector<double>> create_grid_2D(
unsigned int Npoints1,
unsigned int Npoints2)
{
return std::vector<std::vector<double>>(
Npoints1, std::vector<double>(Npoints2, 0.0));
}
int main()
{
unsigned int N=10;
std::vector<std::vector<double>> z = create_grid_2D(N, N);
}
An easy trick to resolve/write such complicated references is (simplified version for the sake of this problem - it's a bit more complicated with braces present): start from the variable name and go to the left, step by step. In your case:
... y
y is ...
... & y
y is a reference ...
... *& y
y is a reference to a pointer ...
... **& y
y is a reference to a pointer to a pointer ...
double**& y
y is a reference to a pointer to a pointer to a double
So, the correct definition is:
void create_grid_2D(double**& y,int Npoints1,int Npoints2)
But as mentioned in the comments, please do really consider avoiding raw pointers in favor of std::vector and other standard containers.
So you want a reference on a pointer to a pointer.
2d pointer is int**, and the reference is int**&. That's what you want to use.
Then, you should a container or at least a smart pointer instead.
This approach would be a little different than what you currently have but basically you want a 2D grid and another name for this is simply a MxN Matrix! We can do this very easily with a simple template structure. This template class will hold all of the contents without having to put the data into dynamic memory directly. Then once you have your class object that you want to use we can then put that into dynamic memory with the use of smart pointers!
#include <iostream>
#include <memory>
template<class T, unsigned M, unsigned N>
class Matrix {
static const unsigned Row = M;
static const unsigned Col = N;
static const unsigned Size = Row * Col;
T data[Size] = {};
public:
Matrix() {};
Matrix( const T* dataIn ) {
fillMatrix( dataIn );
}
void fillMatrix( const T* dataIn );
void printMatrix() const;
};
template<class T, unsigned M, unsigned N>
void Matrix<T, M, N>::fillMatrix( const T* dataIn ) {
for ( unsigned i = 0; i < Size; i++ ) {
this->data[i] = dataIn[i];
}
}
template<class T, unsigned M, unsigned N>
void Matrix<T,M,N>::printMatrix() {
for ( unsigned i = 0; i < Row; i++ ) {
for ( unsigned j = 0; j < Col; j++ ) {
std::cout << this->data[i*Col + j] << " ";
}
std::cout << '\n';
}
}
int main() {
// our 1 day array of data
double data[6] = { 1,2,3,4,5,6 };
// create and print a MxN matrix - in memory still a 1 day array but represented as 2D array
Matrix<double,2,3> A;
A.fillMatrix( data );
A.printMatrix();
std::cout << '\n';
Matrix<double, 3,2> B( data );
B.printMatrix();
std::cout << '\n';
// Want this in dynamic memory? With shared_ptr the memory is handled for you
// and is cleaned up upon it's destruction. This helps to eliminate memory leaks
// and dangling pointers.
std::shared_ptr<Matrix<float,2,3>> pMatrix( new Matrix<float,2,3>( data ) );
pMatrix->printMatrix();
return 0;
}
Output:
1 2 3
4 5 6
1 2
3 4
5 6
1 2 3
4 5 6
Please take a look at the code below:
#include <iostream>
using namespace std;
int main(){
char matrix[2][2][2];
return 0;
}
int getMatrixData(char matrix[][2][2], int x, int y, int z) {
return matrix[x][y][z];
}
When matrix 3d array passed in as a parameter into a function, why is it ok not to specify the first [] size? How this missing dimension can be explained?
Your code is syntactically incorrect. Assuming you meant int getMatrixData(char matrix[][2][2], int x, int y, int z).
When you pass array arguments to function, array decays to pointer to first element (type char [2][2] in this case).
Now some syntax of array and pointer are similar so you don't find much difference.
When multidimensional array is passed, for example 3d in your case, it can be seen as array of 2-d arrays. So you need to give the type of each element char [2][2] in your case and you can skip the dimension of final array as it will decay to pointer anyway. char [2][2] is the information compiler needs to compute the offset of each element.
offset of matrix[x][y][z] = base address of matrix +
x * sizeof(char [2][2]) +
y * sizeof(char [2]) +
z
If you don't pass the dimensions of initial element, compiler can't resolve sizeof in above equation. Passing skipped dimension is optional.
In c++ I would use multidimensional arrays in a different way. There are many topics on the internet about it.
This topic explains how you could do it using a char***. E.g.:
char getMatrixData(char*** matrix, int x, int y, int z)
{
return matrix[x][y][z];
}
int main()
{
char ***matrix = new char**[2];
for (auto i = 0; i < 2; i++)
{
matrix[i] = new char*[2];
for (auto j = 0; j < 2; j++)
{
matrix[i][j] = new char[2];
}
}
getMatrixData(matrix, 1, 1, 1);
// N.B.! you should normally free the memory using delete []!!
// But in this case the program ends, so the memory is freed anyhow.
return 0;
}
But you could also use the std::vector type
#include <vector>
using std::vector;
using CharVector1D = vector<char>;
using CharVector2D = vector<CharVector1D>;
using CharVector3D = vector<CharVector2D>;
char getMatrixData(CharVector3D const& matrix, int x, int y, int z)
{
return matrix[x][y][z];
}
int main()
{
CharVector3D matrix(2, CharVector2D(2, CharVector1D(2)));
getMatrixData(matrix, 1, 1, 1);
return 0;
}
However, c++ is supposed to be an object oriented programming language. So it is probably better to define an matrix object.
#include <vector>
using std::vector;
template <class T>
class Matrix3D
{
private:
size_t _sizeX;
size_t _sizeY;
size_t _sizeZ;
vector<T> _data;
public:
Matrix3D(size_t const x_size, size_t const y_size, size_t const z_size)
: _sizeX(x_size)
, _sizeY(y_size)
, _sizeZ(z_size)
, _data(vector<T> (x_size*y_size*z_size))
{}
T GetData(size_t const x, size_t const y, size_t const z) const
{
return _data.at(x + (_sizeX * (y + (_sizeY * z))));
}
};
int main()
{
Matrix3D<char> matrix(2, 2, 2);
matrix.GetData(1, 1, 1);
return 0;
}
I am trying to do the following:
in main.cpp:
// Create an array of pointers to Block objects
Block *blk[64];
for (i=0; i<8; i++) {
for (j=0; j<8; j++) {
int x_low = i*80;
int y_low = j*45;
blk[j*8+i] = new Block(30, x_low+40.0f, y_low+7.5f, &b);
}
}
And then I am trying to pass it to the graphics object I have created:
Graphics g(640, 480, &b, &p, blk[0], number_of_blocks);
the graphics constructor looks like:
Graphics::Graphics(int width, int height, Ball *b, Paddle *p, Block *blk, int number_of_blocks) {
if I look at what is contained in the array from the graphics object, only the first item exists and then all the other items are in hyperspace:
for (int i=0; i<64; i++) {
printf("for block %d, %f, %f ", i, (_blk+(sizeof(_blk)*i))->_x_low, (_blk+(sizeof(_blk)*i))->_y_low);
printf("blah %d\n", (_blk+(sizeof(_blk)*i)));
}
and if I look at the addresses, they are different (6956552 rather than 2280520 when I examine the addresses in the main class using:
printf(" blah %d\n", &blk[j*8*i]);
I am sure there must be something subtle I am doing wrong as its like I have copied the first item from the blk array to a new address when passed to the graphics object.
Does this make sense? Any ideas?
Cheers,
Scott
If you want to pass the whole array, the constructor should look like this:
Graphics::Graphics(int width, int height, Ball *b, Paddle *p,
Block **blk, int number_of_blocks)
and you should pass the array like this:
Graphics g(640, 480, &b, &p, blk, number_of_blocks);
It looks like:
Graphics::Graphics(int width, int height, Ball *b, Paddle *p, Block *blk, int number_of_blocks) {
is expecting an array of Blocks, not an array of pointers to Blocks. Passing the first element would probably work if you made_number_of blocks 1, but it can't work for anything else using your current data structure. If I were you, I would give up on using arrays and use std::vector instead - it will greatly simplify your code.
The Graphics function is expecting a contiguous array of Block objects in memory and you are creating each new Block independently. Try
Block* blk = new Block[64];
then loop through and initialize each Block's values. This will only work if you can initialize the block objects in another way from using the constructor with arguments since new in this case can only call the default constructor. If the only way to initialize a Block is using the constructor with arguments, you'll have to do something else like passing Block** to the function.
From what I can see, you are passing the first element of the array to the contructor, not the whole array. This is what you are doing:
#include <iostream>
#include <cstdlib>
void foo(int* item, const int length);
int main() {
const int length = 10;
int* array[length];
for (int i = 0; i < length; ++i) {
array[i] = new int(i + 100);
}
foo(array[0], length);
return (EXIT_SUCCESS);
}
void foo(int* item, const int length) {
for (int i = 0; i < length; ++i) {
std::cout << item[i] << std::endl;
}
}
I believe this is what you wanted to do:
#include <iostream>
#include <cstdlib>
void foo(int** array, const int length);
int main() {
const int length = 10;
int* array[length];
for (int i = 0; i < length; ++i) {
int* item = new int(i + 100);
array[i] = item;
}
foo(array, length);
return (EXIT_SUCCESS);
}
void foo(int** array, const int length) {
for (int i = 0; i < length; ++i) {
int* item = array[i];
std::cout << *item << std::endl;
}
}
Regards,
Rafael.
This question builds off of a previously asked question:
Pass by reference multidimensional array with known size
I have been trying to figure out how to get my functions to play nicely with 2d array references. A simplified version of my code is:
unsigned int ** initialize_BMP_array(int height, int width)
{
unsigned int ** bmparray;
bmparray = (unsigned int **)malloc(height * sizeof(unsigned int *));
for (int i = 0; i < height; i++)
{
bmparray[i] = (unsigned int *)malloc(width * sizeof(unsigned int));
}
for(int i = 0; i < height; i++)
for(int j = 0; j < width; j++)
{
bmparray[i][j] = 0;
}
return bmparray;
}
I don't know how I can re-write this function so that it will work where I pass bmparray in as an empty unsigned int ** by reference so that I could allocate the space for the array in one function, and set the values in another.
Use a class to wrap it, then pass objects by reference
class BMP_array
{
public:
BMP_array(int height, int width)
: buffer(NULL)
{
buffer = (unsigned int **)malloc(height * sizeof(unsigned int *));
for (int i = 0; i < height; i++)
{
buffer[i] = (unsigned int *)malloc(width * sizeof(unsigned int));
}
}
~BMP_array()
{
// TODO: free() each buffer
}
unsigned int ** data()
{
return buffer;
}
private:
// TODO: Hide or implement copy constructor and operator=
unsigned int ** buffer
};
typedef array_type unsigned int **;
initialize_BMP_array(array_type& bmparray, int height, int width)
Mmm... maybe I don't understand well your question, but in C you can pass "by reference" by passing another pointer indirection level. That is, a pointer to the double pointer bmparray itself:
unsigned int ** initialize_BMP_array(int height, int width, unsigned int *** bmparray)
{
/* Note the first asterisk */
*bmparray = (unsigned int **)malloc(height * sizeof(unsigned int *));
...
the rest is the same but with a level of indirection
return *bmparray;
}
So the memory for the bmparray is reserved inside the function (and then, passed by reference).
Hope this helps.
To use the safer and more modern C++ idiom, you should be using vectors rather than dynamically allocated arrays.
void initialize_BMP_array(vector<vector<unsigned int> > &bmparray, int height, int width);