This question builds off of a previously asked question:
Pass by reference multidimensional array with known size
I have been trying to figure out how to get my functions to play nicely with 2d array references. A simplified version of my code is:
unsigned int ** initialize_BMP_array(int height, int width)
{
unsigned int ** bmparray;
bmparray = (unsigned int **)malloc(height * sizeof(unsigned int *));
for (int i = 0; i < height; i++)
{
bmparray[i] = (unsigned int *)malloc(width * sizeof(unsigned int));
}
for(int i = 0; i < height; i++)
for(int j = 0; j < width; j++)
{
bmparray[i][j] = 0;
}
return bmparray;
}
I don't know how I can re-write this function so that it will work where I pass bmparray in as an empty unsigned int ** by reference so that I could allocate the space for the array in one function, and set the values in another.
Use a class to wrap it, then pass objects by reference
class BMP_array
{
public:
BMP_array(int height, int width)
: buffer(NULL)
{
buffer = (unsigned int **)malloc(height * sizeof(unsigned int *));
for (int i = 0; i < height; i++)
{
buffer[i] = (unsigned int *)malloc(width * sizeof(unsigned int));
}
}
~BMP_array()
{
// TODO: free() each buffer
}
unsigned int ** data()
{
return buffer;
}
private:
// TODO: Hide or implement copy constructor and operator=
unsigned int ** buffer
};
typedef array_type unsigned int **;
initialize_BMP_array(array_type& bmparray, int height, int width)
Mmm... maybe I don't understand well your question, but in C you can pass "by reference" by passing another pointer indirection level. That is, a pointer to the double pointer bmparray itself:
unsigned int ** initialize_BMP_array(int height, int width, unsigned int *** bmparray)
{
/* Note the first asterisk */
*bmparray = (unsigned int **)malloc(height * sizeof(unsigned int *));
...
the rest is the same but with a level of indirection
return *bmparray;
}
So the memory for the bmparray is reserved inside the function (and then, passed by reference).
Hope this helps.
To use the safer and more modern C++ idiom, you should be using vectors rather than dynamically allocated arrays.
void initialize_BMP_array(vector<vector<unsigned int> > &bmparray, int height, int width);
Related
I'm a little stuck with my school project.
So I need to make a dynamic 2-dimensional array.
It has to be created in a function with 3 parameters: 2-dimensional char array, length and width.
This is what I have so far. Length and width come from another function and there are no problems with that. I feel like I am very close but I don't know how do I save the array to theArray[][] and do I need to create a new variable for where i put /*what here?*/ .
Didn't find anything this specific from the web(maybe I just don't know what to search exactly)
Thanks in advance!
void doArray(char theArray[][], unsigned int length, unsigned int width) {
char** /*what here?*/ = new char*[lenght];
for(unsigned int i = 0; i < length; i++){
/*what here?*/[i] = new char[width];
}}
int main() {
unsigned int lenght = 0;
unsigned int width = 0;
char theArray[][];
size(lenght, width);
doArray(theArray, lenght, width);}
A 2-dimensional dynamic array in C++ is: std::vector<std::vector<TYPE>> a(SIZE_M, std::vector<TYPE>(SIZE_N));
#include <vector>
void doArray(auto &theArray, std::size_t length, std::size_t width) {
theArray = std::vector<std::vector<char>>(length, std::vector<char>(width));
}
int main() {
std::size_t lenght = 0;
std::size_t width = 0;
std::vector<std::vector<char>> theArray;
size(lenght, width);
doArray(theArray, lenght, width);
}
Another way would be to use malloc.
char **arr;
arr = (char**)malloc(length * sizeof(char*));
for(i=0;i<length;i++){
arr[i]=(char*)malloc(width * sizeof(char));
}
It's duplicated. See here for a better explanation.
Whatever, this is what you want:
void doArray(char **theArray, unsigned int length, unsigned int width) {
theArray = new char*[length];
for(unsigned int i = 0; i < length; ++i)
theArray[i] = new char[width];
}
Please don't comment I should use size_t instead of unsigned int.
When you want a function to create something, the normal way to to return the created object, not to pass it in as a parameter. I think maybe that is what is confusing you. For some reason beginners have often have trouble with the idea of functions returning things.
So rewriting your code to use a return instead of a parameter, you get this
char** doArray(unsigned int length, unsigned int width) {
char** arr = new char*[length];
for(unsigned int i = 0; i < length; i++) {
arr[i] = new char[width];
}
return arr;
}
Then in main you call the function like this
int main() {
char** theArray;
...
theArray = doArray(length, width);
As others have pointed out, the correct way in C++ to do this is to use a vector. But either way the lesson about writing functions to return values applies.
Is there a way to tell the compiler that I've allocated a memory of size N * M and I wanna treat this pointer as N * M array? In other words, is there a way to write something like this?:
int arr[N][M] = (int[N][M])malloc(N * M * sizeof(int));
arr[x][y] = 123;
I know that the compiler doesn't know the dimensions of the array, all it knows is that that's a pointer. so my question is: can I somehow tell the compiler that this pointer returned by malloc is an array pointer and it's dimensions are N * M? I can use an array to pointers, pointer to arrays or pointer to pointers, but in all cases I'll have to lookup 2 addresses. I want to have a contiguous memory on the heap and treat it as a multidimensional array. just like how I would write:
int arr[N][M];
Is there any way to achieve that?
In a C++ program you should use the operator new.
As for malloc then in C++ M shall be a constant expression if you want to allocate a two-dimensional array.
You can write for example
int ( *arr )[M] = ( int ( * )[M] )malloc( N * M * sizeof(int) );
or
int ( *arr )[M] = ( int ( * )[M] )malloc( sizeof( int[N][M] ) );
If to use the operator new then the allocation can look like
int ( *arr )[M] = new int[N][M];
If M is not a compile-time constant then you can use the standard container std::vector as it is shown in the demonstrative program below
#include <iostream>
#include <vector>
int main()
{
size_t n = 10, m = 10;
std::vector<std::vector<int>> v( n, { m } );
return 0;
}
What you want is a "matrix" class like
template <typename T>
class matrix
{
size_t len;
size_t width;
std::vector<T> data;
public:
matrix(size_t len, size_t width) : len(len), width(width), data(len*width) {}
T& operator()(size_t row, size_t col) { return data[width * row + col]; }
const T& operator()(size_t row, size_t col) const { return data[width * row + col]; }
size_t size() const { return len * width; }
};
int main(int argc, char const *argv[])
{
matrix<int> m(5, 7);
m(3, 3) = 42;
std::cout << m(3, 3);
}
This keeps all of the data in a single contiguous buffer, and doesn't have any undefined behavior unlike all the other examples that use malloc. It's also RAII, and you don't have to write any copy or move constructors since all of the members "do the right thing" with the compiler provided defaults. You can make this class more complicated, provide a proxy object so you can overload operator[] and be able to do m[][], but at it's base this is what you want.
If you what to avoid use of stack and you need large single block of data to keep two dimensional array of constant size (know at compile time) the this is the best cleanest way to do it:
std::vector<std::array<int, M>> arr(N);
arr[x][y] = 3;
Now if you need M is a value known at run-time, it would be best to use boost::multi_array
I do not see a reason to use malloc.
You can do exactly what you want with a helper function. This let's you specify the array size at runtime, and it uses malloc as requested (although typically you should be using new):
#include <iostream>
#include <string>
#include <memory>
template <class T>
T** Get2DMalloc(size_t m, size_t n) {
T** ret = (T**)malloc(sizeof(T*) * m);
for (size_t i = 0; i < m; ++i) {
ret[i] = (T*)malloc(sizeof(T) * n);
}
return ret;
}
template <class T>
void Free2D(T** arr, size_t m, size_t n) {
for (int i = 0; i < m; ++i) {
free(arr[i]);
}
free(arr);
}
int main() {
int m = 3;
int n = 3;
int** a = Get2DMalloc<int>(3, 3);
for (int x = 0; x < m; ++x) {
for (int y = 0; y < n; ++y) {
a[x][y] = x * m + y;
}
}
for (int i = 0; i < m * n; ++i) {
std::cout << a[i / m][i % n] << std::endl;
}
Free2D<int>(a, m, n);
system("pause");
return 0;
}
I try to pass my dynamic array of structs to kernel but it doesn't works. I get - "Segmentation fault (core dumped)"
My code - EDITED
#include <stdio.h>
#include <stdlib.h>
struct Test {
unsigned char *array;
};
__global__ void kernel(Test *dev_test) {
}
int main(void) {
int n = 4;
int size = 5;
unsigned char *array[size];
Test *dev_test;
// allocate for host
Test *test = (Test*)malloc(sizeof(Test)*n);
for(int i = 0; i < n; i++)
test[i].array = (unsigned char*)malloc(size);
// fill data
for(int i=0; i<n; i++) {
unsigned char temp[] = { 'a', 'b', 'c', 'd' , 'e' };
memcpy(test[i].array, temp, size);
}
// allocate for gpu
cudaMalloc((void**)&dev_test, n * sizeof(Test));
for(int i=0; i < n; i++) {
cudaMalloc((void**)&(array[i]), size * sizeof(unsigned char));
cudaMemcpy(&(dev_test[i].array), &(array[i]), sizeof(unsigned char *), cudaMemcpyHostToDevice);
}
kernel<<<1, 1>>>(dev_test);
return 0;
}
How correctly I should allocate gpu memory and copy data to this memory?
You need to allocate memory for struct member array.
Test *test = malloc(sizeof(Test)*n);
for(int i = 0; i < n; i++)
test[i]->array = malloc(size);
I would suggest to read this answer to cope up with other issues after this fix.
what is your card ? if your card support compute capability >= 3.0, try the unified memory system , to have same data in host/device memory
you can have a look here :
it should maybe look like this one :
int main(void) {
int n = 4;
int size = 5;
Test *test;
cudaMallocManaged(&test, n * size);
unsigned char values[] = { 'a', 'b', 'c', 'd' , 'e' };
for(int i=0; i<n; i++)
{
unsigned char* temp;
cudaMallocManaged(&temp, size*sizeof(char) );
memcpy(temp, values, sizeof(values) );
}
// avoid copy code, makes a deep copy of objects
kernel<<<1, 1>>>(test);
return 0;
}
And i hope you know it, but Don't forget do call cudaFree & delete/free on allocated memory. (better to use std::vector and use data() to access to raw pointer)
I'm trying to get unified memory to work with classes, and to pass and manipulate arrays in unified memory with kernel calls. I want to pass everything by reference.
So I'm overriding the new method for classes and arrays so they are accessible by the GPU, but I think I need to add more code to have arrays in unified memory, but not quite sure how to do this. I get a memory access error when the fillArray() method is called.
If I have to do these sorts of operations (arithmetic on arrays and copying between different sized arrays) hundreds of times, is unified memory a good approach or should I stick with manually copying between cpu and gpu memory? Thank you very much!
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <iostream>
#include <stdio.h>
#define TILE_WIDTH 4
#ifdef __CUDACC__
#define CUDA_CALLABLE_MEMBER __host__ __device__
#else
#define CUDA_CALLABLE_MEMBER
#endif
__global__ void add1(int height, int width, int *a, int *resultArray)
{
int w = blockIdx.x * blockDim.x + threadIdx.x; // Col // width
int h = blockIdx.y * blockDim.y + threadIdx.y;
int index = h * width + w;
if ((w < width) && (h < height))
resultArray[index] = a[index] + 1;
}
class Managed
{
public:
void *operator new(size_t len)
{
void *ptr;
cudaMallocManaged(&ptr, len);
return ptr;
}
void Managed::operator delete(void *ptr)
{
cudaFree(ptr);
}
void* operator new[] (size_t len) {
void *ptr;
cudaMallocManaged(&ptr, len);
return ptr;
}
void Managed::operator delete[] (void* ptr) {
cudaFree(ptr);
}
};
class testArray : public Managed
{
public:
testArray()
{
height = 16;
width = 8;
myArray = new int[height*width];
}
~testArray()
{
delete[] myArray;
}
CUDA_CALLABLE_MEMBER void runTest()
{
fillArray(myArray);
printArray(myArray);
dim3 dimGridWidth((width - 1) / TILE_WIDTH + 1, (height - 1)/TILE_WIDTH + 1, 1);
dim3 dimBlock(TILE_WIDTH, TILE_WIDTH, 1);
add1<<<dimGridWidth,dimBlock>>>(height, width, myArray, myArray);
cudaDeviceSynchronize();
printArray(myArray);
}
private:
int *myArray;
int height;
int width;
void fillArray(int *myArray)
{
for (int i = 0; i < height; i++){
for (int j = 0; j < width; j++)
myArray[i*width+j] = i*width+j;
}
}
void printArray(int *myArray)
{
for (int i = 0; i < height; i++){
for (int j = 0; j < width; j++)
printf("%i ",myArray[i*width+j]);
printf("\n");
}
}
};
int main()
{
testArray *test = new testArray;
test->runTest();
//testArray test;
//test.runTest();
system("pause");
return 0;
}
I want to pass everything by reference so there's no copying.
__global__ void add1(int height, int width, int *&a, int *&resultArray)
Passing a pointer by reference has one use: to modify (reseat) the pointer in the caller's scope. Which you do not do. So the references are, in this case, superfluous. In fact, it's a pessimization, because you're introducing another level of indirection. Use the following signature instead:
__global__ void add1(int height, int width, int* a, int* resultArray)
This compiles and runs, but it seems that the +1 operation never occurs. Why is this?
I know I should have catch error statements, this code is just a simple example.
Well, it's really unfortunate, because adding proper error checking would probably have helped you find the error. In the future, consider adding error checking before asking on SO.
Your kernel expects its arguments to be in an address space it can access. That means it must be a pointer that was obtained through a call to any of the cudaMalloc variants.
But what are you passing?
myArray = new int[height*width]; // Not a cudaMalloc* variant
[...]
add1<<<dimGridWidth,dimBlock>>>(height, width, myArray, myArray);
Therefore the pointer you pass to your kernel has no meaning, because it is not in a "CUDA address space". Your kernel probably segfaults immediately.
I think your confusion may arise from the fact that the enclosing class of myArray (testArray) inherits from Managed. This means that new testArray will allocate a testArray in GPU-accessible address space, but it doesn't mean that using operator new on that class members will allocate them in that address space, too. They too need to be allocated through cudaMalloc* (for example, although not required, through an overloaded operator new that forwards the allocation to cudaMallocManaged). A simple solution is to allocate your array not with new but like this:
cudaMallocManaged(&myArray, width * height* sizeof(*myArray));
Replace the corresponding call to delete with cudaFree.
Additionally:
testArray test;
This does not allocate test on GPU-accessible space, because it is not allocated through operator new.
Is there some way to delay defining the size of an array until a class method or constructor?
What I'm thinking of might look something like this, which (of course) doesn't work:
class Test
{
private:
int _array[][];
public:
Test::Test(int width, int height);
};
Test::Test(int width, int height)
{
_array[width][height];
}
What Daniel is talking about is that you will need to allocate memory for your array dynamically when your Test (width, height) method is called.
You would declare your two dimensional like this (assuming array of integers):
int ** _array;
And then in your Test method you would need to first allocate the array of pointers, and then for each pointer allocate an array of integers:
_array = new *int [height];
for (int i = 0; i < height; i++)
{
_array [i] = new int[width];
}
And then when the object is released you will need to explicit delete the memory you allocated.
for (int i = 0; i < height; i++)
{
delete [] _array[i];
_array [i] = NULL;
}
delete [] _array;
_array = NULL;
vector is your best friend
class Test
{
private:
vector<vector<int> > _array;
public:
Test(int width, int height) :
_array(width,vector<int>(height,0))
{
}
};
I think it is time for you to look up the new/delete operators.
Seeing as this is a multidimensional array, you're going to have to loop through calling 'new' as you go (and again not to forget: delete).
Although I am sure many will suggest to use a one-dimensional array with width*height elements.
(Months later) one can use templates, like this:
// array2.c
// http://www.boost.org/doc/libs/1_39_0/libs/multi_array/doc/user.html
// is professional, this just shows the principle
#include <assert.h>
template<int M, int N>
class Array2 {
public:
int a[M][N]; // vla, var-len array, on the stack -- works in gcc, C99, but not all
int* operator[] ( int j )
{
assert( 0 <= j && j < M );
return a[j];
}
};
int main( int argc, char* argv[] )
{
Array2<10, 20> a;
for( int j = 0; j < 10; j ++ )
for( int k = 0; k < 20; k ++ )
a[j][k] = 0;
int* failassert = a[10];
}