How to determine cuda pointer is nullptr? - c++

I want to determine a cuda memory is malloced or not in runtime. Or is there a way to determine a cuda pointer is a nullptr or not?
I want to determine the memory in cuda is nullptr or not for different process. I have a function as below.
__global__ void func(unsigned int *a, unsigned char *mask, const int len)
{
if (mask!= nullptr){// do something}
else {// do something else}
}
If the mask is processed by cudaMalloc, it should run into if-condition. Otherwise, it runs into else-condition.
This snippet could run:
int* a;
char* mask;
int len = 1024;
cudaMalloc(&a, sizeof(int) * len);
cudaMalloc(&mask, sizeof(char) * len);
func(a, mask, len);
And this snippet could also run:
int* a;
char* mask;
int len = 1024;
cudaMalloc(&a, sizeof(int) * len);
func(a, mask, len);
Is there a way to achieve this?

In the general case, pointer introspection in device code is not possible.
In your host code, if you do:
char* mask = nullptr;
and you guarantee both of these conditions:
If any cudaMalloc operation is run (on mask), you test the return value and do not allow further code progress (or do not allow any of the snippets that use mask to run) if the return value is not cudaSuccess
There is no usage of cudaFree on the mask pointer until such point in time where your code snippets that use it will never be run again
Then it should be possible to do what you are suggesting in device code:
if (mask!= nullptr){// do something}
else {// do something else}
On a successful cudaMalloc call, the allocated pointer will never be the nullptr.

Related

Using linear storage to implement a 2D array

I need to create a function that helps to implement a 2D array using linear storage in C++. The first of these functions is two_d_store(), which takes as arguments the following: base address in memory of region to be used as 2D array, size in bytes of an array entry, 2 array dimensions, and 2 index values. So with this two_d_store() function:
int d[10][20];
d[4][0] = 576;
can be replaced with
char d[200*sizeof(int)];
two_d_store(d, sizeof(int), 10, 20, 4, 0, 576);
So is there a simple way to implement this function without using arrays?
For any weird or not reason, if u dont want to have the standard way to access or set your memory structs, you could use
memcpy(void *dst, const void *src, size_t len);
Its a C function but it works 100% for C++ too.
Here are some general memory handling functions i found, good luck!:
void *memchr(const void *ptr, int ch, size_t len)
memchr finds the first occurence of ch in ptr and returns a pointer to it (or a null pointer if ch was not found in the first len bytes
int memcmp(const void *ptr1, const void *ptr2, size_t len)
memcmp is similar to strcmp, except that bytes equal to 0 are not treated as comparison terminators.
void *memcpy(void *dst, const void *src, size_t len)
memcpy copies len characters from src to dst and returns the original value of dst
The result of memcpy is undefined if src and dst point to overlapping areas of memory
void *memmove(void *dst, const void *src, size_t len)
memmove is just like memcpy except that memmove is guaranteed to work even if the memory areas overlap
void *memset(void *ptr, int byteval, size_t len)
memset sets the first len bytes of the memory area pointed to by ptr to the value specified by byteval
I changed the last argument to a pointer, not value, bcs we don't know what type it is. The max_y param is not needed. First i compute on n_array_ptr the adress of cell we want to change, and then memcpy the size_of_type bytes from new_value pointer to n_array_ptr. This function can be used for arrays of all types, bcs we are casting it void* and calculating the adress of block where our cell starts, and copy'ing exectly size_of_type bytes.
#include <iostream>
#include <cstring>
using namespace std;
void two_d_store(void * array_ptr, size_t size_of_type, int max_y, int max_x, int y, int x, void* new_value) {
void * n_array_ptr = array_ptr + size_of_type * (y * max_x + x);
memcpy(n_array_ptr, new_value, size_of_type);
}
int main() {
int d[10*20];
int new_val = 576;
two_d_store(d, sizeof(int), 10, 20, 4, 0, &new_val);
cout<<d[4*20]; //[4][20]
return 0;
}

Display contents (in hex) of 16 bytes at the specified address

I'm attempting to display the contents of a specific address, given a char* to that address. So far I had attempted doing it using the following implementation
int mem_display(char *arguments) {
int address = *arguments;
int* contents_pointer = (int*)address;
int contents = *contents_pointer;
printf("Address %p: contents %16x\n", contents_pointer, contents);
}
But I keep getting a "Segmentation Fault (Core Dumped)" error. I attempted to make a dummy pointer to test on
char foo = 6;
char *bar = &foo;
But the error still persists
I'm finding it hard to explain what the problem is because almost every single line in your code is wrong.
Here's what I would do:
void mem_display(const void *address) {
const unsigned char *p = address;
for (size_t i = 0; i < 16; i++) {
printf("%02hhx", p[i]);
}
putchar('\n');
}
You need to iterate over the contents of the address, and print each one separately, until you reach 16. Example code:
#include <stdio.h>
void mem_display(unsigned char *arguments) {
printf("Address %p: ", arguments);
int i =0;
unsigned char* byte_array = arguments;
while (i < 16)
{
printf("%02hhx", byte_array[i]);
i++;
}
}
int main(void) {
unsigned char foo = 6;
unsigned char *bar = &foo;
mem_display(bar);
return 0;
}
Output:
Address 0x7ffe5b86a777: 0677A7865BFE7F000060054000000000
If you already have a pointer to the address you want to print the contents of, you can feed that straight to printf, like this:
void print_16_bytes_at(const char *arguments)
{
printf("Address %p: contents", (const void *)arguments);
for (int i = 0; i < 16; i++)
printf(" %02x", (unsigned int)(unsigned char)arguments[i]);
putchar('\n');
}
If arguments isn't a pointer to the memory you want to print the contents of, then I don't understand what it actually is and I need you to explain better.
Notes on the above sample code:
To use %p without provoking undefined behavior, you must explicitly cast the pointer to [const] void * unless it is already [const] void *. Because printf takes a variable number of arguments, the compiler does not know the expected types of its arguments, so it doesn't insert this conversion for you, as it would with a normal function that takes a [const] void * argument.
The double cast (unsigned int)(unsigned char) forces the char read from arguments[i] to be zero-extended rather than sign-extended to the width of unsigned int. If you don't do that, values from 0x80 on up are liable to be printed with a bunch of leading Fs (e.g. ffffff80) which is probably not what you want.
This will still segfault if there aren't 16 bytes of readable memory at the supplied address. In your example
char foo = 6;
print_16_bytes_at(&foo);
there is only guaranteed to be one byte of readable memory at the address of foo. (It will probably work on any computer you can readily get at, but it's still allowed to crash per the letter of the C standard.)
There are a few issues with the original code. First, it is indirecting the passed in pointer twice. It is treating arguments as a pointer to a pointer to the contents to be printed. Since arguments is a pointer to char, this is definitely wrong. (It will end up reading from a very small address, which is a definite segmentation violation or other crash on most architectures.)
Second, unless you know the arguments pointer is aligned appropriately, loading an int via that pointer may crash due to an unaligned access. (Which may well show up as a segmentation violation.) You likely cannot assume proper alignment for this routine.
Third, if you need to print 16 bytes, then an int will (typically) only get 4 or 8 of them. It will be trickier to use standard printing routines to concatenate all the pieces than to write a byte by byte loop. (See above answer for an example of such a loop.)
I think you are either overcomplicating it, or you didn't describe enough what is the actual input. Is it pointer to pointer?
Anyway, to do it in simple way with some pointer to memory, you can do it like this for example (C-like C++, sorry, done in hurry online at web cpp.sh ):
#include <iostream>
const unsigned char fakeData[] = { 1, 13, 0, 255 };
void mem_display(
std::FILE* file,
const unsigned char* memoryPtr,
const size_t size)
{
fprintf(file, "Address %p:", (const void*)memoryPtr);
for (size_t i = 0; i < size; ++i) fprintf(file, " %02hhx", memoryPtr[i]);
fprintf(file, "\n");
}
int main()
{
mem_display(stdout, fakeData, 4);
}
Output:
Address 0x4008e6: 01 0d 00 ff
To print 16 bytes just change the size argument.
I can't think of common type having 16 bytes, so I'm not sure why you are trying to print it out as single number, usually the single-byte output like above is used in debuggers (until the user requests different size unit).
For quite a long time I use following function for printing content of memory area:
/*************************************************************/
/**
* Function to dump memory area
* \param address start address
* \param len length of memory area
*/
/*************************************************************/
void dump( void* address, int len ) noexcept{
int i;
printf("dump from 0x%lX (%d bytes)\n",(long)address, len);
printf("=============================================");
for(i=0; i<len; i++){
if(i%16==0){printf("\n");}
printf("%2X ",(*((char*)address+i))&0xFF);
}
printf("\n=============================================\n");
}
For C programs delete the noexcept keyword.

Pointer arithmetic on void* pointers

I am using the CUDA API / cuFFT API. In order to move data from host to GPU I am usign the cudaMemcpy functions. I am using it like below. len is the amount of elements on dataReal and dataImag.
void foo(const double* dataReal, const double* dataImag, size_t len)
{
cufftDoubleComplex* inputData;
size_t allocSizeInput = sizeof(cufftDoubleComplex)*len;
cudaError_t allocResult = cudaMalloc((void**)&inputData, allocSizeInput);
if (allocResult != cudaSuccess) return;
cudaError_t copyResult;
coypResult = cudaMemcpy2D(static_cast<void*>(inputData),
2 * sizeof (double),
static_cast<const void*>(dataReal),
sizeof(double),
sizeof(double),
len,
cudaMemcpyHostToDevice);
coypResult &= cudaMemcpy2D(static_cast<void*>(inputData) + sizeof(double),
2 * sizeof (double),
static_cast<const void*>(dataImag),
sizeof(double),
sizeof(double),
len,
cudaMemcpyHostToDevice);
//and so on.
}
I am aware, that pointer arithmetic on void pointers is actually not possible. the second cudaMemcpy2D does still work though. I still get a warning by the compiler, but it works correctly.
I tried using static_cast< char* > but that doesn't work as cuffDoubleComplex* cannot be static casted to char*.
I am a bit confused why the second cudaMemcpy with the pointer arithmetic on void is working, as I understand it shouldn't. Is the compiler implicitly assuming that the datatype behind void* is one byte long?
Should I change something there? Use a reinterpret_cast< char* >(inputData) for example?
Also during the allocation I am using the old C-style (void**) cast. I do this because I am getting a "invalid static_cast from cufftDoubleComplex** to void**". Is there another way to do this correctly?
FYI: Link to cudaMemcpy2D Doc
Link to cudaMalloc Doc
You cannot do arithmetic operations on void* since arithmetic operations on pointer are based on the size of the pointed objects (and sizeof(void) does not really mean anything).
Your code compiles probably thanks to a compiler extension that treats arithmetic operations on void* as arithmetic operation on char*.
In your case, you probably do not need arithmetic operations, the following should work (and be more robust):
coypResult &= cudaMemcpy2D(static_cast<void*>(&inputData->y),
sizeof (cufftDoubleComplex),
Since cufftDoubleComplex is simply:
struct __device_builtin__ __builtin_align__(16) double2
{
double x, y;
};

Extracting an int out of void*

I have a method that returns void* and it is basically some block of data from allocated shared memory on linux.
The first 4 bytes in that data are an int and I haven't managed to extract that int.
These are the methods what I have tried:
int getNum(void* data)
{
return *(int*)data; // 1
return *(int*)(data & 0xFFFFFFFF); // 2
}
Thanks in advance!
int getNum(void* data)
{
return *(int32_t*)data; // 1
// ^^^^^^^
}
Should work.
Also if you need to worry about unaligned addresses achieved from the void*, you may use #Martin Bonner's suggestion:
int getNum(void* data)
{
int32_t result;
memcpy(&result, data, sizeof(int32_t));
return result;
}
int getNum(void* data)
{
return *(int*)data; // 1
return *(int*)(data & 0xFFFFFFFF); // 2
}
The second method won't even compile; data is a pointer, and you cannot apply the & bitwise and operator to a pointer value. (I don't know why you'd even want to mask a pointer value like that. If it were legal, the mask would do nothing on a 32-bit system, and would probably destroy the pointer value on a 64-bit system.)
As for the first, it should work -- if data is a valid pointer pointing to a properly aligned int object.
If that's failing at run time with a segmentation fault, it probably means either that data is null or otherwise not a valid pointer (in that case you can't do what you're trying to do), or it points to a misaligned address (which won't cause a seg fault if you're on an x86 or x86_64 system; are you?).
To find out what the pointer looks like, try adding this:
printf("data = %p\n", data);
to your getNum function -- or examine the value of data in a debugger.
If alignment is the problem, then you can do this:
int result;
memcpy(&result, data, sizeof result);
return result;
But in that case storing an int value as a misaligned address is an odd thing to do in the first place. It's not necessarily wrong, just a very odd thing to do.
How is the memory that data points to allocated?
Assuming the int is 4 bytes long, and you're operating on Linux, your method will work. If you want a portable method to move ints in pointers, try the open source ptrint module from CCAN. It converts a pointer to int by adding NULL to the int. It converts in the other direction by subtracting NULL, returning a ptrdiff_t.
#include <assert.h>
#include <stdio.h>
int getNum(void* data)
{
return *(int *) data;
}
char buf[16];
int main(void)
{
assert(sizeof(int) == 4);
*(int *)buf = 9;
printf("%c%c%c%c\n", buf[0], buf[1], buf[2], buf[3]);
printf("%d\n", getNum((void *) buf));
return 0;
}
// $ ./foo | cat -tv
// ^I^#^#^#
// 9

Cuda allocation and return array from gpu to cpu

I have the following code in Cuda (it's not the full code).
I'm trying to check if it copies properly the arrays from host to device and from
device to host.
flVector is initialized with a few numbers as well as indeces.
The pass function needs to copy flVector and indeces to the device memory.
In the main, after I'm calling to pass function, I'm trying to copy again the arrays but now from device to host, and then print the values to check if the values are correct.
flat_h returns properly and the values are correct, but indeces returns with garbage values, and i don't know what is the problem with the code.
to return from the pass function two variables I used the return command to return flOnDevice, and i'm also passing a pointer to inOnDevice to save this array.
this two variables are on the device side, and then i'm trying to copy them back to host.
this is just a check to see that everything is going properly.. but when I print the inOnDevice i'm getting garbage values. why?
int* pass(vector<int>& flVector, int* indeces, int inSize, int* inOnDevice)
{
int* flOnDevice;
cudaMalloc((void**) &(flOnDevice), sizeof(int) * flVector.size());
cudaMemcpy(flOnDevice, &flVector[0], flVector.size()*sizeof(int),cudaMemcpyHostToDevice);
cudaMalloc((void**) &(inOnDevice), sizeof(int) * inSize);
cudaMemcpy(inOnDevice, indeces, inSize*sizeof(int), cudaMemcpyHostToDevice);
return flOnDevice;
}
void main()
{
int* insOnDevice = NULL;
int* flOnDevice;
flOnDevice = pass(flVector, indeces, indSize, inOnDevice);
int* flat_h = (int*)malloc(flVector.size()*sizeof(int));
int* inde_h = (int*)malloc(inSize*sizeof(int));
cudaMemcpy(flat_h,flOnDevice,flVector.size()*sizeof(int),cudaMemcpyDeviceToHost);
cudaMemcpy(inde_h,inOnDevice,inSize*sizeof(int),cudaMemcpyDeviceToHost);
printf("flat_h: \n\n");
for (int i =0; i < flVector.size(); i++)
printf("%d, " , flat_h[i]);
printf("\n\ninde_h: \n\n");
for (int i =0; i < inSize; i++)
printf("%d, " , inde_h[i]);
printf("\n\n");
}
This is not doing what you think it is:
int* pass(vector<int>& flVector, int* indeces, int inSize, int* inOnDevice)
{
...
cudaMalloc((void**) &(inOnDevice), sizeof(int) * inSize);
When you pass a pointer to a function this way, you are passing the pointer by value.
If you then take the address of that pointer-passed-by-value inside the function, that address has no connection to anything in the function calling context. Inside the function pass, there is a local copy of *inOnDevice, and you are modifying that local copy with the subsequent cudaMalloc operation.
Instead, you need to pass a pointer-to-a-pointer in this situation (simulated pass-by-reference) or else pass by reference. For the pointer-to-a-pointer example, it would look something like this:
int* pass(vector<int>& flVector, int* indeces, int inSize, int** inOnDevice)
{
...
cudaMalloc((void**) inOnDevice, sizeof(int) * inSize);
cudaMemcpy(*inOnDevice, indeces, inSize*sizeof(int), cudaMemcpyHostToDevice);
And in main:
flOnDevice = pass(flVector, indeces, indSize, &inOnDevice);
And I think if you had used proper cuda error checking as I suggested to you before, you would have seen an error returned from this line of code:
cudaMemcpy(inde_h,inOnDevice,inSize*sizeof(int),cudaMemcpyDeviceToHost);