Related
I use new to allocate a buffer, as follows:
BYTE *p;
p = new BYTE[20];
If I do NOT store the size of the allocated buffer, how to determine the buffer size via p only?
You can't as p is just a pointer to the blocks of memory allocated. You have to keep count of how much memory you allocated.
You have to store the size of the allocated buffer in a variable if you want access to it later. After those statements, you'll only have access to the pointer, which can't tell you how many elements are in the buffer.
This question already has answers here:
How to find the size of an array (from a pointer pointing to the first element array)?
(17 answers)
Closed 8 years ago.
i run the following code but it kept printing "4"
why its printing "4" and not "12"? and can I use malloc and then sizeof?(if i can then how)
#include<stdio.h>
int main()
{
int arr1[3]={1,2,3};
int *arr2=arr1,i;
printf("%d",sizeof(arr2));
return 0;
}
Pointers are not arrays. arr2 is a pointer to int. sizeof(arr2) will return size of pointer. To print size of an array, the operand of sizeof must be of an array type:
printf("%u",sizeof(arr1));
can I use malloc and then sizeof?
No. There is no portable way to find out the size of a malloc'ed block. malloc returns pointer to the allocated memory. sizeof that pointer will return size of the pointer itself. But you should note that, there is no need to use sizeof when you allocate memory dynamically. In this case you already know the size of array. (In case of char array use strlen).
Further reading: c-faq: Why doesn't sizeof tell me the size of the block of memory pointed to by a pointer?
sizeof(arr2)
would print the size of pointer as it's a int*. However, if you try sizeof(arr1), it would print
sizeof(element_type) * array_size
i.e size of array. Remember that it's not taking into account how many elements are there in array. It would just consider how many elements can array store.
arr2 is a pointer and you are printing sizeof(pointer)
sizeof(arr1) will give you sizeof array which might give you 12.(Given your integer is 4 bytes)
It's printing 4 because arr2 is a pointer, and the size of a pointer is 4 bytes in 32bit architectures. You can't know the size of a dynamically allocated array ( array allocated with malloc ) given just a pointer to it.
I have a c++ function that accepts a Pointer to an array of known size as input and returns an array of size that cannot be determined until the function completes its processing of data. How do I call the c++ function, passing the first array and receive the results as a second array?
I currently do something similar to this:
PYTHON:
def callout(largeListUlonglongs):
cFunc = PyDLL("./cFunction.dll").processNumbers
arrayOfNums = c_ulonglong * len(largeListUlonglongs)
numsArray = arrayOfNums()
for x in xrange(len(largeListUlonglongs)):
numsArray[x] = long(largeListUlonglongs[x])
cFunc.restype = POINTER(c_ulonglong * len(largeListUlonglongs))
returnArray = cFunc(byref(numsArray)).contents
return returnArray
This works so long as the returnArray is of the same size as the numsArray. If it is smaller then the empty elements of the array are filled with 0's. If the returned array is larger than the results get cut off once the array elements are filled.
If it helps, the structure of the returned array contains the size of the returned array as its first element.
Thanks for the help in advance...
Normally it is preferable to get the caller to allocate the buffer. That way the caller is in a position to deallocate it also. However, in this case, only the callee knows how long the buffer needs to be. And so the onus passes to the callee to allocate the buffer.
But that places an extra constraint on the system. Since the callee is allocating the buffer, the caller cannot deallocate it unless they share the same allocator. That can actually be arranged without too much trouble. You can use a shared allocator. There are a few. Your platform appears to be Windows, so for example you can use CoTaskMemAlloc and CoTaskMemFree. Both sides of the interface can call those functions.
The alternative is to keep allocation and deallocation together. The caller must hold on to the pointer that the callee returns. When it has finished with the buffer, usually after copying it into a Python structure, it asks the library to deallocate the memory.
David gave you useful advice on the memory management concerns. I would generally use the simpler strategy of having a function in the library to free the allocated buffer. The onus is on the caller to prevent memory leaks.
To me your question seems to be simply about casting the result to the right pointer type. Since you have the length in index 0, you can set the result type to a long long pointer. Then get the size so you can cast the result to the correct pointer type.
def callout(largeListUlonglongs):
cFunc = PyDLL("./cFunction.dll").processNumbers
cFunc.restype = POINTER(c_ulonglong)
arrayOfNums = c_ulonglong * len(largeListUlonglongs)
numsArray = arrayOfNums(*largeListUlonglongs)
result = cFunc(numsArray)
size = result[0]
returnArray = cast(result, POINTER(c_ulonglong * size))[0]
return returnArray
I've been haunted by this error for quite a while so I decided to post it here.
This segmentation fault happened when a cudaMemcpy is called:
CurrentGrid->cdata[i] = new float[size];
cudaMemcpy(CurrentGrid->cdata[i], Grid_dev->cdata[i], size*sizeof(float),\
cudaMemcpyDeviceToHost);
CurrentGrid and Grid_dev are pointer to a grid class object on host and device respectively and i=0 in this context. Class member cdata is a float type pointer array. For debugging, right before this cudaMemcpy call I printed out the value of each element of Grid_Dev->cdata[i], the address of CurrentGrid->cdata[i] and Grid_dev->cdata[i] and the value of size, which all looks good. But it still ends up with "Segmentation fault (core dumped)", which is the only error message. cuda-memcheck only gave "process didn't terminate successfully". I'm not able to use cuda-gdb at the moment. Any suggestion about where to go?
UPDATE: It seems now I have solved this problem by cudaMalloc another float pointer A on device and cudaMemcpy the value of Grid_dev->cdata[i] to A, and then cudaMemcpy A to host.
So the segment of code written above becomes:
float * A;
cudaMalloc((void**)&A, sizeof(float));
...
...
cudaMemcpy(&A, &(Grid_dev->cdata[i]), sizeof(float *), cudaMemcpyDeviceToHost);
CurrentGrid->cdata[i] = new float[size];
cudaMemcpy(CurrentGrid->cdata[i], A, size*sizeof(float), cudaMemcpyDeviceToHost);
I did this because valgrind popped up "invalid read of size 8", which I thought referring to Grid_dev->cdata[i]. I checked it again with gdb, printing out the value of Grid_dev->cdata[i] being NULL. So I guess I cannot directly dereference the device pointer even in this cudaMemcpy call. But why ? According to the comment at the bottom of this thread , we should be able to dereference device pointer in cudaMemcpy function.
Also, I don't know the the underlying mechanism of how cudaMalloc and cudaMemcpy work but I think by cudaMalloc a pointer, say A here, we actually assign this pointer to point to a certain address on the device. And by cudaMemcpy the Grid_dev->cdata[i] to A as in the modified code above, we re-assign the pointer A to point to the array. Then don't we lose the track of the previous address that A pointed to when it is cudaMalloced? Could this cause memory leak or something? If yes, how should I work around this situation properly?
Thanks!
For reference I put the code of the complete function in which this error happened below.
Many thanks!
__global__ void Print(grid *, int);
__global__ void Printcell(grid *, int);
void CopyDataToHost(param_t p, grid * CurrentGrid, grid * Grid_dev){
cudaMemcpy(CurrentGrid, Grid_dev, sizeof(grid), cudaMemcpyDeviceToHost);
#if DEBUG_DEV
cudaCheckErrors("cudaMemcpy1 error");
#endif
printf("\nBefore copy cell data\n");
Print<<<1,1>>>(Grid_dev, 0); //Print out some Grid_dev information for
cudaDeviceSynchronize(); //debug
int NumberOfBaryonFields = CurrentGrid->ReturnNumberOfBaryonFields();
int size = CurrentGrid->ReturnSize();
int vsize = CurrentGrid->ReturnVSize();
CurrentGrid->FieldType = NULL;
CurrentGrid->FieldType = new int[NumberOfBaryonFields];
printf("CurrentGrid size is %d\n", size);
for( int i = 0; i < p.NumberOfFields; i++){
CurrentGrid->cdata[i] = NULL;
CurrentGrid->vdata[i] = NULL;
CurrentGrid->cdata[i] = new float[size];
CurrentGrid->vdata[i] = new float[vsize];
Printcell<<<1,1>>>(Grid_dev, i);//Print out element value of Grid_dev->cdata[i]
cudaDeviceSynchronize();
cudaMemcpy(CurrentGrid->cdata[i], Grid_dev->cdata[i], size*sizeof(float),\
cudaMemcpyDeviceToHost); //where error occurs
#if DEBUG_DEV
cudaCheckErrors("cudaMemcpy2 error");
#endif
printf("\nAfter copy cell data\n");
Print<<<1,1>>>(Grid_dev, i);
cudaDeviceSynchronize();
cudaMemcpy(CurrentGrid->vdata[i], Grid_dev->vdata[i], vsize*sizeof(float),\
cudaMemcpyDeviceToHost);
#if DEBUG_DEV
cudaCheckErrors("cudaMemcpy3 error");
#endif
}
cudaMemcpy(CurrentGrid->FieldType, Grid_dev->FieldType,\
NumberOfBaryonFields*sizeof(int), cudaMemcpyDeviceToHost);
#if DEBUG_DEV
cudaCheckErrors("cudaMemcpy4 error");
#endif
}
EDIT: here is the information from valgrind, from which I'm trying to track down where the memory leak happened.
==19340== Warning: set address range perms: large range [0x800000000, 0xd00000000) (noaccess)
==19340== Warning: set address range perms: large range [0x200000000, 0x400000000) (noaccess)
==19340== Invalid read of size 8
==19340== at 0x402C79: CopyDataToHost(param_t, grid*, grid*) (CheckDevice.cu:48)
==19340== by 0x403646: CheckDevice(param_t, grid*, grid*) (CheckDevice.cu:186)
==19340== by 0x40A6CD: main (Transport.cu:81)
==19340== Address 0x2003000c0 is not stack'd, malloc'd or (recently) free'd
==19340==
==19340==
==19340== Process terminating with default action of signal 11 (SIGSEGV)
==19340== Bad permissions for mapped region at address 0x2003000C0
==19340== at 0x402C79: CopyDataToHost(param_t, grid*, grid*) (CheckDevice.cu:48)
==19340== by 0x403646: CheckDevice(param_t, grid*, grid*) (CheckDevice.cu:186)
==19340== by 0x40A6CD: main (Transport.cu:81)
==19340==
==19340== HEAP SUMMARY:
==19340== in use at exit: 2,611,365 bytes in 5,017 blocks
==19340== total heap usage: 5,879 allocs, 862 frees, 4,332,278 bytes allocated
==19340==
==19340== LEAK SUMMARY:
==19340== definitely lost: 0 bytes in 0 blocks
==19340== indirectly lost: 0 bytes in 0 blocks
==19340== possibly lost: 37,416 bytes in 274 blocks
==19340== still reachable: 2,573,949 bytes in 4,743 blocks
==19340== suppressed: 0 bytes in 0 blocks
==19340== Rerun with --leak-check=full to see details of leaked memory
==19340==
==19340== For counts of detected and suppressed errors, rerun with: -v
==19340== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
I believe I know what the problem is, but to confirm it, it would be useful to see the code that you are using to set up the Grid_dev classes on the device.
When a class or other data structure is to be used on the device, and that class has pointers in it which refer to other objects or buffers in memory (presumably in device memory, for a class that will be used on the device), then the process of making this top-level class usable on the device becomes more complicated.
Suppose I have a class like this:
class myclass{
int myval;
int *myptr;
}
I could instantiate the above class on the host, and then malloc an array of int and assign that pointer to myptr, and everything would be fine. To make this class usable on the device and the device only, the process could be similar. I could:
cudaMalloc a pointer to device memory that will hold myclass
(optionally) copy an instantiated object of myclass on the host to the device pointer from step 1 using cudaMemcpy
on the device, use malloc or new to allocate device storage for myptr
The above sequence is fine if I never want to access the storage allocated for myptr on the host. But if I do want that storage to be visible from the host, I need a different sequence:
cudaMalloc a pointer to device memory that will hold myclass, let's call this mydevobj
(optionally) copy an instantiated object of myclass on the host to the device pointer mydevobj from step 1 using cudaMemcpy
Create a separate int pointer on the host, let's call it myhostptr
cudaMalloc int storage on the device for myhostptr
cudaMemcpy the pointer value of myhostptr from the host to the device pointer &(mydevobj->myptr)
After that, you can cudaMemcpy the data pointed to by the embedded pointer myptr to the region allocated (via cudaMalloc) on myhostptr
Note that in step 5, because I am taking the address of this pointer location, this cudaMemcpy operation only requires the mydevobj pointer on the host, which is valid in a cudaMemcpy operation (only).
The value of the device pointer myint will then be properly set up to do the operations you are trying to do. If you then want to cudaMemcpy data to and from myint to the host, you use the pointer myhostptr in any cudaMemcpy calls, not mydevobj->myptr. If we tried to use mydevobj->myptr, it would require dereferencing mydevobj and then using it to retrieve the pointer that is stored in myptr, and then using that pointer as the copy to/from location. This is not acceptable in host code. If you try to do it, you will get a seg fault. (Note that by way of analogy, my mydevobj is like your Grid_dev and my myptr is like your cdata)
Overall it is a concept that requires some careful thought the first time you run into it, and so questions like this come up with some frequency on SO. You may want to study some of these questions to see code examples (since you haven't provided your code that sets up Grid_dev):
example 1
example 2
example 3
When I run my program with 1 array, like this:
int a[430][430];
int i, j, i_r0, j_r0;
double c, param1, param2;
int w_far = 0,h_far = 0;
char* magic_num1 = "";
it's good!
But, when I write:
int a[430][430];
int i, j, i_r0, j_r0;
int nicky[430][430]; // Added line
double c, param1, param2;
int w_far = 0,h_far = 0;
char* magic_num1 = "";
the program not run with the error: "stack overflow"!
I don't know how to solve it!
You need to either increase the stack space (how that is done depends on your platform), or you need to allocate the array from the heap, or even better, use std::vector instead of an array.
You're trying to allocate ~1.48 MB of stuff on the stack1, on your system (and not only on it) that's too much.
In general, the stack is not made for keeping big objects, you should put them in the heap instead; use dynamic allocation with new or std::vector, or, even better suited in your case, boost::multi_array.
1. Assuming 32 bit ints.
A proper solution is to use heap, but also note that you'll likely find that changing to:
short a[430][430];
short nicky[430][430]; // Added line
fixes the overflow, depending on your platform. So if 'short', or 'unsigned short' is big enough, this might be an option.
In fact, even when using the heap, consider carefully the array type to reduce memory footprint for a large array.
Local variables are allocated to "stack", which is a storage space used to several purposes and limited to a certain size.
Usually you can declare variables up to several kilobytes, but when you want to use more memory, usually suggested to use "heap", which can be allocated by new operator or std::vector.
std::vector is an alternate for traditional arrays, and its data is safely stored in heap.
To avoid stack overflow, allocate the arrays in the heap.
If one uses C, then allocating an array of size n in the heap can be done by e.g.
int* A = (int*) malloc(n*sizeof(int));
But you must remeber to free that memory when no longer needed with
free(A);
to avoid memory leak.
Equivalently in C++:
int* A = new int[n];
and free with
delete [] A;
This site was helpful.