Creating an object in device code - c++

I want to create an object on the device and allocate it to a pointer available on the host. Is there something I'm doing wrong in here?
__global__ void createAProduction(DeviceProduction* production) {
production = new AProduction();
}
DeviceProduction * devAProduction = NULL;
cudaMalloc(&devAProduction, sizeof(AProduction));
createAProduction<<<1, 1>>>(devAProduction);
deviceProductions["A"] = devAProduction;
Somewhere further in the code I'd like to do sth. like:
BatchOperation ** devBatchOperations;
cudaMalloc((void **) &devBatchOperations, sizeof(BatchOperation *) * operationCount);
Then I populate that pointer array with that:
void DeviceBatchExecutor::execute(vector<BatchOperation> operationsToPerform) {
BatchOperation ** devBatchOperations;
cudaMalloc((void **) &devBatchOperations, sizeof(BatchOperation *) * operationsToPerform.size());
int i = 0;
for(batchOperationIt it = operationsToPerform.begin(); it != operationsToPerform.end(); ++it) {
BatchOperation * devBatchOperation;
cudaMalloc(&devBatchOperation, sizeof(BatchOperation));
cudaMemcpy(&devBatchOperation, &it, sizeof(BatchOperation), cudaMemcpyHostToDevice);
Vertex * devInputNode = it->inputNode->allocateToDevice();
cudaMemcpy(&(devBatchOperation->inputNode), &devInputNode, sizeof(Vertex *), cudaMemcpyDeviceToDevice);
cudaMemcpy(&(devBatchOperation->production), &(it->production), sizeof(Production *), cudaMemcpyDeviceToDevice);
cudaMemcpy(&devBatchOperations[i], &devBatchOperation, sizeof(BatchOperation *), cudaMemcpyDeviceToDevice);
i++;
}
int operationCount = operationsToPerform.size();
executeOperations<<<operationCount, 1>>>(devBatchOperations);
}
where production is a pointer to the device memory holding that created object AProduction. Then I finally invoke processing via
executeOperations<<<operationCount, 1>>>(devBatchOperations);
So I'm relying on virtual method calls. As those DeviceProduction objects were created on the device, there is also a virtual pointer table so it should work. See example here. But it doesn't since the received batch operations seem random... crashes on invocation.
__global__ void executeOperations(BatchOperation ** operation) {
operation[blockIdx.x]->production->apply(operation[blockIdx.x]->inputNode);
}
Batch operation is a struct holding the production to be executed.
struct BatchOperation {
Production * production;
Vertex * inputNode;
Vertex * outputNode;
};

Is there something I'm doing wrong in here?
Yes, probably. The pointer production is passed to the kernel by value:
createAProduction<<<1, 1>>>(devAProduction);
It points to a location in device memory somewhere, since you've already run cudaMalloc on it. This line of kernel code:
production = new AProduction();
overwrites the pass-by-value copy of the production pointer with a new one, returned by in-kernel new. That is almost certainly not what you had intended. (And you haven't defined what AProduction is.). At the completion of that kernel call, the pass-by-value "copy" of the pointer will be lost anyway. You might be able to fix it like this:
*production = *(new DeviceProduction());
Now your production pointer points to a region in device memory that holds an instantiated (on the device) object, which appears to be your intent there. Creating a new object just to copy it may not be necessary, but that is not the crux of the issue I'm trying to point out here. You can probably also "fix" this issue by passing a pointer-to-pointer to the kernel instead. You would then need to allocate for an array of pointers, and assign one of the individual pointers using the in-kernel new directly, as you have shown.
The remainder of your code has a great many items undefined. For example in the above code it's not clear why you would declare that production is a pointer to a DeviceProduction type, but then try to allocate an AProduction type to it. Presumably that is some form of object inheritance which is unclear.
Since you haven't really provided anything approaching a complete code, I've borrowed some pieces from here to put together a complete worked example, showing object creation/setup in one kernel, followed by another kernel that invokes virtual methods on those objects:
$ cat t1086.cu
#include <stdio.h>
#define N 4
class Polygon {
protected:
int width, height;
public:
__host__ __device__ void set_values (int a, int b)
{ width=a; height=b; }
__host__ __device__ virtual int area ()
{ return 0; }
};
class Rectangle: public Polygon {
public:
__host__ __device__ int area ()
{ return width * height; }
};
class Triangle: public Polygon {
public:
__host__ __device__ int area ()
{ return (width * height / 2); }
};
__global__ void setup_f(Polygon ** d_polys) {
int idx = threadIdx.x+blockDim.x*blockIdx.x;
if (idx < N) {
if (idx%2)
d_polys[idx] = new Rectangle();
else
d_polys[idx] = new Triangle();
d_polys[idx]->set_values(5,12);
}};
__global__ void area_f(Polygon ** d_polys) {
int idx = threadIdx.x+blockDim.x*blockIdx.x;
if (idx < N){
printf("area of object %d = %d\n", idx, d_polys[idx]->area());
}};
int main () {
Polygon **devPolys;
cudaMalloc(&devPolys,N*sizeof(Polygon *));
setup_f<<<1,N>>>(devPolys);
area_f<<<1,N>>>(devPolys);
cudaDeviceSynchronize();
}
$ nvcc -o t1086 t1086.cu
$ cuda-memcheck ./t1086
========= CUDA-MEMCHECK
area of object 0 = 30
area of object 1 = 60
area of object 2 = 30
area of object 3 = 60
========= ERROR SUMMARY: 0 errors
$

Robert's suggestion seems to made it work:
__global__ void createAProduction(DeviceProduction** production) {
int idx = threadIdx.x+blockDim.x*blockIdx.x;
if(idx == 0) {
production[0] = new AProduction();
}
}
Called like this:
DeviceProduction ** devAProduction = NULL;
cudaMalloc(&devAProduction, sizeof(AProduction *));
createAProduction<<<1, 1>>>(devAProduction);
gpuErrchk( cudaPeekAtLastError() );
gpuErrchk( cudaDeviceSynchronize() );
But if I want to keep single pointer structure for deviceProductions array would it be ok to do sth. like this?
deviceProductions["A"] = (DeviceProduction *) malloc(sizeof(AProduction *));
gpuErrchk(cudaMemcpy(deviceProductions["A"], devAProduction, sizeof(AProduction *), cudaMemcpyDeviceToHost));
My intention was to copy the pointer (address) to the host memory from the device memory. Am I doing it right?

Related

Passing array of class objects

I've been trying for a long time to pass an array of objects to another class object.
In settingUp.cpp:
//** Status classes and their functions **//
void settingUp(){
dataClass prueba0;
dataClass prueba1;
dataClass prueba2;
const dataClass * arrayPrueba[3];
prueba0.setValues(1);
prueba1.setValues(2);
prueba2.setValues(3);
arrayPrueba[0] = &prueba0;
arrayPrueba[1] = &prueba1;
arrayPrueba[2] = &prueba2;
statusClass status;
status.setValues(1, arrayPrueba);
status.printValues();
}
In classData.cpp:
//** dataClass and their functions **//
void dataClass::setValues(int _length){
length = _length;
}
void dataClass::printValues() const{
printf("TP: dataClass: length = %d\n", &length);
};
In statusClass.cpp:
//** Status classes and their functions **//
void statusClass::setValues (uint8_t _statusSelectorByte, const dataClass **_array){
newStatusSelectorByte = _statusSelectorByte;
array = *_array;
};
void statusClass::printValues(){
printf("TP: statusClass -> printValues: Prueba = %d\n", newStatusSelectorByte);
printf("TP: statusClass -> printValues: arrayPrueba = %d\n", array[1].length);
}
When I call:
status.printValues();
I can read only the fist element of the arrayPrueba.
In statusClass::setValues(), *_array is the same as _array[0]. You are storing only the first dataClass* pointer from the input array.
Later, when using array[1], you are mistreating array as-if it were a pointer to an array of objects, when it is really a pointer to a single object instead. You are thus reaching past that object into surrounding memory, which is undefined behavior (but may "work" in this case because an object may happen to actually exist at that location, but this is bad behavior to rely on).
You need to store the original array pointer, not a single element taken from the array.
private:
const dataClass **array; // <-- add an *
void statusClass::setValues (uint8_t _statusSelectorByte, const dataClass **_array){
newStatusSelectorByte = _statusSelectorByte;
array = _array; // <-- get rid of the *
};
void statusClass::printValues(){
printf("TP: statusClass -> printValues: Prueba = %d\n", newStatusSelectorByte);
printf("TP: statusClass -> printValues: arrayPrueba = %d\n", array[1]->length); // use -> instead of .
}
On a side note: in dataClass::printValues(), you need to drop the & when printing the value of length:
printf("TP: dataClass: length = %d\n", length);

Memory Pools and <Unable to read memory>

I recently switched my project to using a linear memory allocator that I wrote myself (for learning). When I initialize the allocator, I pass it a pointer to a block of memory that was VirtualAlloc-ed beforehand. Before writing the allocator, I was using this block directly just fine.
In my test case, I am using the allocator to allocate memory for a Player* in that initial big block of memory. To make sure every was working, I tried accessing the block of memory directly as I had before to make sure the values were changing according to my expectations. That's when I hit a memory access error. Using the VS debugger/watch window, I have a reasonable idea of what is happening and when, but I am hoping to get some help with the question of why. I'll lay out the relevant pieces of code below.
Virtual Alloc call, later referred to by memory->transientStorage
win32_State.gameMemoryBlock = VirtualAlloc(baseAddress, (size_t)win32_State.totalSize,
MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
Allocator definition
struct LinearAllocator {
void* currentPos;
size_t totalSize;
void* startPos;
size_t usedMemory;
size_t numAllocations;
LinearAllocator();
LinearAllocator(size_t size, void* start);
LinearAllocator(LinearAllocator&) = delete;
~LinearAllocator();
void* allocate(size_t size, uint8 alignment);
void clear();
};
Player and Vec2f definitions
struct Player {
Vec2f pos;
bool32 isFiring;
real32 timeLastFiredMS;
};
union Vec2f {
struct {
real32 x, y;
};
real32 v[2];
};
Relevant Allocator Implementation Details
void* LinearAllocator::allocate(size_t size, uint8_t alignment) {
if (size == 0 || !isPowerOfTwo(alignment)) {
return nullptr;
}
uint8_t adjustment = alignForwardAdjustment(currentPos, alignment);
if (usedMemory + adjustment + size > totalSize) {
return nullptr;
}
uint8_t* alignedAddress = (uint8*)currentPos + adjustment;
currentPos = (void*)(alignedAddress + size);
usedMemory += size + adjustment;
numAllocations++;
return (void*)alignedAddress;
}
inline uint8_t alignForwardAdjustment(void* address, uint8_t alignment) {
uint8_t adjustment = alignment - ( (size_t)address & (size_t)(alignment - 1));
if (adjustment == alignment) {
return 0; // already aligned
}
return adjustment;
}
inline int32_t isPowerOfTwo(size_t value) {
return value != 0 && (value & (value - 1)) == 0;
}
Initialization code where I attempt to use allocator
// **Can write to memory fine here**
((float*)memory->transientStorage)[0] = 4.f;
size_t simulationAllocationSize = memory->transientStorageSize / 2 / sizeof(real32);
simulationMemory = LinearAllocator(simulationAllocationSize, &memory->transientStorage + (uint8_t)0);
for (int i = 0; i < MAX_PLAYERS; i++) {
Player* p = (Player*)simulationMemory.allocate(sizeof(Player), 4);
// **also works here**
((real32*)memory->transientStorage)[0] = 3.f;
p->pos.x = 0.f; // **after this line, I got the unable to read memory error**
p->pos.y = 0.f;
p->isFiring = false;
p->timeLastFiredMS = 0.f;
// **can't write **
((real32*)memory->transientStorage)[0] = 1.f;
}
// **also can't write**
((real32*)memory->transientStorage)[0] = 2.f;
real32 test = ((real32*)memory->transientStorage)[0];
My running assumption is that I'm missing something obvious. But the only clue I have to go off of is that it changed after setting a value in the Player struct. Any help here would be greatly appreciated!
Looks like this is your problem:
simulationMemory = LinearAllocator(simulationAllocationSize,
&memory->transientStorage + (uint8_t)0);
There's a stray & operator, causing you to allocate memory not from the allocated memory block that memory->transientStorage points to but from wherever memory itself lives.
This is turns causes the write to p->pos.x to overwrite the value of transientStorage.
The call to LinearAllocator should be just
simulationMemory = LinearAllocator(simulationAllocationSize,
memory->transientStorage + (uint8_t)0);

Why can't one clone a `Space` in Gecode before solving the original one?

I'm looking for a way to copy Space instances in Gecode and then analyze the difference between the spaces later.
However it goes already wrong after the first copy. When one copies the code in the book Modelling and Programming in Gecode, as shown here below, and simply modifies it such that a copy is made first (SendMoreMoney* smm = m->copy(true);), one gets a Segmentation fault, regardless whether the shared option is true or false.
#include <gecode/int.hh>
#include <gecode/search.hh>
using namespace Gecode;
class SendMoreMoney : public Space {
protected:
IntVarArray l;
public:
SendMoreMoney(void) : l(*this, 8, 0, 9) {
IntVar s(l[0]), e(l[1]), n(l[2]), d(l[3]),
m(l[4]), o(l[5]), r(l[6]), y(l[7]);
// no leading zeros
rel(*this, s, IRT_NQ, 0);
rel(*this, m, IRT_NQ, 0);
// all letters distinct
distinct(*this, l);
// linear equation
IntArgs c(4+4+5); IntVarArgs x(4+4+5);
c[0]=1000; c[1]=100; c[2]=10; c[3]=1;
x[0]=s; x[1]=e; x[2]=n; x[3]=d;
c[4]=1000; c[5]=100; c[6]=10; c[7]=1;
x[4]=m; x[5]=o; x[6]=r; x[7]=e;
c[8]=-10000; c[9]=-1000; c[10]=-100; c[11]=-10; c[12]=-1;
x[8]=m; x[9]=o; x[10]=n; x[11]=e; x[12]=y;
linear(*this, c, x, IRT_EQ, 0);
// post branching
branch(*this, l, INT_VAR_SIZE_MIN(), INT_VAL_MIN());
}
// search support
SendMoreMoney(bool share, SendMoreMoney& s) : Space(share, s) {
l.update(*this, share, s.l);
}
virtual SendMoreMoney* copy(bool share) {
return new SendMoreMoney(share,*this);
}
// print solution
void print(void) const {
std::cout << l << std::endl;
}
};
// main function
int main(int argc, char* argv[]) {
// create model and search engine
SendMoreMoney* m = new SendMoreMoney;
SendMoreMoney* mc = m->copy(true);
DFS<SendMoreMoney> e(m);
delete m;
// search and print all solutions
while (SendMoreMoney* s = e.next()) {
s->print(); delete s;
}
return 0;
}
How can one make a real copy?
You have to call status() on the Space first.
I found this exchange in the Gecode mailing list archives: https://www.gecode.org/users-archive/2006-March/000439.html
It would seem that internally, Gecode uses the copy function and constructor for its own internal purposes, so to make a "copy-by-value" copy of a space, you need to use the clone() function defined in the Space interface. However, as noted in #Anonymous answer, you need to call status() before calling clone or it will throw an exception of type SpaceNotStable
I augmented my space with the function below to automatically call status, make the clone, and return a pointer of my derived type:
struct Example : public Space {
...
Example * cast_clone() {
status();
return static_cast<Example *>(this->clone());
}
...
}
As a workaround, one can create a totally independent space and then use equality constraints
on the variable level to reduce the domains of these variables.
Example:
void cloneHalfValues(SendMoreMoney* origin) {
int n = l.size();
for(int i = 0x00; i < n/2; i++) {
if(origin->l[i].assigned()) {
rel(*this, l[i], IRT_EQ, origin->l[i].val());
}
}
}
The reason why one can't clone a Space is however still a mystery.

NSInvocation not passing pointer to c++ array

I think I'm making just a fundamental mistake, but I cannot for the life of me see it.
I'm calling a method on an Objective-C object from within a C++ class (which is locked). I'm using NSInvocation to prevent me from having to write hundreds methods just to access the data in this other object.
These are the steps I'm going through. This is my first call, and I want to pass s2. I can't really provide a compilable example, but hopefully it's just a DUHRRRRR problem on my part.
float s2[3];
id args2s[] = {(id)&_start.x(),(id)&_start.y(),(id)&s2};
_view->_callPixMethod(#selector(convertPixX:pixY:toDICOMCoords:),3,args2s);
This is the View method being called
invokeUnion View::_callPixMethod(SEL method, int nArgs, id args[])
{
DataModule* data;
DataVisitor getdata(&data);
getConfig()->accept(getdata);
invokeUnion retVal;
retVal.OBJC_ID = data->callPixMethod(_index, _datasetKey, method, nArgs, args);
return retVal;
}
Invoke Union is a union so I can get the float value returned by NSInvocation.
union invokeUnion {
id OBJC_ID;
int intValue;
float floatValue;
bool boolValue;
};
This is the method in the data Object (pthread locked with lock() and unlock());
id DataModule::callPixMethod(int index, std::string predicate, SEL method, int nArgs, id args[] )
{
// May Block
DCMPix *pix =[[getSeriesData(predicate) pix] objectAtIndex:index];
lock();
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
NSMethodSignature *signature;
NSInvocation *invocation;
signature = [DCMPix instanceMethodSignatureForSelector:method];
invocation = [NSInvocation invocationWithMethodSignature:signature];
[invocation setSelector:method];
[invocation setTarget:pix];
if (nArgs > 0) for (int n = 0; n < nArgs; n++) {
SFLog(#"invocation: i=%d, *ptr=0x%x, valf=%f, vald=%d",n,args[n],*args[n],*args[n]);
[invocation setArgument:args[n] atIndex:2+n];
}
id retVal;
[invocation invoke];
[invocation getReturnValue:&retVal];
[pool release];
unlock();
return retVal;
}
The method in the DCMPix object (which I can't modify, it's part of a library) is the following:
-(void) convertPixX: (float) x pixY: (float) y toDICOMCoords: (float*) d pixelCenter: (BOOL) pixelCenter
{
if( pixelCenter)
{
x -= 0.5;
y -= 0.5;
}
d[0] = originX + y*orientation[3]*pixelSpacingY + x*orientation[0]*pixelSpacingX;
d[1] = originY + y*orientation[4]*pixelSpacingY + x*orientation[1]*pixelSpacingX;
d[2] = originZ + y*orientation[5]*pixelSpacingY + x*orientation[2]*pixelSpacingX;
}
-(void) convertPixX: (float) x pixY: (float) y toDICOMCoords: (float*) d
{
[self convertPixX: x pixY: y toDICOMCoords: d pixelCenter: YES];
}
It's crashing when it tries to access d[0]. BAD_EXC_ACCESS which I know means it's accessing released memory, or memory outside of it's scope.
I'm getting lost keeping track of my pointers to pointers. the two float values come across fine (as does other info in other methods) but this is the only one asking for a float* as a parameter. From what I understand the convertPixX: method was converted over from a C program written for Mac OS 9... which is why it asks for the c-array as an out value... I think.
Anyway, any insight would be greatly appreciated.
I've tried sending the value like this:
float *s2 = new float[3];
void* ps2 = &s2;
id args2s[] = {(id)&_start.x(),(id)&_start.y(),(id)&ps2};
_view->_callPixMethod(#selector(convertPixX:pixY:toDICOMCoords:),3,args2s);
But that gives a SIGKILL - plus I'm sure it's bogus and wrong. ... but I tried.
anyway... pointers! cross-language! argh!
Thanks,
An array is not a pointer. Try adding the following line
NSLog(#"%p, %p", s2, &s2);
just above.
id args2s[] = {(id)&_start.x(),(id)&_start.y(),(id)&s2};
s2 and &s2 are both the address of the first float in your array, so when you do:
[invocation setArgument:args[n] atIndex:2+n];
for n = 2, you are not copying in a pointer to the first float, but the first float, possibly the first two floats if an id is 64 bits wide.
Edit:
To fix the issue, this might work (not tested).
float s2[3];
float* s2Pointer = s2;
id args2s[] = {(id)&_start.x(),(id)&_start.y(),(id)&s2Pointer};
_view->_callPixMethod(#selector(convertPixX:pixY:toDICOMCoords:),3,args2s);
s2Pointer is a real pointer that will give you the double indirection you need.

Allocation memory for display

I am trying to create a graphics library. I need to:
int NewDisplay(Display **display, DisplayClass dispClass, int xRes, int yRes)
{
/* create a display:
-- allocate memory for indicated class and resolution
-- pass back pointer to Display object in display
*/
return SUCCESS;
}
How can I allocate memory to class and to the resolution?
Let me guess:
int NewDisplay(Display **display, DisplayClass dispClass, int xRes, int yRes)
{
(*display) = new Display( dispClass, xRes, yRes );
return SUCCESS;
}
If you need to get all the server side memory details,
http://brigitzblog.blogspot.com/2011/11/jspservlet-display-server-memory.html