OpenCL enqueTask vs enqueNDRangeKernel - c++

I'm writing OpenCL using the c++ bindings, trying to make a small library.
NDRange offset(0);
NDRange global_size(numWorkItems);
NDRange local_size(1);
//this call fails with error code -56
err = queue.enqueueNDRangeKernel(kernelReduction, offset, global_size, local_size);
//this call works:
err = queue.enqueueTask(kernelReduction);
Now, Error code -56 is CL_INVALID_GLOBAL_OFFSET. And I have no clue why the first call would fail. Any suggestions?

If you are using OpenCL 1.0, you cannot use global offsets afaik (you need to work around by using a constant memory counter or something). Try updating the bindings to OpenCL 1.1 if they don't automatically adapt and make sure you update your drivers as well.

global_work_offset must be NULL. Any value here should produce CL_INVALID_GLOBAL_OFFSET.
check it out: clEnqueueNDRangeKernel

Related

ArrayFire convolution issue with Cuda backend

I've been having an issue with a certain function call in the
dphaseWeighted = af::convolve(dphaseWeighted, m_slowTimeFilter);
which seem to produce nothing but nan's.
The back ground is we have recently switched from using AF OpenCL to AF Cuda and the problem that we are seeing happens in the function.
dphaseWeighted = af::convolve(dphaseWeighted, m_slowTimeFilter);
This seems to work well when using OpenCL.
Unfortunatley, I can't give you the whole function because of IP. only a couple of snippets.
This convolve lies deep with in a phase extract piece of code. and is actualy the second part of that code which uses the af::convolve funtion.
The first function seems to behave as expected, with sensible floating point data out.
but then when it comes to the second function all I'm seeing is nan's coming out ( I view that with af_print amd dumping the data to a file.
in the CMakeList I include
include_directories(${ArrayFire_INCLUDE_DIRS})
and
target_link_libraries(DASPhaseInternalLib ${ArrayFire_CUDA_LIBRARIES})
and it builds as expected.
Has anyone experience any think like this before?

Running the executable of hdl_simple_viewer.cpp from Point Cloud Library

The Point Cloud library comes with an executable pcl_hdl_viewer_simple that I can run (./pcl_hdl_viewer_simple) without any extra arguments to get live data from a Velodyne LIDAR HDL32.
The source code for this program is supposed to be hdl_viewer_simple.cpp. A simplified version of the code is given on this page which cannot be compiled readily and requires a tiny bit of tweaking to make it compile.
My problem is that the executable that I build myself for both the versions are not able to run. I always get the smart pointer error "Assertion px!=0" error. I am not sure if I am not executing the program in the correct way or what. The executable is supposed to be executed like
./hdl_viewer_simple -calibrationFile hdl32calib.xml -pcapFile file.pcap
in case of playing from previously recorded PCAP files or just ./hdl_viewer_simple if wanting to get live data from the real sensor. However, I always get the assertion failed error.
Has anyone been able to run the executables? I do not want to use the ROS drivers
"Assertion px!=0" is occurring because your pointer is not initialized.
Now that being said, you could initialize it inside your routines, in case the pointer is NULL, especially for data input.
in here, you can try updating the line 83 like this :
CloudConstPtr cloud(new Cloud); //initializing your pointer
and hopefully, it will work.
Cheers,

glewInit() crashing (segfault) after creating osmesa (off-screen mesa) context

I'm trying to run an opengl application on a remote computing cluster. I'm using osmesa as I intend to execute off-screen software rendering (no x11 forwarding etc). I want to use glew (to make life dealing with shaders and other extension related calls easier), and I seem to have built and linked both mesa and glew fine.
When I call mesa-create-context, glewinit gives a OPENGL Version not available output, which probably means the context has not been created. When I call glGetString(GL_EXTENSIONS) i dont get any output, which confirms this. This also shows that glew is working fine on its own. (Other glew commands like glew version etc also work).
Now when I (as shown below), add the mesa-make-context-current function, glewinit crashes with a segfault. Running glGetString(GL_EXTENSIONS) gives me a list of extensions now however (which means context creation is successful!)
I've spent hours trying to figure this out, tried tinkering but nothing works. Would greatly appreciate any help on this. Maybe some of you has experienced something similar before?? Thanks again!
int Height = 1; int Width = 1;
OSMesaContext ctx; void *buffer;
ctx = OSMesaCreateContext( OSMESA_RGBA, NULL );
buffer = malloc( Width * Height * 4 * sizeof(GLfloat) );
if (!OSMesaMakeCurrent( ctx, buffer, GL_UNSIGNED_BYTE, Width, Height )) {
printf("OSMesaMakeCurrent failed!\n");
return 0;
}
-- glewinit() crashes after this.
Just to add, osmesa and glew actually did not compile initially. Because glew undefines GLAPI in it's last line and since osmesa will not include gl.h again, GLAPI remains undefined and causes an error in osmesa.h (119). I got around this by adding an extern to GLAPI, not sure if this is relevant though.
Looking at the source to glewInit in glew.c if glewContextInit succeeds it returns GLEW_OK, GLEW_OK is defined to 0, and so on Linux systems it will always call glxewContextInit which calls glX functions that in the case of OSMesa will likely not be ready for use. This will cause a segfault (as I see), and it seems that the glewInit function has no capability to handle this case unfortunately without patching the C source and recompiling the library.
If others have already solved this I would be interested, I have seen some patched versions of the glew.c sources that workaround this. It isn't clear if there is any energy in the GLEW community to merge changes in that address this use case.

Anyone tried using glMultiDrawArraysIndirect? Compiler can't find the function

Has anyone successfully used glMultiDrawArraysIndirect? I'm including the latest glext.h but compiler can't seem to find the function. Do I need to define something (#define ... ) before including glext.h?
error: ‘GL_DRAW_INDIRECT_BUFFER’ was not declared in this scope
error: ‘glMultiDrawArraysIndirect’ was not declared in this scope
I'm trying to implement OpenGL superBible example. Here are snippets from source code :
GLuint indirect_draw_buffer;
glGenBuffers(1, &indirect_draw_buffer);
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, indirect_draw_buffer);
glBufferData(GL_DRAW_INDIRECT_BUFFER,
NUM_DRAWS * sizeof(DrawArraysIndirectCommand),
draws,
GL_STATIC_DRAW);
....
// fill the buffers
.....
glMultiDrawArraysIndirect (GL_TRIANGLES, NULL, 3, 0);
I'm on Linux with Quadro 2000 & latest drivers installed (NVidia 319.60).
You cannot simply #include <glext.h> and expect this problem to fix itself. This header is only half of the equation, it defines the basic constants, function signatures, typedefs, etc. used by OpenGL extensions but does not actually solve the problem of extending OpenGL.
On most platforms you are guaranteed a certain version of OpenGL (1.1 on Windows) and to use any part of OpenGL that is newer than this version you must extend the API at runtime. Linux is no different, in order to use glMultiDrawArraysIndirect (...) you have to load this extension from the driver at runtime. This usually means setting up function pointers that are NULL until runtime in order to keep the compiler/linker happy.
By far, the simplest solution is going to be to use something like GLEW, which will load all of the extensions your driver supports for versions up to OpenGL 4.4 at runtime. It will take the place of glext.h, all you have to do is initialize the library after you setup your render context.

CUDA cudaMalloc

I've started writing a new CUDA application. However I hit a funny detour along the way.
Calling the first cudaMalloc on a variable x, fails the first time. However when I call it the second time it returns cudaSuccess. Recently upgraded to CUDA 4.0 SDK, it's a really weird bug.
I even did some testing and it seems the first call of cudaMalloc fails.
The very first call to any of the cuda library functions launches an initialisation subroutine. It can happen that somehow the initialisation fails and not the cudaMalloc itself. (CUDA Programming Guide, section 3.2.1)
Somehow, later, however it seems it works, despite the initial failure. I don't know your setting and your code so I can't really help you further. Check the Programming Guide!
I would strongly recommend using the CUDA_SAFE_CALL macro if you aren't -- to force the thread synchronisation, at least while you're debugging the code:
CUDA_SAFE_CALL(cudaMalloc((void**) &(myVar), mem_size_N ));
Update: As per #talonmies, you don't need the cutil library. So let's rewrite the solution:
/* Allocate Data */
cudaMalloc((void**) &(myVar), mem_size_N );
/* Force Thread Synchronization */
cudaError err = cudaThreadSynchronize();
/* Check for and display Error */
if ( cudaSuccess != err )
{
fprintf( stderr, "Cuda error in file '%s' in line %i : %s.\n",
__FILE__, __LINE__, cudaGetErrorString( err) );
}
And as noted in the other answer -- you may want to include the synch & check before you allocation memory just to make sure the API initialized correctly.