Fastest way to modify openGL texture with openCL per pixel - opengl

Using OpenGL 4.4 and OpenCL 2.0, lets say i just want to modify specific pixels of a texture per frame.
Which is the optimal way to achieve this?
Which object should i share?
Will i be able to modify only limited number of pixels?
I want GPU only operations.

First off, there are no OpenCL 2.0 drivers yet; the specification only recently got finalized and implementations probably won't happen until 2014.
Likewise, many OpenGL implementations aren't at 4.4 yet.
However, you can still do what you want with OpenCL 1.2 (or 1.1 since NVIDIA is behind the industry in OpenCL support) and current OpenGL implementations.
Look for OpenCL / OpenGL interop examples, but basically:
Create OpenCL context from OpenGL context
Create OpenCL image from OpenGL texture
After rendering your OpenGL into the texture, acquire the image for OpenCL, run an OpenCL kernel that only update the specific pixel you want to update, and release it back to OpenGL
Draw the texture to the screen
Often OpenCL kernels are 2D and address each pixel, but you can run a 1D kernel where each work item updates a single pixel based on some algorithm. Just make sure not to write the same pixel from more than one work item or you'll have a race condition.

Related

Dynamic shader in OpenGL

CUDA 5 and OpenCL 2 introduce dynamic Parallelism (kernels launched by another kernel via a device API, not by the host API).
Is there an equivalent to this in OpenGL? Is it possible to simulate them with feedback loops? (I think not) They don't miss in OpenGL (maybe in GL 4.3 compute shader) (shadow, texture, etc).
According to this page, it seems that compute shaders in OpenGL don't support dynamic parallelism. You can only launch them with glDispatchCompute​() or glDispatchComputeIndirect​().
It is less possible for other shaders to have such support, because they are within the OpenGL processing stages.

Cuda and/or OpenGL for geometric image transformation

My question concerns the most efficient way of performing geometric image transformations on the GPU. The goal is essentially to remove lens distortion from aquired images in real time. I can think of several ways to do it, e.g. as a CUDA kernel (which would be preferable) doing an inverse transform lookup + interpolation, or the same in an OpenGL shader, or rendering a forward transformed mesh with the image texture mapped to it. It seems to me the last option could be the fastest because the mesh can be subsampled, i.e. not every pixel offset needs to be stored but can be interpolated in the vertex shader. Also the graphics pipeline really should be optimized for this. However, the rest of the image processing is probably going to be done with CUDA. If I want to use the OpenGL pipeline, do I need to start an OpenGL context and bring up a window to do the rendering, or can this be achieved anyway through the CUDA/OpenGL interop somehow? The aim is not to display the image, the processing will take place on a server, potentially with no display attached. I've heard this could crash OpenGL if bringing up a window.
I'm quite new to GPU programming, any insights would be much appreciated.
Using the forward transformed mesh method is the more flexible and easier one to implement. However performance wise there's no big difference, as the effective limit you're running into is memory bandwidth, and the amount of memory bandwidth consumed does only depend on the size of your input image. If it's a fragment shader, fed by vertices or a CUDA texture access that's causing the transfer doesn't matter.
If I want to use the OpenGL pipeline, do I need to start an OpenGL context and bring up a window to do the rendering,
On Windows: Yes, but the window can be an invisible one.
On GLX/X11 you need an X server running, but you can use a PBuffer instead of a window to get a OpenGL context.
In either case use a Framebuffer Object as the actual drawing destination. PBuffers may corrupt their primary framebuffer contents at any time. A Framebuffer Object is safe.
or can this be achieved anyway through the CUDA/OpenGL interop somehow?
No, because CUDA/OpenGL interop is for making OpenGL and CUDA interoperate, not make OpenGL work from CUDA. CUDA/OpenGL Interop helps you with the part you mentioned here:
However, the rest of the image processing is probably going to be done with CUDA.
BTW; maybe OpenGL Compute Shaders (available since OpenGL-4.3) would work for you as well.
I've heard this could crash OpenGL if bringing up a window.
OpenGL actually has no say in those things. It's just a API for drawing stuff on a canvas (canvas = window or PBuffer or Framebuffer Object), but it doesn't deal with actually getting a canvas on the scaffolding, so to speak.
Technically OpenGL doesn't care if there's a window or not. It's the graphics system on which the OpenGL context is created. And unfortunately none of the currently existing GPU graphics systems supports true headless operation. NVidia's latest Linux drivers may allow for some crude hacks to setup a truly headless system, but I never tried that, so far.

is there any way to bin an 3d surface created in CUDA to an OpenGL texture?

here is the scenario:
I pass an 3D OpenGL Texture to CUDA by cudaBindTextureToArray transforming it with a non rigid transformation and writed it to a 3d surface and then I want to pass it by a texture to GLSL shader for volume rendering. and GLSL only knows texture id?how I use this 3d surface as an ordinary OpenGL texture?
pseudo code
craete a texture with opengl like this
glTexImage3D(GL_TEXTURE_3D, 0,............);
pass it to cuda
create and fill a surface with
cutilSafeCall(cudaBindSurfaceToArray(volumeTexOut, outTexture->content));
......
..
cutilSafeCall( cudaMalloc3DArray(&vol->content, &vol->channelDesc, dataSize, cudaArraySurfaceLoadStore ) );
after transformation,..
surf3Dwrite(short(voxel), volumeTexOut, sizeof(short)*x1,y1, z1);
and now i want to use this surface as an opengl Texture and pass it to GLSL
Update: The APIs suggested below are quite old and have been deprecated. Please see the current Graphics Interop APIs for CUDA
CUDA OpenGL interop is (unfortunately) a one-way API: to interoperate between CUDA and OpenGL you must allocate all memory needed in your GL code using OpenGL, and then bind it to CUDA arrays or device pointers in order to access it in CUDA. You cannot do the opposite (allocate memory with CUDA and access it from OpenGL). This goes for data that is either read or written by CUDA.
So for your output, you want to allocate the 3D texture in OpenGL, not with cudaMalloc3DArray(). Then you want to call cudaGraphicsGLRegisterImage with the cudaGraphicsRegisterFlagsSurfaceLoadStore, and then bind a surface to the resulting array using cudaBindSurfaceToArray. This is discussed in section 3.2.11.1 of the CUDA 4.2 CUDA C Programming Guide. The CUDA reference guide provides full documentation on the functions I mentioned.
Note that surface writes require a compute capability 2.0 or higher GPU.

"Drawing of data generated by OpenGL or external APIs such as OpenCL, without CPU intervention."

I noticed that in the new features listed for OpenGL 4.0 the following is included:
Drawing of data generated by OpenGL or external APIs such as OpenCL,
without CPU intervention.
What functionality exactly is this referring to?
It's talking about ARB_draw_indirect. That functionality, core in 4.0, allows the GL implementation to read the drawing parameters directly from the buffer object. So the parameters you would pass to glDrawArrays or glDrawElements come from the buffer, not from your Draw call.
This way, OpenCL or other GPGPU code can just write that struct into the buffer. And therefore, they can determine how many vertices to draw.
AMD has a pretty nifty variation of this that allows for multi-draw functionality.

Drawing on screen, directly from the gpu

I am doing some processing/pixel classification on a picture and I'm using a gpu for it. My question is: is there a library that i can use, in order to print my final picture (2d matrix) on screenm directly from the gpu memory instead of bringing it back to cpu and printing from there?
I don't want anything extravagant, i will only color the pixels and i want to show the new colors.
OpenGL, Cuda draws to a texture and openGL tells the card to display it
See the mandlebrot example
Update: opencl 1.1 now allows sharing opengl contexts with opencl code