I need to write to OpenGL GL_ARRAY_TEXTURE_2D using CUDA graphics interop functionality.I use CUDA Driver API, CUDA version 7.5 .My GPU is NVIDIA Quadro K4000 with CC3.0 I create OpenGL array texture like this (5 layers):
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glGenTextures(1, &_glArrayTexHandle);
glBindTexture(GL_TEXTURE_2D_ARRAY, _glArrayTexHandle);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D_ARRAY, GL_TEXTURE_WRAP_T, GL_CLAMP);
glTexImage3D(GL_TEXTURE_2D_ARRAY, 0, GL_RGBA32F, m_viewportWidth, m_viewportHeight, 5, 0, GL_RGBA, GL_FLOAT, 0);
glBindTexture(GL_TEXTURE_2D_ARRAY, 0);
Then on CUDA side I create graphics image resource for that texture to be used with CUDA surface :
checkCudaErrors(cuGraphicsGLRegisterImage(&m_cudaToGL_TEX_ARRAY_Resource, _glArrayTexHandle, GL_TEXTURE_2D_ARRAY, CU_GRAPHICS_REGISTER_FLAGS_SURFACE_LDST));
assert(m_cudaToGL_TEX_ARRAY_Resource);
checkCudaErrors(cuModuleGetSurfRef(&m_surfGLtexRef, m_module, "surfaceWrite"));
assert(m_surfGLtexRef);
During the renderloop I am mapping the resource and set reference between the array pointer and the surface to write to in the kernel:
checkCudaErrors(cuGraphicsMapResources(1, &m_cudaToGL_TEX_ARRAY_Resource, 0));
//write to layer number 3
checkCudaErrors(cuGraphicsSubResourceGetMappedArray(
&m_cudaOffscreenFBOTextureArrayPtr, m_cudaToGL_TEX_ARRAY_Resource, 3, 0));
assert(m_cudaOffscreenFBOTextureArrayPtr);
checkCudaErrors(cuSurfRefSetArray(m_surfGLtexRef, m_cudaOffscreenFBOTextureArrayPtr, 0));
///launch the kernel:
checkCudaErrors(cuLaunchKernel(function,
blockDimX,
blockDimY,
1,
block_size,
block_size,
1,
0,
NULL,
args,
NULL
));
checkCudaErrors(cuGraphicsUnmapResources(1, &m_cudaToGL_TEX_ARRAY_Resource, 0));
checkCudaErrors(cuCtxSynchronize());
The kernels looks like this:
surface<void, cudaSurfaceType2DLayered> surfaceWrite;
extern "C" __global__ void surfWriteFunc(int xOffset, int yOffset, int Width, int Height)
{
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= Width || y >= Height)
{
return;
}
float4 dataOut = make_float4(1.0f, 0.0f, 1.0f, 1.0f);
surf2DLayeredwrite(dataOut, surfaceWrite, x * sizeof(float4), y, 3);
}
I am trying to use CUDA layered surface write.At least that's what I suppose should be used with GL_ARRAY_TEXTURE_2D .Otherwise the NVIDIA docs have zero info on how to do it.The error I am getting is
CUDA_ERROR_LAUNCH_FAILED for every cuda method called during the rendering as seen above.I tried,for example to use cudaSurfaceType3D instead of layered,but it didn't help.It could be nice if anyone could shed some light on GL array textures interop with CUDA.
After many trials & errors I found how it works.First,it seems like CUDA doesn't allow layered surface write when it comes to CUDA array pointer that maps to GL resource.So this code in kernel
surf2DLayeredwrite(dataOut, surfaceWrite, x * sizeof(float4), y, 3);
Is invalid as there is access to the layer zero only of the mapped array.
The valid code is
surf2DLayeredwrite(dataOut, surfaceWrite, x * sizeof(float4), y, 0);
And therefore there is no reason to use surf2DLayeredwrite at all but just usual
2D surface write:
surf2Dwrite(dataOut, surfaceWrite, x * sizeof(float4), y);
Now to the actual answer to the question "How to write to different layers of GL_ARRAY_TEXTURE_2D?
checkCudaErrors(cuGraphicsSubResourceGetMappedArray(
&m_cudaOffscreenFBOTextureArrayPtr, m_cudaToGL_TEX_ARRAY_Resource, **3**, 0));
Where "3" is the index of the layer in the mapped array texture to write to.
I haven't find a way to select layer index from within the kernel.Currently it looks that it is only possible to do on the host.
This ansver comes late, but maybe it helps someone...
I haven't tried GL_ARRAY_TEXTURE_2D, but I can say that at least GL_TEXTURE_3D texel indexing will work dynamically inside a CUDA kernel (starting from compute_20,sm_21 and CUDA 5.0, probably also CUDA 4.1 and CUDA 4.2).
However, there is one gotcha you need to know with three dimensional textures: you have to compile your CUDA program in release mode, not in debug mode.
Related
I am attempting to use a CUDA kernel to modify an OpenGL texture, but am having a strange issue where my calls to surf2Dwrite() seem to blend with the previous contents of the texture, as you can see in the image below. The wooden texture in the back is what's in the texture before modifying it with my CUDA kernel. The expected output would include ONLY the color gradients, not the wood texture behind it. I don't understand why this blending is happening.
Possible Problems / Misunderstandings
I'm new to both CUDA and OpenGL. Here I'll try to explain the thought process that led me to this code:
I'm using a cudaArray to access the texture (rather than e.g. an array of floats) because I read that it's better for cache locality when reading/writing a texture.
I'm using surfaces because I read somewhere that it's the only way to modify a cudaArray
I wanted to use surface objects, which I understand to be the newer way of doing things. The old way is to use surface references.
Some possible problems with my code that I don't know how to check/test:
Am I being inconsistent with image formats? Maybe I didn't specify the correct number of bits/channel somewhere? Maybe I should use floats instead of unsigned chars?
Code Summary
You can find a full minimum working example in this GitHub Gist. It's quite long because of all the moving parts, but I'll try to summarize. I welcome suggestions on how to shorten the MWE. The overall structure is as follows:
create an OpenGL texture from a file stored locally
register the texture with CUDA using cudaGraphicsGLRegisterImage()
call cudaGraphicsSubResourceGetMappedArray() to get a cudaArray that represents the texture
create a cudaSurfaceObject_t that I can use to write to the cudaArray
pass the surface object to a kernel that writes to the texture with surf2Dwrite()
use the texture to draw a rectangle on-screen
OpenGL Texture Creation
I am new to OpenGL, so I'm using the "Textures" section of the LearnOpenGL tutorials as a starting point. Here's how I set up the texture (using the image library stb_image.h)
GLuint initTexturesGL(){
// load texture from file
int numChannels;
unsigned char *data = stbi_load("img/container.jpg", &g_imageWidth, &g_imageHeight, &numChannels, 4);
if(!data){
std::cerr << "Error: Failed to load texture image!" << std::endl;
exit(1);
}
// opengl texture
GLuint textureId;
glGenTextures(1, &textureId);
glBindTexture(GL_TEXTURE_2D, textureId);
// wrapping
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_MIRRORED_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_MIRRORED_REPEAT);
// filtering
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
// set texture image
glTexImage2D(
GL_TEXTURE_2D, // target
0, // mipmap level
GL_RGBA8, // internal format (#channels, #bits/channel, ...)
g_imageWidth, // width
g_imageHeight, // height
0, // border (must be zero)
GL_RGBA, // format of input image
GL_UNSIGNED_BYTE, // type
data // data
);
glGenerateMipmap(GL_TEXTURE_2D);
// unbind and free image
glBindTexture(GL_TEXTURE_2D, 0);
stbi_image_free(data);
return textureId;
}
CUDA Graphics Interop
After calling the function above, I register the texture with CUDA:
void initTexturesCuda(GLuint textureId){
// register texture
HANDLE(cudaGraphicsGLRegisterImage(
&g_textureResource, // resource
textureId, // image
GL_TEXTURE_2D, // target
cudaGraphicsRegisterFlagsSurfaceLoadStore // flags
));
// resource description for surface
memset(&g_resourceDesc, 0, sizeof(g_resourceDesc));
g_resourceDesc.resType = cudaResourceTypeArray;
}
Render Loop
Every frame, I run the following to modify the texture and render the image:
while(!glfwWindowShouldClose(window)){
// -- CUDA --
// map
HANDLE(cudaGraphicsMapResources(1, &g_textureResource));
HANDLE(cudaGraphicsSubResourceGetMappedArray(
&g_textureArray, // array through which to access subresource
g_textureResource, // mapped resource to access
0, // array index
0 // mipLevel
));
// create surface object (compute >= 3.0)
g_resourceDesc.res.array.array = g_textureArray;
HANDLE(cudaCreateSurfaceObject(&g_surfaceObj, &g_resourceDesc));
// run kernel
kernel<<<gridDim, blockDim>>>(g_surfaceObj, g_imageWidth, g_imageHeight);
// unmap
HANDLE(cudaGraphicsUnmapResources(1, &g_textureResource));
// --- OpenGL ---
// clear
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// use program
shader.use();
// triangle
glBindVertexArray(vao);
glBindTexture(GL_TEXTURE_2D, textureId);
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
glBindVertexArray(0);
// glfw: swap buffers and poll i/o events
glfwSwapBuffers(window);
glfwPollEvents();
}
CUDA Kernel
The actual CUDA kernel is as follows:
__global__ void kernel(cudaSurfaceObject_t surface, int nx, int ny){
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if(x < nx && y < ny){
uchar4 data = make_uchar4(x % 255,
y % 255,
0, 255);
surf2Dwrite(data, surface, x * sizeof(uchar4), y);
}
}
If I understand correctly, you initially register the texture, map it once, create a surface object for the array representing the mapped texture, and then unmap the texture. Every frame, you then map the resource again, ask for the array representing the mapped texture, and then completely ignore that one and use the surface object created for the array you got back when you first mapped the resource. From the documentation:
[…] The value set in array may change every time that resource is mapped.
You have to create a new surface object every time you map the resource because you might get a different array every time. And, in my experience, you will actually get a different one every so often. It may be a valid thing to do to only create a new surface object whenever the array actually changes. The documentation seems to allow for that, but I never tried, so I can't tell whether that works for sure…
Apart from that: You generate mipmaps for your texture. You only overwrite mip level 0. You then render the texture using mipmapping with trilinear interpolation. So my guess would be that you just happen to render the texture at a resolution that does not match the resolution of mip level 0 exactly and, thus, you will end up interpolating between level 0 (in which you wrote) and level 1 (which was generated from the original texture)…
It turns out the problem is that I had mistakenly generated mipmaps for the original wood texture, and my CUDA kernel was only modifying the level-0 mipmap. The blending I noticed was the result of OpenGL interpolating between my modified level-0 mipmap and a lower-resolution version of the wood texture.
Here's the correct output, obtained by disabling mipmap interpolation. Lesson learned!
When I rasterize out a font, my code gives me a single channel of visability for a texture. Currently, I just duplicate this out to 4 different channels, and send that as a texture. Now this works, but I want to try and avoid unnecessary memory allocations and de-alocations on the cpu.
unsigned char *bitmap = new unsigned char[width*height] //How this is populated is not the point.
bitmap, now contains a 2d graphic.
It seems this guy also has the same problem: Opengl: Use single channel texture as alpha channel to display text
I do the same thing as a work around for now, where I just multiply the array size by 4 and copy the data into it 4 times.
unsigned char* colormap = new unsigned char[width * height * 4];
int offset = 0;
for (int d = 0; d < width * height;d++)
{
for (int i = 0;i < 4;i++)
{
colormap[offset++] = bitmap[d];
}
}
WHen I multiply it out, I use:
glTexParameteri(gltype, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(gltype, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexImage2D(gltype, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, colormap);
And get:
Which is what I want.
When i use only the single channel:
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glTexParameteri(gltype, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(gltype, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_R8, width, height, 0, GL_RED, GL_UNSIGNED_BYTE, bitmap);
And Get:
It has no transparency, only red ext. makes it hard to colorize and ext. later.
Instead of having to do what I feel is a unnecessary allocations on the cpu side id like the tell OpenGL: "Hey your getting just one channel. multiply it out for all 4 color channels."
Is there a command for that?
In your shader, it's trivial enough to just broadcast the r component to all four channels:
vec4 vals = texture(tex, coords).rrrr;
If you don't want to modify your shader (perhaps because you need to use the same shader for 4-channel textures too), then you can apply a texture swizzle mask to the texture:
GLint swizzleMask[] = {GL_RED, GL_RED, GL_RED, GL_RED};
glTexParameteriv(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_RGBA, swizzleMask);
When mechanisms read from the fourth component of the texture, they'll get the value defined by the red component of that texture.
Im working on a porject in opengl.
I have a polygon in the polygon filled with bmp image file.
I can rotate the camera to look at the image from different places, and I want to copy the part of the image and put it inside a new bmp file.
I have alot of Unnecessary code so I will copy the imprtant parts.
_textureId = LoadBMP("file.bmp");
glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, _textureId);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glColor3f(1, 1, 0.7);
float BOX_SIZE = -12.0f;
glBegin(GL_QUADS);
glVertex3f(-BOX_SIZE / 2, -BOX_SIZE / 2, -5);
glVertex3f(BOX_SIZE / 2, -BOX_SIZE / 2, -5);
glVertex3f(BOX_SIZE / 2, -BOX_SIZE / 2, 5);
glVertex3f(-BOX_SIZE / 2, -BOX_SIZE / 2, 5);
glEnd();
and the rotation is pretty basic, soo someone have any suggestions?
thanks alot.
If you want to save the output of OpenGL to a file, you will have to read back the contents of the color buffer from the GL to client memory. Then, you can do whatecver you want to it. The command
glReadPixels(GLint x, GLint y, GLsizei width, GLsizei height, GLenum format, GLenum type, GLvoid *data)
will read back the pixel data in an rectangle of the width * height pixels beginning at x,y to the memory buffer located at data. Since you said you want to save it as a BMP file, you probably want GL_UNSIGNED_BYTE as type, because BMP only supports up to 8 bit per channel. You also want probably GL_BGA or GL_BGR as the format, as this is the native channel layout for BMP.
After I have initialized the library and loaded the texture I get http://postimg.org/image/4tzkq4uhl.
But when I added this line to the texture code:
std::vector<unsigned char> buffer(w * h, 0);
I get http://postimg.org/image/kqycmumvt.
Why is this happening when I add that specific code, and why does it seems like the letter is multiplied? I have searched examples and tutorials about FreeType and I saw that in some of them they change the buffer array, but I didn't really understand that, so if you can explain that to me, I may handle this better.
Texture Load:
Texture::Texture(FT_GlyphSlot slot) {
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
glGenTextures(1, &textureID);
glBindTexture(GL_TEXTURE_2D, textureID);
int w = slot->bitmap.width;
int h = slot->bitmap.rows;
// When I remove this line, the black rectangle below the letter reappears.
std::vector<unsigned char> buffer(w * h, 0);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, slot->bitmap.width, slot->bitmap.rows, 0, GL_LUMINANCE_ALPHA, GL_UNSIGNED_BYTE, slot->bitmap.buffer);
glGenerateMipmap(GL_TEXTURE_2D);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
}
Fragment Shader:
#version 330
in vec2 uv;
in vec4 tColor;
uniform sampler2D tex;
out vec4 color;
void main () {
color = vec4(tColor.rgb, texture(tex, uv).a);
}
You're specifying GL_LUMINANCE_ALPHA for the format of the data you pass to glTexImage2D(). Based on the corresponding FreeType documentation I found here:
http://www.freetype.org/freetype2/docs/reference/ft2-basic_types.html#FT_Pixel_Mode
There is no FT_Pixel_Mode value specifying that the data in slot->bitmap.buffer is in fact luminance-alpha. GL_LUMINANCE_ALPHA is a format with 2 bytes per pixel, where the first byte is used for R, G, and B when the data is used to specify a RGBA image, and the second byte is used for A.
Based on the data you're showing, slot->bitmap.pixel_mode is most likely FT_PIXEL_MODE_GRAY, which means that the bitmap data is 1 byte per pixel. In this case, you need to use GL_ALPHA for the format:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, slot->bitmap.width, slot->bitmap.rows, 0,
GL_ALPHA, GL_UNSIGNED_BYTE, slot->bitmap.buffer);
If the pixel_mode is something other than FT_PIXEL_MODE_GRAY, you'll have to adjust the format accordingly, or potentially create a copy of the data if it's a format that is not supported by glTexImage2D().
The reason you get garbage if you specify GL_LUMINANCE_ALPHA instead of GL_ALPHA is that it reads twice as much data as is contained in the data you pass in. The content of the data that is read beyond the allocated bitmap data is undefined, and may well change depending on what other variables you declare/allocate.
If you want to use texture formats that are still supported in the core profile instead of the deprecated GL_LUMINANCE_ALPHA or GL_ALPHA, you can use GL_R8 instead. Since this format has only one component, instead of the four in GL_RGBA, this will also use 75% less texture memory:
glTexImage2D(GL_TEXTURE_2D, 0, GL_R8, slot->bitmap.width, slot->bitmap.rows, 0,
GL_RED, GL_UNSIGNED_BYTE, slot->bitmap.buffer);
This will also require a slight change in the shader to read the r component instead of the a component:
color = vec4(tColor.rgb, texture(tex, uv).r);
Solved it. I added the following to my code and it works good.
GLubyte * data = new GLubyte[2 * w * h];
for( int y = 0; y < slot->bitmap.rows; y++ )
{
for( int x = 0; x < slot->bitmap.width; x++ )
{
data[2 * ( x + y * w )] = 255;
data[2 * ( x + y * w ) + 1] = slot->bitmap.buffer[x + slot->bitmap.width * y];
}
}
I don't know what happened with that particular line I added but now it works.
I wrote some code, too long to paste here, that renders into a 3D 1 component float texture via a fragment shader that uses bindless imageLoad and imageStore.
That code is definitely working.
I then needed to work around some GLSL compiler bugs, so wanted to read the 3D texture above back to the host via glGetTexImage. Yes, I did do a glMemoryBarrierEXT(GL_ALL_BARRIER_BITS).
I did check the texture info via glGetTexLevelparameteriv() and everything I see matches. I did check for OpenGL errors, and have none.
Sadly, though, glGetTexImage never seems to read what was written by the fragment shader. Instead, it only returns the fake values I put in when I called glTexImage3D() to create the texture.
Is that expected behavior? The documentation implies otherwise.
If glGetTexImage actually works that way, how can I read back the data in that 3D texture (resident on the device?) Clearly the driver can do that as it does when the texture is made non-resident. Surely there's a simple way to do this simple thing...
I was asking if glGetTexImage was supposed to work that way or not. Here's the code:
void Bindless3DArray::dump_array(Array3D<float> &out)
{
bool was_mapped = m_image_mapped;
if (was_mapped)
unmap_array(); // unmap array so it's accessible to opengl
out.resize(m_depth, m_height, m_width);
glBindTexture(GL_TEXTURE_3D, m_textureid); // from glGenTextures()
#if 0
int w,h,d;
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_WIDTH, &w);
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_HEIGHT, &h);
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_DEPTH, &d);
int internal_format;
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_INTERNAL_FORMAT, &internal_format);
int data_type_r, data_type_g;
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_RED_TYPE, &data_type_r);
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_GREEN_TYPE, &data_type_g);
int size_r, size_g;
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_RED_SIZE, &size_r);
glGetTexLevelParameteriv(GL_TEXTURE_3D, 0, GL_TEXTURE_GREEN_SIZE, &size_g);
#endif
glGetTexImage(GL_TEXTURE_3D, 0, GL_RED, GL_FLOAT, &out(0,0,0));
glBindTexture(GL_TEXTURE_3D, 0);
CHECK_GLERROR();
if (was_mapped)
map_array_to_cuda(); // restore state
}
Here's the code that creates the bindless array:
void Bindless3DArray::allocate(int w, int h, int d, ElementType t)
{
if (!m_textureid)
glGenTextures(1, &m_textureid);
m_type = t;
m_width = w;
m_height = h;
m_depth = d;
glBindTexture(GL_TEXTURE_3D, m_textureid);
CHECK_GLERROR();
glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_3D, GL_TEXTURE_MAX_LEVEL, 0); // ensure only 1 miplevel is allocated
CHECK_GLERROR();
Array3D<float> foo(d, h, w);
// DEBUG -- glGetTexImage returns THIS data, not what's on device
for (int z=0; z<m_depth; ++z)
for (int y=0; y<m_height; ++y)
for (int x=0; x<m_width; ++x)
foo(z,y,x) = 3.14159;
//-- Texture creation
if (t == ElementInteger)
glTexImage3D(GL_TEXTURE_3D, 0, GL_R32UI, w, h, d, 0, GL_RED_INTEGER, GL_INT, 0);
else if (t == ElementFloat)
glTexImage3D(GL_TEXTURE_3D, 0, GL_R32F, w, h, d, 0, GL_RED, GL_FLOAT, &foo(0,0,0));
else
throw "Invalid type for Bindless3DArray";
CHECK_GLERROR();
m_handle = glGetImageHandleNV(m_textureid, 0, true, 0, (t == ElementInteger) ? GL_R32UI : GL_R32F);
glMakeImageHandleResidentNV(m_handle, GL_READ_WRITE);
CHECK_GLERROR();
#ifdef USE_CUDA
checkCuda(cudaGraphicsGLRegisterImage(&m_image_resource, m_textureid, GL_TEXTURE_3D, cudaGraphicsRegisterFlagsSurfaceLoadStore));
#endif
}
I allocate the array, render to it via an OpenGL fragment program, and then I call dump_array() to read the data back. Sadly, I only get what I loaded in the allocate call.
The render program looks like
void App::clear_deepz()
{
deepz_clear_program.bind();
deepz_clear_program.setUniformValue("sentinel", SENTINEL);
deepz_clear_program.setUniformValue("deepz", deepz_array.handle());
deepz_clear_program.setUniformValue("sem", semaphore_array.handle());
run_program();
glMemoryBarrierEXT(GL_ALL_BARRIER_BITS);
// glMemoryBarrierEXT(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
// glMemoryBarrierEXT(GL_SHADER_GLOBAL_ACCESS_BARRIER_BIT_NV);
deepz_clear_program.release();
}
and the fragment program is:
#version 420\n
in vec4 gl_FragCoord;
uniform float sentinel;
coherent uniform layout(size1x32) image3D deepz;
coherent uniform layout(size1x32) uimage3D sem;
void main(void)
{
ivec3 coords = ivec3(gl_FragCoord.x, gl_FragCoord.y, 0);
imageStore(deepz, coords, vec4(sentinel));
imageStore(sem, coords, ivec4(0));
discard; // don't write to FBO at all
}
discard; // don't write to FBO at all
That's not what discard means. Oh, it does mean that. But it also means that all Image Load/Store writes will be discarded too. Indeed, odds are, the compiler will see that statement and just do nothing for the entire fragment shader.
If you want to just execute the fragment shader, you can employ the GL 4.3 feature (available on your NVIDIA hardware) of having an empty framebuffer object. Or you could use a compute shader. If you can't use GL 4.3 yet, then use a write mask to turn off all color writes.
As Nicol mentions above, if you want side effects only of image load and store, the proper way is to use an empty frame buffer object.
The bug of mixing glGetTexImage() and bindless textures was in fact a driver bug, and has been fixed as of driver version 335.23. I filed the bug and have confirmed my code is now working properly.
Note I am using empty frame buffer objects in the code, and don't use "discard" any more.