How to get texture data using textureID's in openGL - opengl

I'm writing some code where all I have access to is a textureID to get access to the required texture. Is there any way that I can get access to the RGB values of this texture so I can perform some computations on it?
EDIT: I am looking for the inverse of glTexSubImage2D. I want to get the texture data rather than replace it.

You are probably looking for glGetTexImage
Before using glGetTexImage, don't forget to use glBindTexture with your texture ID.

In OpneGL a texture can be read by glGetTexImage/glGetnTexImage respectively the DSA version of the function glGetTextureImage.
Another possibility is to attach the texture to a framebuffer and to read the pixel by glReadPixels. OpenGL ES does not offer a glGetTexImage, so this is the way to OpenGL ES.
See opengl es 2.0 android c++ glGetTexImage alternative
If you transfer the texture image to a Pixel Buffer Object, then you can even access the data via Buffer object mapping. See also OpenGL Pixel Buffer Object (PBO).
You've to bind a buffer with the proper size to the target GL_PIXEL_PACK_BUFFER:
// create buffer
GLuint pbo;
glGenBuffers(1, &pbo);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
glBufferData(GL_PIXEL_PACK_BUFFER, size_in_bytes, 0, GL_STATIC_READ);
// get texture image
glBindTexture(GL_TEXTURE_2D, texture_obj);
glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, (void*)(0));
// map pixel buffer
void * data_ptr = glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY);
// access the data
// [...]
glUnmapBuffer(GL_PIXEL_PACK_BUFFER);

I'm writing this here just in case anyone needs this.
In 4.5+ OpenGL, one can access a textures' data by giving a texture ID by using the
glGetTextureImage() function.
For example in order to get a GL_RGB texture data,
we have 3 floats R,G,B each one of each is 4 bytes so:
float* data = new float[texture_height * texture_width * 3];
glGetTextureImage(textureID, 0, GL_RGB, GL_FLOAT, texture_height * texture_width * 3 * 4, data);

Related

CUDA/OpenGL Interop: Writing to surface object does not erase previous contents

I am attempting to use a CUDA kernel to modify an OpenGL texture, but am having a strange issue where my calls to surf2Dwrite() seem to blend with the previous contents of the texture, as you can see in the image below. The wooden texture in the back is what's in the texture before modifying it with my CUDA kernel. The expected output would include ONLY the color gradients, not the wood texture behind it. I don't understand why this blending is happening.
Possible Problems / Misunderstandings
I'm new to both CUDA and OpenGL. Here I'll try to explain the thought process that led me to this code:
I'm using a cudaArray to access the texture (rather than e.g. an array of floats) because I read that it's better for cache locality when reading/writing a texture.
I'm using surfaces because I read somewhere that it's the only way to modify a cudaArray
I wanted to use surface objects, which I understand to be the newer way of doing things. The old way is to use surface references.
Some possible problems with my code that I don't know how to check/test:
Am I being inconsistent with image formats? Maybe I didn't specify the correct number of bits/channel somewhere? Maybe I should use floats instead of unsigned chars?
Code Summary
You can find a full minimum working example in this GitHub Gist. It's quite long because of all the moving parts, but I'll try to summarize. I welcome suggestions on how to shorten the MWE. The overall structure is as follows:
create an OpenGL texture from a file stored locally
register the texture with CUDA using cudaGraphicsGLRegisterImage()
call cudaGraphicsSubResourceGetMappedArray() to get a cudaArray that represents the texture
create a cudaSurfaceObject_t that I can use to write to the cudaArray
pass the surface object to a kernel that writes to the texture with surf2Dwrite()
use the texture to draw a rectangle on-screen
OpenGL Texture Creation
I am new to OpenGL, so I'm using the "Textures" section of the LearnOpenGL tutorials as a starting point. Here's how I set up the texture (using the image library stb_image.h)
GLuint initTexturesGL(){
// load texture from file
int numChannels;
unsigned char *data = stbi_load("img/container.jpg", &g_imageWidth, &g_imageHeight, &numChannels, 4);
if(!data){
std::cerr << "Error: Failed to load texture image!" << std::endl;
exit(1);
}
// opengl texture
GLuint textureId;
glGenTextures(1, &textureId);
glBindTexture(GL_TEXTURE_2D, textureId);
// wrapping
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_MIRRORED_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_MIRRORED_REPEAT);
// filtering
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
// set texture image
glTexImage2D(
GL_TEXTURE_2D, // target
0, // mipmap level
GL_RGBA8, // internal format (#channels, #bits/channel, ...)
g_imageWidth, // width
g_imageHeight, // height
0, // border (must be zero)
GL_RGBA, // format of input image
GL_UNSIGNED_BYTE, // type
data // data
);
glGenerateMipmap(GL_TEXTURE_2D);
// unbind and free image
glBindTexture(GL_TEXTURE_2D, 0);
stbi_image_free(data);
return textureId;
}
CUDA Graphics Interop
After calling the function above, I register the texture with CUDA:
void initTexturesCuda(GLuint textureId){
// register texture
HANDLE(cudaGraphicsGLRegisterImage(
&g_textureResource, // resource
textureId, // image
GL_TEXTURE_2D, // target
cudaGraphicsRegisterFlagsSurfaceLoadStore // flags
));
// resource description for surface
memset(&g_resourceDesc, 0, sizeof(g_resourceDesc));
g_resourceDesc.resType = cudaResourceTypeArray;
}
Render Loop
Every frame, I run the following to modify the texture and render the image:
while(!glfwWindowShouldClose(window)){
// -- CUDA --
// map
HANDLE(cudaGraphicsMapResources(1, &g_textureResource));
HANDLE(cudaGraphicsSubResourceGetMappedArray(
&g_textureArray, // array through which to access subresource
g_textureResource, // mapped resource to access
0, // array index
0 // mipLevel
));
// create surface object (compute >= 3.0)
g_resourceDesc.res.array.array = g_textureArray;
HANDLE(cudaCreateSurfaceObject(&g_surfaceObj, &g_resourceDesc));
// run kernel
kernel<<<gridDim, blockDim>>>(g_surfaceObj, g_imageWidth, g_imageHeight);
// unmap
HANDLE(cudaGraphicsUnmapResources(1, &g_textureResource));
// --- OpenGL ---
// clear
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// use program
shader.use();
// triangle
glBindVertexArray(vao);
glBindTexture(GL_TEXTURE_2D, textureId);
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
glBindVertexArray(0);
// glfw: swap buffers and poll i/o events
glfwSwapBuffers(window);
glfwPollEvents();
}
CUDA Kernel
The actual CUDA kernel is as follows:
__global__ void kernel(cudaSurfaceObject_t surface, int nx, int ny){
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if(x < nx && y < ny){
uchar4 data = make_uchar4(x % 255,
y % 255,
0, 255);
surf2Dwrite(data, surface, x * sizeof(uchar4), y);
}
}
If I understand correctly, you initially register the texture, map it once, create a surface object for the array representing the mapped texture, and then unmap the texture. Every frame, you then map the resource again, ask for the array representing the mapped texture, and then completely ignore that one and use the surface object created for the array you got back when you first mapped the resource. From the documentation:
[…] The value set in array may change every time that resource is mapped.
You have to create a new surface object every time you map the resource because you might get a different array every time. And, in my experience, you will actually get a different one every so often. It may be a valid thing to do to only create a new surface object whenever the array actually changes. The documentation seems to allow for that, but I never tried, so I can't tell whether that works for sure…
Apart from that: You generate mipmaps for your texture. You only overwrite mip level 0. You then render the texture using mipmapping with trilinear interpolation. So my guess would be that you just happen to render the texture at a resolution that does not match the resolution of mip level 0 exactly and, thus, you will end up interpolating between level 0 (in which you wrote) and level 1 (which was generated from the original texture)…
It turns out the problem is that I had mistakenly generated mipmaps for the original wood texture, and my CUDA kernel was only modifying the level-0 mipmap. The blending I noticed was the result of OpenGL interpolating between my modified level-0 mipmap and a lower-resolution version of the wood texture.
Here's the correct output, obtained by disabling mipmap interpolation. Lesson learned!

Streaming several (YUV) videos using OpenGL

I'm trying to do high-throughput video streaming using OpenGL. I thought I'd figured it all out with my genius programming architecture, but - surprise - when doing more serious tests, I've been stonewalled with a performance problem.
The story goes like this:
It all starts by reserving a stack of PBO's (say, a hundred+ or so):
glGenBuffers(1, &index);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, index);
glBufferData(GL_PIXEL_UNPACK_BUFFER, size, 0, GL_STREAM_DRAW); // reserve n_payload bytes to index/handle pbo_id
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0); // unbind (not mandatory)
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, index); // rebind (not mandatory)
payload = (GLubyte*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER); // release pointer to mapping buffer ** MANDATORY **
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0); // unbind ** MANDATORY **
YUV pixel data is copied into PBOs by separate decoder/uploader threads that use a common stack of available PBOs. The "payload" pointers you see above, are accessed from these threads and data is copied (with memcpy) "directly" to the gpu. Once a PBO is used, it is returned to the stack.
I also pre-reserve textures for each separate video stream. I reserve three textures (y, u and v), like this:
glEnable(GL_TEXTURE_2D);
glGenTextures(1, &index);
glBindTexture(GL_TEXTURE_2D, index);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, format, w, h, 0, format, GL_UNSIGNED_BYTE, 0); // no upload, just reserve
glBindTexture(GL_TEXTURE_2D, 0); // unbind
Rendering is done in a "master thread" (remember, the decoder / uploader threads are separate beasts) that reads frames from a fifo queue.
A critical step in rendering is to copy data from PBOs to textures (tex->format is GL_RED):
// y
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo->y_index);
glBindTexture(GL_TEXTURE_2D, tex->y_index); // this is the texture we will manipulate
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, tex->w, tex->h, tex->format, GL_UNSIGNED_BYTE, 0); // copy from pbo to texture
// u
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo->u_index);
glBindTexture(GL_TEXTURE_2D, tex->u_index); // this is the texture we will manipulate
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, tex->w/2, tex->h/2, tex->format, GL_UNSIGNED_BYTE, 0); // copy from pbo to texture
// v
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo->v_index);
glBindTexture(GL_TEXTURE_2D, tex->v_index); // this is the texture we will manipulate
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, tex->w/2, tex->h/2, tex->format, GL_UNSIGNED_BYTE, 0); // copy from pbo to texture
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0); // unbind // important!
glBindTexture(GL_TEXTURE_2D, 0); // unbind
And finally, the image is drawn using the OpenGL shading language (which is another story).
The Question : Do you see any OpenGL performance bottlenecks here?
Step (3) seems like a bottleneck, as it starts to consume too much time (up to 10+ milliseconds)!, when I'm trying to do this with several cameras.
Of course, this could be due to something else clogging the OpenGL pipeline - but everything else (glDrawElements, etc.) seems to take max. 1 millisecond.
I've been reading about problems people are having with glTexSubImage2D, but in my case, I'm simply filling the textures from PBOs. This should be lightning fast - right? Could the GL_RED format pose a problem by being non-optimal for the driver?
Another thing: I'm not doing de/reallocating here (I am using the same stack of pre-reserved PBO's), but re-allocating seems to be fast as well.. if I understood correctly this one .. ?
https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming
Any insight highly appreciated..!
P. S. The complete project is here: https://github.com/elsampsa/valkka-core
EDIT 1:
I did some profiling: Every now and then during the streaming, both the PBO=>texture loading (as shown in the code snippet) and glXMakeCurrent go completely crazy and they both consume 10+ milliseconds (!) This happens quite sporadically. I tried to add some glFinish calls after each PBO=>texture load, but with little success (it seemed to stabilize things a bit .. but actually I'm not sure)
EDIT 2:
I am slowly getting there .. Ran some tests where I (a) upload with PBO to GPU and then (b) copy from PBO to texture (like in that sample code). The speed seems to depend on the texture format in "glTexImage2D". I try to match the texture's format and OpenGL internal format, by setting them to GL_RED and GL_RED (or GL_R8), respectively. But that is slow. Instead, if I use GL_RGBA for both, PBO=>TEX is lightning fast.. 100x faster !
Here:
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glTexImage2D.xhtml
it says that
GL_RED : Each element is a single red component. The GL converts it to floating point and assembles it into an RGBA element by attaching 0 for green and blue, and 1 for alpha. Each component is clamped to the range [0,1].
.. but I don't want OpenGL to do that! How can I tell it that it's just plain LUMA, i.e. one-byte-per-pixel and no need to convert/fill it, cause I will just use it in the shader program.
Maybe this is impossible and I should use buffer textures instead (as suggested in the comments) .. ? Buffer textures don't try to convert anything.. they just handle it as raw payload, right?
EDIT 3:
I'm trying to get dma to the texture buffer object:
// let's reserve a TBO
glGenBuffers(1, &tbo_index); // a buffer
glBindBuffer(GL_TEXTURE_BUFFER, tbo_index); // .. what is it
glBufferData(GL_TEXTURE_BUFFER, size, 0, GL_STREAM_DRAW); // .. how much
std::cout << "tbo " << tbo_index << std::endl;
glBindBuffer(GL_TEXTURE_BUFFER, 0); // unbind
// generate a texture
glGenTextures(1, &tex_index);
std::cout << "texture " << tex_index << std::endl;
// let's try to get dma to the texture buffer
glBindBuffer(GL_TEXTURE_BUFFER, tbo_index); // bind
payload = (GLubyte*)glMapBuffer(GL_TEXTURE_BUFFER, GL_WRITE_ONLY); // ** TODO: doesn't work
glUnmapBuffer(GL_TEXTURE_BUFFER); // release pointer to mapping buffer
glBindBuffer(GL_TEXTURE_BUFFER, 0); // unbind
std::cout << "tbo " << tbo_index << " at " << (long unsigned int)payload << std::endl;
Doesn't work.. payload is always a null pointer. glMapBuffer works ok with PBOs though. It should work with TBO's as well.

OPENGL Texture2D manipulation directly/indirectly

Following this tutorial, I am performing shadow mapping on a 3D scene. Now I want
to manipulate the raw texel data of shadowMapTexture (see the excerpt below) before
applying this using ARB extensions
//Textures
GLuint shadowMapTexture;
...
...
**CopyTexSubImage2D** is used to copy the contents of the frame buffer into a
texture. First we bind the shadow map texture, then copy the viewport into the
texture. Since we have bound a **DEPTH_COMPONENT** texture, the data read will
automatically come from the depth buffer.
//Read the depth buffer into the shadow map texture
glBindTexture(GL_TEXTURE_2D, shadowMapTexture);
glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, shadowMapSize, shadowMapSize);
N.B. I am using OpenGL 2.1 only.
Tu can do it in 2 ways:
float* texels = ...;
glBindTexture(GL_TEXTURE_2D, shadowMapTexture);
glTexSubImage2D(GL_TEXTURE_2D, 0, x,y,w,h, GL_DEPTH_COMPONENT, GL_FLOAT, texels);
or
Attach your shadowMapTexture to (write) framebuffer and call:
float* pixels = ...;
glRasterPos2i(x,y)
glDrawPixels(w,h, GL_DEPTH_COMPONENT, GL_FLOAT, pixels);
Don't forget to disable depth_test first in above method.

OpenGL - Using glReadPixels to read depth component returns incorrect values

I'm trying to read out the pixels of a texture that only has a depth component, however glReadPixels gives me an array where every value = 1.
Texture / Framebuffer creation:
GLuint frameBuffer;
glGenFramebuffers(1,&frameBuffer);
glBindFramebuffer(GL_FRAMEBUFFER,frameBuffer);
GLuint texture;
glGenTextures(1,&texture);
glBindTexture(GL_TEXTURE_2D,texture);
glTexImage2D(GL_TEXTURE_2D,0,GL_DEPTH_COMPONENT,width,height,0,GL_DEPTH_COMPONENT,GL_FLOAT,0);
glFramebufferTexture(GL_FRAMEBUFFER,GL_DEPTH_ATTACHMENT,m_depthTexture,0);
Reading from the texture:
glBindFramebuffer(GL_FRAMEBUFFER,frameBuffer);
float *depths = new float[width *height];
glReadPixels(0,0,width,height,GL_DEPTH_COMPONENT,GL_FLOAT,&depths[0]);
// glGetError reports no errors, but every value inside 'depths' is 1.
delete[] depths;
I didn't include the actual rendering to the texture, since I know that that works as it should.
This is what the depth texture looks like when I draw it on my main screen framebuffer:
It's definitely not empty, so why is it telling me the depth is 1 for all pixels?
Since you're using a desktop OpenGL version I suggest you use glGetTexImage instead of glReadPixels.

Reading the pixels values from the Frame Buffer Object (FBO) using Pixel Buffer Object (PBO)

Can I use Pixel Buffer Object (PBO) to directly read the pixels values (i.e. using glReadPixels) from the FBO (i.e. while FBO is still attached)?
If yes,
What are the advantages and disadvantages of using PBO with FBO?
What is the problem with following code
{
//DATA_SIZE = WIDTH * HEIGHT * 3 (BECAUSE I AM USING 3 CHANNELS ONLY)
// FBO and PBO status is good
.
.
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fboId);
//Draw the objects
Following glReadPixels works fine
glReadPixels(0, 0, screenWidth, screenHeight, GL_BGR_EXT, GL_UNSIGNED_BYTE, (uchar*)cvimg->imageData);
Following glReadPixels DOES NOT WORK :(
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboId);
//yes glWriteBuffer has also same target and I also checked with every possible values
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glReadPixels(0, 0, screenWidth, screenHeight, GL_BGR_EXT, GL_UNSIGNED_BYTE, (uchar*)cvimg->imageData);
.
.
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0); //back to window framebuffer
When using a PBO as target for glReadPixels you have to specify a byte offset into the buffer (0, I suppose) instead of (uchar*)cvimg->imageData as target address. It is similar to using a buffer offset in glVertexPointer when using VBOs.
EDIT: When a PBO is bound to the GL_PIXEL_PACK_BUFFER, the last argument to glReadPixels is not treated as a pointer into system memory but as a byte offset into the bound buffer's memory. So to write the pixels into the buffer just pass a 0 (write them to the start of the buffer memory). You can then later acces the buffer memory (to get the pixels) by means of glMapBuffer. The example link you provided in your comment does that, too, just read it extensively. I also suggest reading the part about vertex buffer objects they mention at the start, as these lay the ground to understand buffer objects.
Yes, we can use FBO and PBO together.
Answer 1:
For synchronous reading: 'glReadPixels' without PBO is fast.
For asynchronous reading: 'glReadPixels' with 2/n PBOs is better- one for reading pixels from framebuffer to PBO (n) by GPU and another PBO (n+1) to process pixels by CPU. However fast is not granted, it is problem and design spefic.
Answer 2:
Christian Rau's explanation is correct and revised code is below
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboId);
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
//glReadBuffer(GL_DEPTH_ATTACHMENT_EXT);
glReadPixels(0, 0, screenWidth, screenHeight, GL_BGR, GL_UNSIGNED_BYTE, 0);
//GLubyte* src = (GLubyte*)glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB);
//OR
cvimg->imageData = (char*) glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB);
if(cvimg_predict_contour->imageData)
{
//Process src OR cvim->imageData
glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB); // release pointer to the mapped buffer
}
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);