OpenGL reading back buffer quickly - c++

I'm trying to read the contents of the back-buffer into a buffer of my own. glReadPixels by itself is way too slow and drops my FPS from 50 to 30.
So I decided to try the "asynchronous" read with a PBuffer but it crashes.
My code is as follows:
If buffers don't exist, create them. Otherwise, read the back buffer into a specified memory location:
static int readIndex = 0;
static int writeIndex = 1;
static GLuint pbo[2] = {0};
void FastCaptureBackBuffer()
{
//Create PBOs:
if (!initBuffers)
{
initBuffers = true;
glGenBuffers(2, pbo);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[0]);
glBufferData(GL_PIXEL_PACK_BUFFER, width * height * 1.0f, 0, GL_STREAM_READ);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[1]);
glBufferData(GL_PIXEL_PACK_BUFFER, width * height * 1.0f, 0, GL_STREAM_READ);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
}
//swap read and write.
writeIndex = (writeIndex + 1) % 2;
readIndex = (writeIndex + 1) % 2;
//read back-buffer.
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[writeIndex]);
glReadPixels(0, 0, width, height, GL_BGRA, GL_UNSIGNED_BYTE, nullptr);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[readIndex]);
void* data = glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY);
if (data)
{
memcpy(myBuffer, data, width * height * 4);
data = nullptr;
glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
}
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
}
Then I do:
BOOL __stdcall HookSwapBuffers(HDC DC)
{
FastCaptureBackBufferPBO();
return CallFunction<BOOL>(GetOriginalAddress(353), DC);
}
So every time the application calls wglSwapBuffers, I read the back buffer right before it gets swapped.
How can I read the back buffer fast? What am I missing in the above?
Ideally I wanted to: Specify a pointer that the game could render directly to, instead of the screen, and then I can manually render the contents of the memory.
Any other way and I end up copying the back buffer into my memory block and it's slow.
Any ideas?

You're not reserving enough memory in the buffer:
glBufferData(GL_PIXEL_PACK_BUFFER, width * height * 1.0f, 0, GL_STREAM_READ);
Since you're using GL_RGBA as the format, you will need 4 bytes per pixel, which also matches what you're using in your memcpy() call:
memcpy(myBuffer, data, width * height * 4);
So the glBufferData() call should be:
glBufferData(GL_PIXEL_PACK_BUFFER, width * height * 4, 0, GL_STREAM_READ);
Also, it's not entirely clear from your question why you're using HookSwapBuffers(). I believe people use that to intercept the SwapBuffers() call if they do not have source code. If you want to capture rendering you do yourself in your own code, you can simply call glReadPixels() immediately after you finished rendering the frame. It will be executed in sequence with all the other OpenGL calls, so it will contain the result of all the draw calls you issued.
Minor terminology point: What you're asking about here is not called "PBuffer". The full name is "Pixel Buffer Object", often used in its short form "PBO". A PBuffer is something quite different. It was an old mechanism for off-screen rendering that is thankfully mostly obsolete these days.

Any ideas?
How about you don't abuse the main framebuffer for something you should not do (rendering to a window framebuffer and read from that) and instead use a Framebuffer Object and a renderbuffer to render to. You'd still have to use glReadPixels, but since you're using an off-screen surface you're avoiding all that synchronization with the windowing system. Using a PBO for data transfer is still recommendable, since it gives the OpenGL implementation more freedom in scheduling operations. I suggest the following:
Render to FBO renderbuffer
glReadPixels from renderbuffer into a GL_PIXEL_PACK_BUFFER PBO
Blit the renderbuffer to the main framebuffer
SwapBuffers
retrieve the data from the PBO
This arrangement and order of operations gives the OpenGL implementation enough leeway to asynchronously overlap some of the operations happening there without imposing some stalling synchronization points. For example the glReadPixels and blitting the renderbuffer to the main framebuffer are not interfering with each other (both only read from the renderbuffer). The OpenGL driver may rearrange for the glReadPixels to actually be executed after the blit, or at the same time. You may actually swap 2 and 3, and on some implementations this might yield better performance. Heck you could move 2 even after 4, but then you'd loose some operation reordering freedom.

Related

Streaming several (YUV) videos using OpenGL

I'm trying to do high-throughput video streaming using OpenGL. I thought I'd figured it all out with my genius programming architecture, but - surprise - when doing more serious tests, I've been stonewalled with a performance problem.
The story goes like this:
It all starts by reserving a stack of PBO's (say, a hundred+ or so):
glGenBuffers(1, &index);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, index);
glBufferData(GL_PIXEL_UNPACK_BUFFER, size, 0, GL_STREAM_DRAW); // reserve n_payload bytes to index/handle pbo_id
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0); // unbind (not mandatory)
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, index); // rebind (not mandatory)
payload = (GLubyte*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER, GL_WRITE_ONLY);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER); // release pointer to mapping buffer ** MANDATORY **
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0); // unbind ** MANDATORY **
YUV pixel data is copied into PBOs by separate decoder/uploader threads that use a common stack of available PBOs. The "payload" pointers you see above, are accessed from these threads and data is copied (with memcpy) "directly" to the gpu. Once a PBO is used, it is returned to the stack.
I also pre-reserve textures for each separate video stream. I reserve three textures (y, u and v), like this:
glEnable(GL_TEXTURE_2D);
glGenTextures(1, &index);
glBindTexture(GL_TEXTURE_2D, index);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, format, w, h, 0, format, GL_UNSIGNED_BYTE, 0); // no upload, just reserve
glBindTexture(GL_TEXTURE_2D, 0); // unbind
Rendering is done in a "master thread" (remember, the decoder / uploader threads are separate beasts) that reads frames from a fifo queue.
A critical step in rendering is to copy data from PBOs to textures (tex->format is GL_RED):
// y
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo->y_index);
glBindTexture(GL_TEXTURE_2D, tex->y_index); // this is the texture we will manipulate
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, tex->w, tex->h, tex->format, GL_UNSIGNED_BYTE, 0); // copy from pbo to texture
// u
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo->u_index);
glBindTexture(GL_TEXTURE_2D, tex->u_index); // this is the texture we will manipulate
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, tex->w/2, tex->h/2, tex->format, GL_UNSIGNED_BYTE, 0); // copy from pbo to texture
// v
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pbo->v_index);
glBindTexture(GL_TEXTURE_2D, tex->v_index); // this is the texture we will manipulate
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, tex->w/2, tex->h/2, tex->format, GL_UNSIGNED_BYTE, 0); // copy from pbo to texture
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0); // unbind // important!
glBindTexture(GL_TEXTURE_2D, 0); // unbind
And finally, the image is drawn using the OpenGL shading language (which is another story).
The Question : Do you see any OpenGL performance bottlenecks here?
Step (3) seems like a bottleneck, as it starts to consume too much time (up to 10+ milliseconds)!, when I'm trying to do this with several cameras.
Of course, this could be due to something else clogging the OpenGL pipeline - but everything else (glDrawElements, etc.) seems to take max. 1 millisecond.
I've been reading about problems people are having with glTexSubImage2D, but in my case, I'm simply filling the textures from PBOs. This should be lightning fast - right? Could the GL_RED format pose a problem by being non-optimal for the driver?
Another thing: I'm not doing de/reallocating here (I am using the same stack of pre-reserved PBO's), but re-allocating seems to be fast as well.. if I understood correctly this one .. ?
https://www.khronos.org/opengl/wiki/Buffer_Object_Streaming
Any insight highly appreciated..!
P. S. The complete project is here: https://github.com/elsampsa/valkka-core
EDIT 1:
I did some profiling: Every now and then during the streaming, both the PBO=>texture loading (as shown in the code snippet) and glXMakeCurrent go completely crazy and they both consume 10+ milliseconds (!) This happens quite sporadically. I tried to add some glFinish calls after each PBO=>texture load, but with little success (it seemed to stabilize things a bit .. but actually I'm not sure)
EDIT 2:
I am slowly getting there .. Ran some tests where I (a) upload with PBO to GPU and then (b) copy from PBO to texture (like in that sample code). The speed seems to depend on the texture format in "glTexImage2D". I try to match the texture's format and OpenGL internal format, by setting them to GL_RED and GL_RED (or GL_R8), respectively. But that is slow. Instead, if I use GL_RGBA for both, PBO=>TEX is lightning fast.. 100x faster !
Here:
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glTexImage2D.xhtml
it says that
GL_RED : Each element is a single red component. The GL converts it to floating point and assembles it into an RGBA element by attaching 0 for green and blue, and 1 for alpha. Each component is clamped to the range [0,1].
.. but I don't want OpenGL to do that! How can I tell it that it's just plain LUMA, i.e. one-byte-per-pixel and no need to convert/fill it, cause I will just use it in the shader program.
Maybe this is impossible and I should use buffer textures instead (as suggested in the comments) .. ? Buffer textures don't try to convert anything.. they just handle it as raw payload, right?
EDIT 3:
I'm trying to get dma to the texture buffer object:
// let's reserve a TBO
glGenBuffers(1, &tbo_index); // a buffer
glBindBuffer(GL_TEXTURE_BUFFER, tbo_index); // .. what is it
glBufferData(GL_TEXTURE_BUFFER, size, 0, GL_STREAM_DRAW); // .. how much
std::cout << "tbo " << tbo_index << std::endl;
glBindBuffer(GL_TEXTURE_BUFFER, 0); // unbind
// generate a texture
glGenTextures(1, &tex_index);
std::cout << "texture " << tex_index << std::endl;
// let's try to get dma to the texture buffer
glBindBuffer(GL_TEXTURE_BUFFER, tbo_index); // bind
payload = (GLubyte*)glMapBuffer(GL_TEXTURE_BUFFER, GL_WRITE_ONLY); // ** TODO: doesn't work
glUnmapBuffer(GL_TEXTURE_BUFFER); // release pointer to mapping buffer
glBindBuffer(GL_TEXTURE_BUFFER, 0); // unbind
std::cout << "tbo " << tbo_index << " at " << (long unsigned int)payload << std::endl;
Doesn't work.. payload is always a null pointer. glMapBuffer works ok with PBOs though. It should work with TBO's as well.

How can I resize existing texture attachments at my framebuffer?

When I resize my window, I need to resize my textures that are attached to my framebuffer. I tried calling glTexStorage2D again, with different size parameters. However that does not work.
How can I resize the textures attached to my framebuffer? (Including the depth attachment)
EDIT
Code I tried:
glBindTexture(m_target, m_name);
glTexStorage2D(m_target, 1, m_format, m_width, m_height);
glBindTexture(m_target, 0);
where m_name, m_target and m_format are saved from the original texture and m_width and m_height are the new dimensions.
EDIT2
Please tell me why this has been downvoted so I can fix the question.
EDIT3
Here, someone else had the same problem.
I found that the texture was being rendered correctly to the FBO, but that it was being displayed at the wrong size. It was as if the first time the texture was sent to the default framebuffer the texture size was set permanently, and then when a resized texture was sent it was being treated as if it was the original size. For example, if the first texture was 100x100 and the second texture was 50x50 then the entire texture would be displayed in the bottom left quarter of the screen. Conversely, if the original texture was 50x50 and the new texture 100x100 then the result would be the bottom left quarter of the texture being displayed over the whole screen.
However, he uses a shader to fix this. That's not how I want to do this. There has to be another solution, right?
If you were using glTexImage2D (...) to allocate storage for your texture, it would be possible to re-allocate the storage for any image in the texture at any time without first deleting the texture.
However, you are not using glTexImage2D (...), you are using glTexStorage2D (...). This creates an immutable texture object, whose storage requirements are set once and can never be changed again. Any calls to glTexImage2D (...) or glTexStorage2D (...) after you allocate storage initially will generate GL_INVALID_OPERATION and do nothing else.
If you want to create a texture whose size can be changed at any time, do not use glTexStorage2D (...). Instead, pass some dummy (but compatible) values for the data type and format to glTexImage2D (...).
For instance, if you want to allocate a texture with 1 LOD that is m_widthxm_height:
glTexImage2D (m_target, 0, m_format, m_width, m_height, 0, GL_RED, GL_FLOAT, NULL);
If m_width or m_height change later on, you can re-allocate storage the same way:
glTexImage2D (m_target, 0, m_format, m_width, m_height, 0, GL_RED, GL_FLOAT, NULL);
This is a very different situation than if you use glTexStorage2D (...). That will prevent you from re-allocating storage, and will simply create a GL_INVALID_OPERATION error.
You should review the manual page for glTexStorage2D (...), it states the following:
Description
glTexStorage2D specifies the storage requirements for all levels of a two-dimensional texture or one-dimensional texture array simultaneously. Once a texture is specified with this command, the format and dimensions of all levels become immutable unless it is a proxy texture. The contents of the image may still be modified, however, its storage requirements may not change. Such a texture is referred to as an immutable-format texture.
The behavior of glTexStorage2D depends on the target parameter.
When target is GL_TEXTURE_2D, GL_PROXY_TEXTURE_2D, GL_TEXTURE_RECTANGLE, GL_PROXY_TEXTURE_RECTANGLE or GL_PROXY_TEXTURE_CUBE_MAP, calling glTexStorage2D is equivalent, assuming no errors are generated, to executing the following pseudo-code:
for (i = 0; i < levels; i++) {
glTexImage2D(target, i, internalformat, width, height, 0, format, type, NULL);
width = max(1, (width / 2));
height = max(1, (height / 2));
}
When target is GL_TEXTURE_CUBE_MAP, glTexStorage2D is equivalent to:
for (i = 0; i < levels; i++) {
for (face in (+X, -X, +Y, -Y, +Z, -Z)) {
glTexImage2D(face, i, internalformat, width, height, 0, format, type, NULL);
}
width = max(1, (width / 2));
height = max(1, (height / 2));
}
When target is GL_TEXTURE_1D or GL_TEXTURE_1D_ARRAY, glTexStorage2D is equivalent to:
for (i = 0; i < levels; i++) {
glTexImage2D(target, i, internalformat, width, height, 0, format, type, NULL);
width = max(1, (width / 2));
}
Since no texture data is actually provided, the values used in the pseudo-code for format and type are irrelevant and may be considered to be any values that are legal for the chosen internalformat enumerant. [...] Upon success, the value of GL_TEXTURE_IMMUTABLE_FORMAT becomes GL_TRUE. The value of GL_TEXTURE_IMMUTABLE_FORMAT may be discovered by calling glGetTexParameter with pname set to GL_TEXTURE_IMMUTABLE_FORMAT. No further changes to the dimensions or format of the texture object may be made. Using any command that might alter the dimensions or format of the texture object (such as glTexImage2D or another call to glTexStorage2D) will result in the generation of a GL_INVALID_OPERATION error, even if it would not, in fact, alter the dimensions or format of the object.

Clearing color of GL_TEXTURE_2D_ARRAY with PBO

I have a texture 2d array of TEXTURE_2D.I need to clear the content of the textures before each draw pass.I am trying to do it with PBO.But I am getting INVALID_OPERATION error.
Here is how I create the array of images:
glGenTextures(1,&_texID);
glBindTexture (GL_TEXTURE_2D_ARRAY,_texID);
glTexStorage3D(GL_TEXTURE_2D_ARRAY,1,GL_RGBA32F,width,height,numTextures);
glBindTexture (GL_TEXTURE_2D_ARRAY,0);
glBindImageTexture(0, _texID, 0, GL_FALSE, 0, GL_READ_WRITE, GL_RGBA32F);
Here is how I clear it:
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, clearBuffer);
glBindTexture(GL_TEXTURE_2D_ARRAY, itexArray->GetTexID());
for(int i =0; i <numTextures ;++i) {
glTexSubImage3D(GL_TEXTURE_2D_ARRAY,1,0, 0, 0, _viewportWidth, _viewportHeight, i , GL_RGBA, GL_FLOAT, NULL);
}
glBindTexture(GL_TEXTURE_2D_ARRAY, 0);
I have numTextures = 8,so 8 texture layers in the array.When I start clearing them in the loop,first 4 are cleared without errors but from the forth on I ma getting INVALID_OPERATION.
UPDATE:
I solved PBO INVALID_OPERATION issue by enlarging PBO size from 2048x2048 to 4096x4096 but the result is that the textures of texture array are still not cleared properly.For example,at startup of the program leftovers can be still seen which disappear only after the rendered objects start moving around the viewport.
Here is the setup for clearing PBO:
GLint frameSize =MAX_FRAMEBUFFER_WIDTH * MAX_FRAMEBUFFER_HEIGHT * sizeof(float);
glGenBuffers(1, &clearBuffer);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER,clearBuffer);
glBufferData(GL_PIXEL_UNPACK_BUFFER,frameSize,NULL,GL_STATIC_DRAW);
//fill the buffer with color:
vec4* data = (vec4*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER,GL_WRITE_ONLY);
memset(data,0x00,frameSize);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
Where MAX_FRAMEBUFFER_WIDTH and MAX_FRAMEBUFFER_HEIGHT are both 4096
Level is level of detail, i.e. mipmap level, in most cases it is 0, depth would be array index in your case.
Your glTexSubImage3D call is broken.
glTexSubImage3D(GL_TEXTURE_2D_ARRAY, 1,
0, 0, 0, //offset (first image)
_viewportWidth, _viewportHeight, i, //size (getting larger)
GL_RGBA, GL_FLOAT, NULL);
First of all, of course Vasaka is right in that you shouldn't write to mipmap level 1 (which doesn't even exist), but 0. But even then this call will try to put a 3D image of size _viewportWidth * _viewportHeight * i at the first array index, which is surely not what you want. Instead you want to clear a 2D image of size _viewportWidth * _viewportHeight at position i. So your call should actually look this way:
glTexSubImage3D(GL_TEXTURE_2D_ARRAY, 0,
0, 0, i, //offset (ith image)
_viewportWidth, _viewportHeight, 1, //size (proper 2D image)
GL_RGBA, GL_FLOAT, NULL);
And your problem with needing a larger PBO than neccessary is easily solved by including a 4 in the computation of frameSize. Your PBO is treated (and explained by you) as containing 4-vectors of floats, yet you compute the size in bytes of it as if it just contained single floats. That's why it magically works for a doubled dimension, since this would properly increase the size of the PBO 4 times, as neccessary, but it only hides the actual problem of forgetting the component count in the size computation.
EDIT: By the way, instead of maintaining a huge PBO which contains nothing but 0s, you could also try to attach the respective image layer to an FBO and do a simple glClear in each loop iteration. Don't know which one is more efficient (but I'd guess glClear being more optimized than a whole image copy), but it at least makes this large PBO obsolete.

Reading the pixels values from the Frame Buffer Object (FBO) using Pixel Buffer Object (PBO)

Can I use Pixel Buffer Object (PBO) to directly read the pixels values (i.e. using glReadPixels) from the FBO (i.e. while FBO is still attached)?
If yes,
What are the advantages and disadvantages of using PBO with FBO?
What is the problem with following code
{
//DATA_SIZE = WIDTH * HEIGHT * 3 (BECAUSE I AM USING 3 CHANNELS ONLY)
// FBO and PBO status is good
.
.
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fboId);
//Draw the objects
Following glReadPixels works fine
glReadPixels(0, 0, screenWidth, screenHeight, GL_BGR_EXT, GL_UNSIGNED_BYTE, (uchar*)cvimg->imageData);
Following glReadPixels DOES NOT WORK :(
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboId);
//yes glWriteBuffer has also same target and I also checked with every possible values
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glReadPixels(0, 0, screenWidth, screenHeight, GL_BGR_EXT, GL_UNSIGNED_BYTE, (uchar*)cvimg->imageData);
.
.
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0); //back to window framebuffer
When using a PBO as target for glReadPixels you have to specify a byte offset into the buffer (0, I suppose) instead of (uchar*)cvimg->imageData as target address. It is similar to using a buffer offset in glVertexPointer when using VBOs.
EDIT: When a PBO is bound to the GL_PIXEL_PACK_BUFFER, the last argument to glReadPixels is not treated as a pointer into system memory but as a byte offset into the bound buffer's memory. So to write the pixels into the buffer just pass a 0 (write them to the start of the buffer memory). You can then later acces the buffer memory (to get the pixels) by means of glMapBuffer. The example link you provided in your comment does that, too, just read it extensively. I also suggest reading the part about vertex buffer objects they mention at the start, as these lay the ground to understand buffer objects.
Yes, we can use FBO and PBO together.
Answer 1:
For synchronous reading: 'glReadPixels' without PBO is fast.
For asynchronous reading: 'glReadPixels' with 2/n PBOs is better- one for reading pixels from framebuffer to PBO (n) by GPU and another PBO (n+1) to process pixels by CPU. However fast is not granted, it is problem and design spefic.
Answer 2:
Christian Rau's explanation is correct and revised code is below
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboId);
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
//glReadBuffer(GL_DEPTH_ATTACHMENT_EXT);
glReadPixels(0, 0, screenWidth, screenHeight, GL_BGR, GL_UNSIGNED_BYTE, 0);
//GLubyte* src = (GLubyte*)glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB);
//OR
cvimg->imageData = (char*) glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB);
if(cvimg_predict_contour->imageData)
{
//Process src OR cvim->imageData
glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB); // release pointer to the mapped buffer
}
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);

How to get texture data using textureID's in openGL

I'm writing some code where all I have access to is a textureID to get access to the required texture. Is there any way that I can get access to the RGB values of this texture so I can perform some computations on it?
EDIT: I am looking for the inverse of glTexSubImage2D. I want to get the texture data rather than replace it.
You are probably looking for glGetTexImage
Before using glGetTexImage, don't forget to use glBindTexture with your texture ID.
In OpneGL a texture can be read by glGetTexImage/glGetnTexImage respectively the DSA version of the function glGetTextureImage.
Another possibility is to attach the texture to a framebuffer and to read the pixel by glReadPixels. OpenGL ES does not offer a glGetTexImage, so this is the way to OpenGL ES.
See opengl es 2.0 android c++ glGetTexImage alternative
If you transfer the texture image to a Pixel Buffer Object, then you can even access the data via Buffer object mapping. See also OpenGL Pixel Buffer Object (PBO).
You've to bind a buffer with the proper size to the target GL_PIXEL_PACK_BUFFER:
// create buffer
GLuint pbo;
glGenBuffers(1, &pbo);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
glBufferData(GL_PIXEL_PACK_BUFFER, size_in_bytes, 0, GL_STATIC_READ);
// get texture image
glBindTexture(GL_TEXTURE_2D, texture_obj);
glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, (void*)(0));
// map pixel buffer
void * data_ptr = glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY);
// access the data
// [...]
glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
I'm writing this here just in case anyone needs this.
In 4.5+ OpenGL, one can access a textures' data by giving a texture ID by using the
glGetTextureImage() function.
For example in order to get a GL_RGB texture data,
we have 3 floats R,G,B each one of each is 4 bytes so:
float* data = new float[texture_height * texture_width * 3];
glGetTextureImage(textureID, 0, GL_RGB, GL_FLOAT, texture_height * texture_width * 3 * 4, data);