How can I resize existing texture attachments at my framebuffer? - opengl

When I resize my window, I need to resize my textures that are attached to my framebuffer. I tried calling glTexStorage2D again, with different size parameters. However that does not work.
How can I resize the textures attached to my framebuffer? (Including the depth attachment)
EDIT
Code I tried:
glBindTexture(m_target, m_name);
glTexStorage2D(m_target, 1, m_format, m_width, m_height);
glBindTexture(m_target, 0);
where m_name, m_target and m_format are saved from the original texture and m_width and m_height are the new dimensions.
EDIT2
Please tell me why this has been downvoted so I can fix the question.
EDIT3
Here, someone else had the same problem.
I found that the texture was being rendered correctly to the FBO, but that it was being displayed at the wrong size. It was as if the first time the texture was sent to the default framebuffer the texture size was set permanently, and then when a resized texture was sent it was being treated as if it was the original size. For example, if the first texture was 100x100 and the second texture was 50x50 then the entire texture would be displayed in the bottom left quarter of the screen. Conversely, if the original texture was 50x50 and the new texture 100x100 then the result would be the bottom left quarter of the texture being displayed over the whole screen.
However, he uses a shader to fix this. That's not how I want to do this. There has to be another solution, right?

If you were using glTexImage2D (...) to allocate storage for your texture, it would be possible to re-allocate the storage for any image in the texture at any time without first deleting the texture.
However, you are not using glTexImage2D (...), you are using glTexStorage2D (...). This creates an immutable texture object, whose storage requirements are set once and can never be changed again. Any calls to glTexImage2D (...) or glTexStorage2D (...) after you allocate storage initially will generate GL_INVALID_OPERATION and do nothing else.
If you want to create a texture whose size can be changed at any time, do not use glTexStorage2D (...). Instead, pass some dummy (but compatible) values for the data type and format to glTexImage2D (...).
For instance, if you want to allocate a texture with 1 LOD that is m_widthxm_height:
glTexImage2D (m_target, 0, m_format, m_width, m_height, 0, GL_RED, GL_FLOAT, NULL);
If m_width or m_height change later on, you can re-allocate storage the same way:
glTexImage2D (m_target, 0, m_format, m_width, m_height, 0, GL_RED, GL_FLOAT, NULL);
This is a very different situation than if you use glTexStorage2D (...). That will prevent you from re-allocating storage, and will simply create a GL_INVALID_OPERATION error.
You should review the manual page for glTexStorage2D (...), it states the following:
Description
glTexStorage2D specifies the storage requirements for all levels of a two-dimensional texture or one-dimensional texture array simultaneously. Once a texture is specified with this command, the format and dimensions of all levels become immutable unless it is a proxy texture. The contents of the image may still be modified, however, its storage requirements may not change. Such a texture is referred to as an immutable-format texture.
The behavior of glTexStorage2D depends on the target parameter.
When target is GL_TEXTURE_2D, GL_PROXY_TEXTURE_2D, GL_TEXTURE_RECTANGLE, GL_PROXY_TEXTURE_RECTANGLE or GL_PROXY_TEXTURE_CUBE_MAP, calling glTexStorage2D is equivalent, assuming no errors are generated, to executing the following pseudo-code:
for (i = 0; i < levels; i++) {
glTexImage2D(target, i, internalformat, width, height, 0, format, type, NULL);
width = max(1, (width / 2));
height = max(1, (height / 2));
}
When target is GL_TEXTURE_CUBE_MAP, glTexStorage2D is equivalent to:
for (i = 0; i < levels; i++) {
for (face in (+X, -X, +Y, -Y, +Z, -Z)) {
glTexImage2D(face, i, internalformat, width, height, 0, format, type, NULL);
}
width = max(1, (width / 2));
height = max(1, (height / 2));
}
When target is GL_TEXTURE_1D or GL_TEXTURE_1D_ARRAY, glTexStorage2D is equivalent to:
for (i = 0; i < levels; i++) {
glTexImage2D(target, i, internalformat, width, height, 0, format, type, NULL);
width = max(1, (width / 2));
}
Since no texture data is actually provided, the values used in the pseudo-code for format and type are irrelevant and may be considered to be any values that are legal for the chosen internalformat enumerant. [...] Upon success, the value of GL_TEXTURE_IMMUTABLE_FORMAT becomes GL_TRUE. The value of GL_TEXTURE_IMMUTABLE_FORMAT may be discovered by calling glGetTexParameter with pname set to GL_TEXTURE_IMMUTABLE_FORMAT. No further changes to the dimensions or format of the texture object may be made. Using any command that might alter the dimensions or format of the texture object (such as glTexImage2D or another call to glTexStorage2D) will result in the generation of a GL_INVALID_OPERATION error, even if it would not, in fact, alter the dimensions or format of the object.

Related

CUDA/OpenGL Interop: Writing to surface object does not erase previous contents

I am attempting to use a CUDA kernel to modify an OpenGL texture, but am having a strange issue where my calls to surf2Dwrite() seem to blend with the previous contents of the texture, as you can see in the image below. The wooden texture in the back is what's in the texture before modifying it with my CUDA kernel. The expected output would include ONLY the color gradients, not the wood texture behind it. I don't understand why this blending is happening.
Possible Problems / Misunderstandings
I'm new to both CUDA and OpenGL. Here I'll try to explain the thought process that led me to this code:
I'm using a cudaArray to access the texture (rather than e.g. an array of floats) because I read that it's better for cache locality when reading/writing a texture.
I'm using surfaces because I read somewhere that it's the only way to modify a cudaArray
I wanted to use surface objects, which I understand to be the newer way of doing things. The old way is to use surface references.
Some possible problems with my code that I don't know how to check/test:
Am I being inconsistent with image formats? Maybe I didn't specify the correct number of bits/channel somewhere? Maybe I should use floats instead of unsigned chars?
Code Summary
You can find a full minimum working example in this GitHub Gist. It's quite long because of all the moving parts, but I'll try to summarize. I welcome suggestions on how to shorten the MWE. The overall structure is as follows:
create an OpenGL texture from a file stored locally
register the texture with CUDA using cudaGraphicsGLRegisterImage()
call cudaGraphicsSubResourceGetMappedArray() to get a cudaArray that represents the texture
create a cudaSurfaceObject_t that I can use to write to the cudaArray
pass the surface object to a kernel that writes to the texture with surf2Dwrite()
use the texture to draw a rectangle on-screen
OpenGL Texture Creation
I am new to OpenGL, so I'm using the "Textures" section of the LearnOpenGL tutorials as a starting point. Here's how I set up the texture (using the image library stb_image.h)
GLuint initTexturesGL(){
// load texture from file
int numChannels;
unsigned char *data = stbi_load("img/container.jpg", &g_imageWidth, &g_imageHeight, &numChannels, 4);
if(!data){
std::cerr << "Error: Failed to load texture image!" << std::endl;
exit(1);
}
// opengl texture
GLuint textureId;
glGenTextures(1, &textureId);
glBindTexture(GL_TEXTURE_2D, textureId);
// wrapping
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_MIRRORED_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_MIRRORED_REPEAT);
// filtering
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
// set texture image
glTexImage2D(
GL_TEXTURE_2D, // target
0, // mipmap level
GL_RGBA8, // internal format (#channels, #bits/channel, ...)
g_imageWidth, // width
g_imageHeight, // height
0, // border (must be zero)
GL_RGBA, // format of input image
GL_UNSIGNED_BYTE, // type
data // data
);
glGenerateMipmap(GL_TEXTURE_2D);
// unbind and free image
glBindTexture(GL_TEXTURE_2D, 0);
stbi_image_free(data);
return textureId;
}
CUDA Graphics Interop
After calling the function above, I register the texture with CUDA:
void initTexturesCuda(GLuint textureId){
// register texture
HANDLE(cudaGraphicsGLRegisterImage(
&g_textureResource, // resource
textureId, // image
GL_TEXTURE_2D, // target
cudaGraphicsRegisterFlagsSurfaceLoadStore // flags
));
// resource description for surface
memset(&g_resourceDesc, 0, sizeof(g_resourceDesc));
g_resourceDesc.resType = cudaResourceTypeArray;
}
Render Loop
Every frame, I run the following to modify the texture and render the image:
while(!glfwWindowShouldClose(window)){
// -- CUDA --
// map
HANDLE(cudaGraphicsMapResources(1, &g_textureResource));
HANDLE(cudaGraphicsSubResourceGetMappedArray(
&g_textureArray, // array through which to access subresource
g_textureResource, // mapped resource to access
0, // array index
0 // mipLevel
));
// create surface object (compute >= 3.0)
g_resourceDesc.res.array.array = g_textureArray;
HANDLE(cudaCreateSurfaceObject(&g_surfaceObj, &g_resourceDesc));
// run kernel
kernel<<<gridDim, blockDim>>>(g_surfaceObj, g_imageWidth, g_imageHeight);
// unmap
HANDLE(cudaGraphicsUnmapResources(1, &g_textureResource));
// --- OpenGL ---
// clear
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// use program
shader.use();
// triangle
glBindVertexArray(vao);
glBindTexture(GL_TEXTURE_2D, textureId);
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
glBindVertexArray(0);
// glfw: swap buffers and poll i/o events
glfwSwapBuffers(window);
glfwPollEvents();
}
CUDA Kernel
The actual CUDA kernel is as follows:
__global__ void kernel(cudaSurfaceObject_t surface, int nx, int ny){
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if(x < nx && y < ny){
uchar4 data = make_uchar4(x % 255,
y % 255,
0, 255);
surf2Dwrite(data, surface, x * sizeof(uchar4), y);
}
}
If I understand correctly, you initially register the texture, map it once, create a surface object for the array representing the mapped texture, and then unmap the texture. Every frame, you then map the resource again, ask for the array representing the mapped texture, and then completely ignore that one and use the surface object created for the array you got back when you first mapped the resource. From the documentation:
[…] The value set in array may change every time that resource is mapped.
You have to create a new surface object every time you map the resource because you might get a different array every time. And, in my experience, you will actually get a different one every so often. It may be a valid thing to do to only create a new surface object whenever the array actually changes. The documentation seems to allow for that, but I never tried, so I can't tell whether that works for sure…
Apart from that: You generate mipmaps for your texture. You only overwrite mip level 0. You then render the texture using mipmapping with trilinear interpolation. So my guess would be that you just happen to render the texture at a resolution that does not match the resolution of mip level 0 exactly and, thus, you will end up interpolating between level 0 (in which you wrote) and level 1 (which was generated from the original texture)…
It turns out the problem is that I had mistakenly generated mipmaps for the original wood texture, and my CUDA kernel was only modifying the level-0 mipmap. The blending I noticed was the result of OpenGL interpolating between my modified level-0 mipmap and a lower-resolution version of the wood texture.
Here's the correct output, obtained by disabling mipmap interpolation. Lesson learned!

OpenGL reading back buffer quickly

I'm trying to read the contents of the back-buffer into a buffer of my own. glReadPixels by itself is way too slow and drops my FPS from 50 to 30.
So I decided to try the "asynchronous" read with a PBuffer but it crashes.
My code is as follows:
If buffers don't exist, create them. Otherwise, read the back buffer into a specified memory location:
static int readIndex = 0;
static int writeIndex = 1;
static GLuint pbo[2] = {0};
void FastCaptureBackBuffer()
{
//Create PBOs:
if (!initBuffers)
{
initBuffers = true;
glGenBuffers(2, pbo);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[0]);
glBufferData(GL_PIXEL_PACK_BUFFER, width * height * 1.0f, 0, GL_STREAM_READ);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[1]);
glBufferData(GL_PIXEL_PACK_BUFFER, width * height * 1.0f, 0, GL_STREAM_READ);
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
}
//swap read and write.
writeIndex = (writeIndex + 1) % 2;
readIndex = (writeIndex + 1) % 2;
//read back-buffer.
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[writeIndex]);
glReadPixels(0, 0, width, height, GL_BGRA, GL_UNSIGNED_BYTE, nullptr);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[readIndex]);
void* data = glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY);
if (data)
{
memcpy(myBuffer, data, width * height * 4);
data = nullptr;
glUnmapBuffer(GL_PIXEL_PACK_BUFFER);
}
glBindBuffer(GL_PIXEL_PACK_BUFFER, 0);
}
Then I do:
BOOL __stdcall HookSwapBuffers(HDC DC)
{
FastCaptureBackBufferPBO();
return CallFunction<BOOL>(GetOriginalAddress(353), DC);
}
So every time the application calls wglSwapBuffers, I read the back buffer right before it gets swapped.
How can I read the back buffer fast? What am I missing in the above?
Ideally I wanted to: Specify a pointer that the game could render directly to, instead of the screen, and then I can manually render the contents of the memory.
Any other way and I end up copying the back buffer into my memory block and it's slow.
Any ideas?
You're not reserving enough memory in the buffer:
glBufferData(GL_PIXEL_PACK_BUFFER, width * height * 1.0f, 0, GL_STREAM_READ);
Since you're using GL_RGBA as the format, you will need 4 bytes per pixel, which also matches what you're using in your memcpy() call:
memcpy(myBuffer, data, width * height * 4);
So the glBufferData() call should be:
glBufferData(GL_PIXEL_PACK_BUFFER, width * height * 4, 0, GL_STREAM_READ);
Also, it's not entirely clear from your question why you're using HookSwapBuffers(). I believe people use that to intercept the SwapBuffers() call if they do not have source code. If you want to capture rendering you do yourself in your own code, you can simply call glReadPixels() immediately after you finished rendering the frame. It will be executed in sequence with all the other OpenGL calls, so it will contain the result of all the draw calls you issued.
Minor terminology point: What you're asking about here is not called "PBuffer". The full name is "Pixel Buffer Object", often used in its short form "PBO". A PBuffer is something quite different. It was an old mechanism for off-screen rendering that is thankfully mostly obsolete these days.
Any ideas?
How about you don't abuse the main framebuffer for something you should not do (rendering to a window framebuffer and read from that) and instead use a Framebuffer Object and a renderbuffer to render to. You'd still have to use glReadPixels, but since you're using an off-screen surface you're avoiding all that synchronization with the windowing system. Using a PBO for data transfer is still recommendable, since it gives the OpenGL implementation more freedom in scheduling operations. I suggest the following:
Render to FBO renderbuffer
glReadPixels from renderbuffer into a GL_PIXEL_PACK_BUFFER PBO
Blit the renderbuffer to the main framebuffer
SwapBuffers
retrieve the data from the PBO
This arrangement and order of operations gives the OpenGL implementation enough leeway to asynchronously overlap some of the operations happening there without imposing some stalling synchronization points. For example the glReadPixels and blitting the renderbuffer to the main framebuffer are not interfering with each other (both only read from the renderbuffer). The OpenGL driver may rearrange for the glReadPixels to actually be executed after the blit, or at the same time. You may actually swap 2 and 3, and on some implementations this might yield better performance. Heck you could move 2 even after 4, but then you'd loose some operation reordering freedom.

TexSubImage2D produces a garbled texture for specific images only

I'm trying to get a texture to be rendered on top of another one, like in the image below:
However, only that image gets rendered properly. My other images get garbled and "twisted". If you look carefully, it's as if the rows were shifted:
In the above example, I used the very same cat picture in the background. Both this cat picture, and all other images I generate end up garbled, except that one special picture, for some reason. I have looked at EXIF data, and other than the fact that it doesn't use sRGB, it is in the exact same format as the others. It has an alpha channel and everything.
I believe it has something to do with pixel alignment, given how the rows are shifted, but I have tried literally every possible combination of alignment and nothing as worked so far. Here is my code:
int height, width = 512;
m_pSubImage = SOIL_load_image("sample.png", &width, &height, 0, SOIL_LOAD_RGBA);
glGenTextures(1, &m_textureObj);
glBindTexture(m_textureTarget, m_textureObj);
...
glActiveTexture(TextureUnit);
glBindTexture(m_textureTarget, m_textureObj);
glTexSubImage2D(GL_TEXTURE_2D, 0, 20, 10, 100, 100, GL_RGBA, GL_UNSIGNED_BYTE, m_pSubImage);
The code for loading the background image is similar, except that it uses this call instead of glTexSubImage2D:
glTexImage2D(m_textureTarget, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, m_pImage);
It appears that you aren't passing the width and height correctly to glTexSubImage2D. Note that you need the number of pixels stored per scanline, which is often not exactly the "logical" width of the image, but rounded up to a multiple of 4.
The difference between the "logical" and "storage" width will leave a few padding pixels left over on each scan line, which will be interpreted as the leftmost pixels of the next scanline, and accumulate as you move down the image. That creates the slant effect you observe.
You don't appear to be checking for failures. The following failure modes of glTexSubImage2D are especially relevant here:
GL_INVALID_VALUE is generated if xoffset < 0, xoffset + width > w, yoffset < 0, yoffset + height > h, where w is the width and h is the height of the texture image being modified.
GL_INVALID_VALUE is generated if width or height is less than 0.
GL_INVALID_OPERATION is generated if the texture array has not been defined by a previous glTexImage2D or glCopyTexImage2D operation whose internalformat matches the format of glTexSubImage2D.

Clearing color of GL_TEXTURE_2D_ARRAY with PBO

I have a texture 2d array of TEXTURE_2D.I need to clear the content of the textures before each draw pass.I am trying to do it with PBO.But I am getting INVALID_OPERATION error.
Here is how I create the array of images:
glGenTextures(1,&_texID);
glBindTexture (GL_TEXTURE_2D_ARRAY,_texID);
glTexStorage3D(GL_TEXTURE_2D_ARRAY,1,GL_RGBA32F,width,height,numTextures);
glBindTexture (GL_TEXTURE_2D_ARRAY,0);
glBindImageTexture(0, _texID, 0, GL_FALSE, 0, GL_READ_WRITE, GL_RGBA32F);
Here is how I clear it:
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, clearBuffer);
glBindTexture(GL_TEXTURE_2D_ARRAY, itexArray->GetTexID());
for(int i =0; i <numTextures ;++i) {
glTexSubImage3D(GL_TEXTURE_2D_ARRAY,1,0, 0, 0, _viewportWidth, _viewportHeight, i , GL_RGBA, GL_FLOAT, NULL);
}
glBindTexture(GL_TEXTURE_2D_ARRAY, 0);
I have numTextures = 8,so 8 texture layers in the array.When I start clearing them in the loop,first 4 are cleared without errors but from the forth on I ma getting INVALID_OPERATION.
UPDATE:
I solved PBO INVALID_OPERATION issue by enlarging PBO size from 2048x2048 to 4096x4096 but the result is that the textures of texture array are still not cleared properly.For example,at startup of the program leftovers can be still seen which disappear only after the rendered objects start moving around the viewport.
Here is the setup for clearing PBO:
GLint frameSize =MAX_FRAMEBUFFER_WIDTH * MAX_FRAMEBUFFER_HEIGHT * sizeof(float);
glGenBuffers(1, &clearBuffer);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER,clearBuffer);
glBufferData(GL_PIXEL_UNPACK_BUFFER,frameSize,NULL,GL_STATIC_DRAW);
//fill the buffer with color:
vec4* data = (vec4*)glMapBuffer(GL_PIXEL_UNPACK_BUFFER,GL_WRITE_ONLY);
memset(data,0x00,frameSize);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
Where MAX_FRAMEBUFFER_WIDTH and MAX_FRAMEBUFFER_HEIGHT are both 4096
Level is level of detail, i.e. mipmap level, in most cases it is 0, depth would be array index in your case.
Your glTexSubImage3D call is broken.
glTexSubImage3D(GL_TEXTURE_2D_ARRAY, 1,
0, 0, 0, //offset (first image)
_viewportWidth, _viewportHeight, i, //size (getting larger)
GL_RGBA, GL_FLOAT, NULL);
First of all, of course Vasaka is right in that you shouldn't write to mipmap level 1 (which doesn't even exist), but 0. But even then this call will try to put a 3D image of size _viewportWidth * _viewportHeight * i at the first array index, which is surely not what you want. Instead you want to clear a 2D image of size _viewportWidth * _viewportHeight at position i. So your call should actually look this way:
glTexSubImage3D(GL_TEXTURE_2D_ARRAY, 0,
0, 0, i, //offset (ith image)
_viewportWidth, _viewportHeight, 1, //size (proper 2D image)
GL_RGBA, GL_FLOAT, NULL);
And your problem with needing a larger PBO than neccessary is easily solved by including a 4 in the computation of frameSize. Your PBO is treated (and explained by you) as containing 4-vectors of floats, yet you compute the size in bytes of it as if it just contained single floats. That's why it magically works for a doubled dimension, since this would properly increase the size of the PBO 4 times, as neccessary, but it only hides the actual problem of forgetting the component count in the size computation.
EDIT: By the way, instead of maintaining a huge PBO which contains nothing but 0s, you could also try to attach the respective image layer to an FBO and do a simple glClear in each loop iteration. Don't know which one is more efficient (but I'd guess glClear being more optimized than a whole image copy), but it at least makes this large PBO obsolete.

Reading the pixels values from the Frame Buffer Object (FBO) using Pixel Buffer Object (PBO)

Can I use Pixel Buffer Object (PBO) to directly read the pixels values (i.e. using glReadPixels) from the FBO (i.e. while FBO is still attached)?
If yes,
What are the advantages and disadvantages of using PBO with FBO?
What is the problem with following code
{
//DATA_SIZE = WIDTH * HEIGHT * 3 (BECAUSE I AM USING 3 CHANNELS ONLY)
// FBO and PBO status is good
.
.
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fboId);
//Draw the objects
Following glReadPixels works fine
glReadPixels(0, 0, screenWidth, screenHeight, GL_BGR_EXT, GL_UNSIGNED_BYTE, (uchar*)cvimg->imageData);
Following glReadPixels DOES NOT WORK :(
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboId);
//yes glWriteBuffer has also same target and I also checked with every possible values
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
glReadPixels(0, 0, screenWidth, screenHeight, GL_BGR_EXT, GL_UNSIGNED_BYTE, (uchar*)cvimg->imageData);
.
.
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, 0); //back to window framebuffer
When using a PBO as target for glReadPixels you have to specify a byte offset into the buffer (0, I suppose) instead of (uchar*)cvimg->imageData as target address. It is similar to using a buffer offset in glVertexPointer when using VBOs.
EDIT: When a PBO is bound to the GL_PIXEL_PACK_BUFFER, the last argument to glReadPixels is not treated as a pointer into system memory but as a byte offset into the bound buffer's memory. So to write the pixels into the buffer just pass a 0 (write them to the start of the buffer memory). You can then later acces the buffer memory (to get the pixels) by means of glMapBuffer. The example link you provided in your comment does that, too, just read it extensively. I also suggest reading the part about vertex buffer objects they mention at the start, as these lay the ground to understand buffer objects.
Yes, we can use FBO and PBO together.
Answer 1:
For synchronous reading: 'glReadPixels' without PBO is fast.
For asynchronous reading: 'glReadPixels' with 2/n PBOs is better- one for reading pixels from framebuffer to PBO (n) by GPU and another PBO (n+1) to process pixels by CPU. However fast is not granted, it is problem and design spefic.
Answer 2:
Christian Rau's explanation is correct and revised code is below
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, pboId);
glReadBuffer(GL_COLOR_ATTACHMENT0_EXT);
//glReadBuffer(GL_DEPTH_ATTACHMENT_EXT);
glReadPixels(0, 0, screenWidth, screenHeight, GL_BGR, GL_UNSIGNED_BYTE, 0);
//GLubyte* src = (GLubyte*)glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB);
//OR
cvimg->imageData = (char*) glMapBufferARB(GL_PIXEL_PACK_BUFFER_ARB, GL_READ_ONLY_ARB);
if(cvimg_predict_contour->imageData)
{
//Process src OR cvim->imageData
glUnmapBufferARB(GL_PIXEL_PACK_BUFFER_ARB); // release pointer to the mapped buffer
}
glBindBufferARB(GL_PIXEL_PACK_BUFFER_ARB, 0);