Do we need to clear the buffer if we use double buffering? - opengl

Let say, we use double buffering. We first write the frame into the back buffer, then it will be swap into the front buffer to be displayed.
There are 2 scenario here, which I assume have the same outcome.
Assume we clear the back buffer, we then write a new frame to back buffer. Swap it into the front buffer.
Now assume we didn't clear the back buffer, the back buffer will be overwritten with a new frame anyway. Lastly both buffer will be swapped.
Thus, assuming I was right and provided we use double buffering, whether clearing or not clearing buffer, both will then end up with the same display, is that true?
Will there be any possible rendering artifacts, if we didn't clear the buffer?

The key of the second approach is in this assumption:
the back buffer will be overwritten with a new frame
I assume we are talking about OpenGL frame buffer which contains Color values, Depth, Stencil and etc. How exactly will they be overwritten in next frame?
Rendering code does constant depth comparisons, so see which objects need to be drawn. With old frame depth data it will be all messed up. Same happens if you render any semi-transparent items, with blending enabled.
Clearing buffer is fastest way to reset everything to ground zero (or any other specific value you need).
There are techniques that rely on buffer not being cleared (considering this is verified to be a costly operation on a platform). For example not having transparent geometry without opaque behind it, and toggling depth-test less/greater in 0-0.5 / 0.5-1.0 ranges to make it always overwrite old frames values.

During rendering you depend on the fact that at least the depth buffer is cleared.
When double buffering the value of the back buffer will (possibly) be that what you rendered 2 frames ago.
If the depth buffer is not cleared then that wall you planted your face on will never go away.
The depth buffer can be cleared by for example rendering a full screen quad textured with your skybox while the depth test is disabled.

Clearing of the buffers is absolutely essential if you like performance on modern hardware. Clearing buffers doesn't necessarily write to memory. It instead does some cache magic such that, whenever the system tries to read from the memory (if it hasn't been written to since it was cleared), it will read the clear color. So it won't even really access that memory.
This is very important for things like the depth buffer. Depth tests/writes are a read/modify/write operation. The first read will essentially be free.
So while you do not technically need to clear the back buffers if you're going to overwrite every pixel, you really should.

After buffer swap the contents of the back buffer are undefined, i.e. they could be anything. Since many OpenGL rendering operations depend on a well known state of the destination frame buffer to work properly (depth testing, stencil testing, blending) the back buffer has to be brought into a well known state before doing anything else.
Hence, unless you take carefull measures to make sure your rendering operations do not depend on destination buffer contents, you'll have to clear the back buffer after a swap before doing anything else.

Related

OpenGL: efficient way to read sparce pixel data from many framebuffer textures?

I'm writing a program that uses the GPU to calculate stuff, and I want to read data from the framebuffers to be used in my client code. The framebuffers I'm using are about 40 textures, all 1024x1024 in size, all of which contain data that needs read, but only very sparcely, like 50 or so pixels in arbitrary x/y coordinates from each texture. Using glReadPixels for each texture, for each frame, is proving too costly for me to do though...
I only need to read a few select pixels from each texture, is there a way to quickly gather their data without needing to download every entire texture from the GPU?
This sounds fairly expensive no matter how you slice it. A couple of approaches come to mind:
What I would try first is glReadPixels(), but with using a PBO. Bind a buffer large enough to hold all the pixels to the GL_PIXEL_PACK_BUFFER target, and then submit the glReadPixels() calls, with offsets to place the results in distinct sections of the buffer. Then call glMapBufferRange() to read back the values.
An alternate approach is that you copy all the pixels you want to read into a single texture. You could use glBlitFramebuffer() or glCopyTexSubImage2D(). Then use a single glReadPixels() or glGetTexImage() call to get all the data from this texture.
Both of these approaches should result in about the same amount of work and synchronization overhead. But one or the other could be more efficient, depending on which paths in the driver are better optimized.
As the earlier answer already suggested, I would make very sure that you really need this, and there isn't any way to keep and process the data on the GPU. Any time you read back data, you introduce synchronization between GPU and CPU, which is mostly harmful to performance.
Do you have any restrictions on what OpenGL version you can use? If not, it sounds like you should look into compute shaders. You say that you are calculating data, so I assume that you are "abusing" the rendering pipeline for your application, especially the fragment shader, and store fragment data in the framebuffer that is interpreted as something else than color.
If this is the case, then all you need is a shader storage buffer and an atomic counter. At some point right now you are deciding that fragment (x, y, z [z being the texture index]) should have value v. So in your compute shader, you do your calculation as you would in the fragment shader, but as output, you store a tuple (x, y, z, v). You store this tuple in the shader storage buffer at the index of the atomic counter which you increment after each written element. In the end, you have your data stored compactly in the buffer and only need to read back these elements. The exact number is the value the atomic counter holds after termination. Download the buffer with glGetBufferSubData into an array of location-value pairs, iterate over it and do your CPU magic.
If you need to copy the data from the GPU to the CPU memory, there is no way (AFAIK) around using glReadPixels.
Depending on what platform you're using, and the specific of your programs, you can try several optimizations, using FBOs:
Copy only part of the texture, assuming you know the locations of the pixels. Note that in most cases it still faster to copy the entire texture instead of issuing several small reads
If you don't need 32 bit textures, you can render to a lower color resolution. The specific depends on your platform extensions.
Maybe you don't really need to copy the pixels since you plan to use them as a texture input to the next stage? In that case you copy the pixels directly on the GPU using glCopyTexImage2D

Explain different types of draw buffers

Why do we need all these buffers:
GL_FRONT_LEFT, GL_FRONT_RIGHT,
GL_BACK_LEFT, GL_BACK_RIGHT, GL_FRONT, GL_BACK, GL_LEFT,
GL_RIGHT, GL_FRONT_AND_BACK, and GL_AUXi, where i is
between 0 and GL_AUX_BUFFERS.
GL_FRONT and GL_BACK make sense to me, but not others. Can someone give an example when we will be rendering to say GL_BACK_LEFT?
When I say GL_FRONT_ABD_BACK, does it render to all left and right of both front and back buffers (ie 4 buffers in total). What are the use-cases of rendering the same image to all 4 buffers? Any examples?
The LEFT/RIGHT distinction for the buffers is used for stereo display. For double buffered stereo, you would render the image for the left eye into the GL_BACK_LEFT buffer, the image for the right eye into the GL_BACK_RIGHT buffer, and then swap buffers.
In stereo mode, GL_BACK is used if you want to render the same thing to both GL_BACK_LEFT_ and GL_BACK_RIGHT. Which is quite rare.
You're correct about GL_FRONT_AND_BACK. Rendering to the front buffer is long obsolete, and was only useful in special cases anyway. One typical case was for drawing special cursors, like cross-hair cursors, using OpenGL, on top of an already displayed frame. Rendering to both front and back at the same time was even less useful. I can't think of a good use case right now.
AUX buffers are long deprecated as well. I believe they were intended for rendering effects where you needed additional color buffers. These days, you would use FBOs for the same purpose.
The FRONT and BACK distinction refers to double buffering--usually a program draws to the back buffer and then swaps it to the front such that a complete frame is always presented to the user.
The LEFT and RIGHT distinction is for stereo rendering (where a slightly different viewpoint is used to render images for the left and right eyes).
Of course, most stereo rendering hardware would also support double buffering, leading to the combinations GL_FRONT/BACK_LEFT/RIGHT.
What buffers are actually available depends on the context you have requested and (of course) the capabilities of the underlying system. GL_FRONT is likely to have the same defined value as one or the other of GL_FRONT_LEFT and GL_FRONT_RIGHT, and likewise for GL_BACK, though there's no guarantee.
The GL_AUXi buffers are implementation specific. I suspect they're usually used for overlay planes when available.

How to render offscreen on OpenGL? [duplicate]

This question already has answers here:
How to use GLUT/OpenGL to render to a file?
(6 answers)
Closed 9 years ago.
My aim is to render OpenGL scene without a window, directly into a file. The scene may be larger than my screen resolution is.
How can I do this?
I want to be able to choose the render area size to any size, for example 10000x10000, if possible?
It all starts with glReadPixels, which you will use to transfer the pixels stored in a specific buffer on the GPU to the main memory (RAM). As you will notice in the documentation, there is no argument to choose which buffer. As is usual with OpenGL, the current buffer to read from is a state, which you can set with glReadBuffer.
So a very basic offscreen rendering method would be something like the following. I use c++ pseudo code so it will likely contain errors, but should make the general flow clear:
//Before swapping
std::vector<std::uint8_t> data(width*height*4);
glReadBuffer(GL_BACK);
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_BYTE,&data[0]);
This will read the current back buffer (usually the buffer you're drawing to). You should call this before swapping the buffers. Note that you can also perfectly read the back buffer with the above method, clear it and draw something totally different before swapping it. Technically you can also read the front buffer, but this is often discouraged as theoretically implementations were allowed to make some optimizations that might make your front buffer contain rubbish.
There are a few drawbacks with this. First of all, we don't really do offscreen rendering do we. We render to the screen buffers and read from those. We can emulate offscreen rendering by never swapping in the back buffer, but it doesn't feel right. Next to that, the front and back buffers are optimized to display pixels, not to read them back. That's where Framebuffer Objects come into play.
Essentially, an FBO lets you create a non-default framebuffer (like the FRONT and BACK buffers) that allow you to draw to a memory buffer instead of the screen buffers. In practice, you can either draw to a texture or to a renderbuffer. The first is optimal when you want to re-use the pixels in OpenGL itself as a texture (e.g. a naive "security camera" in a game), the latter if you just want to render/read-back. With this the code above would become something like this, again pseudo-code, so don't kill me if mistyped or forgot some statements.
//Somewhere at initialization
GLuint fbo, render_buf;
glGenFramebuffers(1,&fbo);
glGenRenderbuffers(1,&render_buf);
glBindRenderbuffer(render_buf);
glRenderbufferStorage(GL_RENDERBUFFER, GL_BGRA8, width, height);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER​,fbo);
glFramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, render_buf);
//At deinit:
glDeleteFramebuffers(1,&fbo);
glDeleteRenderbuffers(1,&render_buf);
//Before drawing
glBindFramebuffer(GL_DRAW_FRAMEBUFFER​,fbo);
//after drawing
std::vector<std::uint8_t> data(width*height*4);
glReadBuffer(GL_COLOR_ATTACHMENT0);
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_BYTE,&data[0]);
// Return to onscreen rendering:
glBindFramebuffer(GL_DRAW_FRAMEBUFFER​,0);
This is a simple example, in reality you likely also want storage for the depth (and stencil) buffer. You also might want to render to texture, but I'll leave that as an exercise. In any case, you will now perform real offscreen rendering and it might work faster then reading the back buffer.
Finally, you can use pixel buffer objects to make read pixels asynchronous. The problem is that glReadPixels blocks until the pixel data is completely transfered, which may stall your CPU. With PBO's the implementation may return immediately as it controls the buffer anyway. It is only when you map the buffer that the pipeline will block. However, PBO's may be optimized to buffer the data solely on RAM, so this block could take a lot less time. The read pixels code would become something like this:
//Init:
GLuint pbo;
glGenBuffers(1,&pbo);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
glBufferData(GL_PIXEL_PACK_BUFFER, width*height*4, NULL, GL_DYNAMIC_READ);
//Deinit:
glDeleteBuffers(1,&pbo);
//Reading:
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_BYTE,0); // 0 instead of a pointer, it is now an offset in the buffer.
//DO SOME OTHER STUFF (otherwise this is a waste of your time)
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo); //Might not be necessary...
pixel_data = glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY);
The part in caps is essential. If you just issue a glReadPixels to a PBO, followed by a glMapBuffer of that PBO, you gained nothing but a lot of code. Sure the glReadPixels might return immediately, but now the glMapBuffer will stall because it has to safely map the data from the read buffer to the PBO and to a block of memory in main RAM.
Please also note that I use GL_BGRA everywhere, this is because many graphics cards internally use this as the optimal rendering format (or the GL_BGR version without alpha). It should be the fastest format for pixel transfers like this. I'll try to find the nvidia article I read about this a few monts back.
When using OpenGL ES 2.0, GL_DRAW_FRAMEBUFFER might not be available, you should just use GL_FRAMEBUFFER in that case.
I'll assume that creating a dummy window (you don't render to it; it's just there because the API requires you to make one) that you create your main context into is an acceptable implementation strategy.
Here are your options:
Pixel buffers
A pixel buffer, or pbuffer (which isn't a pixel buffer object), is first and foremost an OpenGL context. Basically, you create a window as normal, then pick a pixel format from wglChoosePixelFormatARB (pbuffer formats must be gotten from here). Then, you call wglCreatePbufferARB, giving it your window's HDC and the pixel buffer format you want to use. Oh, and a width/height; you can query the implementation's maximum width/heights.
The default framebuffer for pbuffer is not visible on the screen, and the max width/height is whatever the hardware wants to let you use. So you can render to it and use glReadPixels to read back from it.
You'll need to share you context with the given context if you have created objects in the window context. Otherwise, you can use the pbuffer context entirely separately. Just don't destroy the window context.
The advantage here is greater implementation support (though most drivers that don't support the alternatives are also old drivers for hardware that's no longer being supported. Or is Intel hardware).
The downsides are these. Pbuffers don't work with core OpenGL contexts. They may work for compatibility, but there is no way to give wglCreatePbufferARB information about OpenGL versions and profiles.
Framebuffer Objects
Framebuffer Objects are more "proper" offscreen rendertargets than pbuffers. FBOs are within a context, while pbuffers are about creating new contexts.
FBOs are just a container for images that you render to. The maximum dimensions that the implementation allows can be queried; you can assume it to be GL_MAX_VIEWPORT_DIMS (make sure an FBO is bound before checking this, as it changes based on whether an FBO is bound).
Since you're not sampling textures from these (you're just reading values back), you should use renderbuffers instead of textures. Their maximum size may be larger than those of textures.
The upside is the ease of use. Rather than have to deal with pixel formats and such, you just pick an appropriate image format for your glRenderbufferStorage call.
The only real downside is the narrower band of hardware that supports them. In general, anything that AMD or NVIDIA makes that they still support (right now, GeForce 6xxx or better [note the number of x's], and any Radeon HD card) will have access to ARB_framebuffer_object or OpenGL 3.0+ (where it's a core feature). Older drivers may only have EXT_framebuffer_object support (which has a few differences). Intel hardware is potluck; even if they claim 3.x or 4.x support, it may still fail due to driver bugs.
If you need to render something that exceeds the maximum FBO size of your GL implementation libtr works pretty well:
The TR (Tile Rendering) library is an OpenGL utility library for doing
tiled rendering. Tiled rendering is a technique for generating large
images in pieces (tiles).
TR is memory efficient; arbitrarily large image files may be generated
without allocating a full-sized image buffer in main memory.
The easiest way is to use something called Frame Buffer Objects (FBO). You will still have to create a window to create an opengl context though (but this window can be hidden).
The easiest way to fulfill your goal is using FBO to do off-screen render. And you don't need to render to texture, then get the teximage. Just render to buffer and use function glReadPixels. This link will be useful. See Framebuffer Object Examples

How to read a 3D texture from GPU memory with Pixel Buffer Objects

I'm writing data into a 3D texture from within a fragment shader, and I need to asynchronously read back said data into system memory. The only means of asynchronously initiating the packing operation into the buffer object seems to be calling glReadPixels() with a NULL pointer. But this function insists on getting passed a rectangle defining the region to read back. Now I don't know if these parameters are ignored when using PBOs, but I assume not. In this case, I have no idea what to pass to this function in order to obtain the whole 3D texture.
Even if have to read back individual slices (which would be kind of stupid IMO), I still have no idea how to communicate to OpenGL which slice to read from. Am I missing something?
BTW, I could use individual 2D textures for every slice, but that would screw up (3D-)mipmapping if I'm not mistaken. I wanted to use the 3D mipmaps in order to efficiently find regions of interest in the resulting 3D texture.
P.S. Sorry for the sub-optimal tags, apparently no one ever asked about 3d textures before and since I'm not allowed to create new tags...
Who says that glReadPixels is the only way to read image data? Maybe in OpenGL ES it is, but if you're using ES, you should say so. The rest of this answer will be assuming you're talking about desktop GL.
If you have a texture, and you want to read its contents, you should use glGetTexImage. The switch that controls whether it reads into a buffer object or not is the same switch that controls it for glReadPixels: whether a buffer is bound to GL_PIXEL_PACK_BUFFER.
Note that glGetTexImage will retrieve the entire texture (for a given mipmap level).

OpenGL: Buffer object performance issue

I have a question related to Buffer object performance. I have rendered a mesh using standard Vertex Arrays (not interleaved) and I wanted to change it to Buffer Object to get some performance boost. When I introduce buffers object I was in shock when I find out that using Buffers object lowers performance four times. I think that buffers should increase performance. Does it true? So, I think that I am doing something wrong...
I have render 3d tiled map and to reduce amount of needed memory I use only a single tile (vertices set) to render whole map. I change only texture coordinates and y value in vertex position for each tile of map. Buffers for position and texture coords are created with GL_DYNAMIC_DRAW parameter. The buffer for indices is created with GL_STATIC_DRAW because it doesn't change during map rendering. So, for each tile of map buffers are mapped and unmapped at least one time. Should I use only one buffer for texture coords and positions?
Thanks,
Try moving your vertex/texture coordinates with GL_MODELVIEW/GL_TEXTURE matrices, and leave buffer data alone (GL_STATIC_DRAW alone). e.g. if tile is of size 1x1, create rect (0, 0)-(1, 1) and set it's position in the world with glTranslate. Same with texture coordinates.
VBOs are not there to increase performance of drawing few quads. Their true power is seen when drawing meshes with thousands of polygons using shaders. If you don't need any forward compatibility with newer opengl versions, I see little use in using them to draw dynamically changing data.
If you need to update the buffer(s) each frame you should use GL_STREAM_DRAW (which hints that the buffer contents will likely be used only once) rather than GL_DYNAMIC_DRAW (which hints that they will be but used a couple of times before being updated).
As far as my experience goes, buffers created with GL_STREAM_DRAW will be treated similarly to plain ol' arrays, so you should expect about the same performance as for arrays when using it.
Also make sure that you call glMapBuffer with the access parameter set to GL_WRITE_ONLY, assuming you don't need to read the contents of the buffer. Otherwise, if the buffer is in video memory, it will have to be transferred from video memory to main memory and then back again (well, that's up to the driver really...) for each map call. Transferring to much data over the bus is a very real bottleneck that's quite easy to bump into.