OpenGL: Acquiring only a stencil buffer and no depth buffer?

OpenGL: Acquiring only a stencil buffer and no depth buffer? - c++

I would like to acquire a stencil buffer, but not suffer the overhead of an attached depth buffer if it's possible, since I wouldn't be using it. Most of the resources I've found suggest that while the stencil buffer is optional (excluding it in favour of gaining more depth buffer precision, for example) I have not seen any code that requests and successfully gets only the 8-bit stencil buffer. The most common configuration I've seen being 24 bit depth buffers with an 8 bit stencil buffer.
Is it possible to request only a stencil buffer with a color buffer?
If it is possible, Is it likely the request would be granted by most OpenGL implementations?
The OpenGL version I'm using is 2.0
edit:
The API I'm using to call OpenGL is SFML, which normally doesn't support stencil allocation for it's FBO wrapper objects, though it allows it for the display surface's framebuffer. I edited the functionality in myself, though that's where I'm stuck.
glRenderbufferStorageEXT(GL_RENDERBUFFER_EXT, GL_DEPTH24_STENCIL8_EXT, width, height));
This line decides the storage type I assume. However, GL_DEPTH24_STENCIL8_EXT is the only define I've found that specifies a stencil buffer's creation. (there's no GL_STENCIL8 or anything similar at least)

Researching GL_STENCIL_INDEX8 that was mentioned in the comments, I came across the following line in the the official OpenGL wiki, http://www.opengl.org/wiki/Framebuffer_Object_Examples#Stencil
NEVER EVER MAKE A STENCIL buffer. All GPUs and all drivers do not support an independent stencil buffer. If you need a stencil buffer, then you need to make a Depth=24, Stencil=8 buffer, also called D24S8.
Stress testing the two different allocation schemes, GL_STENCIL_INDEX8_EXT vs GL_DEPTH24_STENCIL8_EXT, the results were roughly equal, both in terms of memory usage and performance. I suspect that it padded the stencil buffer with 24bits anyway. So for sake of portability, going to just use the depth and stencil packed scheme.

Related

OpenGL: Depth Stencil buffer won't write underlying texture attached

I'm currently developing an emulator and I got some games using GL_DEPTH24_STENCIL8 (D24S8) as depth buffer. The depth buffer works fine and all but the underlying texture does not seem to be written. I've checked everything: Write mask, attachment being Depth_Stencil, etc. Still it won't write the underlying texture. It's actually interesting because while rendering the depth buffer, the depth test works, what doesn't work is writting the underlying texture. NSight assumes my texture as it were 2 textures (you can see in the pics), one is allocated to the depth buffer which is correct and the other is the one used later on pixel shader sampling. For some reason it won't write it. Things I checked:
Framebuffer has the texture attached as a Depth Stencil Attachment.
Depth Write is on when it's supposed to be.
There are no copies, reuploads of the texture.
Stencil mask is well set, everything on keep but stencil test is disabled.
Some extra info:
I used immutable storage to allocate the depth/stencil buffer.
Components set to GL_DEPTH24_STENCIL8, GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8
Things I've tried:
Flushing after draw and reupload, didn't work, texture had full zeroes.
glCopyImageSubData to another and use another, didn't work
Changing it to D32F does not work either, even though NSight shows it does, Renderdocs doesn't.

Why is depth buffers faster than depth textures?

This tutorial on shadow-mapping in OpenGL briefly mentions the difference between using a depth buffer and a depth texture (edit: to store per pixel depth information for depth testing or other purposes, such as shadow-mapping) by stating:
Depth texture. Slower than a depth buffer, but you can sample it later in your shader
However, this got me wondering why this is so. After all, both seem to be nothing more than a two-dimensional array containing some data, and the definition on Microsofts notes on graphics define them in very similar terms as such (these notes are as pointed out in a comment, not on OpenGL but another graphical engine, but the purpose of the depth-buffers/-textures seem to be quite similar -- I have have not found an equal description of the two for OpenGL depth-buffers/-textures -- for which reason I have decided to keep these articles. If someone has a link to an article describing depth buffers and depth textures in OpenGL you will be welcome to post it in the comments)
A depth buffer contains per-pixel floating-point data for the z depth of each pixel rendered.
and
A depth texture, also known as a shadow map, is a texture that contains the data from the depth buffer for a particular scene
Of course, there are a few differences between the two methods -- notably, the depth texture can be sampled later, unlike the depth buffer.
Despite these differences, I can however not see why the depth buffer should be faster to use than a depth texture, and my question is, therefore: why can't these two methods of storing the same data be equally fast (edit: when used for storing depth data for depth testing).

By "depth buffer", I will assume you mean "renderbuffer with a depth format".
Possible reasons why a depth renderbuffer might be faster to render to than a depth texture include:
A depth renderbuffer can live within specialized memory that is not shader-accessible, since the implementation knows that you can't access it from the shader.
A depth renderbuffer might be able to have a special format or layout that a depth texture cannot have, since the texture has to be shader-accessible. This could include things like Hi-Z/Hierarchical-Z and so forth.
#1 tends to crop up on tile-based architectures. If you do things right, you can keep your depth renderbuffer entirely within tile memory. That means that, after a rendering operation, there is no need to copy it out to main memory. By contrast, with a depth texture, the implementation can't be sure you don't need to copy it out, so it has to do so just to be safe.
Note that this list is purely speculative. Unless you've actually profiled it, or have some specific knowledge of hardware (as in the TBR case), there's no reason to assume that there is any substantial difference in performance.

Is the stencil buffer still relevant in modern OpenGL?

Me and a friend have been having an ongoing argument about the stencil buffer. In short I haven't been able to find a situation where the stencil buffer would provide any advantage over the programmable pipeline tools in OpenGL 3.2+. Are there any uses to the stencil buffer in modern OpenGL?
[EDIT]
Thanks everyone for all the inputs on the subject.

It is more useful than ever since you can sample stencil index textures from fragment shaders. It should not even be argued that the stencil buffer is not part of the programmable pipeline.
The depth buffer is used for simple pass/fail fragment rejection, which the stencil buffer can also do as suggested in comments. However, the stencil buffer can also accumulate information about test results over multiple passes. All sorts of logic and counting applications exist such as measuring a scene's depth complexity, constructive solid geometry, etc.

To add a recent example to Andon's answer, GTA V uses the stencil buffer kinda like an ID buffer to mark the player character, cars, vegetation etc.
It subsequently uses the stencil buffer to e.g. apply subsurface scattering only to the character or exclude him from motion blur.
See the GTA V Graphics Study (highly recommended, it's a great read!)
Edit: sure you can do this in software. But you can do rasterization or tessellation in software just as well... In the end it's about performance I guess. With depth24stencil8 you have a nice hardware-supported format, and the stencil test is most likely faster then doing discards in the fragment shader.

Just to provide one other use case, shadow volumes (aka "stencil shadows") are still very relevant: https://en.wikipedia.org/wiki/Shadow_volume
They're useful for indoor scenes where shadows are supposed to be pixel perfect, and you're less likely to have alpha-tested foliage messing up the extruded shadow volumes.
It's true that shadow maps are more common, but I suspect that stencil shadows will have a comeback once the brain dead Createive/3DLabs patent expires on the zfail method.

How to draw Renderbuffer as Texturebuffer in FBO?

I succeeded in render to texture with Texturebuffer, using VAO and shaders.
But FBO has another options for color buffer, it's Renderbuffer. I searched a lot on the internet, but cannot found any example related to draw Renderbuffer as Texturebuffer with shaders
If I ain't wrong, Renderbuffer is released in OpenGL 3.30, and it's faster than Texturebuffer.
Can I use Renderbuffer as Texturebuffer? (stupid question huh? I think it should be absolutely, isn't it?)
If yes, please lead me or give any example to draw render buffer as texture buffer.
My target is just for study, but I'd like to know is that a better way to draw textures? Should we use it frequently?

First of all, don't use the term "texture buffer" when you really just mean texture. A "buffer texture"/"texture buffer object" is a different conecpt, completely unrelated here.
If I ain't wrong, Renderbuffer is released in OpenGL 3.30, and it's faster than Texturebuffer.
No. Renderbuffers were there when FBOs were first invented. One being faster than the other is not generally true either, but these are implementation details. But it is also irrelevant.
Can I use Renderbuffer as Texturebuffer? (stupid question huh? I think it should be absolutely, isn't it?)
Nope. You cant use the contents of a renderbuffer directly as a source for texture mapping. Renderbuffesr are just abstract memory regions the GPU renders to, and they are not in the format required for texturing. You can read back the results to the CPU using glReadPixels, our you could copy the data into a texture object, e.g. via glCopyTexSubImage - but that would be much slower than directly rendering into textures.
So renderbuffers are good for a different set of use cases:
offscreen rendering (e.g. where the image results will be written to a file, or encoded to a video)
as helper buffers during rendering, like the depth buffer or stencil buffer, where you do not care anbout the final contents of these buffers anyway
as intermediate buffer when the image data can't be directly used by the follwoing steps, e.g. when using multisampling, and copying the result to a non-multisampled framebuffer or texture

It appears that you have your terminology mixed up.
You attach images to Framebuffer Objects. Those images can either be a Renderbuffer Object (this is an offscreen surface that has very few uses besides attaching and blitting) or they can be part of a Texture Object.
Use whichever makes sense. If you need to read the results of your drawing in a shader then obviously you should attach a texture. If you just need a depth buffer, but never need to read it back, a renderbuffer might be fine. Some older hardware does not support multisampled textures, so that is another situation where you might favor renderbuffers over textures.
Performance wise, do not make any assumptions. You might think that since renderbuffers have a lot fewer uses they would somehow be quicker, but that's not always the case. glBlitFramebuffer (...) can be slower than drawing a textured quad.

How to render offscreen on OpenGL? [duplicate]

This question already has answers here:
How to use GLUT/OpenGL to render to a file?
(6 answers)
Closed 9 years ago.
My aim is to render OpenGL scene without a window, directly into a file. The scene may be larger than my screen resolution is.
How can I do this?
I want to be able to choose the render area size to any size, for example 10000x10000, if possible?

It all starts with glReadPixels, which you will use to transfer the pixels stored in a specific buffer on the GPU to the main memory (RAM). As you will notice in the documentation, there is no argument to choose which buffer. As is usual with OpenGL, the current buffer to read from is a state, which you can set with glReadBuffer.
So a very basic offscreen rendering method would be something like the following. I use c++ pseudo code so it will likely contain errors, but should make the general flow clear:
//Before swapping
std::vector<std::uint8_t> data(width*height*4);
glReadBuffer(GL_BACK);
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_BYTE,&data[0]);
This will read the current back buffer (usually the buffer you're drawing to). You should call this before swapping the buffers. Note that you can also perfectly read the back buffer with the above method, clear it and draw something totally different before swapping it. Technically you can also read the front buffer, but this is often discouraged as theoretically implementations were allowed to make some optimizations that might make your front buffer contain rubbish.
There are a few drawbacks with this. First of all, we don't really do offscreen rendering do we. We render to the screen buffers and read from those. We can emulate offscreen rendering by never swapping in the back buffer, but it doesn't feel right. Next to that, the front and back buffers are optimized to display pixels, not to read them back. That's where Framebuffer Objects come into play.
Essentially, an FBO lets you create a non-default framebuffer (like the FRONT and BACK buffers) that allow you to draw to a memory buffer instead of the screen buffers. In practice, you can either draw to a texture or to a renderbuffer. The first is optimal when you want to re-use the pixels in OpenGL itself as a texture (e.g. a naive "security camera" in a game), the latter if you just want to render/read-back. With this the code above would become something like this, again pseudo-code, so don't kill me if mistyped or forgot some statements.
//Somewhere at initialization
GLuint fbo, render_buf;
glGenFramebuffers(1,&fbo);
glGenRenderbuffers(1,&render_buf);
glBindRenderbuffer(render_buf);
glRenderbufferStorage(GL_RENDERBUFFER, GL_BGRA8, width, height);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER,fbo);
glFramebufferRenderbuffer(GL_DRAW_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, render_buf);
//At deinit:
glDeleteFramebuffers(1,&fbo);
glDeleteRenderbuffers(1,&render_buf);
//Before drawing
glBindFramebuffer(GL_DRAW_FRAMEBUFFER,fbo);
//after drawing
std::vector<std::uint8_t> data(width*height*4);
glReadBuffer(GL_COLOR_ATTACHMENT0);
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_BYTE,&data[0]);
// Return to onscreen rendering:
glBindFramebuffer(GL_DRAW_FRAMEBUFFER,0);
This is a simple example, in reality you likely also want storage for the depth (and stencil) buffer. You also might want to render to texture, but I'll leave that as an exercise. In any case, you will now perform real offscreen rendering and it might work faster then reading the back buffer.
Finally, you can use pixel buffer objects to make read pixels asynchronous. The problem is that glReadPixels blocks until the pixel data is completely transfered, which may stall your CPU. With PBO's the implementation may return immediately as it controls the buffer anyway. It is only when you map the buffer that the pipeline will block. However, PBO's may be optimized to buffer the data solely on RAM, so this block could take a lot less time. The read pixels code would become something like this:
//Init:
GLuint pbo;
glGenBuffers(1,&pbo);
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
glBufferData(GL_PIXEL_PACK_BUFFER, width*height*4, NULL, GL_DYNAMIC_READ);
//Deinit:
glDeleteBuffers(1,&pbo);
//Reading:
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo);
glReadPixels(0,0,width,height,GL_BGRA,GL_UNSIGNED_BYTE,0); // 0 instead of a pointer, it is now an offset in the buffer.
//DO SOME OTHER STUFF (otherwise this is a waste of your time)
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo); //Might not be necessary...
pixel_data = glMapBuffer(GL_PIXEL_PACK_BUFFER, GL_READ_ONLY);
The part in caps is essential. If you just issue a glReadPixels to a PBO, followed by a glMapBuffer of that PBO, you gained nothing but a lot of code. Sure the glReadPixels might return immediately, but now the glMapBuffer will stall because it has to safely map the data from the read buffer to the PBO and to a block of memory in main RAM.
Please also note that I use GL_BGRA everywhere, this is because many graphics cards internally use this as the optimal rendering format (or the GL_BGR version without alpha). It should be the fastest format for pixel transfers like this. I'll try to find the nvidia article I read about this a few monts back.
When using OpenGL ES 2.0, GL_DRAW_FRAMEBUFFER might not be available, you should just use GL_FRAMEBUFFER in that case.

I'll assume that creating a dummy window (you don't render to it; it's just there because the API requires you to make one) that you create your main context into is an acceptable implementation strategy.
Here are your options:
Pixel buffers
A pixel buffer, or pbuffer (which isn't a pixel buffer object), is first and foremost an OpenGL context. Basically, you create a window as normal, then pick a pixel format from wglChoosePixelFormatARB (pbuffer formats must be gotten from here). Then, you call wglCreatePbufferARB, giving it your window's HDC and the pixel buffer format you want to use. Oh, and a width/height; you can query the implementation's maximum width/heights.
The default framebuffer for pbuffer is not visible on the screen, and the max width/height is whatever the hardware wants to let you use. So you can render to it and use glReadPixels to read back from it.
You'll need to share you context with the given context if you have created objects in the window context. Otherwise, you can use the pbuffer context entirely separately. Just don't destroy the window context.
The advantage here is greater implementation support (though most drivers that don't support the alternatives are also old drivers for hardware that's no longer being supported. Or is Intel hardware).
The downsides are these. Pbuffers don't work with core OpenGL contexts. They may work for compatibility, but there is no way to give wglCreatePbufferARB information about OpenGL versions and profiles.
Framebuffer Objects
Framebuffer Objects are more "proper" offscreen rendertargets than pbuffers. FBOs are within a context, while pbuffers are about creating new contexts.
FBOs are just a container for images that you render to. The maximum dimensions that the implementation allows can be queried; you can assume it to be GL_MAX_VIEWPORT_DIMS (make sure an FBO is bound before checking this, as it changes based on whether an FBO is bound).
Since you're not sampling textures from these (you're just reading values back), you should use renderbuffers instead of textures. Their maximum size may be larger than those of textures.
The upside is the ease of use. Rather than have to deal with pixel formats and such, you just pick an appropriate image format for your glRenderbufferStorage call.
The only real downside is the narrower band of hardware that supports them. In general, anything that AMD or NVIDIA makes that they still support (right now, GeForce 6xxx or better [note the number of x's], and any Radeon HD card) will have access to ARB_framebuffer_object or OpenGL 3.0+ (where it's a core feature). Older drivers may only have EXT_framebuffer_object support (which has a few differences). Intel hardware is potluck; even if they claim 3.x or 4.x support, it may still fail due to driver bugs.

If you need to render something that exceeds the maximum FBO size of your GL implementation libtr works pretty well:
The TR (Tile Rendering) library is an OpenGL utility library for doing
tiled rendering. Tiled rendering is a technique for generating large
images in pieces (tiles).
TR is memory efficient; arbitrarily large image files may be generated
without allocating a full-sized image buffer in main memory.

The easiest way is to use something called Frame Buffer Objects (FBO). You will still have to create a window to create an opengl context though (but this window can be hidden).

The easiest way to fulfill your goal is using FBO to do off-screen render. And you don't need to render to texture, then get the teximage. Just render to buffer and use function glReadPixels. This link will be useful. See Framebuffer Object Examples

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js