OpenGL - Occlusion query depth buffer? - c++

I've just started getting into the topic of occlusion queries in OpenGL, but I'm a bit confused about how they actually work.
In most examples I've found, the depth and color masks are deactivated before drawing with the occlusion query (Because we don't need to actually 'draw' anything), in essence somewhat like this:
glDepthMask(GL_FALSE);
glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
glBeginQuery(GL_ANY_SAMPLES_PASSED,query1);
// Draw Object 1
glEndQuery(GL_ANY_SAMPLES_PASSED);
glBeginQuery(GL_ANY_SAMPLES_PASSED,query2);
// Draw Object 2
glEndQuery(GL_ANY_SAMPLES_PASSED);
// etc
glDepthMask(GL_TRUE);
glColorMask(GL_TRUE,GL_TRUE,GL_TRUE,GL_TRUE);
(It's assumed that objects are drawn front to back, so object 1 is in front of object 2. The above code is just pseudo code for the sake of this question. The result of the queries would be retrieved at a later time.)
Now, to know if object 2 is actually occluded by object 1, it would need to keep the fragment information from query 1 somehow (I'm assuming in some sort of depth buffer). But we've disabled drawing to the depth and color buffers, which means nothing is drawn, which means it shouldn't store anything anywhere?
Is there a special 'query' buffer? If so, is there a way to access it? Is it in any way connected to the currently bound texture or frame buffer? Do I need to clear it? Am I misunderstanding how occlusion queries actually work?

Now, to know if object 2 is actually occluded by object 1, it would need to keep the fragment information from query 1 somehow
Why would it? Occlusion queries store a counter of the number of samples that pass the depth test, which is a single integer.
Since you've disabled writing to the color and depth buffer, the only thing drawing the objects will do is increment the occlusion query counter*. Object 2 can't possibly occlude object 1 because drawing object 1 doesn't change the depth buffer.
* Unless you have a stencil buffer or are doing something like image load/store in shaders

Related

How to write integers alongside pixels in the framebuffer, and then use the written integer to ignore the depth buffer

What I want to do
I want to have a set triangles bleed through, or rather ignore the depth buffer, for another set triangles, but only if they have the same number.
Problem (optional reading)
I do not know how to do this without introducing a ton of bubbles into the pipeline. Right now I have very high throughput because I can throw my geometry onto the GPU, tell it to render, and forget about it. However, if I have to keep toggling the state when drawing, I'm worried I'm going to tank my performance. Other people who have done what I've just said (doing a ton of draw calls and state changes) have much worse performance than me. This performance hit is also significantly worse on older hardware, where we are talking on order of 50 - 100+ times performance loss by doing it the state-change way.
Unfortunately this triangle bleeding scenario happens many thousands of times, so the state machine will be getting flooded with "draw triangles, depth off, draw triangles that bleed through, depth on, ...", except N times, where N can get large (N >= 1000).
A good way of imagining this is having a set of triangles T_i, and a set of triangles that bleed through B_i where B_i only bleeds through T_i, and i ranges from 0...1000+. Note that if we are drawing B_100, then it should only bleed through T_100, not T_99 or T_101.
My next thought is to draw all the triangles with their integer into one framebuffer (along with the integer), then draw the bleed through triangles into another framebuffer (also with the integer), and then merge these framebuffers together. I figure they will have the color, depth, and the integer, so I can hopefully merge them in the fragment shader.
Problem is, I have no idea how to write an integer alongside the out vec4 fragColor in the fragment shader.
Questions (and in short)
This leaves me with two questions:
How do I write an integer into a framebuffer? Do I need to write to 4 separate texture framebuffers? (like one color/depth framebuffer texture, another integer framebuffer texture, and then double this so I can merge the pairs of framebuffers together at some point?)
To make this more clear, the algorithm would look like
Render all the 'could be bled from triangles', described above as set T_i,
write colors and depth info into FB1, and write integers into FB2
Render all the 'bleeding' triangles, described above as set B_i,
write colors and depth into FB3, and write integers to FB4
Bind the textures for FB1, FB2, FB3, FB4
Render each pixel by sampling the RGBA, depth, and integers
from the appropriate texture and write those out into the
final framebuffer
I would need to access the color and depth from the textures in the shader. I would also need to access the integer from the other texture. Then I can do the comparison and choose which pixel to write to the default framebuffer.
Is this idea possible? I assume if (1) is, then the answer is yes. Maybe another question could be whether there's a better way. I tried thinking of doing this with the stencil buffer but had no luck
What you want is theoretically possible, but I can't speak as to its performance. You'll be reading and writing a whole lot of texels in a lot of textures for every program iteration.
Anyway to answer your questions:
A framebuffer can have multiple color attachments by using glFramebufferTexture2D with GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1, etc. Each texture can then have its own internal format, in your example you probably want a regular RGB texture for your color output, and a second 1-integer only texture.
Your depth buffer is complicated, because you don't want to let OpenGL handle it as normal. If you want to take over the depth buffer, you probably want to attach it as yet another, float texture that you can check against or not your screen-space fragments.
If you have doubts about your shader, remember that you can bind the any number of textures as input samplers you program in code, and each color bind gets its own output value (your shader runs per-texel, so you output one value at a time). Make sure the format of your output is correct, ie vec3/vec4 for the color buffer, int for your integer buffer and float for the float buffer.
And stencil buffers won't help you turn depth checking on or off in a single (possibly indirect) draw call. I can't visualize what your bleeding thing means, but it can probably help with that? Maybe? But definitely not conditional depth checking.

OpenGL: Repeated use of transform feedback buffers overwrites already established textures

I have a working implementation of this technique for view frustum culling of instanced geometry. The gist of the technique is that we use a vertex shader to check if the bounds of an object lie within the view frustum, and if they do we output the position of that object, using a transform feedback buffer and a geometry shader, to a texture. We can then, during an actual rendering pass, use that texture, along with a query of how many positions we emitted, to acquire the relevant position data for the object we're rendering, and number of draws to specify in our call to glDrawElementsInstanced. One difference between what I do, and what the article does, is that I emit a full transformation matrix, rather than a simple position vector, to the texture, but I doubt that has any bearing on my problem.
The actual problem: Currently I have this setup so that, for each object type being rendered (i.e. tree, box, rock, whatever), the actual rendering pass follows immediately upon the frustum cull rendering pass. This works, and gives the intended results. What I want to do instead, however, is to go over all my drawcommands and do all the frustum culling for the various objects first, and only thereafter do all the actual rendering, to avoid a bunch of unnecessary state changes (i.e. switching back and forth between shader programs). When I do this, however, I encounter the problem that previously established textures -- the ones I use for reading positions from during the actual rendering passes -- all seem to be overwritten by the latest call to the frustum culling function, meaning that all textures established seemingly contain only the position information from the last frustum cull call.
For example: I render, in order, 4 trees, 10 boxes and 3 rocks, and what I will see instead is a tree, a box, and a rock, at all the (three) positions where I would expect only the 3 rocks to be. I cannot for the life of me figure out why this is, because I quite clearly bind new buffers and textures to the TRANSFORM_FEEDBACK_BUFFER every time I call the function. Why are the previously used textures still receiving the new data from the latest call?
Code, in C, for the frustum culling function:
void fcullidraw(drawcommand *tar) {
/* printf("Fculling %s\n", tar->res->name); */
mesh *rmesh = &tar->res->amod->meshes[0];
/* glDeleteTextures(1, &rmesh->ctex); */
if(rmesh->ctbuf == 0)
glGenBuffers(1, &rmesh->ctbuf);
glBindBuffer(GL_TEXTURE_BUFFER, rmesh->ctbuf);
glBufferData(GL_TEXTURE_BUFFER, sizeof(instancedata) * tar->nodraws, NULL, GL_DYNAMIC_COPY);
if(rmesh->ctex == 0)
glGenTextures(1, &rmesh->ctex);
glBindTexture(GL_TEXTURE_BUFFER, rmesh->ctex);
glTexBuffer(GL_TEXTURE_BUFFER, GL_RGBA32F, rmesh->ctbuf);
if(rmesh->cquery == 0)
glGenQueries(1, &rmesh->cquery);
checkactiveshader(tar->tar, findshader("icull"));
glEnable(GL_RASTERIZER_DISCARD);
glUniform1f(activeshader->radius, tar->res->amesh->bbox.radius);
glUniform3fv(activeshader->extent, 1, (const GLfloat*)&tar->res->amesh->bbox.ext);
glUniform3fv(activeshader->cp, 1, (const GLfloat*)&tar->res->amesh->bbox.cp);
glBindVertexArray(tar->res->amod->meshes[0].vao);
glBindBuffer(GL_ARRAY_BUFFER, tar->res->amod->meshes[0].posarray);
glBufferData(GL_ARRAY_BUFFER, sizeof(mat4_t) * tar->nodraws, tar->posarray, GL_DYNAMIC_DRAW);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, rmesh->ctbuf);
glBeginTransformFeedback(GL_POINTS);
glBeginQuery(GL_PRIMITIVES_GENERATED, rmesh->cquery);
glDrawArrays(GL_POINTS, 0, tar->nodraws);
glEndQuery(GL_PRIMITIVES_GENERATED);
glEndTransformFeedback();
glDisable(GL_RASTERIZER_DISCARD);
glGetQueryObjectuiv(rmesh->cquery, GL_QUERY_RESULT, &rmesh->visibleinstances);
}
tar and rmesh obviously vary between each call to this function. Do note that I have left in a few lines of comments here containing code to delete the buffers and textures between each rendering cycle, rather than simply overwriting them, but using that code instead has no effect on the error mode.
I'm stumped. I feel that the textures and buffers are well defined and clearly kept separate, so I do not understand how the textures from previous calls to fcullidraw are somehow still bound to and being overwritten by the TransformFeedback, if that is indeed what is happening, and it certainly seems to be, because the earlier objects will read in the entire transformation matrix of the rock quite neatly, with the "right" rotation, translation, and everything.
The article linked does do the operations in the order I want to do them -- i.e. first repeated frustum culls, and then repeated rendering -- and I'm not sure I see what I do differently. Might be some small and obvious thing, and I might be an idiot, but in that case I'd love to know why and how I am that.
EDIT: I pushed on and updated my implementation with a refinement of the original technique, suggested here, which gets rid of the writing-to-texture method altogether, in favor of instead simply writing to a buffer bound to the VAO, and set to update once per rendered instance with a VertexAttribDivisor. This method looks at lot cleaner on the whole, and incidentally had the additional side effect of not having my original problem at all, as I'm no longer writing to and uploading textures. This is, thus, no longer a practical problem for me, but the answer to the theoretical question does still elude me, so if anyone has ideas I'm still all ears.

How to draw many textured quads faster, and retain glScissor (or something like it)?

I'm using OpenGL 4 and C++11.
Currently I make a whole bunch of individual calls to glDrawElements using separate VAOs with a separate VBO and an IBO.
I do this because the texture coords change for each, and my Vertex data features the texture coords. I understand that there's some redundent position information in this vertex data; however, it's always -1,-1,1,1 because I use a translation and a scale matrix in my vertex shader to then position and scale the vertex data.
The VAO, VBO, IBO, position and scale matrix and texture ID are stored in an object. It's one object per quad.
Currently, some of the drawing would occur like this:
Draw a quad object via (glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT,0)). The bound VBO is just -1,-1,1,1 and the IBO draws me a quad. The bound VBO contains the texture coords of a common texture (same texture used to texture all drawn quads). Matrix transformations on shader position it.
Repeat with another quad object
glEnable(GL_SCISSOR_TEST) is called and the position information of the preview quad is used in a call to glScissor
Next quad object is drawn; only the parts of it visible from the previous quad are actually shown.
Draw another quad object
The performance I'm getting now is acceptable but I want it faster because I've only scratched the surface of what I have in mind. So I'm looking at optimizing. So far I've read that I should:
Remove the position information from my vertex data and just keep texture coords. Instead bind a single position VBO at the start of drawing quads so it's used by all of them.
But I'm unsure how this would work? Because I can only have one VBO active at any one time.
Would I then have to call glBufferSubData and update the texture coordinates prior to drawing each quad? Would this be better performance or worse (a call to glBindVertexArray for every object or a call to glBufferSubData?)
Would I still pass the position and scale as matrices to the shader, I would I take that opportunity to also update the position info of the vertices as well as the texture coords? Which would be faster?
Create one big VBO with or without an IBO and update the vertex data for the position (rather than use a transformation and scale matrix) of each quad within this. It seems like this would be difficult to manage.
Even if I did manage to do this; I would only have a single glDraw call; which sounds fast. Is this true? What sort of performance impact does a single glBindVertexArray call have over multiple?
I don't think there's any way to use this method to implement something like the glScissor call that I'm making now?
Another option I've read is instancing. So I draw the quad however many times I need it; which means I would pass the shader an array of translation matrices and an array of texture coords?
Would this be a lot faster?
I think I could do something like the glScissor test by passing an additional array of booleans which defines whether the current quad should be only drawn within the bounds of the previous one. However, I think this means that for each gl_InstanceID I would have to traverse all previous instances looking for true and false values, and it seems like it would be slow.
I'm trying to save time by not implementing all of these individually. Hopefully an expert can point me towards which is probably better. If anyone has an even better idea, please let me know.
You can have multiple VBO attached to different attributes!
following seqence binds 2 vbos to attribs 0 & 1, note that glBindBuffer() binds buffer temporarily and actual VBO assignment to attrib is made during glVertexAttribPointer().
glBindBuffer(GL_ARRAY_BUFFER,buf1);
glVertexAttribPointer(0, ...);
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER,buf2);
glVertexAttribPointer(1, ...);
glEnableVertexAttribArray(1);
The fastest way to provide quad positions & sizes is to use texture and sample it inside vertex shader. Of course you'd need at least RGBA (x,y,width,height) 16bits / channel texture. But then you can update quad positions using glTexSubImage2D() or you could even render them via FBO.
Everything other than that will perform slower, of course if you want we can elaborate about using uniforms, attribs in vbos or using attribs without enabled arrays for them.
Putting all together:
use single vbo, store quad id in it (int) + your texturing data
prepare x,y,w,h texture, define mapping from quad id to this texture texcoord ie: u=quad_id&0xFF , v=(quad_id>>8) (for texture 256x256 max 65536 quads)
use vertex shader to sample displacement and size from that texture (for given quad_id stored in attribute (or use vertex_ID/4 or vertex_ID/6)
fill vbo and texture
draw everything with single drawarrays of draw elements

how to handle depth in glsl

I have a problem with FBO and depth in openGL. I am passing projection, view and model matrices to a shader that writes to the g buffer. When I unbind the FBO and write to gl_FragColor the scene displays as it ought. But when I write to gl_FragData[0] then write the accompanying texture to a screen aligned quad, objects are drawn according to inverse order processed rather than depth... I can see through objects processed first to objects processed after. Has anyone had the same problem and do they know a fix? Or could someone provide syntax on reading depth values from the vertex shader, querying the current depth, then writing to the depth buffer depending on a comparison, ie, handling the operation manually in the fragment shader.
Your main frame-buffer most likely has the depth, while your manually created FBO might not have it. Therefore, when drawing to the screen you have depth-sorted geometry, while your FBO can not provide that and internally works with disabled depth testing having no storage associated with it.

C++, OpenGL Z-buffer prepass

I'm making a simple voxel engine (think Minecraft) and am currently at the stage of getting rid of occluded faces to gain some precious fps. I'm not very experimented in OpenGL and do not quite understand how the glColorMask magic works.
This is what I have:
// new and shiny
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// this one goes without saying
glEnable(GL_DEPTH_TEST);
// I want to see my code working, so fill the mask
glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);
// fill the z-buffer, or whatever
glDepthFunc(GL_LESS);
glColorMask(0,0,0,0);
glDepthMask(GL_TRUE);
// do a first draw pass
world_display();
// now only show lines, so I can see the occluded lines do not display
glPolygonMode(GL_FRONT_AND_BACK, GL_LINE);
// I guess the error is somewhere here
glDepthFunc(GL_LEQUAL);
glColorMask(1,1,1,1);
glDepthMask(GL_FALSE);
// do a second draw pass for the real rendering
world_display();
This somewhat works, but once I change the camera position the world starts to fade away, I see less and less lines until nothing at all.
It sounds like you are not clearing your depth buffer.
You need to have depth writing enabled (via glDepthMask(GL_TRUE);) while you attempt to clear the depth buffer with glClear. You probably still have it disabled from the previous frame, causing all your clears to be no-ops in subsequenct frames. Just move your glDepthMask call before the glClear.
glColorMask and glDepthMask determine, which parts of the frame buffer are actually written to.
The idea of early Z culling is, to first render only the depth buffer part first -- the actual savings come from sorting the geometry near to far, so that the GPU can quickly discard occluded fragments. However while drawing the Z buffer you don't want to draw the color component: This allows you to switch of shaders, texturing, i.e. in short everything that's computationally intense.
A word of warning: Early Z only works with opaque geometry. Actually the whole depth buffer algorithm only works for opaque stuff. As soon as you're doing blending, you'll have to sort far to near and don't use depth buffering (search for "order independent transparency" for algorithms to overcome the associated problems).
S if you've got anything that's blended, remove it from the 'early Z' stage.
In the first pass you set
glDepthMask(1); // enable depth buffer writes
glColorMask(0,0,0); // disable color buffer writes
glDepthFunc(GL_LESS); // use normal depth oder testing
glEnable(GL_DEPTH_TEST); // and we want to perform depth tests
After the Z pass is done you change the settings a bit
glDepthMask(0); // don't write to the depth buffer
glColorMask(1,1,1); // now set the color component
glDepthFunc(GL_EQUAL); // only draw if the depth of the incoming fragment
// matches the depth already in the depth buffer
GL_LEQUAL does the job, too, but also lets fragments even closer than that in the depth buffer pass. But since no update of the depth buffer happens, anything between the origin and the stored depth will overwrite it, each time something is drawn there.
A slight change of the theme is using an 'early Z' populated depth buffer as a geometry buffer in multiple deferred shading passes afterwards.
To save further geometry, take a look into Occlusion Queries. With occlusion queries you ask the GPU how many, if any fragments pass all tests. This being a voxel engine you're probably using an octree or Kd tree. Drawing the spatial dividing faces (with glDepthMask(0), glColorMask(0,0,0)) of the tree's branches before traversing the branch tells you, if any geometry in that branch is visible at all. That combined with a near to far sorted traversal and a (coarse) frustum clipping on the tree will give you HUGE performance benefits.
z-pre pass can work with translucent objects. if they are translucent, do not render them in the prepass, then zsort and render.