OpenGL degenerate GL_TRIANGLES sharing same vertices - c++

I send a VertexBuffer+IndexBuffer of GL_TRIANGLES via glDrawElements() to the GPU.
In the vertex shader I wanted snap some vertices to the same coordinates to simplify a large mesh on-the-fly.
As result I expeceted a major performance boost because a lot of triangle are collapsing to the same point and would be degenerated.
But I don't get any fps gain.
Due testing I set my vertex shader just to gl_Position(vec4(0)) to degenerate ALL triangles, but still no difference...
Is there any flag to "activate" the degeneration or what am I'm missing?
glQuery of GL_PRIMITIVES_GENERATED also prints always the number of all mesh faces.

What you're missing is how the optimization you're trying to use actually works.
The particular optimization you're talking about is post-caching of T&L. That is, if the same vertex is going to get processed twice, you only process it once and use the results twice.
What you don't understand is how "the same vertex" is actually determined. It isn't determined by anything your vertex shader could compute. Why? Well, the whole point of caching is to avoid running the vertex shader. If the vertex shader was used to determine if the value was already cached... you've saved nothing since you had to recompute it to determine that.
"The same vertex" is actually determined by matching the vertex index and vertex instance. Each vertex in the vertex array has a unique index associated with it. If you use the same index twice (only possible with indexed rendering of course), then the vertex shader would receive the same input data. And therefore, it would produce the same output data. So you can use the cached output data.
Instance ids also play into this, since when doing instanced rendering, the same vertex index does not necessarily mean the same inputs to the VS. But even then, if you get the same vertex index and the same instance id, then you would get the same VS inputs, and therefore the same VS outputs. So within an instance, the same vertex index represents the same value.
Both the instance count and the vertex indices are part of the rendering process. They don't come from anything the vertex shader can compute. The vertex shader could generate the same positions, normals, or anything else, but the actual post-transform cache is based on the vertex index and instance.
So if you want to "snap some vertices to the same coordinates to simplify a large mesh", you have to do that before your rendering command. If you want to do it "on the fly" in a shader, then you're going to need some kind of compute shader or geometry shader/transform feedback process that will compute the new mesh. Then you need to render this new mesh.
You can discard a primitive in a geometry shader. But you still had to do T&L on it. Plus, using a GS at all slows things down, so I highly doubt you'll gain much performance by doing this.

Related

OpenGL Pipeline

Does the Geometry Shader affects the buffer of vertices and indices (VBO and EBO data in GPU memory) initially specified in the cpu side?
For example, suppose I have a vertex buffer containing three vertices, each with some attributes attached to it. Suppose then these three vertices are given to the geometry shader as input and the geometry shader outputs out a large set of vertices based of off the first three where it started with, generating thus a new polygon made up of more than 3 vertices. Does this process alters the content of the element array and vertex buffers?
Only thing I know is that of course it doesn't get altered. Because otherwise the next rendering call would generate even more vertices and that would be a mess. So where does OpenGL store the new generated vertices?
Suppose then these three vertices are given to the geometry shader as input
That doesn't happen. The GS is fed by the vertex shader's outputs. GSs never have any direct contact with the initial vertex data.
So where does OpenGL store the new generated vertices?
Wherever the implementation needs to. The rasterizer hardware will generally have a small buffer for primitive data to be rasterized. That's where the GS outputs will go.
But that's an implementation detail which is not exposed to OpenGL.

OpenGL best practice for putting two different mesh in the same vertex VBO

After some searching, it is said that the separated VAOs which shares the exact same shader attribute layouts, merging these into one VAO and put all these datas into one VBO so that I can draw this objects with only one draw call.
This perfectly makes sense, but how about uniform variables? Say that I want to draw tree and ball. these have different count of vertices, have different transform but share exactly the same shader program.
Until now, My program was like
// generate VAOs, called only once
glGenVertexArray(1, &treeVaoId);
// generate VBO and bind tree vertices
// enabled vertex attribute and set it for shader program
glGenVertexArray(1, &ballVaoId);
// repeat for ball
// draw, called each frame
// give the tree's transform to shader as uniform
glDrawArrays(...) // first draw call
// repeat for ball
glDrawArrays(...) // second draw call
And with these vertices to one VAO and VBO, like:
glGenVertexArray(1, &treeBallId);
// generate enough size of VBO and bind tree vertices and ball vertices after it.
// enabled vertex attribute and set it for shader program
// to draw, I have to separately give transform to it's uniform to each tree and ball, but how?
glDrawArrays(...)
Sorry for poor example, but the point is, Is there a way for giving different uniform variable while drawing one VAO? Or, is my approach totally wrong?
The purpose of batching is to improve performance by minimizing state changes between draw calls (batching reduces them to 0, since there is nothing between draw calls). However, there are degrees of performance improvement, and not all state changes are equal.
On the scale of the costs of state changes, changing program uniforms is the least expensive state change. That's not to say that it's meaningless, but you should consider how much effort you really want to spend compared to the results you get out of it. Especially if you're not pushing hardware as fast as possible.
VAO changes (non-buffer-only changes, that is) are among the more expensive state changes, so you gained a lot by eliminating them.
As the name suggests, uniform variables cannot be changed from one shader instance to another within a draw call. Even a multi-draw call. But that doesn't mean that there's nothing that can be done.
Multi-draw functionality allows you to issue multiple draw calls in a single function call. These individual draws can get their vertex data from different parts of the vertex and index buffers. What you need is a way to communicate to your vertex shader which draw call it is taking part in, so that it can index an array of some sort to extract that draw call's per-object data.
Semi-recent hardware has access to gl_DrawID, which is the index into a multi-draw command of the particular draw call being executed. The ARB_shader_draw_parameters extension is fairly widely implemented. You can use that index to fetch per-object data from a UBO or SSBO.

How to get the indexes of vertices that was finally rendered?

What is the final stage that is still possible to return the indexes that was not clipped or culled or occluded, and that are going to be rendered?
To answer the question asked, there isn't one. All vertex processing rendering stages happen before triangle clipping. As does transform feedback. And fragment shaders don't get vertex indices; they only get the per-vertex values from the vertex processing stage, after interpolation.
In theory, you could do something like this. Your VS outputs an integer index for the vertex, taken from gl_VertexID. You would need a GS that takes the three indices and packages them together into a flat uvec3. Each output vertex would be given the same values. And then, the fragment shader could get the uvec3 and write each of those indices out to a buffer via SSBO and an atomic counter.
Of course, you'll get the same index multiple times (assuming that triangles share indices). But you can do it.
It just doesn't serve much point. Rendering part of a mesh is a lot more trouble than it's worth. For performance, it's better to render either all of it or none of it, based on its visibility. And detecting that is best done via occlusion tests on a different, less complex shape.

Render multiple models in OpenGL with a single draw call

I built a 2D graphical engine, and I created a batching system for it, so, if I have 1000 sprites with the same texture, I can draw them with one single call to openGl.
This is achieved by putting in a single vbo vertex array all the vertices of all the sprites with the same texture.
Instead of "print these vertices, print these vertices, print these vertices", I do "put all the vertices toghether, print", just to be very clear.
Easy enough, but now I'm trying to achieve the same thing in 3D, and I'm having a big problem.
The problem is that I'm using a Model View Projection matrix to place and render my models, which is the common approach to render a model in 3D space.
For each model on screen, I need to pass the MVP matrix to the shader, so that I can use it to transform each vertex to the correct position.
If I would do the transformation outside the shader, it would be executed by the cpu, which I not a good idea, for obvious reasons.
But the problem lies there. I need to pass the matrix to the shader, but for each model the matrix is different.
So I cannot do the same I did with 2d sprites, because changing a shader uniform requires a draw every time.
I hope I've been clear, maybe you have a good idea I didn't have or you already had the same problem. I know for a fact that there is a solution somewhere, because in engine like Unity, you can use the same shader for multiple models, and get away with one draw call
There exists a feature exactly like what you're looking for, and it's called instancing. With instancing, you store n matrices (or whatever else you need) in a Uniform Buffer and call glDrawElementsInstanced to draw n copies. In the shader, you get an extra input gl_InstanceID, with which you index into the Uniform Buffer to fetch the matrix you need for that particular instance.
You can read more about instancing here: https://www.opengl.org/wiki/Vertex_Rendering#Instancing
The answer depends on whether the vertex data for each item is identical or not. If it is, you can use instancing as in #orost's answer, using glDrawElementsInstanced, and gl_InstanceID within the vertex shader, and that method should be preferred.
However, if each 3D model requires different vertex data (which is frequently the case), you can still render them using a single draw call. To do this, you would add another stream into your vertex data with glVertexAttribPointer (and glEnableVertexAttribArray). This extra stream would contain the index of the matrix within the uniform buffer that vertex should use when rendering - so each mesh within the VBO would have an identical index in the extra stream. The uniform buffer contains the same data as in the instancing setup.
Note this method may require some extra CPU processing, if you need to redo the batching - for example, an object within a batch should not be rendered anymore. If this process is required frequently, it should be determined whether batching items is actually beneficial or not.
Besides instancing and adding another vertex attribute as some object ID, I'd like to also mention another strategy (which requires modern OpenGL, though):
The extension ARB_multi_draw_indirect (in core since GL 4.3) adds indirect drawing commands. These commands do source their parameters (number of vertices, starting index and so on) directly from another buffer object. With these functions, many different objects can be drawn with a single draw call.
However, as you still want some per-object state like transformation matrices, that feature is not enough. But in combination with ARB_shader_draw_parameters (not in core GL yet), you get the gl_DrawID parameter, which will be incremented by one for each single object in one mult draw indirect call. That way, you can index into some UBO, or TBO, or SSBO (or whatever) where you store per-object data.

Updating information from the vertex shader

In the vertex shader program of a WebGL application, I am doing the following:
Calculate gl_Position P using a function f(t) that varies in time.
My question is:
Is it possible to store the updated P(t) computed in the vertex shader so I can use it in the next time step? This will be useful for performing some boundary tests.
I have read some information on how textures can be used to store and updated vextex positions, but is this feasible in WebGL, since even texture access from a vertex program is unsupported in OpenGL ES 1.0?
For a more concrete example, let us say that we are trying to move a point according to the equation R(t) = (k*t, 0, 0). These positions are updated in the vertex shader, which makes the point move. Now if I want to make the point bounce at the wall located at R = (C, 0, 0). To do that, we need the position of the point at t - dt. (previous time step).
Any ideas appreciated.
Regards
In addition to the previous answers, you can circumvent vertex texture fetch by PBOs, but I do not know, if they are supported in WebGL or GLES, as I have only desktop GL experience. You write the vertex positions into the framebuffer. But then, instead of using these as vertex texture, you copy them into a vertex buffer (which works best via PBOs) and use them as a usual vertex attribute. That's the old way of doing transform feedback, which I suppose is not supported.
There's no way to store anything in the vertex shader. You can only pass values from it to the fragment shader and write those to the framebuffer pixels. And as you said, vertex texture fetch isn't universally supported (for instance, ANGLE started supporting it only a few days ago), so even that is a bit unworkable.
You can do two things: either do all the position math in JS and pass in the p1 and p0 as uniforms. Or keep track of the previous time value and do the position math twice in the shader, both for t1 and t0 (shouldn't have much of a performance impact unless you're vertex shader -bound).
Is your dt a constant? if so you could retrieve the previous position for your point by evaluating
R(t-dt). If it is not a constant then you could use a uniform to pass it along on every rendering cycle.