What is the final stage that is still possible to return the indexes that was not clipped or culled or occluded, and that are going to be rendered?
To answer the question asked, there isn't one. All vertex processing rendering stages happen before triangle clipping. As does transform feedback. And fragment shaders don't get vertex indices; they only get the per-vertex values from the vertex processing stage, after interpolation.
In theory, you could do something like this. Your VS outputs an integer index for the vertex, taken from gl_VertexID. You would need a GS that takes the three indices and packages them together into a flat uvec3. Each output vertex would be given the same values. And then, the fragment shader could get the uvec3 and write each of those indices out to a buffer via SSBO and an atomic counter.
Of course, you'll get the same index multiple times (assuming that triangles share indices). But you can do it.
It just doesn't serve much point. Rendering part of a mesh is a lot more trouble than it's worth. For performance, it's better to render either all of it or none of it, based on its visibility. And detecting that is best done via occlusion tests on a different, less complex shape.
Related
I'm starting to learn openGL (working with version 3.3) with intent to get a small 3d falling sand simulation up, akin to this:
https://www.youtube.com/watch?v=R3Ji8J2Kprw&t=41s
I have a little experience with setting up a voxel environment like Minecraft from some Udemy tutorials for Unity, but I want to build something simple from the ground up and not deal with all the systems already laid on top of things with Unity.
The first issue I've run into comes early. I want to build a system for rendering quads, because instancing a ton of cubes is ridiculously inefficient. I also want to be efficient with storage of vertices, colors, etc. Thus far in the opengl tutorials I've worked with the way to do this is to store each vertex in a float array with both position and color data, and set up the buffer object to read every set of six entries as three floats for position and three for color, using glVertexAttribPointer. The problem is that for each neighboring quad, the same vertices will be repeated because if they are made of different "blocks" they will be different colors, and I want to avoid this.
What I want to do instead to make things more efficient is store the vertices of a cube in one int array (positions will all be ints), then add each quad out of the terrain to an indices array (which will probably turn into each chunk's mesh later on). The indices array will store each quad's position, and a separate array will store each quad's color. I'm a little confused on how to set this up since I am rather new to opengl, but I know this should be doable based on what other people have done with minecraft clones, if not even easier since I don't need textures.
I just really want to get the framework for the chunks, blocks, world, etc, up and running so that I can get to the fun stuff like adding new elements. Anyone able to verify that this is a sensible way to do this (lol) and offer guidance on how to set this up in the rendering, I would very much appreciate.
Thus far in the opengl tutorials I've worked with the way to do this is to store each vertex in a float array with both position and color data, and set up the buffer object to read every set of six entries as three floats for position and three for color, using glVertexAttribPointer. The problem is that for each neighboring quad, the same vertices will be repeated because if they are made of different "blocks" they will be different colors, and I want to avoid this.
Yes, and perhaps there's a reason for that. You seem to be trying to save.. what, a few bytes of RAM? Your graphics card has 8GB of RAM on it, what it doesn't have is a general processing unit or an unlimited bus to do random lookups in other buffers for every single rendered pixel.
The indices array will store each quad's position, and a separate array will store each quad's color.
If you insist on doing it this way, nothing's stopping you. You don't even need the quad vertices, you can synthesize them in a geometry shader.
Just fill in a buffer with X|Y|Width|Height|Color(RGB) with glVertexAttribPointer like you already know, then run a geometry shader to synthesize two triangles for each entry in your input buffer (a quad), then your vertex shader projects it to world units (you mentioned integers, so you're not in world units initially), and then your fragment shader can color each rastered pixel according to its color entry.
ridiculously inefficient
Indeed, if that sounds ridiculously inefficient to you, it's because it is. You're essentially packing your data on the CPU, transferring it to the GPU, unpacking it and then processing it as normal. You can skip at least two of the steps, and even more if you consider that vertex shader outputs get cached within rasterized primitives.
There are many more variations of this insanity, like:
store vertex positions unpacked as normal, and store an index for the colors. Then store the colors in a linear buffer of some kind (texture, SSBO, generic buffer, etc) and look up each color index. That's even more inefficient, but it's closer to the algorithm you were suggesting.
store vertex positions for one quad and set up instanced rendering with a multi-draw command and a buffer to feed individual instance data (positions and colors). If you also have textures, you can use bindless textures for each quad instance. It's still rendering multiple objects, but it's slightly more optimized by your graphics driver.
or just store per-vertex data in a buffer and render it. Done. No pre-computations, no unlimited expansions, no crazy code, you have your vertex data and you render it.
I send a VertexBuffer+IndexBuffer of GL_TRIANGLES via glDrawElements() to the GPU.
In the vertex shader I wanted snap some vertices to the same coordinates to simplify a large mesh on-the-fly.
As result I expeceted a major performance boost because a lot of triangle are collapsing to the same point and would be degenerated.
But I don't get any fps gain.
Due testing I set my vertex shader just to gl_Position(vec4(0)) to degenerate ALL triangles, but still no difference...
Is there any flag to "activate" the degeneration or what am I'm missing?
glQuery of GL_PRIMITIVES_GENERATED also prints always the number of all mesh faces.
What you're missing is how the optimization you're trying to use actually works.
The particular optimization you're talking about is post-caching of T&L. That is, if the same vertex is going to get processed twice, you only process it once and use the results twice.
What you don't understand is how "the same vertex" is actually determined. It isn't determined by anything your vertex shader could compute. Why? Well, the whole point of caching is to avoid running the vertex shader. If the vertex shader was used to determine if the value was already cached... you've saved nothing since you had to recompute it to determine that.
"The same vertex" is actually determined by matching the vertex index and vertex instance. Each vertex in the vertex array has a unique index associated with it. If you use the same index twice (only possible with indexed rendering of course), then the vertex shader would receive the same input data. And therefore, it would produce the same output data. So you can use the cached output data.
Instance ids also play into this, since when doing instanced rendering, the same vertex index does not necessarily mean the same inputs to the VS. But even then, if you get the same vertex index and the same instance id, then you would get the same VS inputs, and therefore the same VS outputs. So within an instance, the same vertex index represents the same value.
Both the instance count and the vertex indices are part of the rendering process. They don't come from anything the vertex shader can compute. The vertex shader could generate the same positions, normals, or anything else, but the actual post-transform cache is based on the vertex index and instance.
So if you want to "snap some vertices to the same coordinates to simplify a large mesh", you have to do that before your rendering command. If you want to do it "on the fly" in a shader, then you're going to need some kind of compute shader or geometry shader/transform feedback process that will compute the new mesh. Then you need to render this new mesh.
You can discard a primitive in a geometry shader. But you still had to do T&L on it. Plus, using a GS at all slows things down, so I highly doubt you'll gain much performance by doing this.
I am using OpenGL ES2 to render a fairly large number of mostly 2d items and so far I have gotten away by sending a premultiplied model/view/projection matrix to the vertex shader as a uniform and then multiplying my vertices with the resulting MVP in there.
All items are batched using texture atlases and I use one MVP per batch. So all my vertices are relative to the translation of that MVP.
Now I want to have rotation and scaling for each of the separate items, which means I need a different model for each of them. So I modified my vertex to include the model (16 floats!) and added a mat4 attribute in my shader and it all works well. But I'm kinda dissapointed with this solution since it dramatically increased the vertex size.
So as I was staring at my screen trying to think of a different solution I thought about transforming my vertices to world space before I send them over to the shader. Or even to screen space if its possible. The vertices I use are unnormalized coordinates in pixels.
So the question is, is such a thing possible? And if yes how do you do it? I can't think why it shouldn't be since its just maths but after a fairly long search on google, it doesn't look like a lot of people are actually doing this...
Strange cause if it is indeed possible, it would be quite a major optimization in cases like this one.
If the number of matrices per batch are limited then you can pass all those matrices as uniforms (preferably in a UBO) and expand the vertex data with an index which specifies which matrix you need to use.
This is similar to GPU skinning used for skeletal animation.
I have a list of vertices and their arrangement into triangles as well as the per-triangle normalized normal vectors.
Ideally, I'd like to do as little work as possible in somehow converting the (triangle,normal) pairs into (vertex,vertex_normal) pairs that I can stick into my VAO. Is there a way for OpenGL to deal with the face normals directly? Or do I have to keep track of each face a given vertex is involved in (which more or less happens already when I calculate the index buffers) and then manually calculate the averaged normal at the vertex?
Also, is there a way to skip per-vertex normal calculation altogether and just find a way to inform the fragment shader of the face-normal directly?
Edit: I'm using something that should be portable to ES devices so the fixed-function stuff is unusable
I can't necessarily speak as to the latest full-fat OpenGL specifications but certainly in ES you're going to have to do the work yourself.
Although the normal was modal under the old fixed pipeline like just about everything else, it was attached to each vertex. If you opted for the flat shading model then GL would use the colour at the first vertex on the face across the entire thing rather than interpolating it. There's no way to recreate that behaviour under ES.
Attributes are per vertex and uniforms are — at best — per batch. In ES there's no way to specify per-triangle properties and there's no stage of the rendering pipeline where you have an overview of the geometry when you could distribute them to each vertex individually. Each vertex is processed separately, varyings are interpolation and then each fragment is processed separately.
I have some vertex data. Positions, normals, texture coordinates. I probably loaded it from a .obj file or some other format. Maybe I'm drawing a cube. But each piece of vertex data has its own index. Can I render this mesh data using OpenGL/Direct3D?
In the most general sense, no. OpenGL and Direct3D only allow one index per vertex; the index fetches from each stream of vertex data. Therefore, every unique combination of components must have its own separate index.
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
Your best bet is to simply accept that your data will be larger. A great many model formats will use multiple indices; you will need to fixup this vertex data before you can render with it. Many mesh loading tools, such as Open Asset Importer, will perform this fixup for you.
It should also be noted that most meshes are not cubes. Most meshes are smooth across the vast majority of vertices, only occasionally having different normals/texture coordinates/etc. So while this often comes up for simple geometric shapes, real models rarely have substantial amounts of vertex duplication.
GL 3.x and D3D10
For D3D10/OpenGL 3.x-class hardware, it is possible to avoid performing fixup and use multiple indexed attributes directly. However, be advised that this will likely decrease rendering performance.
The following discussion will use the OpenGL terminology, but Direct3D v10 and above has equivalent functionality.
The idea is to manually access the different vertex attributes from the vertex shader. Instead of sending the vertex attributes directly, the attributes that are passed are actually the indices for that particular vertex. The vertex shader then uses the indices to access the actual attribute through one or more buffer textures.
Attributes can be stored in multiple buffer textures or all within one. If the latter is used, then the shader will need an offset to add to each index in order to find the corresponding attribute's start index in the buffer.
Regular vertex attributes can be compressed in many ways. Buffer textures have fewer means of compression, allowing only a relatively limited number of vertex formats (via the image formats they support).
Please note again that any of these techniques may decrease overall vertex processing performance. Therefore, it should only be used in the most memory-limited of circumstances, after all other options for compression or optimization have been exhausted.
OpenGL ES 3.0 provides buffer textures as well. Higher OpenGL versions allow you to read buffer objects more directly via SSBOs rather than buffer textures, which might have better performance characteristics.
I found a way that allows you to reduce this sort of repetition that runs a bit contrary to some of the statements made in the other answer (but doesn't specifically fit the question asked here). It does however address my question which was thought to be a repeat of this question.
I just learned about Interpolation qualifiers. Specifically "flat". It's my understanding that putting the flat qualifier on your vertex shader output causes only the provoking vertex to pass it's values to the fragment shader.
This means for the situation described in this quote:
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
You can have 8 vertexes, 6 of which contain the unique normals and 2 of normal values are disregarded, so long as you carefully order your primitives indices such that the "provoking vertex" contains the normal data you want to apply to the entire face.
EDIT: My understanding of how it works: