From inside the shader I can't modify uniforms or attributes. Is there a way I could write a variable that I can use outside the shader?
My goal is to determine the lowest and the highest vertex on the z-axis. I could of course run through all the vertices in a for loop, but the shader runs through them anyway and is faster.
Not really. The shader's output, always, is pixels. Anything other than that would violate the stream-processing nature of the GPU. You could read the result pixel values...
Related
After some searching, it is said that the separated VAOs which shares the exact same shader attribute layouts, merging these into one VAO and put all these datas into one VBO so that I can draw this objects with only one draw call.
This perfectly makes sense, but how about uniform variables? Say that I want to draw tree and ball. these have different count of vertices, have different transform but share exactly the same shader program.
Until now, My program was like
// generate VAOs, called only once
glGenVertexArray(1, &treeVaoId);
// generate VBO and bind tree vertices
// enabled vertex attribute and set it for shader program
glGenVertexArray(1, &ballVaoId);
// repeat for ball
// draw, called each frame
// give the tree's transform to shader as uniform
glDrawArrays(...) // first draw call
// repeat for ball
glDrawArrays(...) // second draw call
And with these vertices to one VAO and VBO, like:
glGenVertexArray(1, &treeBallId);
// generate enough size of VBO and bind tree vertices and ball vertices after it.
// enabled vertex attribute and set it for shader program
// to draw, I have to separately give transform to it's uniform to each tree and ball, but how?
glDrawArrays(...)
Sorry for poor example, but the point is, Is there a way for giving different uniform variable while drawing one VAO? Or, is my approach totally wrong?
The purpose of batching is to improve performance by minimizing state changes between draw calls (batching reduces them to 0, since there is nothing between draw calls). However, there are degrees of performance improvement, and not all state changes are equal.
On the scale of the costs of state changes, changing program uniforms is the least expensive state change. That's not to say that it's meaningless, but you should consider how much effort you really want to spend compared to the results you get out of it. Especially if you're not pushing hardware as fast as possible.
VAO changes (non-buffer-only changes, that is) are among the more expensive state changes, so you gained a lot by eliminating them.
As the name suggests, uniform variables cannot be changed from one shader instance to another within a draw call. Even a multi-draw call. But that doesn't mean that there's nothing that can be done.
Multi-draw functionality allows you to issue multiple draw calls in a single function call. These individual draws can get their vertex data from different parts of the vertex and index buffers. What you need is a way to communicate to your vertex shader which draw call it is taking part in, so that it can index an array of some sort to extract that draw call's per-object data.
Semi-recent hardware has access to gl_DrawID, which is the index into a multi-draw command of the particular draw call being executed. The ARB_shader_draw_parameters extension is fairly widely implemented. You can use that index to fetch per-object data from a UBO or SSBO.
I need to have some variable/object in the graphics memory that can be accessed within my fragment shader and within my normal C# code. Preferably a 16Byte vec4
What I want to do:
[In C#] Read variable from graphic memory to cpu memory
[In C#] Set variable to zero
[In C#] Execute normal drawing of my scene
[In Shader] One of the fragment passes writes something to the variable (UPDATE)
Restart the loop
(UPDATE)
I pass the current mouse coordinates to the fragment shader with uniform variables. The fragment shader then checks if it is the corresponding pixel. If yes it writes a certain color for colorpicking into the variable. The reason I dont write to a FS output is that I simply didn't find any solution on the internet on how to get this output into my normal memory. Additionaly i would have an output for each pixel instead of one
What I want is basically a uniform variable that a shader can write to.
Is there any kind of variable/object that fits my needs and if so how performant will it be?
A "uniform" that your shader can write to is the wrong term. Uniform means uniform (as in the same value everywhere). If a specific shader invocation is changing the value, it is not uniform anymore.
You can use atomic counters for this sort of thing; increment the counter for every test that passes and later check for non-zero state. This is vastly simpler than setting up a general-purpose Shader Storage Buffer Object and then worrying about making memory access to it coherent.
Occlusion queries are also available for older hardware. They work surprisingly similarly to atomic counters, where you can (very roughly) count the number of fragments that pass a depth test. Do not count on its accuracy, use discard in your fragment shader for any pixel that does not pass your test condition and then test for a non-zero fragment count in the query readback.
As for performance, as long as you can deal with a couple frames worth of latency between issuing a command and later using the result, you should be fine.
If you try to use either, an atomic counter or an occlusion query and read-back the result during the same frame, you will stall the pipeline and eliminate CPU/GPU parallelism.
I would suggest inserting a fence sync object in the command stream and then checking the status of the fence once per-frame before attempting to read results back. This will prevent stalling.
I send a VertexBuffer+IndexBuffer of GL_TRIANGLES via glDrawElements() to the GPU.
In the vertex shader I wanted snap some vertices to the same coordinates to simplify a large mesh on-the-fly.
As result I expeceted a major performance boost because a lot of triangle are collapsing to the same point and would be degenerated.
But I don't get any fps gain.
Due testing I set my vertex shader just to gl_Position(vec4(0)) to degenerate ALL triangles, but still no difference...
Is there any flag to "activate" the degeneration or what am I'm missing?
glQuery of GL_PRIMITIVES_GENERATED also prints always the number of all mesh faces.
What you're missing is how the optimization you're trying to use actually works.
The particular optimization you're talking about is post-caching of T&L. That is, if the same vertex is going to get processed twice, you only process it once and use the results twice.
What you don't understand is how "the same vertex" is actually determined. It isn't determined by anything your vertex shader could compute. Why? Well, the whole point of caching is to avoid running the vertex shader. If the vertex shader was used to determine if the value was already cached... you've saved nothing since you had to recompute it to determine that.
"The same vertex" is actually determined by matching the vertex index and vertex instance. Each vertex in the vertex array has a unique index associated with it. If you use the same index twice (only possible with indexed rendering of course), then the vertex shader would receive the same input data. And therefore, it would produce the same output data. So you can use the cached output data.
Instance ids also play into this, since when doing instanced rendering, the same vertex index does not necessarily mean the same inputs to the VS. But even then, if you get the same vertex index and the same instance id, then you would get the same VS inputs, and therefore the same VS outputs. So within an instance, the same vertex index represents the same value.
Both the instance count and the vertex indices are part of the rendering process. They don't come from anything the vertex shader can compute. The vertex shader could generate the same positions, normals, or anything else, but the actual post-transform cache is based on the vertex index and instance.
So if you want to "snap some vertices to the same coordinates to simplify a large mesh", you have to do that before your rendering command. If you want to do it "on the fly" in a shader, then you're going to need some kind of compute shader or geometry shader/transform feedback process that will compute the new mesh. Then you need to render this new mesh.
You can discard a primitive in a geometry shader. But you still had to do T&L on it. Plus, using a GS at all slows things down, so I highly doubt you'll gain much performance by doing this.
I am trying to learn tessellation shaders in openGL 4.1. I understood most of the things. I have one question.
What is gl_InvocationID?
Can any body please explain in some easy way?
gl_InvocationID has two current uses, but it represents the same concept in both.
In Geometry Shaders, you can have GL run your geometry shader multiple times per-primitive. This is useful in scenarios where you want to draw the same thing from several perspectives. Each time the shader runs on the same set of data, gl_InvocationID is incremented.
The common theme between Geometry and Tessellation Shaders is that each invocation shares the same input data. A Tessellation Control Shader can read every single vertex in the input patch primitive, and you actually need gl_InvocationID to make sense of which data point you are supposed to be processing.
This is why you generally see Tessellation Control Shaders written something like this:
gl_out [gl_InvocationID].gl_Position = gl_in [gl_InvocationID].gl_Position;
gl_in and gl_out are potentially very large arrays in Tessellation Control Shaders (equal in size to GL_PATCH_VERTICES), and you have to know which vertex you are interested in.
Also, keep in mind that you are not allowed to write to any index other than gl_out [gl_InvocationID] from a Tessellation Control Shader. That property keeps invoking Tessellation Control Shaders in parallel sane (it avoids order dependencies and prevents overwriting data that a different invocation already wrote).
Say if i wanted to build matrices inside the gpu pipeline for vertex transforms, i realized that my current implementation is quite inefficient because it rebuilds the matrices from the source material for every single vertex (while it only needs to build it once per affected vertices really). Is there any way to modify the whole array of vertices that get drawn in a single draw call? Calculating the matrices and storing them in vram doesn't seem to be a very good option since multiple vertices will be getting processed at the same time and i dont think i can sync them efficiently. The only other option i can think of is compute shader, i havent looked into its uses yet but would it be possible to have it calculate the matrices and store them in the gpu so i can access them later on when drawing?
Do you have any source code? I never calculate matrices in the shaders, normally do it on the CPU and pass them over in a constant buffer.
One way of achieving this is to precompute the matrix and send them to the shader as a uniform variable. For example, if your shaders only ever need to multiply the MVP matrix with the vertex positions, then you could pre-compute the MVP matrix outside the shader and send it as a float4x4 uniform to the shader, all the vertex shader does then is to multiply that single matrix with each vertex. It doesn't get much more optimal than that, since vertices are processed in parallel on the GPU and the GPU has instruction sets optimized for vector calculus.