Execute fragment shader for visible pixels only - opengl

I have a computation to be done for each pixel of output, and it uses some data passed from earlier steps in shader pipeline - so it makes most sense to execute this computation in the fragment shader. To see whether it's possible at all I started with the simplest example - just count pixels for each primitive. This requires only two shaders - vertex shader:
#version 430
in vec3 position;
void main() {{
gl_Position = vec4(position, 1);
}}
and fragment shader:
#version 430
layout(early_fragment_tests) in;
out vec4 out_color;
layout(std430, binding = 3) buffer out_data {
int data[];
};
void main() {{
atomicAdd(data[gl_PrimitiveID], 1);
out_color = vec4(1, gl_PrimitiveID, 0, 1);
}}
As you can see it just increments an element of shader storage buffer object.
Then I feed it with two triangles (6 points): [-1, -1, 0], [-1, 1, 0], [1, 1, 0], [-1, -1, 0], [1, -1, 0], [-1, 1, -1]. It correctly displays a red triangle and a green triangle, each of them taking exactly a half of the window, but the green triangle is on top - so there is only half of the red triangle visible (1/4 of window).
Of course I expected that the counts will be about 1/4 of window size for red triangle and about 1/2 for green one - but they are both equal to 1/2! Btw, if I set all input Z coordinates to zero, then red triangle is on top and green is half-hidden - in that case counts are correct.
While reading OpenGL docs (where I found the option early_fragment_tests) I understood that fragments discarded for any reason (e.g. depth test as in my case) don't affect atomic counters and SSBOs - see here. But as my example show, they clearly affect them! Is there anything else which can fix the issue?
If that's important, I ran it under linux using an intel skylake iGPU, OpenGL 4.3.

The order of processing fragments from separate triangles is largely undefined. Early fragment tests do not force the fragments of each triangle to be processed in rasterizer order. While OpenGL does mostly work under the "as-if" rule (ie: everything works "as-if" it were processed in order), the incoherent nature of Image Load/Store and SSBO memory accesses means that you cannot rely on in-order processing.
If you want to issue a batch of work and have the FS execute only once for each fragment, then you're going to have to do a depth pre-pass. First, you render the scene, but without an FS at all; this means that the only thing that gets written are depth values. Next, you render the scene as normal.
Between the two of those should be some form of synchronization that ensures that all triangles in the first pass have finished before the second pass starts. Unfortunately, OpenGL doesn't have a decent way to ask for that. You could use a fence sync object with glWaitSync, but that requires an explicit glFlush call, which isn't exactly cheap.

Related

Read write an image with the GL_ARB_fragment_shader_interlock extension

I am testing whether the GL_ARB_fragment_shader_interlock extension can execute the critical section code of the same pixel position in the order of instance rendering. I used instanced rendering to draw five translucent planes (instances in order from farthest to nearer) and the result is the same as the fixed pipeline blending result (the blending result is random without this extension). But one problem is that there will be an extra line in the middle of each plane (not when using fixed pipeline blending). I found that the adjacent sides of the triangle generating the fragments twice. But rasterization should ensure that adjacent triangles do not have overlapping pixels. How to solve this please? I don't know where I went wrong, here is the code and result, please enlighten me!
The GLSL code:
#version 450 core
#extension GL_ARB_fragment_shader_interlock : require
//out vec4 Color_;
in vec4 v2f_Color;
layout(binding = 0, rgba16f) uniform image2D uColorTex;
void main()
{
beginInvocationInterlockARB();
vec4 color = imageLoad(uColorTex, ivec2(gl_FragCoord.xy));
color = (1 - v2f_Color.a) * color + v2f_Color * v2f_Color.a;
imageStore(uColorTex, ivec2(gl_FragCoord.xy), color);
endInvocationInterlockARB();
//Color_ = v2f_Color;
}
Tthe result using extension to read write an image manually:
The result using fixed pipeline blending:
I am a bit suspicious of the ivec2(gl_FragCoord.xy). This is doing a conversion from floating-point to an integer, and it could be the case that it gets rounded in different directions between the different triangles. This might explain not only the overlap, but why there's a missing top-left pixel in one of the squares.
Judging by the spec, ivec2(gl_FragCoord.xy) should be equivalent to ivec2(trunc(gl_FragCoord.xy)), which really ought to be consistent, but maybe the implementation is bad...
You might want to try:
ivec2(round(gl_FragCoord.xy))
ivec2(round(gl_FragCoord.xy + 0.5))

Is gl_FragDepth equal gl_FragCoord.z when msaa enable?

I know gl_FragDepth will take the value of gl_FragCoord.z from opengl wiki.
https://www.khronos.org/opengl/wiki/Fragment_Shader/Defined_Outputs
But I have a problem. If I enable MSAA and write gl_FragDepth = gl_FragCoord.z in fragment shader, the display will not work fine. You can see a black line on the white triangle as below:
If I don't write gl_FragDepth in fragment shader, it will works fine.
If I disable MSAA, it also works fine no matter if I write gl_FragDepth.
The correct display image has no black line:
The render scene is easy, I just draw 2 white triangles and they are intersected on an edge.
I add a simple light in vertex shader. The codes show as below:
const char *vertexShaderSource[] = {
"#version 120\n",
"varying vec4 lightColor;\n",
"void main()\n",
"{\n",
" vec3 n = normalize(gl_NormalMatrix * gl_Normal);\n",
" vec3 l = normalize(vec3(0.0, 1.0, 1.0));\n",
" float NdotL = clamp(dot(n, l), 0.001, 1.0);\n",
" lightColor = vec4(1.0)*(NdotL + 0.2);\n",
" gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;\n",
"}\n"
};
const char *fragmentShaderSource[] = {
"#version 120\n",
"varying vec4 lightColor;\n",
"void main(void)\n",
"{\n",
" gl_FragColor = vec4(lightColor.rgb, 1.0);\n",
" gl_FragDepth = gl_FragCoord.z;\n"
"}\n"
};
The positions of 6 vertices of 2 triangles are (-5,-5,0),(5,-5,0),(-5,5,0),(-5,0,0),(5,0,0),(-5,0,-10).
The normals are perpendicular to triangles.
I wanna know why the display images are different if I write gl_FragDepth in fragment shader?
Your two triangles intersect. Specifically, the grey triangle has an edge which can generate depth values equal to the depth values of the white triangle. As such, it is entirely possible for a particular sample from the grey triangle at that intersection to generate a depth value that is equal to the depth value of the white triangle.
So you were never guaranteed to not see a line there; you just happened not to many cases.
However, that all assumes that:
The grey triangle is being rendered after the white one.
Your depth test will pass on equal values.
The result you are getting here may happen even outside of these two conditions. The reason for that is complex.
See, the whole point of multisampling is that the number depth values generated by the rasterizer and the number of fragment shader executions are not the same. So a single FS invocation is mapped to multiple depth values.
However, a single FS invocation can still write to gl_FragDepth. If it does this, then all samples that map to that FS invocation will receive the same depth. This depth overrides the multisample-generated depth values.
Also, interpolation at the edges of a primitive is weird under multisampling. Each sample that is within the bounds of the triangle at that pixel will result in a sample value being written (unless something else culls it out). But the center point of the pixel need not be one of these sample locations. So a triangle that doesn't pass through the center of a pixel can still contribute some samples to the pixel, so long as the triangle passes through at least one sample in that pixel.
The fragment shader gets interpolated values based on some location inside the pixel. With multisampling, this location may not be inside of the triangle. For example, if the location the implementation selects for the FS's interpolation within the pixel is in the center of the triangle, and the triangle doesn't pass through the center of that pixel, you will still get an FS invocation so long as it passes through some sample.
But this means that the interpolated values can represent locations outside of the area of the triangle. The interpolation math can produce values for areas not within the triangle; they just don't make sense.
gl_FragCoord, being an interpolated value, could therefore generate values outside of the triangle. Since the grey triangle is aimed towards the viewer, the values from locations "above" the oncoming edge of the grey triangle will be closer than they should be. And since the edge of the grey triangle intersects the white triangle, values closer than its actual edge values will be considered closer than the white triangle
The normal way to counter this would be to use the centroid interpolation qualifier. However, the standard doesn't really allow this; even if you redeclared gl_FragCoord with the centroid qualifier, it won't have any effect:
The use of centroid does not further restrict this value to be inside the current primitive.
Also, as previously stated, depth-replacement in regular multisampled rendering destroys all of the per-sample depth information anyway. Every sample in a pixel would get the same depth value if your FS writes to the depth. That's not really what you wanted, even if you could do centroid interpolation of gl_FragCoord (which is probably why they don't allow it).
So if it is absolutely essential to do depth-replacement in a shader used for multisampling (and you should avoid this whenever possible), you will need to use per-sample shading. You can redeclare gl_FragCoord with sample to achieve this.

MSAA and vertex interpolation cause out-of-range values

I am using GLSL with a vertex shader and a fragment shader.
The vertex shader outputs a highp float in the range of [0,1]
When it arrives in the fragment shader, I see values (at triangle edges) that exceed 1.1 no less!
This issue goes away if I...
Either Disable MSAA
Or disable the interpolation by using the GLSL interpolation qualifier flat.
How can a clamped 0-to-1 high precision float arrive in fragment shader as a value that is substantially larger than 1, if MSAA is enabled?
vertex shader code:
out highp float lightcontrib2;
...
lightcontrib2 = clamp( irrad, 0.0, 1.0 );
fragment shader code:
in highp float lightcontrib2;
...
if (lightcontrib2>1.1) { fragColor = vec4(1,0,1,1); return; }
And sure enough, with MSAA 4x, this is the image generated by OpenGL. (Observe the magneta coloured pixels in the centre of the window.)
I've ruled out Not-A-Number values.
GL_VERSION: 3.2.0 NVIDIA 450.51.06
How can a clamped 0-to-1 high precision float arrive in fragment shader as a value that is substantially larger than 1, if MSAA is enabled?
Multisampling at its core is a variation of supersampling: of taking multiple samples from a pixel-sized area of a primitive. Different locations within the space of that pixel-sized area are sampled to produce the resulting value.
When you're at the edge of a primitive however, some of the locations in that pixel-sized area are outside of the area that the primitive actually covers. In supersampling that's fine; you just don't use those samples.
However, multisampling is different. In multisampling, the depth samples are distinct from the fragment shader generated samples. That is, the system might execute the FS only once, but take 4 depth samples and test them against 4 samples in the depth buffer. Any samples that pass the depth test get their color values from the single FS invocation that was executed. If some of those 4 depth samples are outside of the primitive's area, that's fine; they don't count.
However, by divorcing the FS invocation values from the depth sampling, we now encounter an issue: exactly where did that single FS invocation execute within the pixel area?
And that's where we encounter the problem. If the FS invocation executes on a location that's outside of the area of the primitive, that normally gets tossed away. But if any depth samples are within the area of the primitive, then those depth samples still need to get color data. And the whole point of MSAA is to not execute the FS for each sample, so they may get their color data from an FS invocation executed on a different location.
Ideally, it would be from an FS invocation executed on a location within the primitive's area. But hardware can't guarantee that. Well, it can't guarantee it by default, at any rate. Not every algorithm has issues if an FS location happens to fall slightly outside of the primitive's area.
But some algorithms do have issues. This is why we have the centroid qualifier for fragment shader inputs. It ensures that a particular interpolated value will be generated within the area of the primitive.
As you might have guessed, this isn't the default because it's slower than non-centroid interpolation. So use it only when you need it.

How exactly does indexing work?

From my understanding, indexing or IBOs in OpenGL are mainly used to reduce the number of vertices needed to draw for a given geometry. I understand that with an Index Buffer, OpenGL only draws the vertices with the given indexes and skips any other vertices. But doesn't that eliminate the possibility to use texturing? As far as i am aware, if you skip vertices with index buffers, it also skips their vertex attributes? If i have my vertex attributes set like this:
attribute vec4 v_Position;
attribute vec2 v_TexCoord;
and then use an index buffer and glDrawElements(...), wont that eliminate the usage of texturing, or does v_Position get "reused"? if they don't, how can i texture when using an index buffer?
I think you are misunderstanding several key terms.
"Vertex attributes" are the data that defines each individual vertex. While these include texture coordinates, they also include position. In fact, at least if you are not using fixed-function, the meaning of vertex attributes is entirely arbitrary; their meaning is defined by how the vertex shader uses and/or forwards them to following shader stages.
As such, there is no difference between how position, texture coordinates, and any other vertex attribute are forwarded to the vertex shader. They are all parsed exactly the same no matter how indexes are used (or not used).
An example vertex shader:
layout(location = 0) in vec4 position;
layout(location = 1) in vec2 uvAttr;
out vec2 uv;
void main( )
{
uv = uvAttr;
gl_Position = position;
}
And the beginning of the fragment shader to which the above is paired:
in vec2 uv;
The output of vertex shaders is, as you can see, based on the vertex attributes. That output is then interpolated across the faces generated by primitive assembly, before sending it to fragment shaders. Primitive assembly is the main place where indexes come into play: indexes determine how the vertex shader output is used to create actual geometry. That geometry is then broken up into fragments, which are what actually affect the rendering output. Outputs from the vertex shader become inputs to the fragment shader.
After the vertex shader, the vertex attributes cease being defined. Only if you forward them, as above, can they be accessed for use in something like texturing. So, you are not even using the vertex attribute itself as a texture coordinate in the first place: you're using a variable output by the vertex shader and interpolated in primitive assembly/rasterization.
"if you skip vertices with index buffers, it also skips their vertex attributes"
Yes - it totally ignores the vertex: texture coordinates, position, and whatever else you have defined for that vertex. But only the skipped vertex. The rest continue to be processed normally as if the skipped vertex never existed.
For example. Let us say for the sake of argument I have 5 vertexes. I have these ordered into a bow-tie shape as you can see below. Each vertex has position (a 2 component vector of just x and y) and a single component "brightness" to be used as a color. The center vertex of the bow tie is only defined once, but referenced via indexes twice.
The vertex attributes are:
[(1, 1), 0.5], aka [(x, y), brightness]
[(1, 5), 0.5]
[(3, 3), 0.0]
[(5, 5), 0.5]
[(5, 1), 0.5]
The indexes are: 1, 2, 3, 4, 5, 3.
Note that in this example, the "brightness" might as well stand in for your UV(W) coordinates. It would be interpolated similarly, just as a vector. As I said before, the meaning of vertex attributes is arbitrary.
Now, since you're asking about skipping vertexes, here is what the output would be if I changed the indexes to 1, 2, 4:
And this would be 1, 2, 3:
See the pattern here? OpenGL is concerned with the vertexes that makes up the faces it generates, nothing else. Indexes merely change how those faces are assembled (and can enable it to skip unneeded vertexes being calculated entirely). They have no impact on the meaning of the vertexes that are used and do go into the faces. If the black vertex #3 is skipped, it does not contribute to any face, because it is not part of any face.
As an aside, the standard allows implementations to re-use vertex shader output within single draw calls. So, you should expect that using the same index repeatedly will probably not result in additional vertex shader calls. I say "probably not" because what your driver actually does is always going to be voodoo.
Note that in this I have intentionally ignored tesselation and geometry shaders. Those are a topic beyond the scope of this question, but can have some interesting implications for how vertex attributes are handled. I also ignored the fact that the ordering of vertexes can be accessed to a degree in shaders, and thus might impact output.
Index buffer is used for speed.
With index buffer, vertex cache is used to store recently transformed vertices. During transformation, if vertex pointed by index is already transformed and available in vertex cache, it is reused otherwise, vertex is transformed. Without index buffer, vertex cache cannot be utilized so vertices always get transformed. That is why it is important to order your indices to maximize vertex cache hits.
Index buffer is also used for reducing memory footprint.
Single vertex data is usually quite large. For example: to store single precision floating point of position data (x, y, z) requires 12 bytes (assuming that each float requires 4 bytes). This memory requirement gets bigger if you include vertex color, texture coordinate or vertex normal.
If you have a quad composed of two triangles with each vertex consist of position data only (x, y, z). Without index buffer, you require 6 vertices (72 bytes) to store a quad. With 16-bit index buffer, you only need 4 vertices (48 bytes)+ 6 indices (6*2 bytes = 12 bytes) = 60 bytes to store a quad. With index buffer, this memory requirement gets smaller if you have many shared vertices.

OpenGL render-to-texture-via-FBO -- incorrect display vs. normal Texture

off-screen rendering to a texture-bound offscreen framebuffer object should be so trivial but I'm having a problem I cannot wrap my head around.
My full sample program (2D only for now!) is here:
http://pastebin.com/hSvXzhJT
See below for some descriptions.
I'm creating an rgba texture object 512x512, bind it to an FBO. No depth or other render buffers are needed at this point, strictly 2D.
The following extremely simple shaders render to this texture:
Vertex shader:
varying vec2 vPos; attribute vec2 aPos;
void main (void) {
vPos = (aPos + 1) / 2;
gl_Position = vec4(aPos, 0.0, 1.0);
}
In aPos this just gets a VBO containing 4 xy coords for a quad (-1, -1 :: 1, -1 :: 1, 1 :: -1, 1)
So although the framebuffer resolution should theoretically by 512x512 obviously the shader renders its "texture" onto a "full-(off)screen quad", following GLs -1..1 coords paradigm.
Fragment shader:
varying vec2 vPos;
void main (void) {
gl_FragColor = vec4(0.25, vPos, 1);
}
So it sets a fully opaque color with red fixed at 0.25 and green/blue depending on x/y anywhere between 0 and 1.
At this point my assumption is that a 512x512 texture is rendered showing only the -1..1 full-(off)screen quad, fragment-shaded for green/blue from 0..1.
So this is my off-screen setup. On-screen, I have another real visible full-screen quad with 4 xyz coords { -1, -1, 1 ::: 1, -1, 1 ::: 1, 1, 1 ::: -1, 1, 1 }. Again, for now this is 2D so no matrices and so z is always 1.
This quad is drawn by a different shader, simply rendering a given texture, text-book GL-101 style. In my sample program linked above I have a simple boolean toggle doRtt, when this is false (the default), render-to-texture is not performed at all and this shader simply shows uses texture.jpg from the current directory.
This doRtt=false mode shows that the second on-screen quad-renderer is "correct" for my current requirements and performs the texturing as I want it to: repeated twice vertically and twice horizontally (later will be clamped, repeat is just for testing here), otherwise scaling with NO texture filtering or mipmapping.
So no matter how the window (and thus view port) is resized, we always see a full-screen quad with a single texture repeated twice horizontally, twice vertically.
Now, with doRtt=true, the second shader still does its job but the texture is never fully correctly scaled -- or drawn, this I'm not sure since unfortunately we can't just say "hey gl save this FBO to disk for debugging purposes".
The RTT shader DOES perform some partial rendering (or maybe a full one, again can't be sure what's happening off-screen...) Especially when you resize the viewport a lot smaller than the default size you see the breaks between the texture repeats, and not all colors to be expected from our very simple RTT fragment shader are indeed shown.
(A) either: the 512x512 texture is created correctly but not mapped correctly by my code (but then why is with doRtt=false any given texture.jpg file using the exact same simple textured-quad-shader showing just fine?)
(B) or: the 512x512 texture is not rendered correctly and somehow the rtt frag shader changes its output depending on the window resolution -- but why? The off-screen quad is always at -1..1 for x and y, the vertex shader always maps this to fragment coords 0..1, the RTT texture always stays at 512x512 for this simple test!
Note, BOTH the off-screen quad AND the on-screen quad never change their coords and are always "full-screen" (-1..1 in both dimensions).
Again, this should be so simple. What on earth am I missing?
Specs: OpenGL 4.2 (but the code doesn't need any 4.2 features obviously!), Nvidia Quadro 5010M, openSuse 12.1 64bit, Golang Weekly 22-Feb-2012.
First of all - try checking OpenGL errors. Call glGetError() after each OpenGL function. Also you must set correct viewport for drawing. Before drawing to FBO call glViewport(0, 0, 512, 512). Before drawing to screen call glViewport(0, 0, display_width, display_height).
Also there is no need to bind rttFrameTex when you are rendering to it using FBO. Binding texture is needed only when you are reading texture in shader.