As an experiment I decided to try rendering to a texture using the image API exclusively. At first the results were obviously wrong as the texture write occurred before the depth test. So I enabled the early_fragment_tests, which I though was introduced for pretty much this type of use case, but now I get a weird sort of flickering which seems like Z-fighting, which seems strange since it should be performing the same depth test that works for regular rendering.
Anyway, I've included an image of the problem, and I'm curious if anyone has an explanation as what is going on, and why this doesn't work. Can it be made to work?
Here's a minimal reproducer
#version 420
in vec3 normal;
layout(binding = 0) writeonly uniform image2D outputTex;
void main()
{
vec4 fragColor = vec4(normal, 1);
imageStore(outputTex, ivec2(gl_FragCoord.xy), fragColor);
}
I'm going to make some assumptions about the code you didn't show. Because you didn't show it. I'm going to assume that:
You used proper memory coherence operations when you went to display this image (whether to the actual screen or in a glReadPixels/glGetTexImage operations).
You rendered this scene using a regular rendering command, with no special ordering of triangles or anything. You did not render each triangle with a separate rendering command with memory coherence operations between each.
In short, I'm going to assume that your problem is actually due to your shader. It may well be due to many other things. But since you didn't deign to show the rest of your code, I can't tell. Therefore, this answer may not in fact answer your problem. But garbage in, garbage out.
The problem I can see from your shader and the above assumptions is really quite simple: incoherent memory accesses (like image load/store) are completely unordered. You performed an image write operation. Therefore, you have no guarantees about this write operation unless you take steps to make those guarantees.
Yes, you used early fragment tests. But that doesn't mean that the order of incoherent memory accesses from your fragment shader will be in any particular order.
Consider what happens if you render a triangle, then render a triangle in front of it that completely covers it. Early fragment tests won't change anything, since the top fragment happens after the bottom one. And image load/store does not guarantee anything about the ordering of writes to the same pixel. Therefore, it is very possible for writes to the bottom triangle to complete after writes to the top triangle.
As far as I know, ordering writes to the same pixel from different fragment shaders like this is not possible. Even if you issued a memoryBarrier after your write, my reading of the spec doesn't suggest that this will guarantee the write ordering here.
The correct answer is to not do this at all. Write to fragment shader outputs; that's what they're there for.
Related
This question already has an answer here:
(How) can a shader view the current render-buffer?
(1 answer)
Closed 3 years ago.
I'm trying to basically "add" framebuffers, or better the colortexture attachmement of framebuffers. I found a way to do this is by having a shader which gets all the textures and renders their combination.
But to improve the performance wouldn't it be better to just have one shader and framebuffer, and then through instanced drawing the shader draws onto the framebuffer colortexture attachement it is using for drawing?
A bit better explained:
I have 2 framebuffers: Default and Framebuffer1.
I bind Framebuffer1
and give the colortexture attachment of Framebuffer1 as uniform "Fb1_cta" to the following fragment shader:
out vec4 FragColor;
in vec2 TexCoords;
uniform sampler2D Fb1_cta;
void main()
{
vec3 text = texture(Framebuffer1, TexCoords).rgb;
FragColor = vec4(vec3(0.5) + text, 1.0);
}
So i draw into Framebuffer1, but also use the current colortexture attachement for the drawing.
Now I call glDrawArraysInstanced with instancecount 2.
The first renderpass should draw the whole texture in grey (rgb = (0.5, 0.5, 0.5)) and the second should add another vec3(0.5) to that, so the result will be white. That however didn't really work so I split the glDrawArraysInstanced into 2 glDrawArrays and checked the 2 results.
Now while the first pass works as intended:Result of first rendering
The second didn't (btw this is the same result as with the glDrawArraysInstanced):Result of second rendering
To me this pretty much looks like the two renderpasses aren't done sequentially, but in parallel. So I did rerun my code but this time with a bit of time passing between the calls and that seemed to have solved the issue.
Now I wonder is there any way to tell OpenGL that those calls should truly be sequential and might there even be a way to do it with glDrawArraysInstanced to improve the performance?
Is there in general a more elegant solution to this kind of problem?
In general, you cannot read from a texture image that is also being rendered to. To achieve the level of performance necessary for real-time rendering, it is essential to take advantage of parallelism wherever possible. Fragment shader invocations are generally not processed sequentially. On a modern GPU, there will be thousands and thousands of fragment shader invocations running concurrently during rendering. Even fragment shader invocations from separate draw calls. OpenGL and GLSL are designed specifically to enable this sort of parallelization.
From the OpenGL 4.6 specification, section 9.3.1:
Specifically, the values of rendered fragments are undefined if any shader stage
fetches texels and the same texels are written via fragment shader outputs, even
if the reads and writes are not in the same draw call, unless any of the following
exceptions apply:
The reads and writes are from/to disjoint sets of texels (after accounting for
texture filtering rules).
There is only a single read and write of each texel, and the read is in
the fragment shader invocation that writes the same texel (e.g. using
texelFetch2D(sampler, ivec2(gl_FragCoord.xy), 0);).
If a texel has been written, then in order to safely read the result a texel fetch
must be in a subsequent draw call separated by the command
void TextureBarrier( void );
TextureBarrier will guarantee that writes have completed and caches have
been invalidated before subsequent draw calls are executed.
The OpenGL implementation is allowed to (and, as you have noticed, will actually) run multiple drawcalls concurrently if possible. Across all the fragment shader invocations that your two drawcalls produce, you do have some that read and write from/to the same sets of texels. There is more than a single read and write of each texel from different fragment shader invocations. The drawcalls are not separated by a call to glTextureBarrier(). Thus, your code produces undefined results.
A drawcall alone does not constitute a rendering pass. A rendering pass is usually understood as the whole set of operations that produce a certain piece of output (like a particular image in a framebuffer) that is then usually again consumed as an input into another pass. To make your two draw calls "truly sequential", you could call glTextureBarrier() between issuing the draw calls.
But if all you want to do is draw two triangles, one after the other, on top of each other into the same framebuffer, all you have to do is draw two triangles and use additive blending. You don't need instancing. You don't need separate drawcalls. Just draw two triangles. OpenGL requires blending to take place in the order in which the triangles from which the fragments originated were specified. Be aware that if you happen to have depth testing enabled, chances are your depth test is going to prevent the second triangle from ever being drawn unless you did change the depth testing function to something other than the default.
The downside of blending is that you're limited to a set of a few fixed functions that you can select as your blend function. But add is one of them. If you need more complex blending functions, there are vendor-specific extensions that enable what is typically called "programmable blending" on some GPUs…
Note that all of the above only concerns drawcalls that read-from and write to the same target. Drawcalls that read from a target that ealier drawcalls rendered to are guaranteed to be sequenced after the drawcalls that produced their input.
Can I access and change output values of another Fragment at a certain location in the Fragmentshader?
For example in the main() loop I process everything just like usualy and output the color with some value. But in adition to that I also want the fragment at position vec3(5,3,6) (in world coordinates) to have the same colour.
Now I already did some researche on the web on that. The OpenGL site says, the fragmentshader has one fragment as input and has one fragment as output, which doesnt sound very promising.
Also I know that all fragments are being processed in parallel. But maybe it is posible to say, if the fragment at this position has not been processed yet, write this color to it and take this fragment as already processed.
My be someone can explain if this is posible somehow and if not, why this is not a good idea. The best guess I would have is, to build this logic into the shader, it would have a very bad effect on the general performance.
My be someone can explain if this is posible somehow and if not, why this is not a good idea.
It's not a question of bad idea vs. good idea. It's simply not possible.
The closest you can get to this functionality is ARB_fragment_shader_interlock. Through its interlock and ordering guarantees, it allows limited interoperation. And that limitation is... it only allows interoperation for fragments that cover the same pixel/sample.
So even this functionality does not allow you to write to some other pixel.
The absolute best you can do is use SSBOs and atomic counters to have fragment shaders write what color values and "world coordinates" they would like to write to, then have a second process execute that buffer as either a rendering command or a compute shader to actually write that data.
As already pointed out in Nicol's answer, you can't write to additional fragments of a framebuffer surface in the fragment shader.
The description of your use case is not clear enough to tell what might work best. In the interest of brainstorming, the most direct approach that comes to mind is that you don't use a framebuffer draw surface at all, but output to an image instead.
If you bind a texture as an image, you can write to it in the fragment shader using the imageStore() built-in function. This function takes coordinates as one of the argument, so you can write to any pixel you want, as well as write multiple pixels from the same shader invocation.
Depending on what exactly you want to achieve, I could also imagine a hybrid approach, where your primary rendering still goes to a framebuffer, but you write additional pixel values to an image at the desired positions. Then, in a second rendering pass, you can combine the content of the image with the primary rendering. The combination could be done with blending if the math/logic is simple enough. If you need a more complex combination, you can use a texture as the framebuffer attachment of the initial pass, and then use the result of the rendering and the extra image as two inputs for the fragment shader of the combination pass.
I’ve been wondering what happens when binding a depth-only FBO (only the GL_DEPTH_ATTACHMENT gets attached and glDrawBuffer(GL_NONE) is called) for the fragment shader part. Because any color is discarded:
does OpenGL simply process vertices the regular way, call the rasterizer, apply the fragment shader for rasterized fragments, but discard any result
or does it do smarter things, like process vertices until the optional geometry shader, then cut the fragment shader part and use a dummy fragment shader in order to discard useless color computations?
Because of vendor-implementation details, I guess it might vary, but I’d like to have a better idea about that topic.
In my experience, the fragment shader will still run even if it has no outputs. This can be used for example to draw shadow maps with punch-through alpha textures, using discard.
If it does have outputs (or more outputs then are bound), then they should just be ignored. I'd imagine that a smart driver could easily skip the fragment shader entirely if it doesn't contain any discard statements.
Also perhaps look into Separate Shader Objects (https://www.opengl.org/registry/specs/ARB/separate_shader_objects.txt). It allows you to disable the stages manually.
I've read (Though never personally tested) that a complete lack of a color buffer causes strange undefined behavior, as OpenGL implementations each had to ask this question in reverse: "What /should/ we make it do when there's no color buffer?" and have no official, commonly-used-across-all-implementations answer.
The official documentation carefully avoids mentioning this situation generally.
As such, it is just recommended that you simply... not do that, and instead always have a color buffer, even if you don't use it.
Let's say I have a shader set up to use 3 textures, and that I need to render some polygon that needs all the same shader attributes except that it requires only 1 texture. I have noticed on my own graphics card that I can simply call glDisableVertexAttrib() to disable the other two textures, and that doing so apparently causes the disabled texture data received by the fragment shader to be all white (1.0f). In other words, if I have a fragment shader instruction (pseudo-code)...
final_red = tex0.red * tex1.red * tex2.red
...the operation produces the desired final value regardless whether I have 1, 2, or 3 textures enabled. From this comes a number of questions:
Is it legit to disable expected textures like this, or is it a coincidence that my particular graphics card has this apparent mathematical safeguard?
Is the "best practice" to create a separate shader program that only expects a single texture for single texture rendering?
If either approach is valid, is there a benefit to creating a second shader program? I'm thinking it would be cost less time to make 2 glDisableVertexAttrib() calls than to make a glUseProgram() + 5-6 glGetUniform() calls, but maybe #4 addresses that issue.
When changing the active shader program with glUseProgram() do I need to call glGetUniform... functions every time to re-establish the location of each uniform in the program, or is the location of each expected to be consistent until the shader program is deallocated?
Disabling vertex attributes would not really disable your textures, it would just give you undefined texture coordinates. That might produce an affect similar to disabling a certain texture, but to do this properly you should use a uniform or possibly subroutines (if you have dozens of variations of the same shader).
As far as time taken to disable a vertex array state, that's probably going to be slower than changing a uniform value. Setting uniform values don't really affect the render pipeline state, they're just small changes to memory. Likewise, constantly swapping the current GLSL program does things like invalidate shader cache, so that's also significantly more expensive than setting a uniform value.
If you're on a modern GL implementation (GL 4.1+ or one that implements GL_ARB_separate_shader_objects) you can even set uniform values without binding a GLSL program at all, simply by calling glProgramUniform* (...)
I am most concerned with the fact that you think you need to call glGetUniformLocation (...) each time you set a uniform's value. The only time the location of a uniform in a GLSL program changes is when you link it. Assuming you don't constantly re-link your GLSL program, you only need to query those locations once and store them persistently.
Has anyone familiar with some sort of OpenGL magic to get rid of calculating bunch of pixels in fragment shader instead of only 1? Especially this issue is hot for OpenGL ES in fact meanwile flaws mobile platforms and necessary of doing things in more accurate (in performance meaning) way on it.
Are any conclusions or ideas out there?
P.S. it's known shader due to GPU architecture organisation is run in parallel for each texture monad. But maybe there techniques to raise it from one pixel to a group of ones or to implement your own glTexture organisation. A lot of work could be done faster this way within GPU.
OpenGL does not support writing to multiple fragments (meaning with distinct coordinates) in a shader, for good reason, it would obstruct the GPUs ability to compute each fragment in parallel, which is its greatest strength.
The structure of shaders may appear weird at first because an entire program is written for only one vertex or fragment. You might wonder why can't you "see" what is going on in neighboring parts?
The reason is an instance of the shader program runs for each output fragment, on each core/thread simultaneously, so they must all be independent of one another.
Parallel, independent, processing allows GPUs to render quickly, because the total time to process a batch of pixels is only as long as the single most intensive pixel.
Adding outputs with differing coordinates greatly complicates this.
Suppose a single fragment was written to by two or more instances of a shader.
To ensure correct results, the GPU can either assign one to be an authority and ignore the other (how does it know which will write?)
Or you can add a mutex, and have one wait around for the other to finish.
The other option is to allow a race condition regarding whichever one finishes first.
Either way this would immensely slows down the process, make the shaders ugly, and introduces incorrect and unpredictable behaviour.
Well firstly you can calculate multiple outputs from a single fragment shader in OpenGL 3 and up. A framebuffer object can have more than one RGBA surfaces (Renderbuffer Objects) attached and generate an RGBA for each of them by using gl_FragData[n] instead of gl_FragColor. See chapter 8 of the 5th edition OpenGL SuperBible.
However, the multiple outputs can only be generated for the same X,Y pixel coordinates in each buffer. This is for the same reason that an older style fragment shader can only generate one output, and can't change gl_FragCoord. OpenGL guarantees that in rendering any primitive, one and only one fragment shader will write to any X,Y pixel in the destination framebuffer(s).
If a fragment shader could generate multiple pixel values at different X,Y coords, it might try to write to the same destination pixel as another execution of the same fragment shader. Same if the fragment shader could change the pixel X or Y. This is the classic multiple threads trying to update shared memory problem.
One way to solve it would be to say "if this happens, the results are unpredictable" which sucks from the programmer point of view because it's completely out of your control. Or fragment shaders would have to lock the pixels they are updating, which would make GPUs far more complicated and expensive, and the performance would suck. Or fragment shaders would execute in some defined order (eg top left to bottom right) instead of in parallel, which wouldn't need locks but the performance would suck even more.