If using alpha-to-coverage without explicitly setting the samples from the shader (a hardware 4.x feature?), is the coverage mask for alpha value ‘a‘ then guaranteed to be the bit-flip of the coverage mask for alpha value ‘1.f-a‘?
Or in other words: if i render two objects in the same location, and the pixel alphas of the two objects sum up to 1.0, is it then guaranteed that all samples of the pixel get written to (assuming both objects fully cover the pixel)?
The reason why I ask is that I want to crossfade two objects and during the crossfade each object should still properly depth-sort in respect to itself (without interacting with the depth values of the other object and without becoming ‚see-through‘).
If not, how can I realize such a ‚perfect‘ crossfade in a single render pass?
The logic for alpha-to-coverage computation is required to have the same invariance and proportionality guarantees as GL_SAMPLE_COVERAGE (which allows you to specify a floating-point coverage value applied to all fragments in a given rendering command).
However, said guarantees are not exactly specific:
It is intended that the number of 1’s in this value be proportional to the sample coverage value, with all 1’s corresponding to a value of 1.0 and all 0’s corresponding to 0.0.
Note the use of the word "intended" rather than "required". The spec is deliberately super-fuzzy on all of this.
Even the invariance is really fuzzy:
The algorithm can and probably should be different at different pixel locations. If it does differ, it should be defined relative to window, not screen, coordinates, so that rendering results are invariant with respect to window position.
Again, note the word "should". There are no actual requirements here.
So basically, the answer to all of your questions are "the OpenGL specification provides no guarantees for that".
That being said, the general thrust of your question suggests that you're trying to (ab)use multisampling to do cross-fading between two overlapping things without having to do a render-to-texture operation. That's just not going to work well, even if the standard actually guaranteed something about the alpha-to-coverage behavior.
Basically, what you're trying to do is multisample-based dither-based transparency. But like with standard dithering methods, the quality is based entirely on the number of samples. A 16x multisample buffer (which is a huge amount of multisampling) would only give you an effective 16 levels of cross-fade. This would make any kind of animated fading effect not smooth at all.
And the cost of doing 16x multisampling is going to be substantially greater than the cost of doing render-to-texture cross-fading. Both in terms of rendering time and memory overhead (16x multisample buffers are gigantic).
If not, how can I realize such a ‚perfect‘ crossfade in a single render pass?
You can't; not in the general case. Rasterizers accumulate values, with new pixels doing math against the accumulated value of all of the prior values. You want to have an operation do math against a specific previous operation, then combine those results and blend against the rest of the previous operations.
That's simply not the kind of math a rasterizer does.
Related
The whole question is in the title.
I really wonder why this extension uses float type instead of int. I know if you use an arbitrary value, it'll work until you don't pass the max level. But I always thought this value should be an integer of multiple of 2.
Because there is no reason not to use a float. The anisotropy more or less represents the aspect ratio of the pixel (or filter) footprint in texture space. (In the most general case of a perspective projection, the pixel footprint would be a trapezoid in texture space, and a single aspect ratio will only be an approximation as if it were an parallelogram, but it is good enough in practice.) So a non-integral value for the anisotropy value is mathematically completely sane.
The GL_EXT_texutre_filter_anisotropic complelely leaves open the actual implementation details to the implementor:
The particular scheme for anisotropic texture filtering is
implementation dependent. Additionally, implementations are free
to consider the current texture minification and magnification modes
to control the specifics of the anisotropic filtering scheme used.
So actually one could come up with a implementation where a fractional aniso setting could make a difference.
But I always thought this value should be an integer of multiple of 2.
In practice, most GPUs do use some scheme where the limit actually is a power of two. If you have an aniso factor x=2^i and would access mipmap level m, the filter can basically take x samples from (higher resolution) mipmap level m-i instead.
Also note that it is explicitly allowed to set or query the aniso settings via gl[Get]TexParameteri*(), so you actually can use completely integer-based settings in your code and pretend that the float parameter simply doesn't exist.
I'd like to enumerate those general, fundamental circumstances under which multi-pass rendering becomes an unavoidable necessity, as opposed to keeping everything within the same shader program. Here's what I've come up with so far.
When a result requires non-local fragment information (i.e. context) around the current fragment, e.g. for box filters, then a previous pass must have supplied this;
When a result needs hardware interpolation done by a prior pass;
When a result acts as pre-cache of some set of calculations that enables substantially better performance than simply (re-)working through the entire set of calculations in those passes that use them, e.g. transforming each fragment of the depth buffer in a particular and costly way, which multiple later-pass shaders can then share, rather than each repeating those calculations. So, calculate once, use more than once.
I note from my own (naive) deductions above that vertex and geometry shaders don't really seem to come into the picture of deferred rendering, and so are probably usually done in first pass; to me this seems sensible, but either affirmation or negation of this, with detail, would be of interest.
P.S. I am going to leave this question open to gather good answers, so don't expect quick wins!
Nice topic. For me since I'm a beginner I would say to avoid unnecessary calculations in the pixel/fragment shader you get when you use forward rendering.
With forward rendering you have to do a pass for every light you have in your scene, even if the pixel colors aren't affected.
But that's just a comparison between forward rendering and deferred rendering.
As opposed to keeping everything in the same shader program, the simplest thing I can think of is the fact that you aren't restricted to use N number of lights in your scene, since in for instance GLSL you can use either separate lights or store them in a uniform array. Then again you can also use forward rendering, but if you have a lot of lights in your scene forward rendering has a too expensive pixel/fragment shader.
That's all I really know so I would like to hear other theories as well.
Deferred / multi-pass approaches are used when the results of the depth buffer are needed (produced by rendering basic geometry) in order to produce complex pixel / fragment shading effects based on depth, such as:
Edge / silhouette detection
Lighting
And also application logic:
GPU picking, which requires the depth buffer for ray calculation, and uniquely-coloured / ID'ed geometries in another buffer for identification of "who" was hit.
I am interested in getting the maximum hardware-supported resolution for textures.
There are, as far as I have found, two mechanisms for doing something related to this:
glGetIntegerv(GL_MAX_TEXTURE_SIZE,&dim) for 2D (and cube?) textures has served me well. For 3D textures, I discovered (the hard way) that you need to use GL_MAX_3D_TEXTURE_SIZE instead. As far as I can tell, these return the maximum resolution along one side, with the other sides assumed to be the same.
It is unclear what these values actually represent. The values returned by glGetIntegerv(...) are to be considered "rough estimate"s, according to the documentation, but it's unclear whether they are conservative underestimates, best guesses, or best-cases. Furthermore, it's unclear whether these are hardware limitations or current limitations based on the amount of available graphics memory.
The documentation instead suggests using . . .
GL_PROXY_TEXTURE_(1|2|3)D/GL_PROXY_TEXTURE_CUBE_MAP. The idea here is you make a proxy texture before you make your real one. Then, you check to see whether the proxy texture was okay by checking the actual dimensions it got. For 3D textures, that would look like:
glGetTexLevelParameteriv(GL_PROXY_TEXTURE_3D, 0, GL_TEXTURE_WIDTH, &width);
glGetTexLevelParameteriv(GL_PROXY_TEXTURE_3D, 0, GL_TEXTURE_HEIGHT, &height);
glGetTexLevelParameteriv(GL_PROXY_TEXTURE_3D, 0, GL_TEXTURE_DEPTH, &depth);
If all goes well, then the dimensions returned will be nonzero (and presumably the same as what you requested). Then you delete the proxy and make the texture for real.
Some older sources state that proxy textures give outright wrong answers, but that may not be true today.
So, for modern OpenGL (GL 4.* is fine), what is the best way to get the maximum hardware-supported resolution for 1D-, 2D-, 3D-, and cube-textures?
There is a separate value for cube maps, which is queried with GL_MAX_CUBE_MAP_TEXTURE_SIZE. So the limits are:
GL_MAX_TEXTURE_SIZE: Maximum size for GL_TEXTURE_1D and GL_TEXTURE_2D.
GL_MAX_RECTANGLE_TEXTURE_SIZE: Maximum size for GL_TEXTURE_RECTANGLE.
GL_MAX_CUBE_MAP_TEXTURE_SIZE: Maximum size for GL_TEXTURE_CUBE_MAP.
GL_MAX_3D_TEXTURE_SIZE: Maximum size for GL_TEXTURE_3D.
The "rough estimate" language you found on the man pages seems unfortunate. If you look at the much more relevant spec document instead, it talks about the "maximum allowable width and height", or simply says that it's an error to use a size larger than these limits.
These limits represent the maximum sizes supported by the hardware. Or more precisely, the advertised hardware limit. It's of course legal for hardware to restrict the limit below what the hardware could actually support, as long as the advertised limit is consistently applied. Picture that the hardware can only manage/sample textures up to a given size, and this is the size reported by these limits.
These limits have nothing to do with the amount of memory available, so staying within these limits is absolutely no guarantee that a texture of the size can successfully be allocated.
I believe the intention of proxy textures is to let you check what size can actually be allocated. I don't know if that works reliably on any platforms. The mechanism really is not a good fit for how modern GPUs manage memory. But I have never used proxy textures, or dealt with implementing them. I would definitely expect significant platform/vendor dependencies in how exactly they operate. So you should probably try if they give you the desired results on the platforms you care about.
The values returned by glGetIntegerv() for GL_MAX_TEXTURE_SIZE annd GL_MAX_3D_TEXTURE_SIZE are the correct limits for the particular implementation.
It is unclear what these values actually represent. The values
returned by glGetIntegerv(...) are to be considered "rough estimate"s,
according to the documentation, but it's unclear whether they are
conservative underestimates, best guesses, or best-cases.
What kind of documentation are you refering to? The GL spec is very clear on the meaning of those values, and they are not estimates of any kind.
The proxy method should work, too, but does not directly allow you to query the limits. You could of use binary search to narrow down the exact limit via that proxy texture path, but that is just a rather clumsy approach.
I'm drawing several alpha-blended triangles that overlap with a single glDrawElements call.
The indices list the triangles back to front and this order is important for the correct visualization.
Can I rely on the result of this operation being exactly the same as when drawing the triangles in the same order with distinct draw calls?
I'm asking this because I'm not sure whether some hardware would make some kind of an optimization and use the indices only for the information about the primitives that are drawn and disregard the actual primitive order.
To second GuyRT's answer, I looked through the GL4.4 core spec:
glDrawElements is described as follows (emphasis mine):
This command constructs a sequence of geometric primitives by
successively transferring elements for count vertices to the GL.
In section 2.1, on can find the following statement (emphasis mine):
Commands are always processed in the order in which they are received,
[...] This means, for example, that one primitive must be drawn
completely before any subsequent one can affect the framebuffer.
One might read this as only valid for primitves rendered through different draw calls (commands), however, in 7.12.1, there is some further confirmation for the more general interpretation reading for that statement (again, my emphasis):
The relative order of invocations of the same shader type are
undefined. A store issued by a shader when working on primitive B
might complete prior to a store for primitive A, even if primitive A
is specified prior to primitive B. This applies even to fragment
shaders; while fragment shader outputs are written to the framebuffer
in primitive order, stores executed by fragment shader invocations are
not.
Yes, you can rely on the order being the same as specified in the index array, and that fragments will be correctly blended with the results of triangles specified earlier in the array.
I cannot find a reference for this, but my UI rendering code relies on this behaviour (and I think it is a common technique).
To my knowledge OpenGL makes no statement about the order of triangles rendered within a single draw call of any kind. It would be counterproductive of it to do so, because it would place undesirable constraints on implementations.
Consider that modern rendering hardware is almost always multi-processor, so the individual triangles from a draw call are almost certainly being rendered in parallel. If you need to render in a particular order for alpha blending purposes, you need to break up your geometry. Alternatively you could investigate the variety of order independent transparency algorithms out there.
I've been reading through the openGL specification trying to find an answer to this question, without luck. I'm trying to figure out if OpenGL guarantees that draw calls such as GLDrawElements or GLDrawArrays will draw elements in precisely the order they appear in the VBO, or if it is free to process the fragments of those primitives in any order.
For example, if I have a vertex buffer with 30 vertices representing 10 triangles, each with the same coordinates. Will it always be the case that the triangle corresponding to vertices 0, 1 and 2 will be rendered first (and therefore on the bottom); and the triangle corresponding to vertices 28, 29, 30 always be rendered last (and therefore on top)?
The specification is very careful to define an order for the rendering of everything. Arrays of vertex data are processed in order, which results in the generation of primitives in a specific order. Each primitive is said to be rasterized in order, and later primitives cannot be rasterized until prior ones have finished.
Of course, this is all how OpenGL says it should behave. Implementations can (and do) cheat by rasterizing and processing multiple primitives at once. However, they will still obey the "as if" rule. So they cheat internally, but will still write the results as if it had all executed sequentially.
So yes, there's a specific order you can rely upon. Unless you're using shaders that perform incoherent memory accesses; then all bets are off for shader writes.
Although they may actually be drawn in a different order and take finish at different times, at the last raster operation pipeline stage, any blending (or depth/stencil/alpha test for that matter) will be done in the order that the triangles were issued.
You can confirm this by rendering some object using a blending equation that doesn't commute, for example:
glBlendFunc(GL_ONE, GL_DST_COLOR);
If the final framebuffer contents were written by the same arbitrary order that the primitives may be drawn, then in such an example you would see an effect that looks similar to Z-fighting.
This is why it's called a fragment shader (as opposed to pixel shader) because it's not a pixel yet since after the fragment stage it doesn't get written to the framebuffer just yet; only after the raster operation stage.