OpenGL GLSL atomic counter in Vulkan - opengl

When I tried to migrate my OpenGL implementation to Vulkan, I found that 'uniform atomic_uint' is not supported in Vulkan. My use case is simple: incrementing an integer across all fragments. I tried to search the solution but did not find any latest solution.
Here is a list of old solutions:
https://software.intel.com/en-us/articles/opengl-performance-tips-atomic-counter-buffers-versus-shader-storage-buffer-objects. It said that OpenGL atomic counter is similar to SSBO atomic operation and it may be implemented as SSBO atomic operations on some platforms. (Not sure whether it is still true today).
https://community.khronos.org/t/vulkan-atomic-counters/7146. It also said to use image load/store or atomic operations on SSBO as a replacement. (But the content is 2 years old.)
Since Vulkan is still growing, can anyone suggest a latest standard way to do atomic increment over an integer using GLSL in Vulkan?
Edit:
I've got my answer, but I will add more details. In my OpenGL code, I have a render pass with a vertex shader and a fragment shader (No compute shader involved). In the fragment shader, I have the following glsl (simplified):
#version 450
layout (binding = 0) uniform atomic_uint fragmentCount;
void main()
{
atomicCounterIncrement(fragmentCount);
}
This shader works fine in OpenGL because OpenGL has enum 'GL_ATOMIC_COUNTER_BUFFER' in glBindBuffer and keyword 'atomic_uint' in glsl. However, Vulkan does not have the corresponding built-in keyword. Therefore, I try to seek a replacement for it. I did not ask how to query the number of fragments being rendered, though the shader here looks like I'm doing that. I was wondering whether this 'atomic counter' in general graphics shaders exists in Vulkan. As Nicol Bolas pointed out, there is no such thing in Vulkan and hardware-wise there is no implementation on NVIDIA GPU, so I decide to use SSBO and AtomicAdd to do the same thing.
Hope this makes my problem clearer.

Atomic counters don't exist in Vulkan, so you'll have to go with one of those solutions.
BTW, atomic counters, as a distinct hardware concept, are only something that existed on AMD hardware. That is why Vulkan doesn't support them; non-AMD hardware basically emulates them as SSBO work.

Related

What is the reasoning behind OpenGL texture units as opposed to regular buffers and uniforms?

I am very new to the OpenGL API and just discovered textures and how to use them. The generated texture buffers are not bound in the same way as regular uniforms are bound and instead use glActiveTexture, followed by a bind rather than just supplying the texture to the shaders via glUniform as we do with other constants.
What is the reasoning behind this convention?
The only reason I can think of is to utilize the graphics card's full potential and texture processing capabilities instead of just binding buffers directly. Is this correct reasoning, or is it simply the way the API was implemented?
No reasoning is given on the official wiki, just says that it's weird: "Binding textures for use in OpenGL is a little weird" https://www.khronos.org/opengl/wiki/Texture
Your question can be interpreted in two ways.
"Why do we bind textures to the context rather than to the shader?"
Because that would make it needlessly difficult to have multiple shaders use the same textures. Note that pretty much no graphics API directly attaches the texture to the program. Not D3D of any version, not Metal, nor even Vulkan.
Textures are resources used by shaders. But they are not part of the shader.
"Why do we treat textures differently from a general array of values?"
In modern OpenGL, shaders have access to several different kinds of resources: UBOs, SSBOs, textures, and images. Each of these kinds of resources ultimately represents a potentially distinct part of the graphics hardware.
A storage block is not just a uniform block that can be bigger. Storage buffers represent the shader doing global memory accesses, while uniform blocks are often copied directly into the shader execution units. In the latter case, accessing their data is much faster, but that also means that you're limited by how much storage those execution units can have.
Now, this is not true for all hardware (AMD's GCN hardware treats the two almost identically, which is why their UBO limits are so large). But it is true of much hardware.
Textures are even more complicated, because implementations need to be able to store their data in an optimal way for performance reasons. As such, texture storage formats are opaque. They're even opaque in ostensibly low-level APIs like Vulkan. Oh sure, linear formats exist, but implementations aren't required to let you read from them at all.
So textures are not just constant arrays.
You are comparing two completely different things
A texture object can (somehow) be compared to a buffer object. The texture is bound by a combination of glActiveTexture and glBindTexture to a texture unit, whereas a buffer is bound by glBindBuffer which is kind-off similar.
A texture sampler uniform is a uniform in the shader and should thus be compared with other uniforms. This sampler is set by a glUniform1i call.

OpenGL Buffer Texture cache implementation

I've been playing a bit with opengl TBOs today, because it seems to be the only way to have an object shared with OpenCL which OpenCL can read/write inside one kernel (it is not an image) and a fragment shader can read from (and has less limitation in size). Pretty nice!
However, after comparing the read performance on the GL side to actual 1d/2d/3d textures I have the suspicion the texelFetch on the gsamplerBuffer is simply and uncached global memory read, and for my application about 2x slower. At least on OSX driver OpenGL 4.1 ATI-1.22.25, GLSL 4.10.
Can anybody confirm this suspicion or provide contrary findings (on other platforms?)?

What is the code limit size of a vertex or fragment shader in OpenGL 2+

I plan on writing a program that will take some paraemeters as input and will generate its own fragment shader string which will then be compiled, linked and used as a fragment shader (it will only be done once at the start of a program).
Im not an expert in computer graphics so I dont know if this is standard practice but I definitely think it has the potential for some interesting applications - not necessarily graphics applications but possibly computational ones.
My question is what is the code size limit of a shader in OpenGL i.e. how much memory can OpenGL reasonably allocate to a program on the graphics processor?
There is no code size limit. OK, there is, but:
OpenGL doesn't give you a way to query it because:
Such a number would be meaningless, since it does not translate to anything you can directly control in GLSL.
A long GLSL shader might compile while a short shader can't. Why? Because the compiler may have been able to optimize the long shader down to size, while the short shader expanded to lots of opcodes. In short, GLSL is too high-level to be able to effectively quantify such limitations.
In any case, given the limitations of GL 2.x-class hardware, you probably won't hit any length limitations unless you're trying to do so or are doing GPGPU work.

OpenGL Rendering Modes

So far I know about immediate, display list, vertex buffer and vertex buffer object rendering. Which is the fastest? Which OpenGL version does each require? What should I use?
The best (and pretty much only) method of rendering now is to use general purpose buffers, AKA Vertex Buffer Objects. They are in core from 2.1, if I'm correct, but generally appeared as an extension in 1.5 (as ARB_vertex_buffer_object). They have hardware support, which means they can be and probably will be stored directly in GPU memory.
When you load data to them, you specify the suggested usage. You can read more about it in glBufferData manual. For example, GL_STATIC_DRAW is something very similar to static display list. This allows your graphics card to optimize access to them.
Modern (read: non-ancient) hardware really dislikes immediate mode. I've seen a nearly 2-order-of-magnitude performance improvement by replacing immediate mode with vertex arrays.
OpenGL 3 and above support only buffer objects, all other rendering modes are deprecated.
Display lists are a serious pain to use correctly, and not worth it on non-ancient hardware.
To summarize: if you use OpenGL 3+, you have to use (V)BOs. If you target OpenGL 2, use vertex arrays or VBOs as appropriate.

Shader limitations

I've been tuning my game's renderer for my laptop, which has a Radeon HD 3850. This chip has a decent amount of processing power, but rather limited memory bandwidth, so I've been trying to move more shader work into fewer passes.
Previously, I was using a simple multipass model:
Bind and clear FP16 blend buffer (with depth buffer)
Depth-only pass
For each light, do an additive light pass
Bind backbuffer, use blend buffer as a texture
Tone mapping pass
In an attempt to improve the performance of this method, I wrote a new rendering path that counts the number and type of lights to dynamically build custom GLSL shaders. These shaders accept all light parameters as uniforms and do all lighting in a single pass. I was expecting to run into some kind of limit, so I tested it first with one light. Then three. Then twenty-one, with no errors or artifacts, and with great performance. This leads me to my actual questions:
Is the maximum number of uniforms retrievable?
Is this method viable on older hardware, or are uniforms much more limited?
If I push it too far, at what point will I get an error? Shader compilation? Program linking? Using the program?
Shader uniforms are typically implemented by the hardware as registers (or sometimes by patching the values into shader microcode directly, e.g. nVidia fragment shaders). The limit is therefore highly implementation dependent.
You can retrieve the maximums by querying GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB and GL_MAX_FRAGMENT_UNIFORM_COMPONENTS_ARB for vertex and fragment shaders respectively.
See 4.3.5 Uniform of The OpenGLĀ® Shading Language specs:
There is an implementation dependent limit on the amount of storage for uniforms that can be used for
each type of shader and if this is exceeded it will cause a compile-time or link-time error. Uniform
variables that are declared but not used do not count against this limit.
It will fail at link or compile-time, but not using the program.
For how to get the max number supported by your OpenGL implementation, see moonshadow's answer.
For an idea of where the limit actually is for arbitrary GPUs, I'd recommend looking at which DX version that GPU supports.
DX9 level hardware:
vs2_0 supports 256 vec4. ps2_0 supports 32 vec4.
vs3_0 is 256 vec4, ps3_0 is 224 vec4.
DX10 level hardware:
vs4_0/ps4_0 is a minumum of 4096 constants per constant buffer - and you can have 16 of them.
In short, It's unlikely you'll run out with anything that is DX10 based.
I guess the maximum number of uniforms is determined by the amount of video memory,
as it's just a variable. Normal varaibles on the cpu are limited by your RAM too right?