I have a very simple question:
It seems that accessing a sampler1D, via texture1D(), is slower than accessing a sampler2D, via texture2D. Is it right ?
As with most performance matters, the performance of texture fetching operations is platform-dependent.
Related
I am currently building an application in vulkan where I will be sampling a lot of data from a buffer. I will be using as much storage as possible, but sampling speed is also important. My data is in the form of a 2D array of 32 bit integers. I can either upload it as a texture and use a texture sampler for it, or as a storage buffer. I read that storage buffers are generally slow, so I was considering using the image sampler to read my data in a fragment shader. I would have to disable mipmapping and filtering, and convert UV coordinates to array indices, but if it's faster I think it might be worth it.
My question is, would it generally be worth it to store my data in an image sampler, or should I do the obvious and use a storage buffer? What are the pros/cons of each approach?
Guarantees about performance do not exist.
But Vulkan API tries not to decieve you. The obvious way is likely the right way.
If you want to sample then sample. If you want to do raw access then obviously do raw access. Generally, you should not be forcefully trying to put a square in a round hole.
I am very new to the OpenGL API and just discovered textures and how to use them. The generated texture buffers are not bound in the same way as regular uniforms are bound and instead use glActiveTexture, followed by a bind rather than just supplying the texture to the shaders via glUniform as we do with other constants.
What is the reasoning behind this convention?
The only reason I can think of is to utilize the graphics card's full potential and texture processing capabilities instead of just binding buffers directly. Is this correct reasoning, or is it simply the way the API was implemented?
No reasoning is given on the official wiki, just says that it's weird: "Binding textures for use in OpenGL is a little weird" https://www.khronos.org/opengl/wiki/Texture
Your question can be interpreted in two ways.
"Why do we bind textures to the context rather than to the shader?"
Because that would make it needlessly difficult to have multiple shaders use the same textures. Note that pretty much no graphics API directly attaches the texture to the program. Not D3D of any version, not Metal, nor even Vulkan.
Textures are resources used by shaders. But they are not part of the shader.
"Why do we treat textures differently from a general array of values?"
In modern OpenGL, shaders have access to several different kinds of resources: UBOs, SSBOs, textures, and images. Each of these kinds of resources ultimately represents a potentially distinct part of the graphics hardware.
A storage block is not just a uniform block that can be bigger. Storage buffers represent the shader doing global memory accesses, while uniform blocks are often copied directly into the shader execution units. In the latter case, accessing their data is much faster, but that also means that you're limited by how much storage those execution units can have.
Now, this is not true for all hardware (AMD's GCN hardware treats the two almost identically, which is why their UBO limits are so large). But it is true of much hardware.
Textures are even more complicated, because implementations need to be able to store their data in an optimal way for performance reasons. As such, texture storage formats are opaque. They're even opaque in ostensibly low-level APIs like Vulkan. Oh sure, linear formats exist, but implementations aren't required to let you read from them at all.
So textures are not just constant arrays.
You are comparing two completely different things
A texture object can (somehow) be compared to a buffer object. The texture is bound by a combination of glActiveTexture and glBindTexture to a texture unit, whereas a buffer is bound by glBindBuffer which is kind-off similar.
A texture sampler uniform is a uniform in the shader and should thus be compared with other uniforms. This sampler is set by a glUniform1i call.
Does utilizing textureOffset(...) increase performance compared to calculating offsets manually and using regular texture(...) function?
As there is a GL_MAX_PROGRAM_TEXEL_OFFSET property, I would guess that it can fetch offseted texels in a single, or at least as few as possible, fetches making it superb for example blurring effects, but I cant seem to find out how it works internally anywhere?
Update:
Reformulating question: is it common among gl-drivers to make any optimizations regarding texture fetches when utilizing the textureOffset(...) function?
You're asking the wrong question. The question should not be whether the more specific function will always have better performance. The question is whether the more specific function will ever be slower.
And there's no reason to expect it to be slower. If the hardware has no specialized functionality for offset texture accesses, then the compiler will just offset the texture coordinate manually, exactly like you could. If there is hardware to help, then it will use it.
So if you have need of textureOffsets and can live within its limitations, there's no reason not to use it.
I would guess that it can fetch offseted texels in a single, or at least as few as possible, fetches making it superb for example blurring effects
No, that's textureGather. textureOffset is for doing exactly what its name says: accessing a texture based on a texture coordinate, with an texel offset from that coordinate's location.
textueGather samples from multiple neighboring texels all at once. If you need to read a section of a texture to do bluring, textureGather (and textureGatherOffset) are going to be more useful than textureOffset.
So far I know about immediate, display list, vertex buffer and vertex buffer object rendering. Which is the fastest? Which OpenGL version does each require? What should I use?
The best (and pretty much only) method of rendering now is to use general purpose buffers, AKA Vertex Buffer Objects. They are in core from 2.1, if I'm correct, but generally appeared as an extension in 1.5 (as ARB_vertex_buffer_object). They have hardware support, which means they can be and probably will be stored directly in GPU memory.
When you load data to them, you specify the suggested usage. You can read more about it in glBufferData manual. For example, GL_STATIC_DRAW is something very similar to static display list. This allows your graphics card to optimize access to them.
Modern (read: non-ancient) hardware really dislikes immediate mode. I've seen a nearly 2-order-of-magnitude performance improvement by replacing immediate mode with vertex arrays.
OpenGL 3 and above support only buffer objects, all other rendering modes are deprecated.
Display lists are a serious pain to use correctly, and not worth it on non-ancient hardware.
To summarize: if you use OpenGL 3+, you have to use (V)BOs. If you target OpenGL 2, use vertex arrays or VBOs as appropriate.
I've been tuning my game's renderer for my laptop, which has a Radeon HD 3850. This chip has a decent amount of processing power, but rather limited memory bandwidth, so I've been trying to move more shader work into fewer passes.
Previously, I was using a simple multipass model:
Bind and clear FP16 blend buffer (with depth buffer)
Depth-only pass
For each light, do an additive light pass
Bind backbuffer, use blend buffer as a texture
Tone mapping pass
In an attempt to improve the performance of this method, I wrote a new rendering path that counts the number and type of lights to dynamically build custom GLSL shaders. These shaders accept all light parameters as uniforms and do all lighting in a single pass. I was expecting to run into some kind of limit, so I tested it first with one light. Then three. Then twenty-one, with no errors or artifacts, and with great performance. This leads me to my actual questions:
Is the maximum number of uniforms retrievable?
Is this method viable on older hardware, or are uniforms much more limited?
If I push it too far, at what point will I get an error? Shader compilation? Program linking? Using the program?
Shader uniforms are typically implemented by the hardware as registers (or sometimes by patching the values into shader microcode directly, e.g. nVidia fragment shaders). The limit is therefore highly implementation dependent.
You can retrieve the maximums by querying GL_MAX_VERTEX_UNIFORM_COMPONENTS_ARB and GL_MAX_FRAGMENT_UNIFORM_COMPONENTS_ARB for vertex and fragment shaders respectively.
See 4.3.5 Uniform of The OpenGLĀ® Shading Language specs:
There is an implementation dependent limit on the amount of storage for uniforms that can be used for
each type of shader and if this is exceeded it will cause a compile-time or link-time error. Uniform
variables that are declared but not used do not count against this limit.
It will fail at link or compile-time, but not using the program.
For how to get the max number supported by your OpenGL implementation, see moonshadow's answer.
For an idea of where the limit actually is for arbitrary GPUs, I'd recommend looking at which DX version that GPU supports.
DX9 level hardware:
vs2_0 supports 256 vec4. ps2_0 supports 32 vec4.
vs3_0 is 256 vec4, ps3_0 is 224 vec4.
DX10 level hardware:
vs4_0/ps4_0 is a minumum of 4096 constants per constant buffer - and you can have 16 of them.
In short, It's unlikely you'll run out with anything that is DX10 based.
I guess the maximum number of uniforms is determined by the amount of video memory,
as it's just a variable. Normal varaibles on the cpu are limited by your RAM too right?