I have compute shader in which I have uimage2d buffer and imageStore() operations on that buffer. I want to know what would be behaviour if i write to locations beyond the size of buffer. Would be there some wrap effect ? Or behaviour depends on driver ? Or will it undefined and anything can happen?
According to the specification, an access to a texel which doesn't exist has no effect.
See OpenGL 4.6 API Core Profile Specification - 8.26. TEXTURE IMAGE LOADS AND STORES; page 193:
If the individual texel identified for an image load, store, or atomic operation doesn’t exist, the access is treated as invalid. Invalid image loads will return zero.
Invalid image stores will have no effect. Invalid image atomics will not update any texture bound to the image unit and will return zero. An access is considered invalid if:
[...]
the selected texel doesn’t exist
Related
Context
I have a fragment shader that processes a 2D image. Sometimes a pixel may be considered "invalid" (RGB value 0/0/0) for a few frame, while being valid the rest of the frames. This causes temporal noise as these pixels flicker.
I'd like to implement a sort of temporal filter where each rendering loop, each pixel is "shown" (RGB value not 0/0/0) if and only if this pixel was "valid" in the last X loops, where X might be 5, 10, etc. I figured if I could have an array of the same size as the image, I could set the element corresponding to a pixel to 0 when that pixel is invalid and increment it otherwise. And if the value is >= X, then the pixel can be displayed.
Image latency caused by the temporal filter is not an issue, but I want to minimize performance costs.
The question
So that's the context. I'm looking for a mechanism that allows me reading and writing (uniforms are therefore out) between different rendering loops of the same fragment shader. Reading back the data from my OpenGL application is a plus but not necessary.
I came across Shader Storage Buffer Object, would it fit my needs?
Are there other concerns I should be aware of? Performances? Coherency/memory barriers?
Yes, SSBOs are a suitable tool to have persistent memory between shader loops.
As I couldn't find a reason why it wouldn't work, I implemented it and I was indeed able to have a SSBO as an array with each element mapped to a pixel in order to do temporal filtering on each pixels.
I had to do a few things to not have artifacts in the image:
Use GL_DYNAMIC_COPY when binding the data with glBufferData.
Set my SSBO as volatile in the shader.
Use a barrier (memoryBarrierBuffer();) in my shader to separate the writing and reading of the SSBO.
As mentioned by #user253751 in a comment, I had to convert texture coordinates to index arrays.
I checked the performance costs of using the SSBO and they were negligible in my case: <0.1 ms for a 848x480 frame.
I'm using texture in grids: firstly a large texture (such as 1024x1024 or 2048x2048) is created without data, then areas being used are set with glTexSubImage2d calls. However, I want to have all pixels to have initial value of 0xffff, not zero. And I feel it's stupid to allocate megabytes of all-0xffff host memory only for initialize texture value. So is it possible to set all pixels of a texture to a specific value, with just a few calls?
Specifically, is it possible in OpenGL 2.1?
There is glClearTexImage, but it was introduced in OpenGL 4.4; see if it's available to you with the ARB_clear_texture extension.
If you're absolutely restricted to the core OpenGL 2.1, allocating client memory and issuing a glTexImage2D call is the only way of doing that. In particular you cannot even render to a texture with unextended OpenGL 2.1, so tricks like binding the texture to a framebuffer (OpenGL 3.0+) and calling glClearColor aren't applicable. However, a one-time allocation and initialization of a 1-16MB texture isn't that big of a problem, even if it feels 'stupid'.
Also note that a newly created texture image is undetermined; you cannot rely on it being all zeros, thus you have to initialize it one way or another.
I'm doing some work with compute shaders, and I've noticed that if two invocations write to the same location on a texture using imageStore, you get a flickering effect when the texture is rendered since access speeds are not guaranteed, and so sometimes one invocation gets there last and sometimes its the other one. I would like my final colour value to be, say, the value with the highest value of red. Is there a way for me to determine that within the shader?
I think there was some confusion, so I'll just give some more info. I'm working with data that I've bound on the CPU as GL_UNSIGNED_BYTE, and I access it using
layout (r8, binding = 0) uniform image3D visualTexture;
At this stage, I simply just want to stop the flickering, ie, some shader invocation takes preference over the others. The highest value would be ideal, but I want this to be fast.
Image atomic operations are only permitted for single-channel, 32-bit formats (integers and float). So just change your data to use 32-bit integers, rather than 8-bit integers, and use imageAtomicMax to set values into the image.
You could just use the 32-bit integer buffer as an intermediary, with a post-process that reads the 32-bit data and writes out to an 8-bit buffer.
I have a working prototype that tests bindless textures. I have a camera that pans over 6 gigs of texture, while i only have 2 gigs of VRAM. I have an inner frustum that is used to get the list of objects in the viewport for rendering, and an outer frustum that is used to Queue in (make resident) the textures that will soon be rendered, all other textures, if they are resident, are made non resident using the function glMakeTextureHandleNonResident.
The program runs, but the VRAM of the gpu behaves as if it has a GC step where it clears VRAM at random intervals of time. When it does this, my rendering is completely frozen, but then skips to the proper frame, eventually getting back to up 60 FPS. Im curious that glMakeTextureHandleNonResident doesnt actually pull the texture out of VRAM "when" it is called. Does anyone know EXACTLY what the GPU is doing with that call?
GPU: Nvidia 750GT M
Bindless textures essentially expose a translation table on the hardware so that you can reference textures using an arbitrary integer (handle) in a shader rather than GL's traditional bind-to-image-unit mechanics; they don't allow you to directly control GPU memory residency.
Sparse textures actually sound more like what you want. Note that both of these things can be used together.
Making a handle non-resident does not necessarily evict the texture memory from VRAM, it just removes the handle from said translation table. Eviction of texture memory can be deferred until some future time, exactly as you have discovered.
You can read more about this in the extension specification for GL_ARB_bindless_texture.
void glMakeImageHandleResidentARB (GLuint64 handle, GLenum access):
"When an image handle is resident, the texture it references is not necessarily considered resident for the purposes of the AreTexturesResident command."
Issues:
(18) Texture and image handles may be made resident or non-resident. How
does handle residency interact with texture residency queries from
OpenGL 1.1 (glAreTexturesResident or GL_TEXTURE_RESIDENT)?
RESOLVED:
The residency state for texture and image handles in this
extension is completely independent from OpenGL 1.1's GL_TEXTURE_RESIDENT
query. Residency for texture handles is a function of whether the
glMakeTextureHandleResidentARB has been called for the handle. OpenGL 1.1
residency is typically a function of whether the texture data are
resident in GPU-accessible memory.
When a texture handle is not made resident, the texture that it refers
to may or may not be stored in GPU-accessible memory. The
GL_TEXTURE_RESIDENT query may return GL_TRUE in this case. However, it does
not guarantee that the texture handle may be used safely.
When a texture handle is made resident, the texture that it refers to is
also considered resident for the purposes of the old GL_TEXTURE_RESIDENT
query. When an image handle is resident, the texture that it refers to
may or may not be considered resident for the query -- the resident
image handle may refer only to a single layer of a single mipmap level
of the full texture.
I've have a rendering setup in which I write to a Frame Buffer object attached to a texture and the rendering itself also uses the texture I'm rendering to.
Is this generally a good idea? Could there be some strange issues involved here I'm perhaps not aware of?
This will result in undefined behavior, which means it may break with any future driver version, and behave differently on different hardware. To be on the safe side, you should never render into a texture which is currently bound (i.e. which could be possibly read and written to at the same time -- this is the problem actually). Try making a copy of the texture, and render into that instead.
Take a look at the spec, specifially section 4.4.3 "Rendering When an Image of a Bound Texture Object is Also Attached to the Framebuffer"