How exactly does imageAtomicExchange work?

How exactly does imageAtomicExchange work? - c++

I have a texture of vec4's which is being modified by the compute shader.
Different invocations of the compute shader modify different components of the same vector and this seems to be causing some concurrency problems as currently my method for doing so is:
vec4 texelData = imageLoad(Texture, texCoords);
//Do operations on individual components of texelData based on the invocation id
imageStore(Texture, texCoords, texelData);
I imagine what happens here is that different invocations are getting the original state of texelData which would be all 0's, writing their bit to it, then storing it which means only the component modified by the last invocation to finish will be present at the end.
So I'm looking into using imageAtomicExchange which should do this atomically, therefore eliminating the concurrency problems, however I cannot get it to work.
The spec says that the arguments are:
The image2D to write to
The vec2 coordinates in the image to write to
A float.. which I don't understand?
Would it not be a vec4 as that is what is present at each texel? And if not shouldn't there be another argument or an extra dimension of the coordinate vector to specify which component of the texel to swap?

Related

Should Uniform Buffer Objects be used over glUniform when uniform block size is small?

I've recently replaced all usage of glUniform() in my code to instead use Uniform Buffer Objects (UBO) to better organize my object rendering code. However, my performance has sharply plummeted as the number of objects drawn increases (an object in my case is anything with it's own offset into a UBO; only one UBO is shared between everything that uses the same shader program).
I think this may be because my objects actually use very little uniform data, sometimes as little as one vec4, while a UBO requires each offset be in multiples of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT which can be rather large.
I understood UBO to be a strict upgrade to using glUniform(), am I mistaken? Should I have a glUniform() path for these simple objects, or is there a necessary step to using UBOs effectively with many different simple objects that I haven't taken?
My scene is mostly quads, with very few objects using more than 4 verts, 6 indices. Some of these quads have large uniform blocks while most have very simple ones, maybe one or two vec4.
(For now, I've gotten around this problem by embedding this object data into the vertices, though it definitely feels wrong to do it this way.)

Shader storage buffer object slow – alternative?

I'm trying to make an array of vec3 available to a fragment shader. In the targeted application, there could be several hundred elements.
I tested transferring data in the form of a shader storage buffer object, declared as
layout(binding = 0) buffer voxels { vec3 xyz[]; }
and set using glBufferData, but I found that my fragment shader becomes very slow, even with only 33 elements.
Moreover, when I convert the same data into the GLSL code of a const vec3[] and include it in the shader code, the shader becomes noticeably faster.
Is there a better way – faster than an SSBO and more elegant than creating shader code?
As might already be apparent from the above, the array is only read from in the shader. It is constant within the shader as well as over shader invocations for different fragments, so effectively a uniform, and it is set only once or a few times over the runtime of the program.

I'd recommend using std430 layout specifier on the SSBO given that you are using vec3 data types, otherwise you'll be forced to pad the data, which isn't going to be great. In general, if the buffer is a fixed size, then prefer using glBufferSubData instead of glBufferData (the latter may reallocate memory on the GPU).
As yet another alternative, if you are able to target GL 4.4+, consider using glBufferStorage instead (or even better, if GL4.5 is available, use glCreateuffers, and glNamedBufferStorage). This let's you pass a few more hints to the GL driver about the way in which the buffer will be consumed. I'd try out a few options (e.g. mapping v.s. sub-data v.s. recreating each time).

How do I deal with many variables per triangle in OpenGL?

I'm working with OpenGL and am not totally happy with the standard method of passing values PER TRIANGLE (or in my case, quads) that need to make it to the fragment shader, i.e., assign them to each vertex of the primitive and pass them through the vertex shader to presumably be unnecessarily interpolated (unless using the "flat" directive) in the fragment shader (so in other words, non-varying per fragment).
Is there some way to store a value PER triangle (or quad) that needs to be accessed in the fragment shader in such a way that you don't need redundant copies of it per vertex? Is so, is this way better than the likely overhead of 3x (or 4x) the data moving code CPU side?
I am aware of using geometry shaders to spread the values out to new vertices, but I heard geometry shaders are terribly slow on non up to date hardware. Is this the case?

OpenGL fragment language supports the gl_PrimitiveID input variable, which will be the index of the primitive for the currently processed fragment (starting at 0 for each draw call). This can be used as an index into some data store which holds per-primitive data.
Depending on the amount of data that you will need per primitive, and the number of primitives in total, different options are available. For a small number of primitives, you could just set up a uniform array and index into that.
For a reasonably high number of primitives, I would suggest using a texture buffer object (TBO). This is basically an ordinary buffer object, which can be accessed read-only at random locations via the texelFetch GLSL operation. Note that TBOs are not really textures, they only reuse the existing texture object interface. Internally, it is still a data fetch from a buffer object, and it is very efficient with none of the overhead of the texture pipeline.
The only issue with this approach is that you cannot easily mix different data types. You have to define a base data type for your TBO, and every fetch will get you the data in that format. If you just need some floats/vectors per primitive, this is not a problem at all. If you e.g. need some ints and some floats per primitive, you could either use different TBOs, one for each type, or with modern GLSL (>=3.30), you could use an integer type for the TBO and reinterpret the integer bits as floating point with intBitsToFloat(), so you can get around that limitation, too.

You can use one element in the vertex array for rendering multiple vertices. It's called instanced vertex attributes.

Does OpenGL3+ admit a specific state to render 2D-only graphics?

I'm currently working on 2D graphics, and as far as I can tell every vertex is ultimately processed as a 4D point in homogeneous space. So I say to myself: what a waste of resources! I gather that the hardware is essentially designed to handle 3D scenes, and as such may be hardcoded to do 4d linear algebra. Yet, is there a way to write shaders (or enable a bunch of options) so that only genuine 2d coordinates are used in hard memory? I know one could embed two 2x2 matrices in a 4x4 matrix, but the gl_Position variable being a vec4 seems to end the track here. I'm not looking for some kind of "workaround" hack like this, but rather of a canonical way to make OpenGL do it, like a specific mode/state.
 I've not been able to find either sample code or even a simple mention of such a fact on the net, so I gather it should simply be impossible/not desirable for, say, performance reasons. Is that so?

Modern GPUs are actually scalar architectures. In GLSL you can write also shorter vectors. vec2 is a perfectly valid type and you can create vertex arrays with just 2 scalar elements per vector, as defined by the size parameter of glVertexAttribPointer
As Anon M. Coleman commented, OpenGL will internally perform a vec4(v, [0, [0]], 1) construction for any data passed in as a vertex attribute of dimension < 4.
In the vertex shader you must assign a vec4 to gl_Position. But you can trivially expand a vec2 to a vec4:
vec2 v2;
gl_Position = vec4(v2, 0, 1);
Yes, the gl_Position output always must be a vec4, due to the fact OpenGL specifies operations in clip space. But this is not really a bottleneck at all.

All credit goes to Andon M. Coleman, who perfectly answered the question as a comment. I just quote it here for the sake of completion:
«Absolutely not. Hardware itself is/was designed around 4-component data and instructions for many years. Modern GPUs are scalar friendly, and they have to be considering the push for GPGPU (but older NV GPUs pre-GeForce 8xxx have a purely vector ALU). Now, as for vertex attributes, you get 16 slots of size (float * 4) for storage. This means whether you use a vec2 or vec4 vertex attribute, it actually behaves like a vec4. This can be seen if you ever write vec4 and only give enough data for 2 of the components - GL automatically assigns Z = 0.0 and W = 1.0.
Furthermore, you could not implement clipping in 2D space with 2D coordinates. You need homogeneous coordinates to produce NDC coordinates. You would need to use window space coordinates, which you cannot do from a vertex shader. After the vertex shader finishes, GL will perform clipping, perspective divide and viewport mapping to arrive at window space. But window space coordinates are still 4D (the Z component may not contribute to a location in window space, but it does affect fragment tests). »

Is it possible to loop through a second VBO in the vertex shader?

So, let say that I have two vertex buffers. One that describes the actual shape I want to draw, and the other one is able to influence the first one.
So, what I actually want to be able to do is something like this:
uniform VBO second_one;
void main()
{
for (int i = 0; i < size_of_array(second_one); ++i)
Do things with second_one[i] to alter the values
create the output informations
}
Things I might want to do can be gravity, that that each point in second_one tries to drag a bit the point closer to it and so on and then after the point is adjusted, apply the matrices to have its actual location.
I would be really surprise that it's possible, or something close to it. But the whole point is to be able to use a second VBO, or the make it as a uniform of type vec3 let say so I can access it.

For what you're wanting, you have three options.
An array of uniforms. GLSL lets you do uniform vec3 stuff[50];. And arrays in GLSL have a .length() method, so you can find out how big they are. Of course, there are limits to the number of uniforms you use, but you shouldn't need more than 20-30 of these. Anything more than that and you'll really feel the performance drain.
Uniform buffer objects. These can store a bit more data than non-block uniforms, but they still have limits. And the storage comes from a buffer object. But accesses to them are, depending on hardware, slightly slower than accesses to direct uniforms.
Buffer textures. This is a way to attach a buffer object to a texture. With this, you can access vast amounts of memory from within a shader. But be warned: they're not fast to access. If you can make due with one of the above methods, do so.
Note that #2 and #3 will only be found on hardware capable of supporting GL 3.x and above. So DX10-class hardware.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js