What is the fastest way to swizzle channels in OpenGL? - opengl

I've heard that in OpenGL, changing the texture data format from GL_RGBA to GL_BGRA significantly improves pixel transfer performance. Now there are two ways to swizzle the texture data to suit this format:
One is to modify the fragment shader, so instead of FragColor = texture(...) , use
vec4 t = texture(...);
FragColor = vec4(t.b,t.g,t.r,t.a);
However, the red book introduces another method, which is to set the parameter GL_TEXTURE_SWIZZLE_RGBA .
On my small test program, both methods work. But which of these give better performance? I observe that setting the parameter can be done after uploading the data. So is the second method equivalent to the first, done implicitly by the driver?

Neither method of texture swizzling has anything to do with pixel transfer performance. Pixel transfers happen when you make calls like glTex(Sub)Image. How you read that data in your shader is irrelevant to the performance of those functions.
What you're being advised to do is provide pixel transfer functions with data that is in BGRA order. That means you are the one who needs to swizzle the data on the CPU. But really, the advice is to make sure your on-disk texture data is pre-swizzled for optimal transfer performance. If you can't control the format of your on-disk data, then its best to let the implementation swizzle the texture data itself rather than you writing code to do so.
Theirs is probably faster.

Related

Are Shader Storage Buffer Objects the right tool to have persistent memory between shader loops?

Context
I have a fragment shader that processes a 2D image. Sometimes a pixel may be considered "invalid" (RGB value 0/0/0) for a few frame, while being valid the rest of the frames. This causes temporal noise as these pixels flicker.
I'd like to implement a sort of temporal filter where each rendering loop, each pixel is "shown" (RGB value not 0/0/0) if and only if this pixel was "valid" in the last X loops, where X might be 5, 10, etc. I figured if I could have an array of the same size as the image, I could set the element corresponding to a pixel to 0 when that pixel is invalid and increment it otherwise. And if the value is >= X, then the pixel can be displayed.
Image latency caused by the temporal filter is not an issue, but I want to minimize performance costs.
The question
So that's the context. I'm looking for a mechanism that allows me reading and writing (uniforms are therefore out) between different rendering loops of the same fragment shader. Reading back the data from my OpenGL application is a plus but not necessary.
I came across Shader Storage Buffer Object, would it fit my needs?
Are there other concerns I should be aware of? Performances? Coherency/memory barriers?
Yes, SSBOs are a suitable tool to have persistent memory between shader loops.
As I couldn't find a reason why it wouldn't work, I implemented it and I was indeed able to have a SSBO as an array with each element mapped to a pixel in order to do temporal filtering on each pixels.
I had to do a few things to not have artifacts in the image:
Use GL_DYNAMIC_COPY when binding the data with glBufferData.
Set my SSBO as volatile in the shader.
Use a barrier (memoryBarrierBuffer();) in my shader to separate the writing and reading of the SSBO.
As mentioned by #user253751 in a comment, I had to convert texture coordinates to index arrays.
I checked the performance costs of using the SSBO and they were negligible in my case: <0.1 ms for a 848x480 frame.

Editable Texture with OpenGL

I'm trying to take advantage of a gpu's parallelism to make an image proccessing application. I'm having a shader, which takes two textures, and based on some uniform variables, computes an output texture. But instead of transparency alpha value, each texture pixel needs an extra metadata byte, mandatory in computation:
So I consider running the shader twice each frame, once to compute the Dynamic Metadata as a single byte texture, and once to calculate the resulting Paint Texture, which I need to be 3 bytes (to limit memory usage, as there might be quite some such textures loaded at once).
I find the above problem a bit complicated, I've used opengl to paint to
the screen, but I need to paint to two different textures this time,
which I do not know how to do. Besides, gl_FragColor built-in variable's
type is vec4, but I need different output values.
So, to sum it up a little, is it possible for the fragment shader to output
anything other than a vec4?
Is it possible to save to two different textures with a single call?
Is it possible to make an editable texture to store changes, until the editing ends and the data have to be passed back to the cpu?
What openGL calls would be most usefull for the above?
Paint texture should also be able to be retrieved to be shown on the screen.
The above could very easily be done via blitting textures on the cpu.
I could keep all the relevant data on the cpu, do all the work 60 times/sec,
and update the relevant texture by passing the data from the cpu to the gpu.
For changing relatively small regions of a texture each frame
(about ~20% of the total scale of about 512x512 size textures), would you consider the above approach worth the trouble?
It depends on which version of OpenGL you use.
The latest OpenGL 4+ does not have a gl_FragColor variable, and instead lets you write any number (up to supported maximum) of output colors from the fragment shader, each sent to the corresponding framebuffer color attachment:
layout(location = 0) out vec4 OUT0;
layout(location = 1) out float OUT1;
That will write OUT0 to GL_COLOR_ATTACHMENT0 and OUT1 to GL_COLOR_ATTACHEMENT1 of the currently bound framebuffer.
However, considering that you use gl_FragColor, you use some old version of OpenGL. I'm not proficient in the legacy older OpenGL versions, but you can check out whether your implementation supports the GL_ARB_draw_buffers extension and/or gl_FragData[] output variable.
Also, as stated, it's unclear why can't you use a single RGBA texture and use its alpha channel for that metadata.

OpenGL: efficient way to read sparce pixel data from many framebuffer textures?

I'm writing a program that uses the GPU to calculate stuff, and I want to read data from the framebuffers to be used in my client code. The framebuffers I'm using are about 40 textures, all 1024x1024 in size, all of which contain data that needs read, but only very sparcely, like 50 or so pixels in arbitrary x/y coordinates from each texture. Using glReadPixels for each texture, for each frame, is proving too costly for me to do though...
I only need to read a few select pixels from each texture, is there a way to quickly gather their data without needing to download every entire texture from the GPU?
This sounds fairly expensive no matter how you slice it. A couple of approaches come to mind:
What I would try first is glReadPixels(), but with using a PBO. Bind a buffer large enough to hold all the pixels to the GL_PIXEL_PACK_BUFFER target, and then submit the glReadPixels() calls, with offsets to place the results in distinct sections of the buffer. Then call glMapBufferRange() to read back the values.
An alternate approach is that you copy all the pixels you want to read into a single texture. You could use glBlitFramebuffer() or glCopyTexSubImage2D(). Then use a single glReadPixels() or glGetTexImage() call to get all the data from this texture.
Both of these approaches should result in about the same amount of work and synchronization overhead. But one or the other could be more efficient, depending on which paths in the driver are better optimized.
As the earlier answer already suggested, I would make very sure that you really need this, and there isn't any way to keep and process the data on the GPU. Any time you read back data, you introduce synchronization between GPU and CPU, which is mostly harmful to performance.
Do you have any restrictions on what OpenGL version you can use? If not, it sounds like you should look into compute shaders. You say that you are calculating data, so I assume that you are "abusing" the rendering pipeline for your application, especially the fragment shader, and store fragment data in the framebuffer that is interpreted as something else than color.
If this is the case, then all you need is a shader storage buffer and an atomic counter. At some point right now you are deciding that fragment (x, y, z [z being the texture index]) should have value v. So in your compute shader, you do your calculation as you would in the fragment shader, but as output, you store a tuple (x, y, z, v). You store this tuple in the shader storage buffer at the index of the atomic counter which you increment after each written element. In the end, you have your data stored compactly in the buffer and only need to read back these elements. The exact number is the value the atomic counter holds after termination. Download the buffer with glGetBufferSubData into an array of location-value pairs, iterate over it and do your CPU magic.
If you need to copy the data from the GPU to the CPU memory, there is no way (AFAIK) around using glReadPixels.
Depending on what platform you're using, and the specific of your programs, you can try several optimizations, using FBOs:
Copy only part of the texture, assuming you know the locations of the pixels. Note that in most cases it still faster to copy the entire texture instead of issuing several small reads
If you don't need 32 bit textures, you can render to a lower color resolution. The specific depends on your platform extensions.
Maybe you don't really need to copy the pixels since you plan to use them as a texture input to the next stage? In that case you copy the pixels directly on the GPU using glCopyTexImage2D

Get results of GPU calculations back to the CPU program in OpenGL

Is there a way to get results from a shader running on a GPU back to the program running on the CPU?
I want to generate a polygon mesh from simple voxel data based on a computational costly algorithm on the GPU but I need the result on the CPU for physics calculations.
Define "the results"?
In general, if you're doing GPGPU-style computations with OpenGL, you are going to need to structure your shaders around the needs of a rendering system. Rendering systems are designed to be one-way: data goes into them and an image is produced. Going backwards, having the rendering system produce data, is not generally how rendering systems are structured.
That doesn't mean you can't do it, of course. But you need to architect everything around the limitations of OpenGL.
OpenGL offers a number of hooks where you can write data from certain shader stages. Most of these require specialized hardware
Fragment shader outputs
Any hardware capable of fragment shaders will obviously allow you to write to the current framebuffer you're rendering. Through the use of framebuffer objects and textures with floating-point or integer image formats, you can write pretty much any data you want to a variety of images. Once in a texture, you can simply call glGetTexImage to get the rendered pixel data. Or you can just do glReadPixels to get it if the FBO is still bound. Either way works.
The primary limitations of this method are:
The number of images you can attach to the framebuffer; this limits the amount of data you can write. On pre-GL 3.x hardware, FBOs were typically limited to only 4 images plus a depth/stencil buffer. In 3.x and better hardware, you can expect a minimum of 8 images.
The fact that you're rendering. This means that you need to set up your vertex data to position a triangle exactly where you want it to modify data. This is not a trivial undertaking. It's also difficult to get useful input data, since you typically want each texel to be fairly independent of the other. Structuring your fragment shader around these limitations is difficult. Not impossible, but non-trivial in many cases.
Transform Feedback
This OpenGL 3.0 feature allows the output from the Vertex Processing stage of OpenGL (vertex shader and optional geometry shader) to be captured in one or more buffer objects.
This is much more natural for capturing vertex data that you want to play with or render again. In your case, you'll need to read it back after rendering it, perhaps with a glGetBufferSubData call, or by using glMapBufferRange for reading.
The limitations here are that you generally only can capture 4 output values, where each value is a vec4. There are also some strict layout restrictions. Some OpenGL 3.x and 4.x hardware offers the ability to write data to multiple feedback streams, which can all be written into different buffers.
Image Load/Store
This GL 4.2 feature represents the pinnacle of writing: you can bind an image (a buffer texture, if you want to write to a buffer), and just write to it. There are memory ordering constraints that you need to work within.
It's very flexible, but very complex. Besides the difficulty in using it properly, there are a number of limitations. The number of images you can write to will be fairly limited, perhaps 8 or so. And implementations may have total write limits, so that 8 images to write to may have to be shared by the fragment shader's outputs.
What's more, image outputs are only guaranteed for the fragment shader (and 4.3's compute shaders). That is, hardware is allowed to forbid you from using image load/store on non-FS/CS shader stages.

How do I set the color of a single pixel in a Direct3D texture?

I'm attempting to draw a 2D image to the screen in Direct3D, which I'm assuming must be done by mapping a texture to a rectangular billboard polygon projected to fill the screen. (I'm not interested or cannot use Direct2D.) All the texture information I've found in the SDK describes loading a bitmap from a file and assigning a texture to use that bitmap, but I haven't yet found a way to manipulate a texture as a bitmap pixel by pixel.
What I'd really like is a function such as
void TextureBitmap::SetBitmapPixel(int x, int y, DWORD color);
If I can't set the pixels directly in the texture object, do I need to keep around a DWORD array that is the bitmap and then assign the texture to that every frame?
Finally, while I'm initially assuming that I'll be doing this on the CPU, the per-pixel color calculations could probably also be done on the GPU. Is the HLSL code that sets the color of a single pixel in a texture, or are pixel shaders only useful for modifying the display pixels?
Thanks.
First, your direct question:
You can, technically, set pixels in a texture. That would require use of LockRect and UnlockRect API.
In D3D context, 'locking' usually refers to transferring a resource from GPU memory to system memory (thereby disabling its participation in rendering operations). Once locked, you can modify the populated buffer as you wish, and then unlock - i.e., transfer the modified data back to the GPU.
Generally locking was considered a very expensive operation, but since PCIe 2.0 that is probably not a major concern anymore. You can also specify a small (even 1-pixel) RECT as a 2nd argument to LockRect, thereby requiring the memory-transfer of a negligible data volume, and hope the driver is indeed smart enough to transfer just that (I know for a fact that in older nVidia drivers this was not the case).
The more efficient (and code-intensive) way of achieving that, is indeed to never leave the GPU. If you create your texture as a RenderTarget (that is, specify D3DUSAGE_RENDERTARGET as its usage argument), you could then set it as the destination of the pipeline before making any draw calls, and write a shader (perhaps passing parameters) to paint your pixels. Such usage of render targets is considered standard, and you should be able to find many code samples around - but unless you're already facing performance issues, I'd say that's an overkill for a single 2D billboard.
HTH.