Most efficient way to perform sum of textures - opengl

Which is the best way, from a performance point of view, to perform a (weighted) sum of the content of two texture? I'm fine with either perform this on CPU or GPU as long is a fast method. Note that this must be repeated multiple times, so it's not just a one shot sum of two.
In particular I'm interested in a weighted sum of several texture, but I believe this can be easily generalized from the sum of two.
EDIT:
I'll make more clear my goal. I've to generate several texture (sequentially) with various blurring, therefore these texture will be all generated by rendering on texture. The number of them I don't think will ever be more than 8/9.
At the end the result must displayed on screen.

So if understand the question correctly, you render into some textures, and then need a weighted sum over all of those textures, and want to display just that image. If so, you could just do one extra rendering pass, while having all of your textures bound, and just calculate the weighted sum of all textures in the fragment shader. Since you do not need the result as a texutre, you could directly render into the default framebuffer, so the result should become immediately visible.
With up to 9 textures you need the most, you can actually follow that strategy, since there will be enough texture units. However, that approach might be a bit inflexible, especially if you have to deal with a varying number of textures to sum up at different points in time.
It would be nice if you could just have a uniform variable with the count, and array of weight values, and a loop in the shader which would boil down to
uniform int count;
uniform float weights[MAX_COUNT];
uniform sampler2D uTex[MAX_COUNT];
[...]
for (i=0; i<count; i++)
sum += weight[i] * texture(uTex[i], texcoords);
And you can do that beginning with GL 4. It does support arrays of texture samplers, but requires that the access index is dynamically uniform, which means that all shader invocations are going to access the same texture samplers at the same time. As the loop only depends on a uniform variable, this is the case.
However, it might be a better strategey to just not use multiple textures. Assuming all of your input textures have the same resolution, you might be better off using just one texture array. You can attach a layer of such an array texture to an FBO as you can do with a ordinary 2D texture, so rendering to them indepedently (or rendering to multiple layers at a time using multiple render targets) will just work. You then only need to bind that single array texture and can do
uniform int count;
uniform float weights[MAX_COUNT];
uniform sampler2Darray uTex;
[...]
for (i=0; i<count; i++)
sum += weight[i] * texture(uTex, vec3(texcoords,i));
This only requires GL3 level hardware and the maximum count you can work with is not limited by the number of texture units available to the texture shader, but tby the texture array limit (typically > 256) and the available memory. However, the performance will go down if count gets too high. You might reach some point where actually using multiple passes where you only sum up a certain sub-range of your images becomes more efficient, due to the texture cache. In this approach, all the texture accesses of the different layers compete for the texture cache, negatively impacting the cache hit rate between neighboring fragments. But this should be no issue with just 8 or 9 input images.

Related

How mipmap worked with fragment shader in opengl?

Mipmaps seem to be handled automatically by OpenGL. The function provided by the fragment shader seems to be to return the color of the sampling point corresponding to the pixel. So how does opengl automatically handle mipmaps?
When you use the texture(tex, uv) function, it uses the derivatives of uv with respect to the window coordinates to compute the footprint of the fragment in the texture space.
For a 2d texture with an isotropic filter the size of the footprint can be calculated as:
ρ = max{ √((du/dx)² + (dv/dx)²), √((du/dy)² + (dv/dy))² }
This calculates the change of uv horizontally and vertically, then takes the bigger of the two.
The logarithm of ρ, in combination with other parameters (like lod bias, clamping, and filter type) determines where in the pyramid the texel will be sampled.
However, in practice the implementation isn't going to do calculus to determine the derivatives. Instead a numeric approximation is used, typically by shading fragments in groups of four (aka 'quads') and calculating the derivatives by subtracting the uvs in the neighboring fragments in the group. This in turn may require 'helper invocations' where the shader is executed for a fragment that's not covered by the primitive, but is still used for the derivatives. This is also why historically, automatic mipmap level selection didn't work outside of a fragment shader.
The implementation is not required to use the above formula for ρ either. It can approximate it within some reasonable constraints. Anisotropic filtering complicates the formulas further, but the idea remains the same -- the implicit derivatives are used to determine where to sample the mipmap.
If the automatic derivatives mechanism isn't available (e.g. in a vertex or a compute shader), it's your responsibility to calculate them and use the textureGrad function instead.

Efficiently transforming many different models in modern OpenGL

Suppose I want to render many different models, each with a different transformation matrix I want to be applied to their vertices. As far as I understand, the naive approach is to specify a matrix uniform in the vertex shader, the value of which is updated for each mesh during rendering.
It's obvious to me that this is a bad idea, due to the expense of many uniform updates and draw calls. So, what is the most efficient way to achieve this in modern OpenGL?
I've genuinely tried to find a straight, clear answer to this question. Most answers I find vaguely mention UBOs, or instance drawing (which afaik won't work unless you are drawing instances of the same mesh many times, which is not my goal).
With OpenGL 4.6 or with ARB_shader_draw_parameters, each draw in a multi-draw rendering command (functions of the form glMultiDraw*) is assigned a draw index from 0 to the number of draw calls specified by that function. This index is provided to the Vertex Shader via the gl_DrawID input value. You can then use this index to fetch a matrix from any number of constructs: UBOs, SSBOs, buffer textures, etc.
This works for multi-draw indirect rendering as well. So in theory, you can have a compute shader operation generate a bunch of rendering commands, then render your entire scene with a single draw call (assuming that all of your objects live in the same vertex buffers and can use the same shader and other state). Or at the very least, a large portion of the scene.
Furthermore, this index is considered dynamically uniform, so you can also use it (or values derived from it and other dynamically uniform values) to index into arrays of textures, fetch a texture from an array of bindless textures, or the like.

Is there a workaround for increasing GL_MAX_ARRAY_TEXTURE_LAYERS?

I'm using a texture array to render Minecraft-style voxel terrain. It's working fantastic, but I noticed recently that GL_MAX_ARRAY_TEXTURE_LAYERS is alot smaller than GL_MAX_TEXTURE_SIZE.
My textures are very small, 8x8, but I need to be able to support rendering from an array of hundreds to thousands of them; I just need GL_MAX_ARRAY_TEXTURE_LAYERS to be larger.
OpenGL 4.5 requires GL_MAX_ARRAY_TEXTURE_LAYERS be at least 2048, which might suffice, but my application is targeting OpenGL 3.3, which only guarantees 256+.
I'm drawing up blanks trying to figure out a prudent workaround for this limitation; dividing up the rendering of terrain based on the max number of supported texture layers does not sound trivial at all to me.
I looked into whether ARB_sparse_texture could help, but GL_MAX_SPARSE_ARRAY_TEXTURE_LAYERS_ARB is the same as GL_MAX_ARRAY_TEXTURE_LAYERS; that extension is just a workaround for VRAM usage rather than layer usage.
Can I just have my GLSL shader access from an array of sampler2DArray? GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS has to be at least 80+, so 80+ * 256+ = 20480+ and that would enough layers for my purposes. So, in theory could I do something like this?
const int MAXLAYERS = 256;
vec3 texCoord;
uniform sampler2DArray[] tex;
void main()
{
int arrayIdx = int(texCoord.z + 0.5f) / MAXLAYERS 256
float arrayOffset = texCoord.z % MAXLAYERS;
FragColor = texture(tex[arrayIdx],
vec3(texCoord.x, texCoord.y, arrayOffset));
}
It would be better to ditch array textures and just use a texture atlas (or use an array texture with each layer containing lots of sub-textures, but as I will show, that's highly unnecessary). If you're using textures of such low resolution, you probably aren't using linear interpolation, so you can easily avoid bleed-over from neighboring texels. And even if you have trouble with bleed-over, it can easily be fixed by adding some space between the sub-textures.
Even if your sub-textures need to be 10x10 to avoid bleed-over, a 1024x1024 texture (the minimum size GL 3.3 requires) gives you 102x102 sub-textures, which is 10'404 textures. Which ought to be plenty. And if its not, then make it an array texture with however many layers you need.
Arrays of samplers will not work for your purpose. First, you cannot declare an unsized uniform array of any kind. Well you can, but you have to redeclare it with a size at some point in your shader, so there's no much point to the unsized declaration. The only unsized arrays you can have are in SSBOs, as the last element of the SSBO.
Second, even with a size, the index you use for arrays of opaque types must be a dynamically uniform. And since you're trying to draw all of the faces of the cubes in one draw calls, and each face can have select from a different layer, there is no intent for this expression's value to be dynamically uniform.
Third, even if you did this with bindless texturing, you would run into the same problem: unless you're on NVIDIA hardware, the sampler you pick must be a dynamically uniform sampler. Which requires the index into the array of samplers to be dynamically uniform. Which yours is not.

Editable Texture with OpenGL

I'm trying to take advantage of a gpu's parallelism to make an image proccessing application. I'm having a shader, which takes two textures, and based on some uniform variables, computes an output texture. But instead of transparency alpha value, each texture pixel needs an extra metadata byte, mandatory in computation:
So I consider running the shader twice each frame, once to compute the Dynamic Metadata as a single byte texture, and once to calculate the resulting Paint Texture, which I need to be 3 bytes (to limit memory usage, as there might be quite some such textures loaded at once).
I find the above problem a bit complicated, I've used opengl to paint to
the screen, but I need to paint to two different textures this time,
which I do not know how to do. Besides, gl_FragColor built-in variable's
type is vec4, but I need different output values.
So, to sum it up a little, is it possible for the fragment shader to output
anything other than a vec4?
Is it possible to save to two different textures with a single call?
Is it possible to make an editable texture to store changes, until the editing ends and the data have to be passed back to the cpu?
What openGL calls would be most usefull for the above?
Paint texture should also be able to be retrieved to be shown on the screen.
The above could very easily be done via blitting textures on the cpu.
I could keep all the relevant data on the cpu, do all the work 60 times/sec,
and update the relevant texture by passing the data from the cpu to the gpu.
For changing relatively small regions of a texture each frame
(about ~20% of the total scale of about 512x512 size textures), would you consider the above approach worth the trouble?
It depends on which version of OpenGL you use.
The latest OpenGL 4+ does not have a gl_FragColor variable, and instead lets you write any number (up to supported maximum) of output colors from the fragment shader, each sent to the corresponding framebuffer color attachment:
layout(location = 0) out vec4 OUT0;
layout(location = 1) out float OUT1;
That will write OUT0 to GL_COLOR_ATTACHMENT0 and OUT1 to GL_COLOR_ATTACHEMENT1 of the currently bound framebuffer.
However, considering that you use gl_FragColor, you use some old version of OpenGL. I'm not proficient in the legacy older OpenGL versions, but you can check out whether your implementation supports the GL_ARB_draw_buffers extension and/or gl_FragData[] output variable.
Also, as stated, it's unclear why can't you use a single RGBA texture and use its alpha channel for that metadata.

Comparing two textures in openGL

I'm new to OpenGL and I'm looking forward to compare two textures to understand how much they are similar to each other. I know how to to this with two bitmap images but I really need to use a method to compare two textures.
Question is: Is there any way to compare two textures as we compare two images? Like comparing two images pixel by pixel?
Actually what you seem to be asking for is not possible or at least not as easy as it would seem to accomplish on the GPU. The problem is GPU is designed to accomplish as many small tasks as possible in the shortest amount of time. Iterating through an array of data such as pixels is not included so getting something like an integer or a floating value might be a bit hard.
There is one very interesting procedure you may try but I can not say the result will be appropriate for you:
You may first create a new texture that is a difference between the two input textures and then keep downsampling the result till 1x1 pixel texture and get the value of that pixel to see how different it is.
To achieve this it would be best to use a fixed size of the target buffer which is POT (power of two) for instance 256x256. If you didn't use a fixed size then the result could vary a lot depending on the image sizes.
So in first pass you would redraw the two textures to the 3rd one (using FBO - frame buffer object). The shader you would use is simply:
vec4 a = texture2D(iChannel0,uv);
vec4 b = texture2D(iChannel1,uv);
fragColor = abs(a-b);
So now you have a texture which represents the difference between the two images per pixel, per color component. If the two images will be the same, the result will be a totally black picture.
Now you will need to create a new FBO which is scaled by half in every dimension which comes to 128x128 in this example. To draw to this buffer you would need to use GL_NEAREST as a texture parameter so no interpolations on the texel fetching is done. Then for each new pixel sum the 4 nearest pixels of the source image:
vec4 originalTextCoord = varyingTextCoord;
vec4 textCoordRight = vec2(varyingTextCoord.x+1.0/256, varyingTextCoord.y);
vec4 textCoordBottom = vec2(varyingTextCoord.x, varyingTextCoord.y+1.0/256);
vec4 textCoordBottomRight = vec2(varyingTextCoord.x+1.0/256, varyingTextCoord.y+1.0/256);
fragColor = texture2D(iChannel0, originalTextCoord) +
texture2D(iChannel0, textCoordRight) +
texture2D(iChannel0, textCoordBottom) +
texture2D(iChannel0, textCoordBottomRight);
The 256 value is from the source texture so that should come as a uniform so you may reuse the same shader.
After this is drawn you need to drop down to 64, 32, 16... Then read the pixel back to the CPU and see the result.
Now unfortunately this procedure may produce very unwanted results. Since the colors are simply summed together this will produce an overflow for all the images which are not similar enough (results in a white pixel or rather (1,1,1,0) for non-transparent). This may be overcome first by using a scale on the first shader pass, to divide the output by a large enough value. Still this might not be enough and an average might need to be done in the second shader (multiply all the texture2D calls by .25).
In the end the result might still be a bit strange. You get 4 color components on the CPU which represent the sum or the average of an image differential. I guess you could sum them up and choose what you consider for the images to be much alike or not. But if you want to have a more sense in the result you are getting you might want to treat the whole pixel as a single 32-bit floating value (these are a bit tricky but you may find answers around the SO). This way you may compute the values without the overflows and get quite exact results from the algorithms. This means you would write the floating value as if it is a color which starts with the first shader output and continues for every other draw call (get texel, convert it to float, sum it, convert it back to vec4 and assign as output), GL_NEAREST is essential here.
If not then you may optimize the procedure and use GL_LINEAR instead of GL_NEAREST and simply keep redrawing the differential texture till it gets to a single pixel size (no need for 4 coordinates). This should produce a nice pixel which represents an average of all the pixels in the differential textures. So this is the average difference between pixels in the two images. Also this procedure should be quite fast.
Then if you want to do a bit smarter algorithm you may do some wonders on creating the differential texture. Simply subtracting the colors may not be the best approach. It would make more sense to blur one of the images and then comparing it to the other image. This will lose precision for those very similar images but for everything else it will give you a much better result. For instance you could say you are interested only if the pixel is 30% different then the weight of the other image (the blurred one) so you would discard and scale the 30% for every component such as result.r = clamp(abs(a.r-b.r)-30.0/100.0, .0, 1.0)/((100.0-30.0)/100.0);
You can bind both textures to a shader and visit each pixel by drawing a quad or something like this.
// Equal pixels are marked green. Different pixels are shown in red color.
void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
vec2 uv = fragCoord.xy / iResolution.xy;
vec4 a = texture2D(iChannel0,uv);
vec4 b = texture2D(iChannel1,uv);
if(a != b)
fragColor = vec4(1,0,0,1);
else
fragColor = vec4(0,1,0,1);
}
You can test the shader on Shadertoy.
Or you can also bind both textures to a compute shader and visit every pixel by iteration.
You cannot compare vectors. You have to use
if( any(notEqual(a,b)))
Check the GLSL language spec