Are there advantages of MipMaps aside from the performance ones? - opengl

Is the only true advantage of the mipmaps that the filtering required in real time will be less demanding, as it will have been done partly in advance?
Could you not achieve identical results with linear or anisotropic filtering and a little bit more processing power?

Not with "a little bit" more processing power, but with several orders of magnitude more. As an extreme example, consider a quad with a texture mapped to it, but scaled down so the quad is rendered onto a single screen pixel. That screen pixel is then expected to have the average colour value over the entire texture.
When using mipmaps, there will be a 1x1 precomputed mipmap level that has the colour value you want. One simple lookup, fast and easy.
When not using mipmaps, to achieve the exact same effect, rendering this one pixel would mean doing a texture lookup for every single pixel in the texture and then averaging over them. We could take shortcuts by averaging over only, say, 16 equally spaced pixels, but that could make a marked difference in the output (consider what this would do to checkerboard patterns).
So whilst this theoretically could be done without mipmaps in real time, it would effectively mean calculating large portions of the entire mipmap pyramid for every pixel. There would be no visual difference, but you'd have to start measuring framerates in frames per hour.

Related

Calculating average scene brightness in OpenGL

I'm currently implementing automatically adapting exposure for use with HDR in OpenGL. For this I need to retrieve the average brightness of all pixels in the previous frame.
I've not managed to find any solid explanations of how to do this. As far as I can see there are two ways to go about it.
Use glReadPixels to copy the framebuffer to memory and average them on the CPU. This is likely to be painfully slow and doesn't make good use of the GPU.
Take the frame and render it to successively smaller FBOs using linear filtering. This lets the GPU do most of the work but it's going to require a lot of FBOs (roughly 10 for a 1080p screen).
There has got to be a better way of getting average scene brightness. Does anyone have any suggestions?
There are two options that come into my mind:
Using glGenerateMipmap, which calculates the average of a 2x2 window, leaving you with the average scene brightness at the smallest level. This can be retrieved using textureLod function in a shader. Since each mipmap level has half the size of the previous one, the correct level will be log2(max), where max is the returned value of GL_MAX_TEXTURE_SIZE.
Using compute shaders to do basically the same thing glGenerateMipmap does, but with a bigger window size, which could potentially be faster (although I never tested this).
Your Option 2 is not much different from using glGenerateMipmap on the texture, just that you don't need to hassle with any client side objects like FBOs. So basically, rendering to mipmap level 0 of the texture, letting the GL generate the mipmap pyramid, and reading back just the highest level 1x1 image is probably the easiest way to get some approximation of the average color value.

OpenGL Gaussian Kernel on 3D texture

I would like to perform a blur on a 3D texture in openGL. Since it is separable I should be able to do it in 3 passes. My question is, what would be the best way to cope with it?
I currently have the 3D texture and fill it using imageStore. Should I create other 2 copies of the texture for the blur or is there a way to do it while using a single texture?
I am already using glCompute to compute the mip map of the 3D texture, but in this case I read from the texture at level 0 and write to the one at the next level so there is no conflict, while in this case I would need some copy.
In short it can't be done in 3 passes, because is not a 2D image. Even if kernel is separable.
You have to blur each image slice separately, wich is 2 passes for image (if you are using a 256x256x256 texture then you have 512 passes just for blurring along U and V coordinates). The you still have to blur along T and U (or T and V: indifferent) coordinates wich is another 512 passes. You can gain performance by using bilinear filter and read values between texels to save some constant processing cost. The 3D blur will be very costly.
Performance tip: maybe you don't need to blur the whole texture but only a part of it? (the visible part?)
The problem wich a such high number of passes, is the number of interactions between GPU and CPU: drawcalls and FBO setup wich are both slow operations that hang the CPU (probably a different API with low CPU overhead would be faster)
Try to not separate the kernel:
If you have a small kernel (I guess up to 5^3, only profiling will show the max kernel size) probably the fastest way is to NOT separate the kernel (that's it, you save a lot of drawcalls and FBO binding and leverage everything to GPU fillrate and bandwith).
Spread work over time:
Does not matter if your kernel is separated or not. Instead of computing a Gaussian Blur every frame, you could just compute it every second (maybe with a bigger kernel). Then you use as source of "continuos blurring data" the interpolation of the previouse blur and the next blur (wich is a 2x 3D Texture samples each frame, wich is much cheaper than continuosly blurring).

Efficient downsampling for post-processing effects in opengl 3.3

I understand the idea behind the bloom/glow effect: we downsample the texture to keep our convolution kernels small. Now that I am trying to implement it, I am not quite sure which road I should take.
My first idea was to use glGenerateMipMap to do the downsampling. However, I cannot tell it to stop after, say, 4 steps. It's a bit of a black box for me, and for all I know, it may generate 10 images to sample my screen from 1024*768 down to 1*1. Maybe these last steps are cheap because everything is so small already, but maybe they are not.
I googled around and found that many people were relying on FBOs rather than glGenerateMipMap. I am familiar with FBOs since I use deferred lighting. My second idea is to simply render a 'quad' with a linear sampler into a smaller texture. I would do that four times in a row, halving width and height each time. However, I found that some people preferred using their own fragment shader for downsampling rather than relying on GL_LINEAR and I wonder why; maybe it is faster?
What would be a way to quickly downsample my full-screen texture 4 times in a row, keeping each version? I have no need for fancy edge-preserving sampling algorithms as I am going to blur everything anyway.
we downsample the texture to keep our convolution kernels small.
Or you simply render the bloom/glow layer at a smaller resolution in the first place. This saves both fillrate and you don't have to minify afterwards.
My second idea is to simply render a 'quad' with a linear sampler into a smaller texture.
This is no downsampling it all. It's linear interpolation between sampling points and may create artifacts.

Optimize OpenGL 2D rendering by using depth buffer to discard overlapping pixels?

Is it possible to take advantage of the depth buffer in a way such that it would only draw on those areas where are no pixels drawn yet?
I am rendering simple 1 colored triangles: a lot of them may overlap, which will reduce rendering speed significantly, because it is rendering more pixels than what is visible on the screen.
This is easily possible in 3D render mode: just enable depth testing and set the triangles on different z-positions. But that does not work on 2d mode: i cant set every triangle on higher position than the previous, since that would result in bad rendering quality after certain height when the depth buffer limits come on the way.
How can I do this with shaders? Or if no shaders needed; how to do it without shaders?
Assign a polygon offset (by means of glPolygonOffset) to each triangle, and enable depth testing.
i cant set every triangle on higher position than the previous, since that would result in bad rendering quality after certain height when the depth buffer limits come on the way.
That would only happen if you do it wrong.
A 24-bit depth buffer offers 16 million different depth values for you to choose from. It's simply a matter of computing a value properly. Granted, the exact mechanics are hardware-specific, but no so specific that you would be unable to get at least 4 million separate layers.
It's a matter of simple math. You're building a function that maps from the integer range [0, N] to the floating-point range [0, 1], where N is the number of triangles. Say, 4 million just to give you room.
Thus, the Z-value for any pariticular triangle is k/N, where k is the integer index of that triangle. You should easily be able to do this in your shader.
Worst comes to worst, you can make a 32-bit floating-point depth buffer.

Sum image intensities in GPU

I have an application where I need take the average intensity of an image for around 1 million images. It "feels" like a job for a GPU fragment shader, but fragment shaders are for per-pixel local computations, while image averaging is a global operation.
One approach I considered is loading the image into a texture, applying a 2x2 box-blur, load the result back into a N/2 x N/2 texture and repeating until the output is 1x1. However, this would take log n applications of the shader.
Is there a way to do it in one pass? Or should I just break down and use CUDA/OpenCL?
The summation operation is a specific case of the "reduction," a standard operation in CUDA and OpenCL libraries. A nice writeup on it is available on the cuda demos page. In CUDA, Thrust and CUDPP are just two examples of libraries that provide reduction. I'm less familiar with OpenCL, but CLPP seems to be a good library that provides reduction. Just copy your color buffer to an OpenGL pixel buffer object and use the appropriate OpenGL interoperability call to make that pixel buffer's memory accessible in CUDA/OpenCL.
If it must be done using the opengl API (as the original question required), the solution is to render to a texture, create a mipmap of the texture, and read in the 1x1 texture. You have to set the filtering right (bilinear is appropriate, I think), but it should get close to the right answer, modulo precision error.
My gut tells me to attempt your implementation in OpenCL. You can optimize for your image size and graphics hardware by breaking up the images into bespoke chunks of data that are then summed in parallel. Could be very fast indeed.
Fragment shaders are great for convolutions but that result is usually written to the gl_FragColor so it makes sense. Ultimately you will have to loop over every pixel in the texture and sum the result which is then read back in the main program. Generating image statistics perhaps not what the fragment shader was designed for and its not clear that a major performance gain is to be had since its not guaranteed a particular buffer is located in GPU memory.
It sounds like you may be applying this algorithm to a real-time motion detection scenario, or some other automated feature detection application. It may be faster to compute some statistics from a sample of pixels rather than the entire image and then build a machine learning classifier.
Best of luck to you in any case!
It doesn't need CUDA if you like to stick to GLSL. Like in the CUDA solution mentioned here, it can be done in a fragment shader staight forward. However, you need about log(resolution) draw calls.
Just set up a shader that takes 2x2 pixel samples from the original image, and output the average sum of those. The result is an image with half resolution in both axes. Repeat that until the image is 1x1 px.
Some considerations: Use GL_FLOAT luminance textures if avaliable, to get an more precise sum. Use glViewport to quarter the rendering area in each stage. The result then ends up in the top left pixel of your framebuffer.