Efficient downsampling for post-processing effects in opengl 3.3 - opengl

I understand the idea behind the bloom/glow effect: we downsample the texture to keep our convolution kernels small. Now that I am trying to implement it, I am not quite sure which road I should take.
My first idea was to use glGenerateMipMap to do the downsampling. However, I cannot tell it to stop after, say, 4 steps. It's a bit of a black box for me, and for all I know, it may generate 10 images to sample my screen from 1024*768 down to 1*1. Maybe these last steps are cheap because everything is so small already, but maybe they are not.
I googled around and found that many people were relying on FBOs rather than glGenerateMipMap. I am familiar with FBOs since I use deferred lighting. My second idea is to simply render a 'quad' with a linear sampler into a smaller texture. I would do that four times in a row, halving width and height each time. However, I found that some people preferred using their own fragment shader for downsampling rather than relying on GL_LINEAR and I wonder why; maybe it is faster?
What would be a way to quickly downsample my full-screen texture 4 times in a row, keeping each version? I have no need for fancy edge-preserving sampling algorithms as I am going to blur everything anyway.

we downsample the texture to keep our convolution kernels small.
Or you simply render the bloom/glow layer at a smaller resolution in the first place. This saves both fillrate and you don't have to minify afterwards.
My second idea is to simply render a 'quad' with a linear sampler into a smaller texture.
This is no downsampling it all. It's linear interpolation between sampling points and may create artifacts.

Related

Calculating average scene brightness in OpenGL

I'm currently implementing automatically adapting exposure for use with HDR in OpenGL. For this I need to retrieve the average brightness of all pixels in the previous frame.
I've not managed to find any solid explanations of how to do this. As far as I can see there are two ways to go about it.
Use glReadPixels to copy the framebuffer to memory and average them on the CPU. This is likely to be painfully slow and doesn't make good use of the GPU.
Take the frame and render it to successively smaller FBOs using linear filtering. This lets the GPU do most of the work but it's going to require a lot of FBOs (roughly 10 for a 1080p screen).
There has got to be a better way of getting average scene brightness. Does anyone have any suggestions?
There are two options that come into my mind:
Using glGenerateMipmap, which calculates the average of a 2x2 window, leaving you with the average scene brightness at the smallest level. This can be retrieved using textureLod function in a shader. Since each mipmap level has half the size of the previous one, the correct level will be log2(max), where max is the returned value of GL_MAX_TEXTURE_SIZE.
Using compute shaders to do basically the same thing glGenerateMipmap does, but with a bigger window size, which could potentially be faster (although I never tested this).
Your Option 2 is not much different from using glGenerateMipmap on the texture, just that you don't need to hassle with any client side objects like FBOs. So basically, rendering to mipmap level 0 of the texture, letting the GL generate the mipmap pyramid, and reading back just the highest level 1x1 image is probably the easiest way to get some approximation of the average color value.

Texture tiling with continuous random offset?

I have a texture and a mesh, if I apply the texture on the mesh, it tiles it continuously as one would expect. The offset for each tile is equal.
The problem:
Non-tilable texture or texture with some outstanding elements are looking repetitive and cheap.
Example:
Solution Attempt
My first attempt was to programatically generate a texture size of a mesh with randomised offsets for each tiles. Of course the size of the texture became a problem, let alone the GPU limitation of a single texture max size.
What I would like to do
I would like to know if there's a way to make a Unity shader or a material that would load a single texture and tile it with random offsets for each tile and do it only once to keep the performance high?
I believe you might try one of techniques invented by Inigo Quilez (http://www.iquilezles.org/www/articles/texturerepetition/texturerepetition.htm).
Basically, non-tilable textures and textures with some outstanding elements are different problems.
Non-tilable textures
There are 2 ways of solving it:
Fixing the texture itself;
Mirrored repeat can be used in some cases (see GL_MIRRORED_REPEAT)
Textures with some outstanding elements
This can be solved in the following ways (or conjunction of them):
Modifying the texture (this includes enlargement as well);
Using multitexturing;
Well, maybe mirrored repeat can be used as well in some cases.
Shifting texture coordinates randomly
Unfortunately, I can't think of any case of these 2 problems (except, maybe, white nose textures) where texture coordinates shifting is a solution.
You are looking at this problem the wrong way. All games face this issue. They hide it simply by a) varying textures a lot instead of texturing large areas with the same texture and b) through level design. Imagine this plane filled with barns, gras, trees, fences and what not - suddenly the mono-textured surface blends in with its surroundings. Also camera angle plays a huge role in this. Try changing your camera position close to the ground and the repeating texture is much less noticeable.
Your plane is just a very extreme example. You should not try to fix it at this point but rather continue to build your game. Or design your textures to repeat well without showing clear patterns. The extreme would be a flatcolored texture. But generally large outdoor terrain textures simply have very little structure, almost being like noise, plus they don't use colors with any contrast, just shades of the same color.
Your offset idea won't work. Perhaps it might work technically (it may be inefficient though). But random offsets can't cover up the patterns, instead it will create new ones because the textures won't smoothly interpolate at their edges anymore, so you could clearly see a grid of squares. That I guess would be even uglier and more noticeable.
Lastly you can increase texture size or scale (blurryness may need to be covered up as explained above). In relation to camera angle this would be the easiest, most effective fix. Or at least an improvement.
old thread, but relevant to many I think. You can do this in a shader, by randomizing the Vertex position on the XZ plane, (or better) the UV co-ordinates, based on the world space of the co-ordinates.
The texture will still tile.... but instead of being in a straight line... it will be in a random wiggly line. This is great for stuff like terrain, grass etc.... but obviously no good if you want to maintain straight lines in your textures.
A second option is diffuse-detail shader. It tiles one texture up close to camera, and another when further away (which you can make softer / more blurry
Third option... blend 2 textures together, with different UV tiling scale (non divisible. e.g not scale 2 and 4, but use 1 and 2.334556) on each, so the pattern is harder to see

My own z-buffer

How I can make my own z-buffer for correct blending alpha channels? I'm using glsl.
I have only one idea. And this is use 2 "buffers", one of them storing depth-component and another color (with alpha channel). I don't need access to buffer in my program. I cant use uniform array because glsl have a restriction for the number of uniforms variables. I cant use FBO because behaviour for sometime writing and reading Frame Buffer is not defined (and dont working at any cards).
How I can resolve this problem?!
Or how to read actual real time z-buffer from glsl? (I mean for each fragment shader call z-buffer must be updated)
How I can make my own z-buffer for correct blending alpha channels?
That's not possible. For perfect order-independent transparency you must get rid of z-buffer and replace it with another mechanism for hidden surface removal.
With z-buffer there are two possible ways to tackle the problem.
Multi-layered z-buffer (impractical with hardware acceleration) - basically it'll store several layers of "depth" values and will use it for blending transparent surfaces. Will hog a lot of memory, and there will be maximum number of transparent overlayying surfaces, once you're over the limit, there will be artifacts.
Depth peeling (google it). Order independent transparency, but there's a limit for maximum number of "overlaying" transparent polygons per pixel. Can actually be implemented on hardware.
Both approaches will have a limit (maximum number of overlapping transparent polygons per pixel), once you go over the limit, scene will no longer render properly. Which means the whole thing rather useless.
What you could actually do (to get perfect solution) is to remove the zbuffer completely, and make a graphic rendering pipeline that will gather all polygons to be rendered, clip them, split them (when two polygons intersect), sort them and then paint them on screen in correct order to ensure that you'll get correct result. However, this is hard, and doing it with hardware acceleration is harder. I think (I'm not completely certain it happened) 5 ot 6 years ago some ATI GPU-related document mentioned that some of their cards could render correct scene with Z-Buffer disabled by enabling some kind of extension. However, they didn't say a thing about alpha-blending. I haven't heard about this feature since. Perhaps it didn't become popular and shared the fate of TruForm (forgotten). Also such rendering pipeline will not be able to some things that are possible on z-buffer
If it's order-independent transparencies you're after then the fundamental problem is that a depth buffer stores on depth per pixel but if you're composing a view of partially transparent geometry then more than one fragment contributes to each pixel.
If you were to solve the problem robustly you'd need an ordered list of depths per pixel, going back to the closest opaque fragment. You'd then walk the list in reverse order. In practice OpenGL doesn't do things like variably sized arrays so people achieve pretty much that by drawing their geometry in back-to-front order.
An alternative embodied by GL_SAMPLE_ALPHA_TO_COVERAGE is to switch to screen-door transparency, which is indistinguishable from real transparency either at a really high resolution or with multisampling. Ideally you'd do that stochastically, but that would void the OpenGL rule of repeatability. Nevertheless since you're in GLSL you can do it for yourself. Your sampler simply takes the input alpha and uses that as the probability that it'll output the final pixel. So grab a random value in the range 0.0 to 1.0 from somewhere and if it's greater than the alpha then discard the pixel. Always output with an alpha of 1.0 and just use the normal depth buffer. Answers like this say a bit more on what you can do to get randomish numbers in GLSL, and obviously you want to turn multisampling up as high as possible.
Eric Enderton has written a decent paper (which has a slide version) on stochastic order-independent transparency that goes alongside a DirectX implementation that's worth checking out.

Sum image intensities in GPU

I have an application where I need take the average intensity of an image for around 1 million images. It "feels" like a job for a GPU fragment shader, but fragment shaders are for per-pixel local computations, while image averaging is a global operation.
One approach I considered is loading the image into a texture, applying a 2x2 box-blur, load the result back into a N/2 x N/2 texture and repeating until the output is 1x1. However, this would take log n applications of the shader.
Is there a way to do it in one pass? Or should I just break down and use CUDA/OpenCL?
The summation operation is a specific case of the "reduction," a standard operation in CUDA and OpenCL libraries. A nice writeup on it is available on the cuda demos page. In CUDA, Thrust and CUDPP are just two examples of libraries that provide reduction. I'm less familiar with OpenCL, but CLPP seems to be a good library that provides reduction. Just copy your color buffer to an OpenGL pixel buffer object and use the appropriate OpenGL interoperability call to make that pixel buffer's memory accessible in CUDA/OpenCL.
If it must be done using the opengl API (as the original question required), the solution is to render to a texture, create a mipmap of the texture, and read in the 1x1 texture. You have to set the filtering right (bilinear is appropriate, I think), but it should get close to the right answer, modulo precision error.
My gut tells me to attempt your implementation in OpenCL. You can optimize for your image size and graphics hardware by breaking up the images into bespoke chunks of data that are then summed in parallel. Could be very fast indeed.
Fragment shaders are great for convolutions but that result is usually written to the gl_FragColor so it makes sense. Ultimately you will have to loop over every pixel in the texture and sum the result which is then read back in the main program. Generating image statistics perhaps not what the fragment shader was designed for and its not clear that a major performance gain is to be had since its not guaranteed a particular buffer is located in GPU memory.
It sounds like you may be applying this algorithm to a real-time motion detection scenario, or some other automated feature detection application. It may be faster to compute some statistics from a sample of pixels rather than the entire image and then build a machine learning classifier.
Best of luck to you in any case!
It doesn't need CUDA if you like to stick to GLSL. Like in the CUDA solution mentioned here, it can be done in a fragment shader staight forward. However, you need about log(resolution) draw calls.
Just set up a shader that takes 2x2 pixel samples from the original image, and output the average sum of those. The result is an image with half resolution in both axes. Repeat that until the image is 1x1 px.
Some considerations: Use GL_FLOAT luminance textures if avaliable, to get an more precise sum. Use glViewport to quarter the rendering area in each stage. The result then ends up in the top left pixel of your framebuffer.

How do draw to a texture in OpenGL

Now that my OpenGL application is getting larger and more complex, I am noticing that it's also getting a little slow on very low-end systems such as Netbooks. In Java, I am able to get around this by drawing to a BufferedImage then drawing that to the screen and updating the cached render every one in a while. How would I go about doing this in OpenGL with C++?
I found a few guides but they seem to only work on newer hardware/specific Nvidia cards. Since the cached rendering operations will only be updated every once in a while, i can sacrifice speed for compatability.
glBegin(GL_QUADS);
setColor(DARK_BLUE);
glVertex2f(0, 0); //TL
glVertex2f(appWidth, 0); //TR
setColor(LIGHT_BLUE);
glVertex2f(appWidth, appHeight); //BR
glVertex2f(0, appHeight); //BR
glEnd();
This is something that I am especially concerned about. A gradient that takes up the entire screen is being re-drawn many times per second. How can I cache it to a texture then just draw that texture to increase performance?
Also, a trick I use in Java is to render it to a 1 X height texture then scale that to width x height to increase the performance and lower memory usage. Is there such a trick with openGL?
If you don't want to use Framebuffer Objects for compatibility reasons (but they are pretty widely available), you don't want to use the legacy (and non portable) Pbuffers either. That leaves you with the simple possibility of reading the contents of the framebuffer with glReadPixels and creating a new texture with that data using glTexImage2D.
Let me add that I don't really think that in your case you are going to gain much. Drawing a texture onscreen requires at least texel access per pixel, that's not really a huge saving if the alternative is just interpolating a color as you are doing now!
I sincerely doubt drawing from a texture is less work than drawing a gradient.
In drawing a gradient:
Color is interpolated at every pixel
In drawing a texture:
Texture coordinate is interpolated at every pixel
Color is still interpolated at every pixel
Texture lookup for every pixel
Multiply lookup color with current color
Not that either of these are slow, but drawing untextured polygons is pretty much as fast as it gets.
Hey there, thought I'd give you some insight in to this.
There's essentially two ways to do it.
Frame Buffer Objects (FBOs) for more modern hardware, and the back buffer for a fall back.
The article from one of the previous posters is a good article to follow on it, and there's plent of tutorials on google for FBOs.
In my 2d Engine (Phoenix), we decided we would go with just the back buffer method. Our class was fairly simple and you can view the header and source here:
http://code.google.com/p/phoenixgl/source/browse/branches/0.3/libPhoenixGL/PhRenderTexture.h
http://code.google.com/p/phoenixgl/source/browse/branches/0.3/libPhoenixGL/PhRenderTexture.cpp
Hope that helps!
Consider using a display list rather than a texture. Texture reads (especially for large ones) are a good deal slower than 8 or 9 function calls.
Before doing any optimization you should make sure you fully understand the bottlenecks. You'll probably be surprised at the result.
Look into FBOs - framebuffer objects. It's an extension that lets you render to arbitrary rendertargets, including textures. This extension should be available on most recent hardware. This is a fairly good primer on FBOs: OpenGL Frame Buffer Object 101