Texture Image processing on the GPU? - opengl

I'm rendering a certain scene into a texture and then I need to process that image in some simple way. How I'm doing this now is to read the texture using glReadPixels() and then process it on the CPU. This is however too slow so I was thinking about moving the processing to the GPU.
The simplest setup to do this I could think of is to display a simple white quad that takes up the entire viewport in an orthogonal projection and then write the image processing bit as a fragment shader. This will allow many instances of the processing to run in parallel as well as to access any pixel of the texture it requires for the processing.
Is this a viable course of action? is it common to do things this way?
Is there maybe a better way to do it?

Yes, this is the usual way of doing things.
Render something into a texture.
Draw a fullscreen quad with a shader that reads that texture and does some operations.
Simple effects (e.g. grayscale, color correction, etc.) can be done by reading one pixel and outputting one pixel in the fragment shader. More complex operations (e.g. swirling patterns) can be done by reading one pixel from offset location and outputting one pixel. Even more complex operations can be done by reading multiple pixels.
In some cases multiple temporary textures would be needed. E.g. blur with high radius is often done this way:
Render into a texture.
Render into another (smaller) texture, with a shader that computes each output pixel as average of multiple source pixels.
Use this smaller texture to render into another small texture, with a shader that does proper Gaussian blur or something.
... repeat
In all of the above cases though, each output pixel should be independent of other output pixels. It can use one more more input pixels just fine.
An example of processing operation that does not map well is Summed Area Table, where each output pixel is dependent on input pixel and the value of adjacent output pixel. Still, it is possible to do those kinds on the GPU (example pdf).

Yes, it's the normal way to do image processing. The color of the quad doesn't really matter if you'll be setting the color for every pixel. Depending on your application, you might need to careful about pixel sampling issues (i.e. ensuring that you sample from exactly the correct pixel on the source texture, rather than halfway between two pixels).

Related

OpenGL trim/inline contour of stencil

I have created a shape in my stencil buffer (black in the picture below). Now I would like to render to the backbuffer. I would like one texture on the outer pixels (say 4 pixels) of my stencil (red), and an other texture on the remaining pixels (red).
I have read several solutions that involve scaling, but that will not work when there is no obvious center of the shape.
How do I acquire the desired effect?
The stencil buffer works great for doing operations on the specific fragments being overlaid onto them. However, it's not so great for doing operations that require looking at pixels other than the one corresponding to the fragment being rendered. In order to do outlining, you have to ask about the values of neighboring pixels, which stencil operations don't allow.
So, if it is possible to put the stencil data you want to test against in a non-stencil format image (ie: a color image, maybe with an integer texture format), that would make things much simpler. You can do the effect of stencil discarding by using discard directly in the fragment shader. Since you can fetch arbitrarily from the texture (as long as you're not trying to modify it), you can fetch neighboring pixels and test their values. You can use that to identify when a fragment is near a border.
However, if you're relying on specialized stencil operations to build the stencil data itself (like bitwise operations), then that's more complicated. You will have to employ stencil texturing operations, so you're going to have to render to an FBO texture that has a depth/stencil format. And you'll have to set it up to allow you to read from the stencil aspect of the texture. This is an OpenGL 4.3 feature.
This effectively converts it into an 8-bit unsigned integer texture. That allows you to play whatever games you need to. But if you want to use stencil tests to discard fragments, you will also need texture barrier functionality to allow you to read from an image that's attached to the current FBO. But you don't need to actually use the barrier, since you should mask off stencil writing. You just need GL 4.5 or the NV/ARB_texture_barrier extension to be available, which they widely are.
Either way this happens, the biggest difficulty is going to be varying the size of the border. It is easy to just test the neighboring 9 pixels to see if it is at a border. But the larger the border size, the larger the area of pixels each fragment has to test. At that point, I would suggest trying to look for a different solution, one that is based on some knowledge of what pattern is being written into the stencil buffer.
That is, if the rendering operation that lays down the stencil has some knowledge of the shape, then it could compute a distance to the edge of the shape in some way. This might require constructing the geometry in a way that it has distance information in it.

how to retrieve z depth and color of a rendered pixel

I would like to retrieve the z height of each pixels of a rendered object in a scene.
I will need to retrieve the color rendered too.
What are the opengl technics to implement ?
glReadPixels and CPU side code
use glReadPixels to obtain both RGB and Depth buffers. Here examples for both:
depth buffer got by glReadPixels is always 1
OpenGL Scale Single Pixel Line
That will read the buffers into CPU accessible memory. This way is slow (due to sync) but should work on any platform.
FBO render to texture and GPU shader
Faster method is to use FBO and render to texture and use that output in next rendering pass as input texture for computing your stuff inside shaders. This however will not run properly on Intel and might need additional tweaking of code between nVidia and AMD.
If you have per pixel output use single QUAD covering your screen as the second rendering pass.
If you got single output for the whole screen instead use single POINT render and compute all in the fragment shader (scann the whole texture inside) something like this:
How to implement 2D raycasting light effect in GLSL
The difference is that by usnig shaders and FBO you are not transferring data between GPU/CPU so its way faster.
The content of the targeted textures can be still readed by CPU using texture related GL functions
compute GPU shaders
There are also compute shaders out there but I did not use them yet so I am just guessing however with them it might be possible to do your stuff in single pass and also the form of the result and computation should not be as limiting.
My bet is that you are doing some post processing similar to Deferred Shading so googling such topic/tutorials might help.

How to do particle binning in OpenGL?

Currently I'm creating a particle system and I would like to transfer most of the work to the GPU using OpenGL, for gaining experience and performance reasons. At the moment, there are multiple particles scattered through the space (these are currently still created on the CPU). I would more or less like to create a histogram of them. If I understand correctly, for this I would first translate all the particles from world coordinates to screen coordinates in a vertex shader. However, now I want to do the following:
So, for each pixel a hit count of how many particles are inside. Each particle will also have several properties (e.g. a colour) and I would like to sum them for every pixel (as shown in the lower-right corner). Would this be possible using OpenGL? If so, how?
The best tool I recomend for having the whole data (if it fits on GPU memory) is the use of SSBO.
Nevertheless, you need data after transforming them (e.g. by a projection). Still SSBO is your best option:
In the fragment shader you read the properties of already handled particles (let's say, the rendered pixel) and write modified properties (number of particles at this pixel, color, etc) to the same index in the buffer.
Due to parallel nature of GPU, several instances coming from different particles may be doing concurrently the work for the same index. Thus you need to handle this on your own. Read Memory model and Atomic operations
Another approach, but limited, is using Blending
The idea is that each fragment increments the actual color value of the frame buffer. This can be done using GL_FUNC_ADD for glBlendEquationSeparate and using as fragment-output-color a value of 1/255 (normalized integer) for each RGB/a component.
Limitations come from the [0-255] range: Only up to 255 particles in the same pixel, the rest amount is clamped to this range and so "lost".
You have four components RGBA, thus four properties can be handled. But can have several renderbuffers in a FBO.
You can read the FBO by glReadPixels. Use glReadBuffer first with a GL_COLOR_ATTACHMENTi if you use a FBO instead of the default frame buffer.

Using a buffer for selectioning objects: accuracy problems

in each frame (as in frames per second) I render, I make a smaller version of it with just the objects that the user can select (and any selection-obstructing objects). In that buffer I render each object in a different color.
When the user has mouseX and mouseY, I then look into that buffer what color corresponds with that position, and find the corresponding objects.
I can't work with FBO so I just render this buffer to a texture, and rescale the texture orthogonally to the screen, and use glReadPixels to read a "hot area" around mouse cursor.. I know, not the most efficient but performance is ok for now.
Now I have the problem that this buffer with "colored objects" has some accuracy problems. Of course I disable all lighting and frame shaders, but somehow I still get artifacts. Obviously I really need clean sheets of color without any variances.
Note that here I put all the color information in an unsigned byte in GL_RED. (assumiong for now I maximally have 255 selectable objects).
Are these caused by rescaling the texture? (I could replace this by looking up scaled coordinates int he small texture.), or do I need to disable some other flag to really get the colors that I want.
Can this technique even be used reliably?
It looks like you're using GL_LINEAR for your GL_TEXTURE_MAG_FILTER. Use GL_NEAREST instead if you don't want interpolated colors.
I could replace this by looking up scaled coordinates int he small texture.
You should. Rescaling is more expensive than converting the coordinates for sure.
That said, scaling a uniform texture should not introduce artifacts if you keep an integer ratio (like upscale 2x), with no fancy filtering. It looks blurry on the polygon edges, so I'm assuming that's not what you use.
Also, the rescaling should introduce variations only at the polygon boundaries. Did you check that there are no variations in the un-scaled texture ? That would confirm whether it's the scaling that introduces your "artifacts".
What exactly do you mean by "variance"? Please explain in more detail.
Now some suggestion: In case your rendering doesn't depend on stencil buffer operations, you could put the object ID into the stencil buffer in the render pass to the window itself, don't use the detour over a separate texture. On current hardware you usually get 8 bits of stencil. Of course the best solution, if you want to use a index buffer approach, is using multiple render targets and render the object ID into an index buffer together with color and the other stuff in one pass. See http://www.opengl.org/registry/specs/ARB/draw_buffers.txt

Count image similarity on GPU [OpenGL/OcclusionQuery]

OpenGL. Let's say I've drawn one image and then the second one using XOR. Now I've got black buffer with non-black pixels somewhere, I've read that I can use shaders to count black [ rgb(0,0,0) ] pixels ON GPU?
I've also read that it has to do something with OcclusionQuery.
http://oss.sgi.com/projects/ogl-sample/registry/ARB/occlusion_query.txt
Is it possible and how? [any programming language]
If you've got other idea on how to find similarity via OpenGL/GPU - that would be great too.
I'm not sure how you do the XOR bit (at least it should be slow; I don't think any of current GPUs accelerate that), but here's my idea:
have two input images
turn on occlusion query.
draw the two images to the screen (i.e. full screen quad with two textures set up), with a fragment shader that computes abs(texel1-texel2), and kills the pixel (discard in GLSL) if the pixels are the same (difference is zero or below some threshold). Easiest is probably just using a GLSL fragment shader, and there you just read two textures, compute abs() of the difference and discard the pixel. Very basic GLSL knowledge is enough here.
get number of pixels that passed the query. For pixels that are the same, the query won't pass (pixels will be discarded by the shader), and for pixels that are different, the query will pass.
At first I though of a more complex approach that involves depth buffer, but then realized that just killing pixels should be enough. Here's my original though (but the above one is simpler and more efficient):
have two input images
clear screen and depth buffer
draw the two images to the screen (i.e. full screen quad with two textures set up), with a fragment shader that computes abs(texel1-texel2), and kills the pixel (discard in GLSL) if the pixels are different. Draw the quad so that it's depth buffer value is something close to near plane.
after this step, depth buffer will contain small depth values for pixels that are the same, and large (far plane) depth values for pixels that are different.
turn on occlusion query, and draw another full screen quad with depth closer than far plane, but larger than the previous quad.
get number of pixels that passed the query. For pixels that are the same, the query won't pass (depth buffer is already closer), and for pixels that are different, the query will pass. You'd use SAMPLES_PASSED_ARB to get this. There's an occlusion query example at CodeSampler.com to get your started.
Of course all this requires GPU with occlusion query support. Most GPUs since 2002 or so do support that, with exception of some low-end ones (in particular, Intel 915 (aka GMA 900) and Intel 945 (aka GMA 950)).