How can I apply a depth test to vertices (not fragments)? - opengl

TL;DR I'm computing a depth map in a fragment shader and then trying to use that map in a vertex shader to see if vertices are 'in view' or not and the vertices don't line up with the fragment texel coordinates. The imprecision causes rendering artifacts, and I'm seeking alternatives for filtering vertices based on depth.
Background. I am very loosely attempting to implement a scheme outlined in this paper (http://dash.harvard.edu/handle/1/4138746). The idea is to represent arbitrary virtual objects as lots of tangent discs. While they wanted to replace triangles in some graphics card of the future, I'm implementing this on conventional cards; my discs are just fans of triangles ("Discs") around center points ("Points").
This is targeting WebGL.
The strategy I intend to use, similar to what's done in the paper, is:
Render the Discs in a Depth-Only pass.
In a second (or more) pass, compute what's visible based solely on which Points are "visible" - ie their depth is <= the depth from the Depth-Only pass at that x and y.
I believe the authors of the paper used a gaussian blur on top of the equivalent of a GL_POINTS render applied to the Points (ie re-using the depth buffer from the DepthOnly pass, not clearing it) to actually render their object. It's hard to say: the process is unfortunately a one line comment, and I'm unsure of how to duplicate it in WebGL anyway (a naive gaussian blur will just blur in the background pixels that weren't touched by the GL_POINTS call).
Instead, I'm hoping to do something slightly different, by rerendering the discs in a second pass instead as cones (center of disc becomes apex of cone, think "close the umbrella") and effectively computing a voronoi diagram on the surface of the object (ala redbook http://www.glprogramming.com/red/chapter14.html#name19). The idea is that an output pixel is the color value of the first disc to reach it when growing radiuses from 0 -> their natural size.
The crux of the problem is that only discs whose centers pass the depth test in the first pass should be allowed to carry on (as cones) to the 2nd pass. Because what's true at the disc center applies to the whole disc/cone, I believe this requires evaluating a depth test at a vertex or object level, and not at a fragment level.
Since WebGL support for accessing depth buffers is still poor, in my first pass I am packing depth info into an RGBA Framebuffer in a fragment shader. I then intended to use this in the vertex shader of the second pass via a sampler2D; any disc center that was closer than the relative texture2D() lookup would be allowed on to the second pass; otherwise I would hack "discarding" the vertex (its alpha would be set to 0 or some flag set that would cause discard of fragments associated with the disc/cone or etc).
This actually kind of worked but it caused horrendous z-fighting between discs that were close together (very small perturbations wildly changed which discs were visible). I believe there is some floating point error between depth->rgba->depth. More importantly, though, the depth texture is being set by fragment texel coords, but I'm looking up vertices, which almost certainly don't line up exactly on top of relevant texel coordinates; so I get depth +/- noise, essentially, and the noise is the issue. Adding or subtracting .000001 or something isn't sufficient: you trade Type I errors for Type II. My render became more accurate when I switched from NEAREST to LINEAR for the depth texture interpolation, but it still wasn't good enough.
How else can I determine which disc's centers would be visible in a given render, so that I can do a second vertex/fragment (or more) pass focused on objects associated with those points? Or: is there a better way to go about this in general?

Related

Gaussian blur on shadow map (FBO/texture)

I have followed the tutorial presented here to create shadow maps.
What I would like to do is to have some sort of post processing step on the shadow, where I apply gaussian blur (or whatever blur) on the shadow map. Understand that I have followed this tutorial strictly, and that I am very inexperienced when it comes to OpenGL. I don't know if it should be applied to the depthMapFBO or the depthMap itself. Or if I need to create new FBOs/textures.
Can one even blur the depth value in this way? How would you go about blurring the shadow map?
Note that I'm not interested in realism, I just want a uniform blurring on all shadows.
Blurring a shadow depth texture makes no sense.
A depth texture contains depth values. A value of, for example, 0.5 means something. It specifies the depth of a texel. For a shadow map, it means something even more specific: it specifies the closest distance to the light of an occluding surface at a particular location in the scene.
If the closest distance at one location is 0.5, what would it mean to "blur" this with a distance of 0.6? It would effectively mean that you have changed the distance of objects to the light. But since that won't be reflected in the actual geometry, this means that the blurred shadow map no longer accurately represents the geometry. So now, there will be locations that should be shadowed which are not, and locations that are being shadowed but shouldn't be.
In short, it makes your depth texture meaningless.
What you seem to want is softer shadows. There are ways to accomplish that with shadow maps, but blurring the depth texture is not one of them.

(Modern) OpenGL Different Colored Faces on a Cube - Using Shaders

A cube with different colored faces in intermediate mode is very simple. But doing this same thing with shaders seems to be quite a challenge.
I have read that in order to create a cube with different coloured faces, I should create 24 vertices instead of 8 vertices for the cube - in other words, (I visualies this as 6 squares that don't quite touch).
Is perhaps another (better?) solution to texture the faces of the cube using a real simple texture a flat color - perhaps a 1x1 pixel texture?
My texturing idea seems simpler to me - from a coder's point of view.. but which method would be the most efficient from a GPU/graphic card perspective?
I'm not sure what your overall goal is (e.g. what you're learning to do in the long term), but generally for high performance applications (e.g. games) your goal is to reduce GPU load. Every time you switch certain states (e.g. change textures, render targets, shader uniform values, etc..) the GPU stalls reconfiguring itself to meet your demands.
So, you can pass in a 1x1 pixel texture for each face, but then you'd need six draw calls (usually not so bad, but there is some prep work and potential cache misses) and six texture sets (can be very bad, often as bad as changing shader uniform values).
Suppose you wanted to pass in one texture and use that as a texture map for the cube. This is a little less trivial than it sounds -- you need to express each texture face on the texture in a way that maps to the vertices. Often you need to pass in a texture coordinate for each vertex, and due to the spacial configuration of the texture this normally doesn't end up meaning one texture coordinate for one spatial vertex.
However, if you use an environmental/reflection map, the complexities of mapping are handled for you. In this way, you could draw a single texture on all sides of your cube. (Or on your sphere, or whatever sphere-mapped shape you wanted.) I'm not sure I'd call this easier since you have to form the environmental texture carefully, and you still have to set a different texture for each new colors you want to represent -- or change the texture either via the GPU or in step with the GPU, and that's tricky and usually not performant.
Which brings us back to the canonical way of doing as you mentioned: use vertex values -- they're fast, you can draw many, many cubes very quickly by only specifying different vertex data, and it's easy to understand. It really is the best way, and how GPUs are designed to run quickly.
Additionally..
And yes, you can do this with just shaders... But it'd be ugly and slow, and the GPU would end up computing it per each pixel.. Pass the object space coordinates to the fragment shader, and in the fragment shader test which side you're on and output the corresponding color. Highly not recommended, it's not particularly easier, and it's definitely not faster for the GPU -- to change colors you'd again end up changing uniform values for the shaders.

How can I deterministically detect the shader fragment location in its 2x2 pixel quad?

I've been trying to utilize the techniques in Eric Penner's "Shader Amortization using
Pixel Quad Message Passing" from GPU Pro 2, Chapter VI.2. The basic idea is that modern GPU's process fragment shaders in 2x2 fragment quads, and you can use ddx() and ddy() to get the value of some_var at all four fragments as long as the following hold:
Your GPU supports high-quality derivatives
You know which fragment you're processing (top-left, top-right, bottom-left, bottom-right)
This opens up a lot of opportunities for fragment shader optimization (like distributing texture fetches over a 2x2 pixel quad) that you'd need Compute Shaders to beat.
My problem is this:
I can't deterministically detect which fragment I'm processing. Ideally, each fragment block would start at even-numbered output pixel coords like (0, 0), (2, 0), ... (1024, 1024), ..., so you'd just need to check whether the output pixel x and y coords are even or odd to know which fragment you're currently processing. The method Penner uses in the book assumes this works...but it seems to be going wrong for me.
Unfortunately, my 2x2 fragment quads appear to be starting in nondeterministic places: I've seen them start at (even, even), (even, odd), and (odd, even). I can't remember if I've seen (odd, odd) or not, but anyway, the arrangement seems to depend on a myriad of factors I don't understand, including the output resolution and shader specifics. (I'm testing on an 8800 GTS, in case anyone's wondering.)
Does anyone know what might be causing this nondeterminism or have any documentation on it? I understand there's virtually no official standardization in this area, but I'm more interested in how things work in practice on modern desktop-level GPU's, and I'm hoping there's a way to get this technique to work. If no one knows how to reason about the even/odd start behavior, does anyone know any other way of determining the current fragment's relative location in its 2x2 quad?
Thanks :)
As it turns out, the premise of my question was mostly wrong:
The 2x2 fragment quads DO almost always start on even pixel numbers...as long as the output resolution is even-numbered.
If the output resolution is odd-numbered (a possibility with the underlying program I'm working with), things can get more complicated, for obvious reasons. I don't expect there's any uniformity here across drivers/GPU's/etc. either, but my current tests (which themselves may still be buggy) appear to demonstrate 2x2 pixel quads starting at an odd pixel along the dimension with odd resolution, at least when the odd dimension is horizontal.
All of this weirdness helped obscure my bigger issue: The code I used to detect the fragment's location in the pixel quad was buggy. I tested by setting the texture coordinates equal within a pixel quad (set to the pixel quad center)...or so I thought. However, I calculated the screen coordinates based on a full-screen quad where the uv mapping has the +v axis pointing downward. The screenspace origin starts at the bottom-left, because it's based on the top-right quadrant of Cartesian coordinates, and I accidentally forgot to invert the v-coordinate of the uv offset I used to find the pixel quad center. Many of my nondeterministic observations came from failing to check my assumptions while debugging and misinterpreting things as a result, particularly in combination with odd resolutions.
This was an embarrassing mistake I should have caught a lot sooner, but I figured I'd detail it as a warning to others to always double-check the direction of your vertical axis when you're dealing with opposite-facing coordinate frames. ;)
UPDATE:
I ran across a situation where 2x2 pixel quads started on even pixel numbers even when the resolution was odd. Thanks to the nondeterminism under odd resolutions, I had to work out another solution:
If you're deriving your screen pixel numbers from the uv coords of a fullscreen quad (for post-processing), the fragment location derived from this is only useful for arranging/placing shared samples between fragments, etc., not for the quad-pixel communication itself. You'll need to have screen pixel numbers with respect to the screenspace origin for that. You can derive these from vertex positions, or you can use ddx().x and ddy().y on the uv-based pixel numbers to find out their screen direction and mirror the fragment position in the appropriate direction from there.
Calculate the fragment location based on your screen pixel numbers (with respect to the true screenspace origin) and the assumption 2x2 pixel quads start on even pixels. (If you used uv-based pixel numbers, now is the time to mirror things.)
Do a ddx().x and ddy().y on the fragment location, and if they're negative in either direction, you know the pixel quad starts at an odd pixel number in that direction...so mirror in that direction.
If you calculate two fragment positions, one based on a uv origin and one based on a screen origin, use the uv-based one for reasoning about uv-based sample placement, and use the screen-based one for actually obtaining the values of a variable at neighboring fragments.
Profit.
I'll post a link to my working MIT-licensed code once I release it on Github, along with usage examples (the speedup is unfortunately not what I expected, but whatever ;)). I'm just waiting to get done with a larger shader I'll be uploading along with it.

How to do bilinear interpolation of normals over a quad?

I'm working on a Minecraft-like engine as a hobby project to see how far the concept of voxel terrains can be pushed on modern hardware and OpenGL >= 3. So, all my geometry consists of quads, or squares to be precise.
I've built a raycaster to estimate ambient occlusion, and use the technique of "bent normals" to do the lighting. So my normals aren't perpendicular to the quad, nor do they have unit length; rather, they point roughly towards the space where least occlusion is happening, and are shorter when the quad receives less light. The advantage of this technique is that it just requires a one-time calculation of the occlusion, and is essentially free at render time.
However, I run into trouble when I try to assign different normals to different vertices of the same quad in order to get smooth lighting. Because the quad is split up into triangles, and linear interpolation happens over each triangle, the result of the interpolation clearly shows the presence of the triangles as ugly diagonal artifacts:
The problem is that OpenGL uses barycentric interpolation over each triangle, which is a weighted sum over 3 out of the 4 corners. Ideally, I'd like to use bilinear interpolation, where all 4 corners are being used in computing the result.
I can think of some workarounds:
Stuff the normals into a 2x2 RGB texture, and let the texture processor do the bilinear interpolation. This happens at the cost of a texture lookup in the fragment shader. I'd also need to pack all these mini-textures into larger ones for efficiency.
Use vertex attributes to attach all 4 normals to each vertex. Also attach some [0..1] coefficients to each vertex, much like texture coordinates, and do the bilinear interpolation in the fragment shader. This happens at the cost of passing 4 normals to the shader instead of just 1.
I think both these techniques can be made to work, but they strike me as kludges for something that should be much simpler. Maybe I could transform the normals somehow, so that OpenGL's interpolation would give a result that does not depend on the particular triangulation used.
(Note that the problem is not specific to normals; it is equally applicable to colours or any other value that needs to be smoothly interpolated across a quad.)
Any ideas how else to approach this problem? If not, which of the two techniques above would be best?
As you clearly understands, the triangle interpolation that GL will do is not what you want.
So the normal data can't be coming directly from the vertex data.
I'm afraid the solutions you're envisioning are about the best you can achieve. And no matter what you pick, you'll need to pass down [0..1] coefficients from the vertex to the shader (including 2x2 textures. You need them for texture coordinates).
There are some tricks you can do to somewhat simplify the process, though.
Using the vertex ID can help you out with finding which vertex "corner" to pass from vertex to fragment shader (our [0..1] values). A simple bit test on the lowest 2 bits can let you know which corner to pass down, without actual vertex data input. If packing texture data, you still need to pass an identifier inside the texture, so this may be moot.
if you use 2x2 textures to allow the interpolation, there are (were?) some gotchas. Some texture interpolators don't necessarily give a high precision interpolation if the source is in a low precision to begin with. This may require you to change the texture data type to something of higher precision to avoid banding artifacts.
Well... as you're using Bent normals technique, the best way to increase result is to pre-tessellate mesh and re-compute with mesh with higher tessellation.
Another way would be some tricks within pixel shader... one possible way - you can actually interpolate texture on your own (and not use built-in interpolator) in pixel shader, which could help you a lot. And you're not limited just to bilinear interpolation, you could do better, F.e. bicubic interpolation ;)

Count image similarity on GPU [OpenGL/OcclusionQuery]

OpenGL. Let's say I've drawn one image and then the second one using XOR. Now I've got black buffer with non-black pixels somewhere, I've read that I can use shaders to count black [ rgb(0,0,0) ] pixels ON GPU?
I've also read that it has to do something with OcclusionQuery.
http://oss.sgi.com/projects/ogl-sample/registry/ARB/occlusion_query.txt
Is it possible and how? [any programming language]
If you've got other idea on how to find similarity via OpenGL/GPU - that would be great too.
I'm not sure how you do the XOR bit (at least it should be slow; I don't think any of current GPUs accelerate that), but here's my idea:
have two input images
turn on occlusion query.
draw the two images to the screen (i.e. full screen quad with two textures set up), with a fragment shader that computes abs(texel1-texel2), and kills the pixel (discard in GLSL) if the pixels are the same (difference is zero or below some threshold). Easiest is probably just using a GLSL fragment shader, and there you just read two textures, compute abs() of the difference and discard the pixel. Very basic GLSL knowledge is enough here.
get number of pixels that passed the query. For pixels that are the same, the query won't pass (pixels will be discarded by the shader), and for pixels that are different, the query will pass.
At first I though of a more complex approach that involves depth buffer, but then realized that just killing pixels should be enough. Here's my original though (but the above one is simpler and more efficient):
have two input images
clear screen and depth buffer
draw the two images to the screen (i.e. full screen quad with two textures set up), with a fragment shader that computes abs(texel1-texel2), and kills the pixel (discard in GLSL) if the pixels are different. Draw the quad so that it's depth buffer value is something close to near plane.
after this step, depth buffer will contain small depth values for pixels that are the same, and large (far plane) depth values for pixels that are different.
turn on occlusion query, and draw another full screen quad with depth closer than far plane, but larger than the previous quad.
get number of pixels that passed the query. For pixels that are the same, the query won't pass (depth buffer is already closer), and for pixels that are different, the query will pass. You'd use SAMPLES_PASSED_ARB to get this. There's an occlusion query example at CodeSampler.com to get your started.
Of course all this requires GPU with occlusion query support. Most GPUs since 2002 or so do support that, with exception of some low-end ones (in particular, Intel 915 (aka GMA 900) and Intel 945 (aka GMA 950)).