I'm wondering if drawing of a triangle with dimensions partially out of the frustum will take longer than when I would calculate where the triangle is bound by the frustum and make a (probably two) new triangle to draw that one instead of the bigger triangle resulting in the same pixels getting changed.
So the question is are there fragment shaders run for positions that don't even exist on your screen? Or is the rasterization phase optimized for this issue.
Modern GL hardware is really, really good at clipping triangles against the viewport.
The fragment shader will only run on pixels that survive the viewport clipping.
Don't try to do frustum culling at the triangle level, do it at the object/tile level. Your CPU and vertex shader units will thank you :)
This is entirely a driver thing. Any triangle that goes outside of the range is automatically handled by your driver, in theory by subdividing the triangle into two smaller ones; I doubt you will be able to do it faster than your GPU.
There are techniques used to optimize the triangles outside the drawing range without subdivision, though; you might specify a zone that won't need subdivision (sort of a border), and if the point happens to lie inside, special processing is issued, but in general it skips the unnecessary fragment computations.
TL;DR: trust your GPU on that.
Short answer:
No, the fragment shader won't run for pixels outside the frustum.
But what you should really worry about is the vertex shader. It will run even if the whole triangle is outside the frustum because the GPU can't predict (not that I know of a way) if the triangle will end up on the screen.
Related
I have four arbitrary points (lt,rt,rb,lb) in 3d space and I would like these points to define my near clipping plane (lt stands for left-top, rt for right-top and so on).
Unfortunately, these points are not necessarily a rectangle (in screen space). They are however a rectangle in world coordinates.
The context is that I want to have a mirror surface by computing the mirrored world into a texture. The mirror is an arbitary translated and rotated rectangle in 3d space.
I do not want to change the texture coordinates on the vertices, because that would lead to ugly pixelisation when you e.g. look at the mirror from the side. When I would do that, also culling would not work correctly which would lead to huge performance impacts in my case (small mirror, huge world).
I also cannot work with the stencil buffer, because in some scenarios I have mirrors facing each other which would also lead to a huge performance drop. Furthermore, I would like to keep my rendering pipeline simple.
Can anyone tell me how to compute the according projection matrix?
Edit: Of cause I already have moved my camera accordingly. That is not the problem here.
Instead of tweaking the projection matrix (which I don't think can be done in the general case), you should define an additional clipping plane. You do that by enabling:
glEnable(GL_CLIP_DISTANCE0);
And then set gl_ClipDistance vertex shader output to be the distance of the vertex from the mirror:
gl_ClipDistance[0] = dot(vec4(vertex_position, 1.0), mirror_plane);
Im trying to wrap my head around the GPU pipeline and the performance implications...
I create a coordinate system and put a million vertices on it, all of them are now in memory usable by the GPU. I assume this is the performance hit on this step: moving all the floating values into the GPU memory, implying the points where already created.
Then I transform my million points coordinates into clipping coordinates. Here I’m applying a transformation to each point.
As a result of this transformation some points are now outside of the clip coordinates, let’s say only a thousands points are in. Does the vertex shader run on the thousand or all million points? What about the fragment shader? And the building of triangles? Transformation into the final device coordinates only takes the thousand points?
My guess is that the vertex runs on all but the fragment only on the interpolation of the visible vertices.
Is the only optimization possible then to just include as little vertices as possible in the first place? If I’m looking at a full 3D world with buildings, trees, roads... and then zoom at just one rock I’m running all the shaders on all the objects anywhay... so the only solution would be to not put those trees and buildings in the first place? Or can I have this world on GPU memory but just compute the rock? Could I apply the transformation of coordinates to just the rock somehow? Where in the pipeline a technique like GPU culling, level of detail or dynamic tessellation takes place?
Vertex shaders are executed for each vertex submitted with glDrawArrays and glDrawElements function families, perhaps even multiple times per vertex. The transformed vertices are then assembled into primitives and clipped—if it's outside the viewport then their processing is done. To reduce the overhead of processing vertices of objects outside the viewport multiple techniques are employed. The simplest one is "frustum culling"—submit the object for rendering only if its bounding box intersects the camera frustum.
Fragment shaders are executed for each fragment ("pixel") in the framebuffer that passes the depth test. One way to reduce their count is to render front to back—so that only the front-visible fragments are ever calculated.
To draw a sphere, one does not need to know anything else but it's position and radius. Thus, rendering a sphere by passing a triangle mesh sounds very inefficient unless you need per-vertex colors or other such features. Despite googling, searching D3D11 documentation and reading Introduction to 3D Programming with DirectX 11, I failed to understand
Is it possible to draw a sphere by passing only the position and radius of it to the GPU?
If not, what is the main principle I have misunderstood?
If yes, how to do it?
My ultimate goal is to pass more parameters later on which will be used by a shader effect.
You will need to implement Geometry Shader. This shader should take Sphere center and radius as input and emit a banch of vertices for rasterization. In general this is called point sprites.
One option would be to use tessellation.
https://en.wikipedia.org/wiki/Tessellation_(computer_graphics)
Most of the mess will be generated on the gpu side.
Note:
In the end you still have more parameters sent to the shaders because the sphere will be split into triangles that will be each rendered individually on the screen.
But the split is done on the gpu side.
While you can create a sphere from a point & vertex on the GPU, it's generally not very efficient. With higher-end GPUs you could use Hardware Tessellation, but even that would be better done a different way.
The better solution is to use instancing and render lots of the same VB/IB of sphere geometry scaled to different positions and sizes.
I've noticed that in my program when I look away from my tessellated mesh, the frame time goes way down, suggesting that no tessellation is happening on meshes that aren't on screen. (I have no custom culling code)
But for my purposes I need access in the geometry shader to the tessellated vertices whether they are on screen or not.
Are these tessellated triangles making it to the geometry shader stage? Or are they being culled before they even make it to the tessellation evaluator stage.
when I look away from my tessellated mesh, the frame time goes way down, suggesting that no tessellation is happening on meshes that aren't on screen.
Well there's your problem; the belief that faster speed means lack of tessellation.
There are many reasons why off-screen polygons will be faster than on-screen ones. For example, you could be partially or entirely fragment-shader and/or pixel-bound in terms of speed. Thus, once those fragments are culled by being entirely off-screen, your rendering time goes down.
In short, everything's fine. Neither OpenGL nor D3D allows the tessellation stage to discard geometry arbitrarily. Remember: you don't have to tessellate in clip-space, so there's no way for the system to even know if a triangle is "off-screen". Your GS is allowed to potentially bring off-screen triangles on-screen. So there's not enough information at that point to decide what is and is not off-screen.
In all likelihood, what's happening is that your renderer's performance was bound to the rasterizer/fragment shader, rather than to vertex processing. So even though the vertices are still being processed, the expensive per-fragment operations aren't being done because the triangles are off-screen. Thus, a performance improvement.
http://www.opengl.org/wiki/Rendering_Pipeline_Overview says that "primitives that lie on the boundary between the inside of the viewing volume and the outside are split into several primitives" after the geometry shader is run and before fragments are rasterized. Everything else I've ever read about OpenGL has also described the clipping process the same way. However, by setting gl_FragDepth in the fragment shader to values that are closer to the camera than the actual depth of the point on the triangle that generated it (so that the fragment passes the depth test when it would have failed if I were copying fixed-pipeline functionality), I'm finding that fragments are being generated for the entire original triangle even if it partially overlaps the far viewing plane. On the other hand, if all of the vertices are behind the plane, the whole triangle is clipped and no fragments are sent to the fragment shader (I suppose more technically you would say it is culled, not clipped).
What is going on here? Does my geometry shader replace some default functionality? Are there flags/hints I need to set or additional built-in variables that I need to write to in order for the next step of the rendering pipeline to know how to do partial clipping?
I'm using GLSL version 1.2 with the GL_EXT_geometry_shader4 extension on an NVIDIA GeForce 9400M.
That sounds like a driver bug. If you can see results for fragments that should have been outside the viewing region (ie: if turning off your depth writes causes the fragments to disappear entirely), then that's against the spec's behavior.
Granted, it's such a corner case that I doubt anyone's going to do anything about it.
Most graphics hardware tries as hard as possible to avoid actually clipping triangles. Clipping triangles means potentially generating 3+ triangles from a single triangle. That tends to choke the pipeline (pre-tessellation at any rate). Therefore, unless the triangle is trivially rejectable (ie: outside the clip box) or incredibly large, modern GPUs just ignore it. They let the fragment culling hardware take care of it.
In this case, because your fragment shader is a depth-writing shader, it believes that it can't reject those fragments until your fragment shader has finished.
Note: I realized that if you turn on depth clamping, that turns off near and far clipping entirely. Which may be what you want. Depth values written from the fragment shader are clamped to the current glDepthRange.
Depth clamping is an OpenGL 3.2 feature, but NVIDIA has supported it for near on a decade with NV_depth_clamp. And if your drivers are recent, you should be able to use ARB_depth_clamp even if you don't get a 3.2 compatibility context.
If I understood you correctly, you wonder that your triangles aren't clipped against the far plane.
Afaik OpenGL just clips against the 4 border planes after the vertex assembly. The far and near clipping gets done (by spec afaik) after the fragment shader. Ie when you zoom in extremely and polygons collide with the near plane they get rendered to that point and don't pop away as a whole.
And I don't think that the specs note splitting primitives at all (even when the hw might do that in screenspace ignoring fragdepth), it just notes skipping primitives as a whole (in the case that none vertex lies in the view frustum).
Also relying on a wiki for word-exact rules is always a bad idea.
PS: http://fgiesen.wordpress.com/2011/07/05/a-trip-through-the-graphics-pipeline-2011-part-5/ explains the actual border and near&far clipping very good.