I've noticed that in my program when I look away from my tessellated mesh, the frame time goes way down, suggesting that no tessellation is happening on meshes that aren't on screen. (I have no custom culling code)
But for my purposes I need access in the geometry shader to the tessellated vertices whether they are on screen or not.
Are these tessellated triangles making it to the geometry shader stage? Or are they being culled before they even make it to the tessellation evaluator stage.
when I look away from my tessellated mesh, the frame time goes way down, suggesting that no tessellation is happening on meshes that aren't on screen.
Well there's your problem; the belief that faster speed means lack of tessellation.
There are many reasons why off-screen polygons will be faster than on-screen ones. For example, you could be partially or entirely fragment-shader and/or pixel-bound in terms of speed. Thus, once those fragments are culled by being entirely off-screen, your rendering time goes down.
In short, everything's fine. Neither OpenGL nor D3D allows the tessellation stage to discard geometry arbitrarily. Remember: you don't have to tessellate in clip-space, so there's no way for the system to even know if a triangle is "off-screen". Your GS is allowed to potentially bring off-screen triangles on-screen. So there's not enough information at that point to decide what is and is not off-screen.
In all likelihood, what's happening is that your renderer's performance was bound to the rasterizer/fragment shader, rather than to vertex processing. So even though the vertices are still being processed, the expensive per-fragment operations aren't being done because the triangles are off-screen. Thus, a performance improvement.
Related
A cube with different colored faces in intermediate mode is very simple. But doing this same thing with shaders seems to be quite a challenge.
I have read that in order to create a cube with different coloured faces, I should create 24 vertices instead of 8 vertices for the cube - in other words, (I visualies this as 6 squares that don't quite touch).
Is perhaps another (better?) solution to texture the faces of the cube using a real simple texture a flat color - perhaps a 1x1 pixel texture?
My texturing idea seems simpler to me - from a coder's point of view.. but which method would be the most efficient from a GPU/graphic card perspective?
I'm not sure what your overall goal is (e.g. what you're learning to do in the long term), but generally for high performance applications (e.g. games) your goal is to reduce GPU load. Every time you switch certain states (e.g. change textures, render targets, shader uniform values, etc..) the GPU stalls reconfiguring itself to meet your demands.
So, you can pass in a 1x1 pixel texture for each face, but then you'd need six draw calls (usually not so bad, but there is some prep work and potential cache misses) and six texture sets (can be very bad, often as bad as changing shader uniform values).
Suppose you wanted to pass in one texture and use that as a texture map for the cube. This is a little less trivial than it sounds -- you need to express each texture face on the texture in a way that maps to the vertices. Often you need to pass in a texture coordinate for each vertex, and due to the spacial configuration of the texture this normally doesn't end up meaning one texture coordinate for one spatial vertex.
However, if you use an environmental/reflection map, the complexities of mapping are handled for you. In this way, you could draw a single texture on all sides of your cube. (Or on your sphere, or whatever sphere-mapped shape you wanted.) I'm not sure I'd call this easier since you have to form the environmental texture carefully, and you still have to set a different texture for each new colors you want to represent -- or change the texture either via the GPU or in step with the GPU, and that's tricky and usually not performant.
Which brings us back to the canonical way of doing as you mentioned: use vertex values -- they're fast, you can draw many, many cubes very quickly by only specifying different vertex data, and it's easy to understand. It really is the best way, and how GPUs are designed to run quickly.
Additionally..
And yes, you can do this with just shaders... But it'd be ugly and slow, and the GPU would end up computing it per each pixel.. Pass the object space coordinates to the fragment shader, and in the fragment shader test which side you're on and output the corresponding color. Highly not recommended, it's not particularly easier, and it's definitely not faster for the GPU -- to change colors you'd again end up changing uniform values for the shaders.
I'm trying to develop a high level understanding of the graphics pipeline. One thing that doesn't make much sense to me is why the Geometry shader exists. Both the Tessellation and Geometry shaders seem to do the same thing to me. Can someone explain to me what does the Geometry shader do different from the tessellation shader that justifies its existence?
The tessellation shader is for variable subdivision. An important part is adjacency information so you can do smoothing correctly and not wind up with gaps. You could do some limited subdivision with a geometry shader, but that's not really what its for.
Geometry shaders operate per-primitive. For example, if you need to do stuff for each triangle (such as this), do it in a geometry shader. I've heard of shadow volume extrusion being done. There's also "conservative rasterization" where you might extend triangle borders so every intersected pixel gets a fragment. Examples are pretty application specific.
Yes, they can also generate more geometry than the input but they do not scale well. They work great if you want to draw particles and turn points into very simple geometry. I've implemented marching cubes a number of times using geometry shaders too. Works great with transform feedback to save the resulting mesh.
Transform feedback has also been used with the geometry shader to do more compute operations. One particularly useful mechanism is that it does stream compaction for you (packs its varying amount of output tightly so there are no gaps in the resulting array).
The other very important thing a geometry shader provides is routing to layered render targets (texture arrays, faces of a cube, multiple viewports), something which must be done per-primitive. For example you can render cube shadow maps for point lights in a single pass by duplicating and projecting geometry 6 times to each of the cube's faces.
Not exactly a complete answer but hopefully gives the gist of the differences.
See Also:
http://rastergrid.com/blog/2010/09/history-of-hardware-tessellation/
Is there any difference in performance between drawing a scene with full triangles (GL_TRIANGLES) instead of just drawing their vertices (GL_POINTS), on modern hardware?
Where GL_POINTS is initialized like this:
glPointSize(1.0);
glDisable(GL_POINT_SMOOTH);
I have a somewhat low-end graphics card (9600gt) and drawing vertices-only can bring a 2x fps increase on certain sceneries. Not sure if it applies too on more recent gpus.
2x fps increase on
You lose 98% of picture and get only 2x fps increase. That's not impressive. If you take into account that you should be able to easily render 300..500 fps on any decent hardware (with vsync disabled and minor optimizations), that's probably not worth it.
Is there any difference in performance between drawing a scene with full triangles (GL_TRIANGLES) instead of just drawing their vertices (GL_POINTS), on modern hardware?
Well, if your scene has a LOT of alpha-blending and very "heavy" pixel shaders, then, obviously, displaying scene as point cloud will speed things up, because there's less pixels to fill.
On other hand, this kind of "optimization" will be completely useless for any practical task. I mean, if you're using blending and shaders, you probably wouldn't want to display your scene as pointlist in the first place, unless you're doing some kind of debug render (using glPolygonMode), and in case of debug render, you'll probably turn shaders off (because shaded/lit point will be hard to see) and disable lighting.
Even if you're using point sprites as particles or something, I'd stick with triangles - they give more control and do not have maximum size limit (compared to point sprites).
I can display more objects?
If you want more objects, you should probably try to optimzie things elsewhere first. If you stop trying to draw invisible objects (outside of field of view, etc), that'll be a start that can improve performance.
you have a mesh which is very far away from the camera. 1 million triangles and you know it is always in view. At this density ratio, triangles can't be bigger than a pixel,
When triangles are smaller than a pixel, and there are many of them, your mesh start looking like garbage and turns into pixelated mess of points. It will be ugly. Roughly same effect as when you disable mippimapping and texture filters and then render checkboard pattern. Using points instead of triangles might even aggravate effect.
: If you have 1mil triangle mesh that is always visible, you already need different kind of optimization. Reduce number of triangles (level of detail, dynamic tesselation or some solution that can simplify geometry on the fly), use bump mapping(maybe parallax mapping) to simulate extra geometry details that aren't even here, or even turn it into static background or a sprite. That'll work much better. Trying to render it using points will simply make it look ugly.
No, if the number of triangles is similar to the number of their shared vertices (considering the glDrawElements rendering command being used) in both modes the geometry-wise part of the rendering pipeline will be evaluated at roughly the same speed. The only benefit you can get from drawing GL_POINTS relies solely on the percentage of empty screen space you get from not drawing faces, thus only at fragment shader level.
I'm wondering if drawing of a triangle with dimensions partially out of the frustum will take longer than when I would calculate where the triangle is bound by the frustum and make a (probably two) new triangle to draw that one instead of the bigger triangle resulting in the same pixels getting changed.
So the question is are there fragment shaders run for positions that don't even exist on your screen? Or is the rasterization phase optimized for this issue.
Modern GL hardware is really, really good at clipping triangles against the viewport.
The fragment shader will only run on pixels that survive the viewport clipping.
Don't try to do frustum culling at the triangle level, do it at the object/tile level. Your CPU and vertex shader units will thank you :)
This is entirely a driver thing. Any triangle that goes outside of the range is automatically handled by your driver, in theory by subdividing the triangle into two smaller ones; I doubt you will be able to do it faster than your GPU.
There are techniques used to optimize the triangles outside the drawing range without subdivision, though; you might specify a zone that won't need subdivision (sort of a border), and if the point happens to lie inside, special processing is issued, but in general it skips the unnecessary fragment computations.
TL;DR: trust your GPU on that.
Short answer:
No, the fragment shader won't run for pixels outside the frustum.
But what you should really worry about is the vertex shader. It will run even if the whole triangle is outside the frustum because the GPU can't predict (not that I know of a way) if the triangle will end up on the screen.
http://www.opengl.org/wiki/Rendering_Pipeline_Overview says that "primitives that lie on the boundary between the inside of the viewing volume and the outside are split into several primitives" after the geometry shader is run and before fragments are rasterized. Everything else I've ever read about OpenGL has also described the clipping process the same way. However, by setting gl_FragDepth in the fragment shader to values that are closer to the camera than the actual depth of the point on the triangle that generated it (so that the fragment passes the depth test when it would have failed if I were copying fixed-pipeline functionality), I'm finding that fragments are being generated for the entire original triangle even if it partially overlaps the far viewing plane. On the other hand, if all of the vertices are behind the plane, the whole triangle is clipped and no fragments are sent to the fragment shader (I suppose more technically you would say it is culled, not clipped).
What is going on here? Does my geometry shader replace some default functionality? Are there flags/hints I need to set or additional built-in variables that I need to write to in order for the next step of the rendering pipeline to know how to do partial clipping?
I'm using GLSL version 1.2 with the GL_EXT_geometry_shader4 extension on an NVIDIA GeForce 9400M.
That sounds like a driver bug. If you can see results for fragments that should have been outside the viewing region (ie: if turning off your depth writes causes the fragments to disappear entirely), then that's against the spec's behavior.
Granted, it's such a corner case that I doubt anyone's going to do anything about it.
Most graphics hardware tries as hard as possible to avoid actually clipping triangles. Clipping triangles means potentially generating 3+ triangles from a single triangle. That tends to choke the pipeline (pre-tessellation at any rate). Therefore, unless the triangle is trivially rejectable (ie: outside the clip box) or incredibly large, modern GPUs just ignore it. They let the fragment culling hardware take care of it.
In this case, because your fragment shader is a depth-writing shader, it believes that it can't reject those fragments until your fragment shader has finished.
Note: I realized that if you turn on depth clamping, that turns off near and far clipping entirely. Which may be what you want. Depth values written from the fragment shader are clamped to the current glDepthRange.
Depth clamping is an OpenGL 3.2 feature, but NVIDIA has supported it for near on a decade with NV_depth_clamp. And if your drivers are recent, you should be able to use ARB_depth_clamp even if you don't get a 3.2 compatibility context.
If I understood you correctly, you wonder that your triangles aren't clipped against the far plane.
Afaik OpenGL just clips against the 4 border planes after the vertex assembly. The far and near clipping gets done (by spec afaik) after the fragment shader. Ie when you zoom in extremely and polygons collide with the near plane they get rendered to that point and don't pop away as a whole.
And I don't think that the specs note splitting primitives at all (even when the hw might do that in screenspace ignoring fragdepth), it just notes skipping primitives as a whole (in the case that none vertex lies in the view frustum).
Also relying on a wiki for word-exact rules is always a bad idea.
PS: http://fgiesen.wordpress.com/2011/07/05/a-trip-through-the-graphics-pipeline-2011-part-5/ explains the actual border and near&far clipping very good.