I have an art application I'm dabbling with that uses OpenGL for accelerated graphics rendering. I'd like to be able to add the ability to draw arbitrary piecewise curves - pretty much the same sort of shapes that can be defined by the SVG 'path' element.
Rather than tessellating my paths into polygons on the CPU, I thought it might be better to pass an array of values in a buffer to my shader defining the pieces of my curve and then using an in/out test to check which pixels were actually inside. In other words, I'd be iterating through a potentially large array of data describing each segment in my path.
From what I remember back when I learned shader programming years ago, GPUs handle if statements by evaluating both branches and then throwing away the branch that wasn't used. This would effectively mean that it would end up silently running through my entire buffer even if I only used a small part of it (i.e., my buffer has the capacity to handle 1024 curve segments, but the simple rectangle I'm drawing only uses the first four of them).
How do I write my code to deal with this variable data? Can modern GPUs handle conditional code like this well?
GPUs can handle arbitrary-length buffers and conditionals (or fake it convincingly). The problem is that a vertex and geometry shaders cannot generate arbitrary number of triangles from a short description.
OpenGL 4.0 added two new types of shaders: Tessellation Control shaders and Tessellation Evaluation shaders. These shaders give you the ability to tessellate curves and surfaces on the GPU.
I found this tutorial to be quite useful in showing how to tessellate Bezier curves on the GPU.
Related
I am currently implementing the pose estimation algorithm proposed in Oikonomidis et al., 2011, which involves rendering a mesh in N different hypothesised poses (N will probably be about 64). Section 2.5 suggests speeding up the computation by using instancing to generate multiple renderings simultaneously (after which they reduce each rendering to a single number on the GPU), and from their description, it sounds like they found a way to produce N renderings simultaneously.
In my implementation's setup phase, I use an OpenGL viewport array to define GL_MAX_VIEWPORTS viewports. Then in the rendering phase, I transfer an array of GL_MAX_VIEWPORTS model-pose matrices to a mat4 uniform array in GPU memory (I am only interested in estimating position and orientation), and use gl_InvocationID in my geometry shader to select the appropriate pose matrix and viewport for each polygon of the mesh.
GL_MAX_VIEWPORTS is 16 on my machine (I have a GeForce GTX Titan), so this method will allow me to render up to 16 hypotheses at a time on the GPU. This may turn out to be fast enough, but I am nonetheless curious about the following:
Is there is a workaround for the GL_MAX_VIEWPORTS limitation that is likely to be faster than calling my render function ceil(double(N)/GL_MX_VIEWPORTS) times?
I only started learning the shader-based approach to OpenGL a couple of weeks ago, so I don't yet know all the tricks. I initially thought of replacing my use of the built-in viewport support with a combination of:
a geometry shader that adds h*gl_InvocationID to the y coordinates of the vertices after perspective projection (where h is the desired viewport height) and passes gl_InvocationID onto the fragment shader; and
a fragment shader that discards fragments with y coordinates that satisfy y<gl_InvocationID*h || y>=(gl_InvocationID+1)*h.
But I was put off investigating this idea further by the fear that branching and discard would be very detrimental to performance.
The authors of the paper above released a technical report describing some of their GPU acceleration methods, but it's not detailed enough to answer my question. Section 3.2.3 says "During geometry instancing, viewport information is attached to every vertex... A custom pixel shader clips pixels that are outside their pre-defined viewports". This sounds similar to the workaround that I've described above, but they were using Direct3D, so it's not easy to compare what they were able to achieve with that in 2011 to what I can achieve today in OpenGL.
I realise that the only definitive answer to my question is to implement the workaround and measure its performance, but it's currently a low-priority curiosity, and I haven't found answers anywhere else, so I hoped that a more experienced GLSL user might be able to offer their time-saving wisdom.
From a cursory glance at the paper, it seems to me that the actual viewport doesn't change. That is, you're still rendering to the same width/height and X/Y positions, with the same depth range.
What you want is to change which image you're rendering to. Which is what gl_Layer is for; to change which layer within the layered array of images attached to the framebuffer you are rendering to.
So just set the gl_ViewportIndex to 0 for all vertices. Or more specifically, don't set it at all.
The number of GS instancing invocations does not have to be a restriction; that's your choice. GS invocations can write multiple primitives, each to a different layer. So you could have each instance write, for example, 4 primitives, each to 4 separate layers.
Your only limitations should be the number of layers you can use (governed by GL_MAX_ARRAY_TEXTURE_LAYERS and GL_MAX_FRAMEBUFFER_LAYERS, both of which must be at least 2048), and the number of primitives and vertex data that a single GS invocation can emit (which is kind of complicated).
I have been researching different approaches to terrain systems in game engines for a bit now, trying to familiarize myself with the work. A number of the details seem straightforward, but I am getting hung up on a single detail.
For performance reasons many terrain solutions utilize shaders to generate parts or all of the geometry, such as vertex shaders to generate positions or tessellation shaders for LoD. At first I figured those approaches were exclusively for renders that weren't concerned about physics simulations.
The reason I say that is because as I understand shaders at the moment, the results of a shader computation generally are discarded at the end of the frame. So if you rely on shaders heavily then the geometry information will be gone before you could access it and send it off to another system (such as physics running on the CPU).
So, am I wrong about shaders? Can you store the results of them generating geometry to be accessed by other systems? Or am I forced to keep the terrain geometry on CPU and leave the shaders to the other details?
Shaders
You understand parts of the shaders correctly, that is: after a frame, the data is stored as a final composed image in the backbuffer.
BUT: Using transform feedback it is possible to capture transformed geometry into a vertex buffer and reuse it. Transform Feedback happens AFTER the vertex/geometry/tessellation shader, so you could use the geometry shader to generate a terrain (or visible parts of it once), push it through transform-feedback and store it.
This way, you potentially could use CPU collision detection with your terrain! You can even combine this with tessellation.
You will love this: A Framework for Real-Time, Deformable Terrain.
For the LOD and tessellation: LOD is not the prerequisite of tessellation. You can use tessellation to allow some more sophisticated effects such as adding a detail by recursive subdivision of rough geometry. Linking it with LOD is simply a very good optimization avoiding RAM-memory based LOD-mesh-levels, since you just have your "base mesh" and subdivide it (Although this will be an unsatisfying optimization imho).
Now some deeper info on GPU and CPU exclusive terrain.
GPU Generated Terrain (Procedural)
As written in the NVidia article Generating Complex Procedural Terrains Using the GPU:
1.2 Marching Cubes and the Density Function Conceptually, the terrain surface can be completely described by a single function, called the
density function. For any point in 3D space (x, y, z), the function
produces a single floating-point value. These values vary over
space—sometimes positive, sometimes negative. If the value is
positive, then that point in space is inside the solid terrain.
If the value is negative, then that point is located in empty space
(such as air or water). The boundary between positive and negative
values—where the density value is zero—is the surface of the terrain.
It is along this surface that we wish to construct a polygonal mesh.
Using Shaders
The density function used for generating the terrain, must be available for the collision-detection shader and you have to fill an output buffer containing the collision locations, if any...
CUDA
See: https://www.youtube.com/watch?v=kYzxf3ugcg0
Here someone used CUDA, based on the NVidia article, which however implies the same:
In CUDA, performing collision detection, the density function must be shared.
This will however make the transform feedback techniques a little harder to implement.
Both, Shaders and CUDA, imply resampling/recalculation of the density at at least one location, just for the collision detection of a single object.
CPU Terrain
Usually, this implies a RAM-memory stored set of geometry in the form of vertex/index-buffer pairs, which are regularly processed by the shader-pipeline. As you have the data available here, you will also most likely have a collision mesh, which is a simplified representation of your terrain, against which you perform collision.
Alternatively you could spend your terrain a set of colliders, marking the allowed paths, which is imho performed in the early PS1 Final Fantasy games (which actually don't really have a terrain in the sense we understand terrain today).
This short answer is neither extensively deep nor complete. I just tried to give you some insight into some concepts used in dozens of solutions.
Some more reading: http://prideout.net/blog/?tag=opengl-transform-feedback.
I recently started programming with openGL. I've done code creating basic primitives and have used shaders in webGL. I've googled the subject extensively but it's still not that clear to me. Basically, here's what I want to know. Is there anything that can be done in GLSL that can't be done in plain openGL, or does GLSL just do things more efficiently?
The short version is: OpenGL is an API for rendering graphics, while GLSL (which stands for GL shading language) is a language that gives programmers the ability to modify pipeline shaders. To put it another way, GLSL is a (small) part of the overall OpenGL framework.
To understand where GLSL fits into the big picture, consider a very simplified graphics pipeline.
Vertexes specified ---(vertex shader)---> transformed vertexes ---(primitive assembly)---> primitives ---(rasterization)---> fragments ---(fragment shader)---> output pixels
The shaders (here, just the vertex and fragment shaders) are programmable. You can do all sorts of things with them. You could just swap the red and green channels, or you could implement a bump mapping to make your surfaces appear much more detailed. Writing these shaders is an important part of graphics programming. Here's a link with some nice examples that should help you see what you can accomplish with custom shaders: http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaderExamples.html.
In the not-too-distant past, the only way to program them was to use GPU assembler. In OpenGL's case, the language is known as ARB assembler. Because of the difficulty of this, the OpenGL folks gave us GLSL. GLSL is a higher-level language that can be compiled and run on graphics hardware. So to sum it all up, programmable shaders are an integral part of the OpenGL framework (or any modern graphics API), and GLSL makes it vastly easier to program them.
As also covered by Mattsills answer GL Shader Language or GLSL is a part of OpenGL that enables the creation of algorithms called shaders in/for OpenGL. Shaders run on the GPU.
Shaders make decisions about factors such as the color of parts of surfaces, and the way surfaces share information such as reflected light. Vertex Shaders, Geometry Shaders, Tesselation Shaders and Pixel Shaders are types of shader that can be written in GLSL.
Q1:
Is there anything that can be done in GLSL that can't be done in plain OpenGL?
A:
You may be able to use just OpenGL without the GLSL parts, but if you want your own surface properties you'll probably want a shader make this reasonably simple and performant, created in something like GLSL. Here are some examples:
Q2:
Or does GLSL just do things more efficiently?
A:
Pixel shaders specifically are very parallel, calculating values independently for every cell of a 2D grid, while also containing significant caveats, like not being unable to handle "if" statement like conditions very performantly, so it's a case of using different kinds of shaders to there strengths, on surfaces described and dealt with in the rest of OpenGL.
Q3:
I suspect you want to know if just using GLSL is an option, and I can only answer this with my knowledge of one kind of shader, Pixel Shaders. The rest of this answer covers "just" using GLSL as a possible option:
A:
While GLSL is a part of OpenGL, you can use the rest of OpenGL to set up the enviroment and write your program almost entirly as a pixel shader, where each element of the pixel shader colours a pixel of the whole screen.
For example:
(Note that WebGL has a tendency to hog CPU to the point of stalling the whole system, and Windows 8.1 lets it do so, Chrome seems better at viewing these links than Firefox.)
No, this is not a video clip of real water:
https://www.shadertoy.com/view/Ms2SD1
The only external resources fed to this snail some easily generatable textures:
https://www.shadertoy.com/view/ld3Gz2
Rendering using a noisy fractal clouds of points:
https://www.shadertoy.com/view/Xtc3RS
https://www.shadertoy.com/view/MsdGzl
A perfect sphere: 1 polygon, 1 surface, no edges or vertices:
https://www.shadertoy.com/view/ldS3DW
A particle system like simulation with cars on a racetrack, using a 2nd narrow but long pixel shader as table of data about car positions:
https://www.shadertoy.com/view/Md3Szj
Random values are fairly straightforward:
fract(sin(p)*10000.)
I've found the language in some respects to be hard to work with and it may or may not be particularly practical to use GLSL in this way for a large project such as a game or simulation, however as these demos show, a computer game does not have to look like a computer game and this sort of approach should be an option, perhaps used with generated content and/or external data.
As I understand it to perform reasonably Pixel Shaders in OpenGL:
Have to be loaded into a small peice of memory.
Do not support:
"if" statement like conditions.
recursion or while loop like flow control.
Are restricted to a small pool of valid instructions and data types.
Including "sin", mod, vector multiplication, floats and half precision floats.
Lack high level features like objects or lambdas.
And effectively calculate values all at once in parallel.
A consequence of all this is that code looks more like lines of closed form equations and lacks algorythms or higher level structures, using modular arithmetic for something akin to conditions.
I have a huge mesh(100k triangles) that needs to be drawn a few times and blend together every frame. Is it possible to reuse the vertex shader output of the first pass of mesh, and skip the vertex stage on later passes? I am hoping to save some cost on the vertex pipeline and rasterization.
Targeted OpenGL 3.0, can use features like transform feedback.
I'll answer your basic question first, then answer your real question.
Yes, you can store the output of vertex transformation for later use. This is called Transform Feedback. It requires OpenGL 3.x-class hardware or better (aka: DX10-hardware).
The way it works is in two stages. First, you have to set your program up to have feedback-based varyings. You do this with glTransformFeedbackVaryings. This must be done before linking the program, in a similar way to things like glBindAttribLocation.
Once that's done, you need to bind buffers (given how you set up your transform feedback varyings) to GL_TRANSFORM_FEEDBACK_BUFFER with glBindBufferRange, thus setting up which buffers the data are written into. Then you start your feedback operation with glBeginTransformFeedback and proceed as normal. You can use a primitive query object to get the number of primitives written (so that you can draw it later with glDrawArrays), or if you have 4.x-class hardware (or AMD 3.x hardware, all of which supports ARB_transform_feedback2), you can render without querying the number of primitives. That would save time.
Now for your actual question: it's probably not going to help buy you any real performance.
You're drawing terrain. And terrain doesn't really get any transformation. Typically you have a matrix multiplication or two, possibly with normals (though if you're rendering for shadow maps, you don't even have that). That's it.
Odds are very good that if you shove 100,000 vertices down the GPU with such a simple shader, you've probably saturated the GPU's ability to render them all. You'll likely bottleneck on primitive assembly/setup, and that's not getting any faster.
So you're probably not going to get much out of this. Feedback is generally used for either generating triangle data for later use (effectively pseudo-compute shaders), or for preserving the results from complex transformations like matrix palette skinning with dual-quaternions and so forth. A simple matrix multiply-and-go will barely be a blip on the radar.
You can try it if you like. But odds are you won't have any problems. Generally, the best solution is to employ some form of deferred rendering, so that you only have to render an object once + X for every shadow it casts (where X is determined by the shadow mapping algorithm). And since shadow maps require different transforms, you wouldn't gain anything from feedback anyway.
To preface this question, I have a competent understanding of OpenGL and the maths behind it, and while I have never touched anything related to DirectX I imagine the concepts are similar.
There is plenty of information around about why triangles are used for 3D graphics (they are necessarily planar, are indivisible except into smaller triangles, etc). However, I would like to know if triangles are merely a convenient way of storing and manipulating 3D data (simpler maths regarding interpolation, etc), or if there is a hardware limitation in the graphics card that only realistically allows the rendering of triangles (e.g. instructions that can essentially ONLY be applied to triangles).
Following on from this, is there any way to achieve pixel-by-pixel control of graphics rendering (as outlined briefly by the answer to this question). While I appreciate direct control over individual pixels is done through a driver, is there any way I can get this kind of control over a rendering environment? Is there away to 'avoid triangles' completely?
Yes and no. Kind of.
Current GPUs are designed to render triangles because triangles are nice to work with. And because current GPUs are designed to work with triangles, people use triangles and so GPUs only need to process triangles, and so they're designed to process only triangles.
As you say, triangles just have advantages that make them convenient to use. GPUs can be made (and have been made) to render other primitives natively, but it's just not really worth it. If you tell a modern GPU to render a quad, it splits it up into two triangles and renders those.
Not because there's a technical reason why a GPU can't render quads natively, but because it's not worth spending transistors on. It's much more useful to focus the GPU on doing triangles as fast as possible, and then just emulate other primitives if they're needed.
So yes, modern GPUs have hardware limitations so they don't work with quads, for example, but not because it's impossible to design a GPU which works with quads. It'd just be less efficient to do so. :)
As for "avoiding triangles", sure, that's basically what the fragment shader does: it fills in one single pixel. The GPU just runs it a few million times in parallel to fill in the entire screen. You could draw two big triangles, which form a quad filling the entire screen, and then just specify a fragment shader which fills that with the content you like.
If you want more control over the process, do it in software instead: paint one pixel at a time to a memory surface, and then load that as a texture on the GPU. But it's slow. :)
As far as i know every modern CAN render quads and some even N-gons but it comparing the render time of a quad to 2 triangles shows the triangle advantage.
This is mainly because GPU's have been optimized to render triangles and that the accual hardware has way more "steam processors" (for triangles) then others such as textures ones. Some other processor types on the GPU can render quads directly but normally you would find a thousand steam to a few texture processors
Note that getting a texure unit to render a quad is EXTREMELY difficult. It is possible in theory but no one used the pricip for a serius case.
Unless you work with very hardware close operation the software will take care of the triangles, (eg, Auto-Convert them from quads)