I'm writing a program that draws a number of moving/rotating polygons using OpenGL. Each polygon has a location in world coordinates while its vertices are expressed in local coordinates (relative to polygon location). Each polygon also has a rotation.
The only way I can think of doing this is calculate vertex positions by translation/rotation in each frame and push them to the GPU be drawn, but I was wondering if this could be performed in the vertex shader.
I thought I might express vertex locations in local coordinates and then add location and rotation attributes to each vertex, but then it occurred to me that this won't be any better than pushing new vertex positions on each frame.
Should I do this kind of calculation on the CPU, or is there a way to do it efficiently in the vertex shader?
The vertex shader is indeed responsible for transforming your geometry. However, the vertex shader is run for every single vertex of your scene. If you do transformations inside the vertex shader, you'll do the same calculation over and over again which yields the same result every time (as opposed to simply multiplying the model view projection matrix with the vertex coordinate). So in terms of efficiency you're best off doing that on the CPU side.
If the models are small, like in your case, I don't expect there to be too much of a difference, because you still have to set the coordinates where the polygons are supposed to be drawn somehow. In this case doing the calculations once on the CPU side is still the best, given that it does the calculation once independent of the vertex count of your polygons, as well as that it will probably result in clearer code since it's easier to see what you're doing.
These calculations are usually done on CPU only. As doing them on CPU is efficient in general. your best shot is to send these rotation matrices in as uniform and do multiplication on GPU. Sending uniforms is not very expensive operation in general so u should be be worrying about that.
Related
I am learning to make a graphical engine with OpenGL. I wanted to know, should repetitive operations be moved from the vertex shader to the fragment shader, since from what I understood the vertex shader is only run once per vertex?
For instance, when normalizing a vector for the light direction, since this light is the same in the entire vertex should it be moved to the vertex shader, instead of calculating it for every pixel? Is there a particular reason to keep it in the fragment shader?
If the calculation is exactly the same: yes, it should usually be more efficient to do it in the vertex shader than the fragment shader. Some situations where it might not be more efficient:
when drawing geometry that results in fewer shaded pixels than transformable vertices -- either due to dense geometry or extreme discards/occlusion. If this is the case, usually you would want to address it by switching to lower level-of-detail geometry or smarter geometry culling.
when doing the calculation in the vertex shader requires you to send more data to the fragment shader in order to use the calculation's results. Sending more data can be slower because it requires more memory manipulation and because the rasterizer needs to interpolate more "varying" values across each polygon.
For light calculations, specifically, be mindful that moving calculations from the fragment shader to the vertex shader can affect the quality of your rendering. Particularly, normalized direction vectors at each vertex can become shorter after "varying" interpolation, which can slightly darken triangle interiors if used directly without renormalization. And, of course, moving the entire lighting calculation to the vertex shader has even more drastic effects.
But how visible these effects are depends on the frequency of textures, the resolution of geometry, the size on screen, how far away the lights are, etc. -- in some cases, the quality/performance tradeoff may make sense.
Im trying to wrap my head around the GPU pipeline and the performance implications...
I create a coordinate system and put a million vertices on it, all of them are now in memory usable by the GPU. I assume this is the performance hit on this step: moving all the floating values into the GPU memory, implying the points where already created.
Then I transform my million points coordinates into clipping coordinates. Here I’m applying a transformation to each point.
As a result of this transformation some points are now outside of the clip coordinates, let’s say only a thousands points are in. Does the vertex shader run on the thousand or all million points? What about the fragment shader? And the building of triangles? Transformation into the final device coordinates only takes the thousand points?
My guess is that the vertex runs on all but the fragment only on the interpolation of the visible vertices.
Is the only optimization possible then to just include as little vertices as possible in the first place? If I’m looking at a full 3D world with buildings, trees, roads... and then zoom at just one rock I’m running all the shaders on all the objects anywhay... so the only solution would be to not put those trees and buildings in the first place? Or can I have this world on GPU memory but just compute the rock? Could I apply the transformation of coordinates to just the rock somehow? Where in the pipeline a technique like GPU culling, level of detail or dynamic tessellation takes place?
Vertex shaders are executed for each vertex submitted with glDrawArrays and glDrawElements function families, perhaps even multiple times per vertex. The transformed vertices are then assembled into primitives and clipped—if it's outside the viewport then their processing is done. To reduce the overhead of processing vertices of objects outside the viewport multiple techniques are employed. The simplest one is "frustum culling"—submit the object for rendering only if its bounding box intersects the camera frustum.
Fragment shaders are executed for each fragment ("pixel") in the framebuffer that passes the depth test. One way to reduce their count is to render front to back—so that only the front-visible fragments are ever calculated.
I am using OpenGL ES2 to render a fairly large number of mostly 2d items and so far I have gotten away by sending a premultiplied model/view/projection matrix to the vertex shader as a uniform and then multiplying my vertices with the resulting MVP in there.
All items are batched using texture atlases and I use one MVP per batch. So all my vertices are relative to the translation of that MVP.
Now I want to have rotation and scaling for each of the separate items, which means I need a different model for each of them. So I modified my vertex to include the model (16 floats!) and added a mat4 attribute in my shader and it all works well. But I'm kinda dissapointed with this solution since it dramatically increased the vertex size.
So as I was staring at my screen trying to think of a different solution I thought about transforming my vertices to world space before I send them over to the shader. Or even to screen space if its possible. The vertices I use are unnormalized coordinates in pixels.
So the question is, is such a thing possible? And if yes how do you do it? I can't think why it shouldn't be since its just maths but after a fairly long search on google, it doesn't look like a lot of people are actually doing this...
Strange cause if it is indeed possible, it would be quite a major optimization in cases like this one.
If the number of matrices per batch are limited then you can pass all those matrices as uniforms (preferably in a UBO) and expand the vertex data with an index which specifies which matrix you need to use.
This is similar to GPU skinning used for skeletal animation.
Say if i wanted to build matrices inside the gpu pipeline for vertex transforms, i realized that my current implementation is quite inefficient because it rebuilds the matrices from the source material for every single vertex (while it only needs to build it once per affected vertices really). Is there any way to modify the whole array of vertices that get drawn in a single draw call? Calculating the matrices and storing them in vram doesn't seem to be a very good option since multiple vertices will be getting processed at the same time and i dont think i can sync them efficiently. The only other option i can think of is compute shader, i havent looked into its uses yet but would it be possible to have it calculate the matrices and store them in the gpu so i can access them later on when drawing?
Do you have any source code? I never calculate matrices in the shaders, normally do it on the CPU and pass them over in a constant buffer.
One way of achieving this is to precompute the matrix and send them to the shader as a uniform variable. For example, if your shaders only ever need to multiply the MVP matrix with the vertex positions, then you could pre-compute the MVP matrix outside the shader and send it as a float4x4 uniform to the shader, all the vertex shader does then is to multiply that single matrix with each vertex. It doesn't get much more optimal than that, since vertices are processed in parallel on the GPU and the GPU has instruction sets optimized for vector calculus.
I'm working on a Minecraft-like engine as a hobby project to see how far the concept of voxel terrains can be pushed on modern hardware and OpenGL >= 3. So, all my geometry consists of quads, or squares to be precise.
I've built a raycaster to estimate ambient occlusion, and use the technique of "bent normals" to do the lighting. So my normals aren't perpendicular to the quad, nor do they have unit length; rather, they point roughly towards the space where least occlusion is happening, and are shorter when the quad receives less light. The advantage of this technique is that it just requires a one-time calculation of the occlusion, and is essentially free at render time.
However, I run into trouble when I try to assign different normals to different vertices of the same quad in order to get smooth lighting. Because the quad is split up into triangles, and linear interpolation happens over each triangle, the result of the interpolation clearly shows the presence of the triangles as ugly diagonal artifacts:
The problem is that OpenGL uses barycentric interpolation over each triangle, which is a weighted sum over 3 out of the 4 corners. Ideally, I'd like to use bilinear interpolation, where all 4 corners are being used in computing the result.
I can think of some workarounds:
Stuff the normals into a 2x2 RGB texture, and let the texture processor do the bilinear interpolation. This happens at the cost of a texture lookup in the fragment shader. I'd also need to pack all these mini-textures into larger ones for efficiency.
Use vertex attributes to attach all 4 normals to each vertex. Also attach some [0..1] coefficients to each vertex, much like texture coordinates, and do the bilinear interpolation in the fragment shader. This happens at the cost of passing 4 normals to the shader instead of just 1.
I think both these techniques can be made to work, but they strike me as kludges for something that should be much simpler. Maybe I could transform the normals somehow, so that OpenGL's interpolation would give a result that does not depend on the particular triangulation used.
(Note that the problem is not specific to normals; it is equally applicable to colours or any other value that needs to be smoothly interpolated across a quad.)
Any ideas how else to approach this problem? If not, which of the two techniques above would be best?
As you clearly understands, the triangle interpolation that GL will do is not what you want.
So the normal data can't be coming directly from the vertex data.
I'm afraid the solutions you're envisioning are about the best you can achieve. And no matter what you pick, you'll need to pass down [0..1] coefficients from the vertex to the shader (including 2x2 textures. You need them for texture coordinates).
There are some tricks you can do to somewhat simplify the process, though.
Using the vertex ID can help you out with finding which vertex "corner" to pass from vertex to fragment shader (our [0..1] values). A simple bit test on the lowest 2 bits can let you know which corner to pass down, without actual vertex data input. If packing texture data, you still need to pass an identifier inside the texture, so this may be moot.
if you use 2x2 textures to allow the interpolation, there are (were?) some gotchas. Some texture interpolators don't necessarily give a high precision interpolation if the source is in a low precision to begin with. This may require you to change the texture data type to something of higher precision to avoid banding artifacts.
Well... as you're using Bent normals technique, the best way to increase result is to pre-tessellate mesh and re-compute with mesh with higher tessellation.
Another way would be some tricks within pixel shader... one possible way - you can actually interpolate texture on your own (and not use built-in interpolator) in pixel shader, which could help you a lot. And you're not limited just to bilinear interpolation, you could do better, F.e. bicubic interpolation ;)