Performance of using OpenGL UBOs for camera matrices

Performance of using OpenGL UBOs for camera matrices - c++

I've read about how to use UBOs in OpenGL from this tutorial, in which it is suggested that both the projection and camera position matrix be stored in a single UBO, shared between shaders, like this:
layout(std140) uniform global_transforms
{
mat4 camera_orientation;
mat4 camera_projection;
};
But it seems to me to make more sense to store the product of the camera projection * position matrix multiplication in the UBO, so that the matrix multiplication doesn't have to occur once for every call to the vertex buffer. You'd be sending the same amount of data to the buffer each step, but trading potentially many matrix multiplications on the GPU for just one on the CPU.
My question: am I right in thinking that this would be even just a wee bit more performant? Perhaps the shader compiler is smart enough to perform operations involving only uniforms once per draw?
(I'd just test it myself with a few thousand polygons, but it's my first time working with a programmable pipeline and I haven't quite gotten UBOs working yet :P)

Related

Permanently move vertices using vertex shader GLSL

I am leaning GLSL and in general some OpenGL and I am having some trouble with vertex movement and management.
I am good with camera rotations and translation but now I need to move a few vertices and have them stay in their new positions.
What I would like to do is move them using the vertex shader but also not keep track of their new positions trough matrices (as I need to move them around independently and it would be very pricey in terms of memory and computing power to store that many matrices).
If there were a way to change their position values in the VBO directly from the vertex shader, that would be optimal.
Is there a way to do that? What other ways do you suggest?
Thanks in advance.
PS I am using GLSL version 1.30

While it's possible to write values from a shader into a buffer and later read it from the CPU-client side (i.e., by using glReadPixels()) I don't think it is your case.
You can move a group of vertices, all with the same movement, with a single matrix. Why don't you do it with the CPU and store the results, updating their gl-buffer when needed? (VAO remains unchanged if you just update the glBuffer) Once they are moved, you don't need that matrix anymore, right? Or if you want to undo the movement, then, yes, yo need to store also the matrix.

It seems that transform feedback is exactly what you need.

What I would like to do is move them using the vertex shader but also not keep track of their new positions trough matrices
If I understand you correctly then what you want is to send some vertices to the GPU. Then having the vertex shader moving them. You can't because a vertex shader is only able to read from the vertex buffer, it isn't able to write back to it.
it would be very pricey in terms of memory and computing power to store that many matrices.
Considering:
I am good with camera rotations and translation
Then in wouldn't be expensive at all. Considering that you already have a view and projection matrix for the camera and viewport. Then having a model matrix contain the translation, rotation and scaling of each object isn't anywhere near a bottleneck.
In the vertex shader you'd simply have:
uniform mat4 mvp; // model view projection matrix
...
gl_Position = mvp * vec4(position, 1.0);
On the CPU side of things you'd do:
mvp = projection * view * model;
GLint mvpLocation = glGetUniformLocation(shaderGeometryPass, "mvp")
glUniformMatrix4fv(mvpLocation, 1, GL_FALSE, (const GLfloat*)&mvp);
If this gives you performance issues then the problem lies elsewhere.
If you really want to "save" which ever changes you make on the GPU side of things, then you'd have to look into Shader Storage Buffer Object and/or Transform Feedback

Opengl 3.0+ : How to efficiently draw hierarchical (i.e. chain transform-matrix) meshes with Shader?

This is a sample 3D scene:-
Mesh A is the parent of Mesh B. (parent like 3D modeling program Ex.Maya or Blender)
Transformation matrix of Mesh A and B = MA and MB.
In the old Opengl, Mesh A and Mesh B can be drawn by :-
glLoadIdentity();
glMulMatrix(MA);
MeshA.draw();
glMulMatrix(MB);
MeshB.draw();
In the new shader Opengl 3.0+, it can be drawn by :-
shader.bind();
passToShader(MA);
MeshA.draw();
passToShader(MA*MB);
MeshB.draw();
Shader is:-
uniform mat4 multiplicationResult;
glVertex = M_multiplicationResult * meshPosition
When MA is changed in a timestep: In the old way, only MA has to be recomputed. But in the new way, using Shader, the whole MA x MB have to be recomputed in CPU.
The problem become severe in the scene in which the hierarchy (parenting) of those Mesh are very high (Ex. 5 levels) and many branches (Ex. one MeshA has many MeshB) , CPU has to recompute the whole MA x MB x MC x MD x ME for every related Mesh E, even only single MA is changed.
How to optimize it? Or is it the way to go?
My poor solutions :-
add more slots in Shader like this:-
uniform mat4 MA;
uniform mat4 MB;
uniform mat4 MC;
uniform mat4 MD;
uniform mat4 ME;
glVertex = MA*MB*MC*MD*ME*meshPosition;
But the shader would never know how many MX would be enough. It is hard-coded, waste GPU for low hierarchy, lower maintainability and don't support more complex scene.
use compatibility context - not a good practice

But in the new way, using Shader, the whole MA x MB have to be recomputed in CPU.
What did you think that glMultMatrix was doing? It too was computing MA x MB. And that computation almost certainly happened on the CPU.
What you want is a matrix stack that works like OpenGL's matrix stack. So... just write one. There's nothing magical about what OpenGL was doing. You can write a data type that mirrors OpenGL's matrix operations, then pass it around when you render.
Alternatively, you can just use the C++ stack:
void render(const matrix &parent)
{
matrix me = parent * my_transform;
passToShader(me);
my_mesh.draw();
for(each object)
object.render(me);
}
There, problem solved. Each child of an object receives its parent matrix, which it uses to compute its own full modelview matrix.
I hope to use something faster because they are "relatively-static" objects.
OK, let's do a full performance analysis of this.
The general CPU performance of the code I posted above is doing the exact same number of matrix multiplications as the glMultMatrix So your code is as fast now as it used to be (give or take).
So, let's consider the case where you minimize the number of matrix multiples you do on the CPU. Right now, you're doing one matrix multiplication per-object. Instead, let's do no matrix multiplications per object.
So let's say your shader has 4 matrix uniforms (whether a 4 element matrix or just 4 separate uniforms, it doesn't matter). So you're limited to a maximum stack depth of 4, but never mind that now.
This way, you only change the matrices that change. So if a parent matrix changes, the child matrix doesn't have to be recomputed.
OK... so what?
You still have to give that child matrix to the shader. So you're still paying the price of changing program uniform state. You're still uploading 16 floats to the shader per-object.
Not only that, consider what your vertex shader has to do now. It must perform 4 vector/matrix multiplications. And it must do this for every single vertex of every single object. After all, the shader doesn't know which of those matrices are empty and which ones aren't. So it must assume that they all have data and it must therefore multiply against them all.
So the question is, which is faster:
A single matrix multiplication per object on the CPU
3 vector/matrix multiplications for every vertex on the GPU (you need to do at least one).

How to multiply vertices with model matrix outside the vertex shader

I am using OpenGL ES2 to render a fairly large number of mostly 2d items and so far I have gotten away by sending a premultiplied model/view/projection matrix to the vertex shader as a uniform and then multiplying my vertices with the resulting MVP in there.
All items are batched using texture atlases and I use one MVP per batch. So all my vertices are relative to the translation of that MVP.
Now I want to have rotation and scaling for each of the separate items, which means I need a different model for each of them. So I modified my vertex to include the model (16 floats!) and added a mat4 attribute in my shader and it all works well. But I'm kinda dissapointed with this solution since it dramatically increased the vertex size.
So as I was staring at my screen trying to think of a different solution I thought about transforming my vertices to world space before I send them over to the shader. Or even to screen space if its possible. The vertices I use are unnormalized coordinates in pixels.
So the question is, is such a thing possible? And if yes how do you do it? I can't think why it shouldn't be since its just maths but after a fairly long search on google, it doesn't look like a lot of people are actually doing this...
Strange cause if it is indeed possible, it would be quite a major optimization in cases like this one.

If the number of matrices per batch are limited then you can pass all those matrices as uniforms (preferably in a UBO) and expand the vertex data with an index which specifies which matrix you need to use.
This is similar to GPU skinning used for skeletal animation.

Accessing all vertices in a draw call from hlsl in SM4+

Say if i wanted to build matrices inside the gpu pipeline for vertex transforms, i realized that my current implementation is quite inefficient because it rebuilds the matrices from the source material for every single vertex (while it only needs to build it once per affected vertices really). Is there any way to modify the whole array of vertices that get drawn in a single draw call? Calculating the matrices and storing them in vram doesn't seem to be a very good option since multiple vertices will be getting processed at the same time and i dont think i can sync them efficiently. The only other option i can think of is compute shader, i havent looked into its uses yet but would it be possible to have it calculate the matrices and store them in the gpu so i can access them later on when drawing?

Do you have any source code? I never calculate matrices in the shaders, normally do it on the CPU and pass them over in a constant buffer.

One way of achieving this is to precompute the matrix and send them to the shader as a uniform variable. For example, if your shaders only ever need to multiply the MVP matrix with the vertex positions, then you could pre-compute the MVP matrix outside the shader and send it as a float4x4 uniform to the shader, all the vertex shader does then is to multiply that single matrix with each vertex. It doesn't get much more optimal than that, since vertices are processed in parallel on the GPU and the GPU has instruction sets optimized for vector calculus.

Create view matrices in GLSL shader

I have many positions and directions stored in 1D textures on the GPU. I want to use those as rendersources in a GLSL geometry shader. To do this, I need to create corresponding view matrices from those textures.
My first thought is to take a detour to the CPU, read the textures to memory and create a bunch of view matrices from there, with something like glm::lookat(). Then send the matrices as uniform variables to the shader.
My question is, wether it is possible to skip this detour and instead create the view matrices directly in the GLSL geometry shader? Also, is this feasible performance wise?

Nobody says (or nobody should say) that your view matrix has to come from the CPU through a uniform. You can just generate the view matrix from the vectors in your texture right inside the shader. Maybe the implementation of the good old gluLookAt is of help to you there.
If this approach is a good idea performance-wise, is another question, but if this texture is quite large or changes frequently, this aproach might be better than reading it back to the CPU.
But maybe you can pre-generate the matrices into another texture/buffer using a simple GPGPU-like shader that does nothing more than generate a matrix for each position/vector in the textures and store this in another texture (using FBOs) or buffer (using transform feedback). This way you don't need to make a roundtrip to the CPU and you don't need to generate the matrices anew for each vertex/primitive/whatever. On the other hand this will increase the required memory as a 4x4 matrix is a bit more heavy than a position and a direction.

Sure. Read the texture, and build the matrices from the values...
vec4 x = texture(YourSampler, WhateverCoords1);
vec4 y = texture(YourSampler, WhateverCoords2);
vec4 z = texture(YourSampler, WhateverCoords3);
vec4 w = texture(YourSampler, WhateverCoords4);
mat4 matrix = mat4(x,y,z,w);
Any problem with this ? Or did I miss something ?

The view matrix is a uniform, and uniforms don't change in the middle of a render batch, nor can they be written to from a shader (directly). Insofar I don't see how generating it could be possible, at least not directly.
Also note that the geometry shader runs after vertices have been transformed with the modelview matrix, so it does not make all too much sense (at least during the same pass) to re-generate that matrix or part of it.
You could of course probably still do some hack with transform feedback, writing some values to a buffer, and either copy/bind this as uniform buffer later or just read the values from within a shader and multiply as a matrix. That would at least avoid a roundtrip to the CPU -- the question is whether such an approach makes sense and whether you really want to do such an obscure thing. It is hard to tell what's best without knowing exactly what you want to achieve, but quite probably just transforming things in the vertex shader (read those textures, build a matrix, multiply) will work better and easier.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js