How to multiply vertices with model matrix outside the vertex shader - c++

I am using OpenGL ES2 to render a fairly large number of mostly 2d items and so far I have gotten away by sending a premultiplied model/view/projection matrix to the vertex shader as a uniform and then multiplying my vertices with the resulting MVP in there.
All items are batched using texture atlases and I use one MVP per batch. So all my vertices are relative to the translation of that MVP.
Now I want to have rotation and scaling for each of the separate items, which means I need a different model for each of them. So I modified my vertex to include the model (16 floats!) and added a mat4 attribute in my shader and it all works well. But I'm kinda dissapointed with this solution since it dramatically increased the vertex size.
So as I was staring at my screen trying to think of a different solution I thought about transforming my vertices to world space before I send them over to the shader. Or even to screen space if its possible. The vertices I use are unnormalized coordinates in pixels.
So the question is, is such a thing possible? And if yes how do you do it? I can't think why it shouldn't be since its just maths but after a fairly long search on google, it doesn't look like a lot of people are actually doing this...
Strange cause if it is indeed possible, it would be quite a major optimization in cases like this one.

If the number of matrices per batch are limited then you can pass all those matrices as uniforms (preferably in a UBO) and expand the vertex data with an index which specifies which matrix you need to use.
This is similar to GPU skinning used for skeletal animation.

Related

multiple shadowmaps in deferred shading

I have question about usage of multiple shadowmaps in deferred shading. I have implemented a single shadowmap in forward shading.
In forward rendering, in the vertex shader of each object I calculated it's position in lightspace and compared it to the shadowmap in the fragment shader. I can see that working with multiple maps with an array of projection matrices and an array of shadowmaps as uniforms.
In the case of deferred shading I was wondering what is the common practice. The way I see it there are a few options:
In the deferred shading, for each pixel I calculate it's position in each lightspace and compare it to the corresponding lightmap. (that way I do the calculation for each fragment and each matrix which might too be expensive?)
In forward rendering I calculate the position of each vertex in each projection and there is a G-buffer output for each position. I then do the comparison in the deferred shading. (that way I do the computation of the position only once per vertex instead of once per pixel but I have a shadowmap and lightspace position for each shadow which seems suboptimal)
A bit like 2. But I do the verification in forward rendering. That way I can store many booleans if it's in the light or not for each shadow in one int texture. The problem is that I can't do soft shadows that way.
Maybe something better?
To synthesise: 1 needs many matrices multipication but is easy to implement. 2 needs few matrices multiplication but many textures and outputs (which is limited by the graphic card). and 3 needs few output and few calculations per pixel. But I can't' get soft shadows because the result is an array of boolean.
I am not doing it really for better performance but mostly to learn new stuffs. I'm open to suggestions. Maybe I'm misunderstanding something. Is there a standard way to do it?

How to get the indexes of vertices that was finally rendered?

What is the final stage that is still possible to return the indexes that was not clipped or culled or occluded, and that are going to be rendered?
To answer the question asked, there isn't one. All vertex processing rendering stages happen before triangle clipping. As does transform feedback. And fragment shaders don't get vertex indices; they only get the per-vertex values from the vertex processing stage, after interpolation.
In theory, you could do something like this. Your VS outputs an integer index for the vertex, taken from gl_VertexID. You would need a GS that takes the three indices and packages them together into a flat uvec3. Each output vertex would be given the same values. And then, the fragment shader could get the uvec3 and write each of those indices out to a buffer via SSBO and an atomic counter.
Of course, you'll get the same index multiple times (assuming that triangles share indices). But you can do it.
It just doesn't serve much point. Rendering part of a mesh is a lot more trouble than it's worth. For performance, it's better to render either all of it or none of it, based on its visibility. And detecting that is best done via occlusion tests on a different, less complex shape.

Moving/rotating shapes in the vertex shader

I'm writing a program that draws a number of moving/rotating polygons using OpenGL. Each polygon has a location in world coordinates while its vertices are expressed in local coordinates (relative to polygon location). Each polygon also has a rotation.
The only way I can think of doing this is calculate vertex positions by translation/rotation in each frame and push them to the GPU be drawn, but I was wondering if this could be performed in the vertex shader.
I thought I might express vertex locations in local coordinates and then add location and rotation attributes to each vertex, but then it occurred to me that this won't be any better than pushing new vertex positions on each frame.
Should I do this kind of calculation on the CPU, or is there a way to do it efficiently in the vertex shader?
The vertex shader is indeed responsible for transforming your geometry. However, the vertex shader is run for every single vertex of your scene. If you do transformations inside the vertex shader, you'll do the same calculation over and over again which yields the same result every time (as opposed to simply multiplying the model view projection matrix with the vertex coordinate). So in terms of efficiency you're best off doing that on the CPU side.
If the models are small, like in your case, I don't expect there to be too much of a difference, because you still have to set the coordinates where the polygons are supposed to be drawn somehow. In this case doing the calculations once on the CPU side is still the best, given that it does the calculation once independent of the vertex count of your polygons, as well as that it will probably result in clearer code since it's easier to see what you're doing.
These calculations are usually done on CPU only. As doing them on CPU is efficient in general. your best shot is to send these rotation matrices in as uniform and do multiplication on GPU. Sending uniforms is not very expensive operation in general so u should be be worrying about that.

Accessing all vertices in a draw call from hlsl in SM4+

Say if i wanted to build matrices inside the gpu pipeline for vertex transforms, i realized that my current implementation is quite inefficient because it rebuilds the matrices from the source material for every single vertex (while it only needs to build it once per affected vertices really). Is there any way to modify the whole array of vertices that get drawn in a single draw call? Calculating the matrices and storing them in vram doesn't seem to be a very good option since multiple vertices will be getting processed at the same time and i dont think i can sync them efficiently. The only other option i can think of is compute shader, i havent looked into its uses yet but would it be possible to have it calculate the matrices and store them in the gpu so i can access them later on when drawing?
Do you have any source code? I never calculate matrices in the shaders, normally do it on the CPU and pass them over in a constant buffer.
One way of achieving this is to precompute the matrix and send them to the shader as a uniform variable. For example, if your shaders only ever need to multiply the MVP matrix with the vertex positions, then you could pre-compute the MVP matrix outside the shader and send it as a float4x4 uniform to the shader, all the vertex shader does then is to multiply that single matrix with each vertex. It doesn't get much more optimal than that, since vertices are processed in parallel on the GPU and the GPU has instruction sets optimized for vector calculus.

Create view matrices in GLSL shader

I have many positions and directions stored in 1D textures on the GPU. I want to use those as rendersources in a GLSL geometry shader. To do this, I need to create corresponding view matrices from those textures.
My first thought is to take a detour to the CPU, read the textures to memory and create a bunch of view matrices from there, with something like glm::lookat(). Then send the matrices as uniform variables to the shader.
My question is, wether it is possible to skip this detour and instead create the view matrices directly in the GLSL geometry shader? Also, is this feasible performance wise?
Nobody says (or nobody should say) that your view matrix has to come from the CPU through a uniform. You can just generate the view matrix from the vectors in your texture right inside the shader. Maybe the implementation of the good old gluLookAt is of help to you there.
If this approach is a good idea performance-wise, is another question, but if this texture is quite large or changes frequently, this aproach might be better than reading it back to the CPU.
But maybe you can pre-generate the matrices into another texture/buffer using a simple GPGPU-like shader that does nothing more than generate a matrix for each position/vector in the textures and store this in another texture (using FBOs) or buffer (using transform feedback). This way you don't need to make a roundtrip to the CPU and you don't need to generate the matrices anew for each vertex/primitive/whatever. On the other hand this will increase the required memory as a 4x4 matrix is a bit more heavy than a position and a direction.
Sure. Read the texture, and build the matrices from the values...
vec4 x = texture(YourSampler, WhateverCoords1);
vec4 y = texture(YourSampler, WhateverCoords2);
vec4 z = texture(YourSampler, WhateverCoords3);
vec4 w = texture(YourSampler, WhateverCoords4);
mat4 matrix = mat4(x,y,z,w);
Any problem with this ? Or did I miss something ?
The view matrix is a uniform, and uniforms don't change in the middle of a render batch, nor can they be written to from a shader (directly). Insofar I don't see how generating it could be possible, at least not directly.
Also note that the geometry shader runs after vertices have been transformed with the modelview matrix, so it does not make all too much sense (at least during the same pass) to re-generate that matrix or part of it.
You could of course probably still do some hack with transform feedback, writing some values to a buffer, and either copy/bind this as uniform buffer later or just read the values from within a shader and multiply as a matrix. That would at least avoid a roundtrip to the CPU -- the question is whether such an approach makes sense and whether you really want to do such an obscure thing. It is hard to tell what's best without knowing exactly what you want to achieve, but quite probably just transforming things in the vertex shader (read those textures, build a matrix, multiply) will work better and easier.