OpenGL instanced drawing how to deal with vertex input limitation - c++

I'd like to implement instanced rendering to my opengl engine but I've just learned that the maximum number of inputs supported by vertex shader is only 16 for my GPU.
These are the following matrices I need to move to input:
uniform mat4 MVP;
uniform mat4 modelMatrix;
uniform mat3 normalMatrix;
uniform mat4 DepthBiasMVP;
If I understand correctly I will need an attribute for every column of each matrix, so I'll need 4+4+3+4 = 15 attribute space. 19 with the attributes that I already use ( pos, color, texCoord, normal ), and it will grow up to 20+ if I add tangents and other stuff.
Is there a way to deal with this or will I have to forget the instanced drawing ? Let's say I managed to get rid of one of these matrices (modelMatrix) and I have about 15 - 16 attributes, will it work on different GPU's ? the 16 limit is the minimum for all GPU's right ?

Note that 16 is the minimum amount of vertex attributes that your implementation actually has; most of the time more are allowed that you can query via:
glGetIntegerv(GL_MAX_VERTEX_ATTRIBS, &n).
Now, when trying to store all your data in instanced arrays you should try and organize what the data actually is that you want to differ per instance. Do you have a lot of instanced objects that only differ in location and/or orientation? Then you probably only need to set your uniform modelMatrix as an instanced array (which requires 4 vertex attributes, an acceptable value). Do you really need a different view and projection matrix for each instance? Probably not. The same holds for DepthBiasMVP.
The normalMatrix is required if you perform non-uniform scaling and if you plan to do that for each instance you also need to set a normalMatrix per instance. You could calculate those on the CPU beforehand and send them as vertex attributes costing you another 4 vertex attributes. Another option is to calculate normalMatrix in the vertex shader, but that this might slow your vertex shader down a little (perhaps an acceptable tradeoff?).
These should reduce the information you need per instance to just the modelMatrix and perhaps the normalMatrix, already reducing it by half. Maybe you only have a different position per instance? In that case even a simple vec4 will do.
Basically, try to think about what data you actually need to update per instance and you'll most likely be surprised as to how much data you actually need for each instance.

One can store the per-instance data in uniform arrays, uniform buffer objects or texture buffer objects and use the gl_InstanceID variable in GLSL to access the data in the buffer object. Uniform arrays might seem the easiest, but are most limited in size and hence are only applicable for a low number of instances. UBOs can be a bit bigger, but are also quite limited. TBOs on the other hand will allow you many megabytes of data, but you have to appropriately pack your data there. In your case, it seems that you only need float types, so a base format with 32 bit floats should suffice.

Related

Sending 2D and 3D shadowmaps to shaders

I'm trying to implement shadow mapping for my simple engine and i've figured out I should combine Omnidirectional Shadow Mapping (cubemap for point lights) with 2d mapping (for directional and spot lights).
My uniform block looks like this:
#define MAX_LIGHTS 128
//...
struct Light
{
//light data...
};
//...
layout (std140) uniform Lights
{
int lightCount; //how many lights were passed into the shader (from 0 to MAX_LIGHTS)
Light lights[MAX_LIGHTS];
};
I have two questions for you.
Are sampler objects costly? Is the following code optimal for multiple lights?
sampler2D shadowMaps2D[MAX_LIGHTS];
samplerCube shadowCubemaps[MAX_LIGHTS];
//...
if (lights[index].type == POINT_LIGHT)
CalculateShadow(shadowCubemaps[lights[index].shadowMapNr]);
else
CalculateShadow(shadowMaps2D[lights[index].shadowMapNr]);
Only lightCount amount of the objects would be actually filled with a texture. We're stuck with a lot of undefined samplers and I think it can cause some problems.
If I understand correctly, I mustn't declare sampler in uniform blocks. So am I really forced to cycle through all of my shaders and update samplers each time the shadow maps get updated? It's a waste of time!
Are sampler objects costly?
That question is a bit misleading since the sampler data types in GLSL are only opaque handles which reference the texture units. What is costly is the actual sampling operation. Also, the number of texture units a particular shade stage is limited. The spec only guarantees 16. Since you can't reuse the same unit for different sampler types, this would limit your MAX_LIGHTS to just 8.
However, one seldom needs arrays of samplers. Instead, you can use array textures, which will allow you to store all of your shadow maps (per texture type) in a single texture object, and you will need only one sampler for it.
Having said all that I still think that your light count is completely unrealistic. Applying 128 shadow maps in real time won't work even on the fastest GPUs out there...
If I understand correctly, I mustn't declare sampler in uniform blocks.
Correct.
So am I really forced to cycle through all of my shaders and update samplers each time the shadow maps get updated? It's a waste of time!
No. The sampler uniforms only need to be updated if the index of the texture unit you want to sample from changes (which is ideally never). Not when a different texture is bound, and not when some texture contents change.

In OpenGL can I access the buffer I'm drawing in a vertex shader?

I know that each time a vertex shader is run it basically accesses part of the buffer (VBO) being drawn, when drawing vertex number 7 for example it's basically indexing 7 vertices into that VBO, based on the vertex attributes and so on.
layout (location = 0) in vec3 position;
layout (location = 1) in vec3 normal;
layout (location = 2) in vec3 texCoords; // This may be running on the 7th vertex for example.
What I want to do is have access to an earlier part of the VBO for example, so when it's drawing the 7th Vertex I would like to have access to vertex number 1 for example, so that I can interpolate with it.
Seeing that at the time of running the shader it's already indexing into the VBO already, I would think that this is possible, but I don't know how to do it.
Thank you.
As you can see in the documentation, vertex attributes are expected to change on every shader run. So no, you can't access attributes defined for other vertices in a vertex shader.
You can probably do this:
Define a uniform array and pass in the values you need. But keep in mind that you are using more memory this way, you need to pass more data etc.
As #Reaper said you can use a uniform buffer, which can be accessed freely. But the GPU doesn't like random access, it's usually more efficient to stream the data.
You can solve this as well by just adding the data for later/earlier vertices into the array, because in C++ all vertices are at your disposal.
For example if this is the "normal" array:
{
vertex1_x, vertex1_y, vertex1_z, normal1_x, normal1_y, normal1_z, texCoord1_x, texCoord1_y,
...
}
Then you could extend it with data for the other vertex to interpolate with:
{
vertex1_x, vertex1_y, vertex1_z, normal1_x, normal1_y, normal1_z, texCoord1_x, texCoord1_y, vertex2_x, vertex2_y, vertex2_z, normal2_x, normal2_y, normal2_z, texCoord2_x, texCoord2_y,
...
}
Actually you can pass any data per vertex. Just make sure that the stride size and other parameters are adjusted in the glVertexAttribPointer parameters.

Do I need to take care to pack vertex attributes together?

If I want to pass two nominally independent attribute arrays of floats to a draw call, can I happily have a GLSL in float variable for each of them, or do I need to ensure to pack them into an in vec2 or similar and use the various components to ensure not consuming unnecessary GL_MAX_VERTEX_ATTRIBS "slots"?
Or, in other words; GL_MAX_VERTEX_ATTRIBS specifies, according to the docs, "the maximum number of 4-component generic vertex attributes accessible to a vertex shader". Does an attribute that is less than 4 components always count as one attribute towards this limit?
Does an attribute that is less than 4 components always count as one attribute towards this limit?
Yes, that is exactly what it means.
Each vertex attribute is 4-component, if you write it as float in your shader that actually does not change anything. If you want to see this in action, try setting up a 1-component vertex attribute pointer and then declaring that attribute vec4 in your vertex shader -- GL will automatically assign the values 0.0, 0.0, 1.0 for y, z and w respectively.
If you are hitting the vertex attribute limit (minimum 16) because you're using a bunch of scalars, then you should consider packing your attributes into a vec4 instead for optimal utilization.
There is a minor exception to this rule I described above, for data types with more than 4-components (e.g. mat4). A vertex attribute declared mat4 has 16-components and consumes 4 sequential attribute locations.

How do I efficiently handle a large number of per vertex attributes in OpenGL?

The number of per vertex attributes that I need to calculate my vertex shader output is bigger than GL_MAX_VERTEX_ATTRIBS. Is there an efficient way to e.g. point to a number of buffers using a uniform array of indices and to access the per vertex data this way?
This is a hardware limitation so the short answer is no.
If you consider workarounds for other ways, like using uniforms that also got limitations so that is a no way to go either.
One possible way I can think of which is rather hackish is to get the extra data from a texture. Since you can access textures from the vertex shader, but texture filtering is not supported ( you wont need it so it doesn't matter for you ).
With the newer OpenGLs its possible to store rather large amount of data in textures and access them without limitation even in the vertex shader, it seems to be one way to go.
Altho with this approach there is a problem you need to face, how do you know the current index, i.e. which vertex it is?
You can check out gl_VertexID built-in for that.
You could bypass the input assembler and bind the extra attributes in an SSBO or texture. Then you can use gl_VertexID in the vertex shader to get the value of index buffer entry you are currently rendering (eg: the index in the vertex data you need to read from)
So for example in a VS the following code is essentially identical (it may however have different performance characteristics depending on your hardware)
in vec3 myAttr;
void main() {
vec3 vertexValue = myAttr;
//etc
}
vs.
buffer myAttrBuffer {
vec3 myAttr[];
};
void main() {
vec3 vertexValue = myAttr[gl_VertexID];
//etc
}
The CPU-side binding code is different, but generally that's the concept. myAttr counts towards GL_MAX_VERTEX_ATTRIBS, but myAttrBuffer does not since it is loaded explicitly by the shader.
You could even use the same buffer object in both cases by binding with a different target.
If you can not absolutely limit yourself to GL_MAX_VERTEX_ATTRIBS attributes, I would advise using multi pass shaders. Redesign your code to work with data with half set of attributes in first pass, and the remaining in second pass.

Create view matrices in GLSL shader

I have many positions and directions stored in 1D textures on the GPU. I want to use those as rendersources in a GLSL geometry shader. To do this, I need to create corresponding view matrices from those textures.
My first thought is to take a detour to the CPU, read the textures to memory and create a bunch of view matrices from there, with something like glm::lookat(). Then send the matrices as uniform variables to the shader.
My question is, wether it is possible to skip this detour and instead create the view matrices directly in the GLSL geometry shader? Also, is this feasible performance wise?
Nobody says (or nobody should say) that your view matrix has to come from the CPU through a uniform. You can just generate the view matrix from the vectors in your texture right inside the shader. Maybe the implementation of the good old gluLookAt is of help to you there.
If this approach is a good idea performance-wise, is another question, but if this texture is quite large or changes frequently, this aproach might be better than reading it back to the CPU.
But maybe you can pre-generate the matrices into another texture/buffer using a simple GPGPU-like shader that does nothing more than generate a matrix for each position/vector in the textures and store this in another texture (using FBOs) or buffer (using transform feedback). This way you don't need to make a roundtrip to the CPU and you don't need to generate the matrices anew for each vertex/primitive/whatever. On the other hand this will increase the required memory as a 4x4 matrix is a bit more heavy than a position and a direction.
Sure. Read the texture, and build the matrices from the values...
vec4 x = texture(YourSampler, WhateverCoords1);
vec4 y = texture(YourSampler, WhateverCoords2);
vec4 z = texture(YourSampler, WhateverCoords3);
vec4 w = texture(YourSampler, WhateverCoords4);
mat4 matrix = mat4(x,y,z,w);
Any problem with this ? Or did I miss something ?
The view matrix is a uniform, and uniforms don't change in the middle of a render batch, nor can they be written to from a shader (directly). Insofar I don't see how generating it could be possible, at least not directly.
Also note that the geometry shader runs after vertices have been transformed with the modelview matrix, so it does not make all too much sense (at least during the same pass) to re-generate that matrix or part of it.
You could of course probably still do some hack with transform feedback, writing some values to a buffer, and either copy/bind this as uniform buffer later or just read the values from within a shader and multiply as a matrix. That would at least avoid a roundtrip to the CPU -- the question is whether such an approach makes sense and whether you really want to do such an obscure thing. It is hard to tell what's best without knowing exactly what you want to achieve, but quite probably just transforming things in the vertex shader (read those textures, build a matrix, multiply) will work better and easier.