GLSL per vertex fixed size array - c++

Is it possible in desktop GLSL to pass a fixed size array of floats to the vertex shader as an attribute? If yes, how?
I want to have per vertex weights for character animation so I would like to have something like the following in my vertex shader:
attribute float weights[25];
How would I fill the attribute array from my C++ & OpenGL program? I have seen in another question that I could get the attribute location of the array attribute and then just add the index to that location. Could someone give an example on that for my pretty large array?
Thanks.

Let's start with what you asked for.
On pretty much no hardware that exists currently will attribute float weights[25]; compile. While shaders can have arrays of attributes, each array index represents a new attribute index. And on all hardware the currently exists, the maximum number of attribute indices is... 16. You'd need 25, and that's just for the weights.
Now, you can mitigate this easily enough by remembering that you can use vec4 attributes. Thus, you store every four array elements in a single attribute. Your array would be attribute vec4 weights[7]; which is doable. Your weight-fetching logic will have to change of course.
Even so, you don't seem to be taking in the ramifications of what this would actually mean for your vertex data. Each attribute represents a component of a vertex's data. Each vertex for a rendering call will have the same amount of data; the contents of that data will differ, but not how much data.
In order to do what you're suggesting, every vertex in your mesh would need 25 floats describing the weight. Even if this was stored as normalized unsigned bytes, that's still 25 extra bytes worth of data at a minimum. That's a lot. Especially considering that for the vast majority of vertices, most of these values will be 0. Even in the worst case, you'd be looking at maybe 6-7 bones affecting an single vertex.
The way skinning is generally done in vertex shaders is to limit the number of bones that affects a single vertex to four. This way, you don't use an array of attributes; you just use a vec4 attribute for the weights. Of course, you also now need to say which bone is associated with which weight. So you have a second vec4 attribute that specifies the bone index for that weight.
This strikes a good balance. You only take up 2 extra attributes (which can be unsigned bytes in terms of size). And for the vast majority of vertices, you'll never even notice, because most vertices are only influenced by 1-3 bones. A few uses 4, and fewer still use 5+. In those cases, you just cut off the lowest weights and recompute the weights of the others proportionately.

Nicol Bolas already gave you an answer how to restructure your task. You should do it, because processing 25 floats for a vertex, probably through some quaternion multiplication will waste a lot of good GPU processing power; most of the attributes for a vertex will translate close to an identity transform anyway.
However for academic reasons I'm going to tell you, how to pass 25 floats per vertex. The key is not using attributes for this, but fetching the data from some buffer, a texture. The GLSL vertex shader stage has the builtin variable gl_VertexID, which passes the index of the currently processed vertex. With recent OpenGL you can access textures from the vertex shader as well. So you have a texture of size vertex_count × 25 holding the values. In your vertex shader you can access them using the texelFetch function, i.e. texelFetch(param_buffer, vec2(gl_VertexID, 3));
If used in skeletal animation this system is often referred to as texture skinning. However it should be used sparingly, as it's a real performance hog. But sometimes you can't avoid it, for example when implementing a facial animation system where you have to weight all the vertices to 26 muscles, if you want to accurately simulate a human face.

Related

OpenGL, are unused shadow maps in the shader bad?

Is it a bad idea to have something like this in your shader code?
uniform sampler2DShadow shadowMaps[8];
uniform int numShadowMaps;
With this one could have an universal shader that works for all cases from 0 to 8 shadow maps. Or is it better for performance to have 8 different shaders with hardcoded shadowmap numbers?
The issue isn't really one of performance so much as taking up so many binding points. Many implementations only allow 16 texture binding points (per shader stage), so you'd be using up half of them just for shadows. Even if they go unused, they're still taking up those resources.
It would be better to use a 2D array texture shadow map sampler (sampler2DArrayShadow). You would still have a uniform to tell you how many array layers are populated with useful data.
Plus, this way you are not hard-coding a limit into your shader. So if you decide that the upper limit of shadow maps is only 6 or should be expanded to 10, you don't have to change your shader code.
The downside of course is that all layers of an array texture must be the same size. So if you need some of your shadow maps to be smaller, you can't do that.

Is it possible to construct vertex on GPU from a non-XYZ vertex buffer?

I'm writing a particle simulation where the logic is updated using Intel AVX. I'm using a SoA approach to maximize my "SIMD-friendliness" but I shuffle the particle position components into XYZ-format
when updating the vertex buffer.
Is it possible to exclude the shuffle part and simply pass the vertex data in
XXYYZZ-format and construct each vertex in a shader stage?
My first thought was using three vertex buffers with x, y and z components separated and construct each vertex using the same subscript index to access the x, y and z component of a vertex.
I'm aware that this is not the conventional way but I would like to emphasize that this is just an experiment. Perhaps anyone got some knowledge about this approach (if it is even possible) and/or could point me in the right direction? Perhaps there is a name to it aswell?
Thank you!
There is no restriction on how you feed the GPU with your vertices. You can customize the input layout to read values from any number of vertex buffers, in your example, you will have at least three elements. In the vertex shader, you receive your three elements as three scalars and swizzle them back. The only real limitation is that each value are at the same index in each buffer.
In regards to performance, unless you want to get the top 1% performance of the GPU, you will see no difference compared to a well interleaved vertex. This influence mostly the bandwidth and L2 cache miss, so unless you have crazy millions of particles, it is unlikely to happen. And if you have, you can use a compute shader to interleave the data in a pre-process.

How do I deal with many variables per triangle in OpenGL?

I'm working with OpenGL and am not totally happy with the standard method of passing values PER TRIANGLE (or in my case, quads) that need to make it to the fragment shader, i.e., assign them to each vertex of the primitive and pass them through the vertex shader to presumably be unnecessarily interpolated (unless using the "flat" directive) in the fragment shader (so in other words, non-varying per fragment).
Is there some way to store a value PER triangle (or quad) that needs to be accessed in the fragment shader in such a way that you don't need redundant copies of it per vertex? Is so, is this way better than the likely overhead of 3x (or 4x) the data moving code CPU side?
I am aware of using geometry shaders to spread the values out to new vertices, but I heard geometry shaders are terribly slow on non up to date hardware. Is this the case?
OpenGL fragment language supports the gl_PrimitiveID input variable, which will be the index of the primitive for the currently processed fragment (starting at 0 for each draw call). This can be used as an index into some data store which holds per-primitive data.
Depending on the amount of data that you will need per primitive, and the number of primitives in total, different options are available. For a small number of primitives, you could just set up a uniform array and index into that.
For a reasonably high number of primitives, I would suggest using a texture buffer object (TBO). This is basically an ordinary buffer object, which can be accessed read-only at random locations via the texelFetch GLSL operation. Note that TBOs are not really textures, they only reuse the existing texture object interface. Internally, it is still a data fetch from a buffer object, and it is very efficient with none of the overhead of the texture pipeline.
The only issue with this approach is that you cannot easily mix different data types. You have to define a base data type for your TBO, and every fetch will get you the data in that format. If you just need some floats/vectors per primitive, this is not a problem at all. If you e.g. need some ints and some floats per primitive, you could either use different TBOs, one for each type, or with modern GLSL (>=3.30), you could use an integer type for the TBO and reinterpret the integer bits as floating point with intBitsToFloat(), so you can get around that limitation, too.
You can use one element in the vertex array for rendering multiple vertices. It's called instanced vertex attributes.

How to Make Large Matrix Multiply

I'm trying to make an GLSL shader that multiplies a 90x10 matrix with an 10x1 one. The 90x1 result corresponds to the xyz values of 30 vertices. The first large matrix is only loaded at startup. The other matrix, on the other hand, can change at each render.
How could this be done? I'm guessing the first matrix could be stored as a texture, but I have no idea what to do with the second.
Just pass the second matrix as a uniform array of floats.
uniform float vec10[10];
and perform the multiplication element by element.
Note that if that's too slow, you can try packing your large texture in such a way that you can read 4 elements with a single texelfetch.
If you want to see the syntax for binding uniform arrays, consult http://www.opengl.org/wiki/Uniform_(GLSL) .
Note, that its also completely legal to store this second matrix in texture as well; I'm just not sure of the performance impact of doing so as opposed to sending as a uniform. But get it working first, profile and optimize later.

How does interleaved vertex submission help performance?

I have read and seen other questions that all generally point to the suggestion to interleav vertex positions and colors, etc into one array, as this minimizes the data that gets sent from cpu to gpu.
What I'm not clear on is how OpenGL does this when, even with an interleaved array, you must still make separate GL calls for position and color pointers. If both pointers use the same array, just set to start at different points in that array, does the draw call not copy the array twice since it was the object of two different pointers?
This is mostly about cache. For example, imagine we have 4 vertex and 4 colors. You can provide the information this way (excuse me but I don't remember the exact function names)
glVertexPointer(..., vertex);
glColorPointer(..., colors);
What it internally does, is read vertex[0], then apply colors[0], then again vertex[1] with colors[1]. As you can see, if vertex is, for example, 20 megs long, vertex[0] and colors[0] will be, to say the least, 20 megabytes apart from each other.
Now, on the other hand, if you provide a structure like { vertex0, color0, vertex1, color1, etc.} there will be a lot of cache hits because, well, vertex0 and color0 are together, and so are vertex1 and color1.
Hope this helps answer the question
edit: on second read, I may not have answered the question. You might probably be wondering how does OpenGL know which values to read from that structure, maybe? Like I said before with a structure such as { vertex, color, vertex, color } you tell OpenGL that vertex is at position 0, with an offset of 2 (so next one will be at position 2, then 4, etc) and color starts at position 1, with an offset of 2 also (so position 1, then 3, etc).
addition: In case you want a more practical example, look at this link http://www.lwjgl.org/wiki/index.php?title=Using_Vertex_Buffer_Objects_(VBO). You can see there how it only provides the buffer once and then uses offsets to render efficiently.
I suggest reading: Vertex_Specification_Best_Practices
h4lc0n provided quite nice explanation, but I would like add some additional info:
interleaved data can actually hurt performance when your data often changes. For instance when you change position of point sprites, you update POS, but COLOR and TEXCOORD are usually the same. Then, when data is interleaved you must "touch" additional data. In that case it would be better to have one VBO for POS only (or in general for data that changes often) and the second VBO for data that is constant.
it is not easy to give strict rules about VBO layout, since it is very vendor/driver specific. Also your usage can be different from others. In general it is needed to make some benchmarks for your particular test cases
You could also make an argument for separating different attributes. Assuming a GPU does not process one vertex after another but rather a bunch (ex. 16) of them in parallel, you would would get something like this while executing a vertex shader:
read attribute A for all 16 vertices
perform some computations
read attribute B for all 16 vertices
perform some more computations
....
So you read one attribute for many vertices at once. From this reasoning it would seem that interleaving the attributes actually hurts the performance. Of cours this would only be visible if you are either bandwidth constrained or if the memory latency cannot be hidden for some reason (ex. a complex shader that requires many registers will reduce the number of vertices that can be in flight at a given time).