Does OpenGL3+ admit a specific state to render 2D-only graphics? - opengl

I'm currently working on 2D graphics, and as far as I can tell every vertex is ultimately processed as a 4D point in homogeneous space. So I say to myself: what a waste of resources! I gather that the hardware is essentially designed to handle 3D scenes, and as such may be hardcoded to do 4d linear algebra. Yet, is there a way to write shaders (or enable a bunch of options) so that only genuine 2d coordinates are used in hard memory? I know one could embed two 2x2 matrices in a 4x4 matrix, but the gl_Position variable being a vec4 seems to end the track here. I'm not looking for some kind of "workaround" hack like this, but rather of a canonical way to make OpenGL do it, like a specific mode/state.
 I've not been able to find either sample code or even a simple mention of such a fact on the net, so I gather it should simply be impossible/not desirable for, say, performance reasons. Is that so?

Modern GPUs are actually scalar architectures. In GLSL you can write also shorter vectors. vec2 is a perfectly valid type and you can create vertex arrays with just 2 scalar elements per vector, as defined by the size parameter of glVertexAttribPointer
As Anon M. Coleman commented, OpenGL will internally perform a vec4(v, [0, [0]], 1) construction for any data passed in as a vertex attribute of dimension < 4.
In the vertex shader you must assign a vec4 to gl_Position. But you can trivially expand a vec2 to a vec4:
vec2 v2;
gl_Position = vec4(v2, 0, 1);
Yes, the gl_Position output always must be a vec4, due to the fact OpenGL specifies operations in clip space. But this is not really a bottleneck at all.

All credit goes to Andon M. Coleman, who perfectly answered the question as a comment. I just quote it here for the sake of completion:
«Absolutely not. Hardware itself is/was designed around 4-component data and instructions for many years. Modern GPUs are scalar friendly, and they have to be considering the push for GPGPU (but older NV GPUs pre-GeForce 8xxx have a purely vector ALU). Now, as for vertex attributes, you get 16 slots of size (float * 4) for storage. This means whether you use a vec2 or vec4 vertex attribute, it actually behaves like a vec4. This can be seen if you ever write vec4 and only give enough data for 2 of the components - GL automatically assigns Z = 0.0 and W = 1.0.
Furthermore, you could not implement clipping in 2D space with 2D coordinates. You need homogeneous coordinates to produce NDC coordinates. You would need to use window space coordinates, which you cannot do from a vertex shader. After the vertex shader finishes, GL will perform clipping, perspective divide and viewport mapping to arrive at window space. But window space coordinates are still 4D (the Z component may not contribute to a location in window space, but it does affect fragment tests). »

Related

OpenGL, are unused shadow maps in the shader bad?

Is it a bad idea to have something like this in your shader code?
uniform sampler2DShadow shadowMaps[8];
uniform int numShadowMaps;
With this one could have an universal shader that works for all cases from 0 to 8 shadow maps. Or is it better for performance to have 8 different shaders with hardcoded shadowmap numbers?
The issue isn't really one of performance so much as taking up so many binding points. Many implementations only allow 16 texture binding points (per shader stage), so you'd be using up half of them just for shadows. Even if they go unused, they're still taking up those resources.
It would be better to use a 2D array texture shadow map sampler (sampler2DArrayShadow). You would still have a uniform to tell you how many array layers are populated with useful data.
Plus, this way you are not hard-coding a limit into your shader. So if you decide that the upper limit of shadow maps is only 6 or should be expanded to 10, you don't have to change your shader code.
The downside of course is that all layers of an array texture must be the same size. So if you need some of your shadow maps to be smaller, you can't do that.

How to Make Large Matrix Multiply

I'm trying to make an GLSL shader that multiplies a 90x10 matrix with an 10x1 one. The 90x1 result corresponds to the xyz values of 30 vertices. The first large matrix is only loaded at startup. The other matrix, on the other hand, can change at each render.
How could this be done? I'm guessing the first matrix could be stored as a texture, but I have no idea what to do with the second.
Just pass the second matrix as a uniform array of floats.
uniform float vec10[10];
and perform the multiplication element by element.
Note that if that's too slow, you can try packing your large texture in such a way that you can read 4 elements with a single texelfetch.
If you want to see the syntax for binding uniform arrays, consult http://www.opengl.org/wiki/Uniform_(GLSL) .
Note, that its also completely legal to store this second matrix in texture as well; I'm just not sure of the performance impact of doing so as opposed to sending as a uniform. But get it working first, profile and optimize later.

Fixed-point arithmetic with vertex shaders

If I use fixed-point (or integers with 1 describing the smallest game unit) to describe my vertex vectors, how can I setup OpenGL/eigen transformations to work with it? If I'm doing this in my vertex shader:
gl_Position = projectionMatrix * viewMatrix * modelMatrix * vec4(in_Position, 1.0)
If I pass in_Position in as a vec3 of GL_INT, while I pass in the matrices as GL_FLOAT mat4, will the proper casting be done? Is there a performance cost?
Is it possible to prepare my transformation matrices to be in fixed-point as well?
This is being done with a 2D game, which I think makes it more feasible than with 3D. I would really prefer the accuracy, since it seems there is degradation of position on large maps when things get far away from the origin. I realize I could probably get away with only object position being an integer while the vertices are still described as floats. However, I think my collision scheme will work better with fixed-point vertices. What is generally the performance difference?
This will imply a int to float conversion that will penalize your performances. You should cast in_Position to vec3 at CPU to GPU copy time. If you use a Matrix object to store them on CPU, you can cast them with:
MatrixXf data_as_float = data_as_int.cast<float>();
Then call glBufferData with data_as_float.
Ok, after some experimentation, I've settled on a solution.
gl_Position = projviewMatrix * vec4(ivec3(in_Position - camera), 1.0);
camera is a uniform uvec3, and in_Position is the uvec3 position input. Translation is performed as a separate operation, while the view scaling, rotation, and projection is done with a mat4 of floats (projviewMatrix) as usual.
Care must be taken to ensure the proper types and input commands (glVertexAttribIPointer) are used. OpenGL seems very eager to cast to float yet leave the data in an integer type, so any small error will result in mangled input.
It simply is not feasible to perform the projviewMatrix multiply in fixed-point, since you do not have access to an intermediary 64-bit storage for the multiplications. Only if the bits used by in_Position and projviewMatrix sum to 32 would it approach usability, but considering that coords for rendering will be so close to the origin and no extra ops are gained (still need to shift after multiply, GPU will take as long for floats as ints), there is no reason to perform fixed-point arithmetic after the position has been centered by camera.
Of course, this is ignoring the royal pain it is to actually manipulate the integer position data. I really wouldn't recommend it.

Is it possible to loop through a second VBO in the vertex shader?

So, let say that I have two vertex buffers. One that describes the actual shape I want to draw, and the other one is able to influence the first one.
So, what I actually want to be able to do is something like this:
uniform VBO second_one;
void main()
{
for (int i = 0; i < size_of_array(second_one); ++i)
Do things with second_one[i] to alter the values
create the output informations
}
Things I might want to do can be gravity, that that each point in second_one tries to drag a bit the point closer to it and so on and then after the point is adjusted, apply the matrices to have its actual location.
I would be really surprise that it's possible, or something close to it. But the whole point is to be able to use a second VBO, or the make it as a uniform of type vec3 let say so I can access it.
For what you're wanting, you have three options.
An array of uniforms. GLSL lets you do uniform vec3 stuff[50];. And arrays in GLSL have a .length() method, so you can find out how big they are. Of course, there are limits to the number of uniforms you use, but you shouldn't need more than 20-30 of these. Anything more than that and you'll really feel the performance drain.
Uniform buffer objects. These can store a bit more data than non-block uniforms, but they still have limits. And the storage comes from a buffer object. But accesses to them are, depending on hardware, slightly slower than accesses to direct uniforms.
Buffer textures. This is a way to attach a buffer object to a texture. With this, you can access vast amounts of memory from within a shader. But be warned: they're not fast to access. If you can make due with one of the above methods, do so.
Note that #2 and #3 will only be found on hardware capable of supporting GL 3.x and above. So DX10-class hardware.

GLSL per vertex fixed size array

Is it possible in desktop GLSL to pass a fixed size array of floats to the vertex shader as an attribute? If yes, how?
I want to have per vertex weights for character animation so I would like to have something like the following in my vertex shader:
attribute float weights[25];
How would I fill the attribute array from my C++ & OpenGL program? I have seen in another question that I could get the attribute location of the array attribute and then just add the index to that location. Could someone give an example on that for my pretty large array?
Thanks.
Let's start with what you asked for.
On pretty much no hardware that exists currently will attribute float weights[25]; compile. While shaders can have arrays of attributes, each array index represents a new attribute index. And on all hardware the currently exists, the maximum number of attribute indices is... 16. You'd need 25, and that's just for the weights.
Now, you can mitigate this easily enough by remembering that you can use vec4 attributes. Thus, you store every four array elements in a single attribute. Your array would be attribute vec4 weights[7]; which is doable. Your weight-fetching logic will have to change of course.
Even so, you don't seem to be taking in the ramifications of what this would actually mean for your vertex data. Each attribute represents a component of a vertex's data. Each vertex for a rendering call will have the same amount of data; the contents of that data will differ, but not how much data.
In order to do what you're suggesting, every vertex in your mesh would need 25 floats describing the weight. Even if this was stored as normalized unsigned bytes, that's still 25 extra bytes worth of data at a minimum. That's a lot. Especially considering that for the vast majority of vertices, most of these values will be 0. Even in the worst case, you'd be looking at maybe 6-7 bones affecting an single vertex.
The way skinning is generally done in vertex shaders is to limit the number of bones that affects a single vertex to four. This way, you don't use an array of attributes; you just use a vec4 attribute for the weights. Of course, you also now need to say which bone is associated with which weight. So you have a second vec4 attribute that specifies the bone index for that weight.
This strikes a good balance. You only take up 2 extra attributes (which can be unsigned bytes in terms of size). And for the vast majority of vertices, you'll never even notice, because most vertices are only influenced by 1-3 bones. A few uses 4, and fewer still use 5+. In those cases, you just cut off the lowest weights and recompute the weights of the others proportionately.
Nicol Bolas already gave you an answer how to restructure your task. You should do it, because processing 25 floats for a vertex, probably through some quaternion multiplication will waste a lot of good GPU processing power; most of the attributes for a vertex will translate close to an identity transform anyway.
However for academic reasons I'm going to tell you, how to pass 25 floats per vertex. The key is not using attributes for this, but fetching the data from some buffer, a texture. The GLSL vertex shader stage has the builtin variable gl_VertexID, which passes the index of the currently processed vertex. With recent OpenGL you can access textures from the vertex shader as well. So you have a texture of size vertex_count × 25 holding the values. In your vertex shader you can access them using the texelFetch function, i.e. texelFetch(param_buffer, vec2(gl_VertexID, 3));
If used in skeletal animation this system is often referred to as texture skinning. However it should be used sparingly, as it's a real performance hog. But sometimes you can't avoid it, for example when implementing a facial animation system where you have to weight all the vertices to 26 muscles, if you want to accurately simulate a human face.