glTexSubImage3D(GL_TEXTURE_2D_ARRAY, ...) and GL_TEXTURE_SWIZZLE_RGBA - opengl

I have a texture array (~512 layers).
Some of the textures I upload have 4 channels (RGBA), some have only one (RED).
When creating individual textures, I can do this:
GLint swizzleMask[] = { GL_ONE, GL_ONE, GL_ONE, GL_RED };
glTexParameteriv(GL_TEXTURE_2D, GL_TEXTURE_SWIZZLE_RGBA, swizzleMask);
Can I do this for specific layers of my texture array? (Swizzling should apply to one texture in the array only, not the others).
I suspect this is not possible, and if so, what's the preferred method? (Vertex attributes would be my last resort option).
(i) EDIT: Looking preferably for an OpenGL 3.3 or below solution.
(ii) EDIT: The idea is that I have RGBA bitmaps for my game (grass, wall, etc...) and I also have font bitmaps. I'm trying to render these in the same draw call.
In my fragment shader, I have something like:
uniform sampler2DArray TextureArraySampler;
out vec4 FragmentColor;
in VertexOut
{
vec2 UV;
vec4 COLOR;
flat uint TEXTURE_INDEX;
} In;
void main(void)
{
FragmentColor = In.COLOR * texture(TextureArraySampler, vec3(In.UV.x, In.UV.y, In.TEXTURE_INDEX));
}
So, when rendering fonts, I would like the shader to sample like:
FragmentColor = In.COLOR * vec4(1, 1, 1, texture(TextureArraySampler, vec3(In.UV.x, In.UV.y, In.TEXTURE_INDEX)).r);
And, when rendering bitmaps:
FragmentColor = In.COLOR * texture(TextureArraySampler, vec3(In.UV.x, In.UV.y, In.TEXTURE_INDEX)).rgba;

To start with, no, there's no way to do what you want. Well, there is a way, but it involves sticking a non-dynamically uniform conditional branch in your fragment shader, which is not a cost worth paying.
I'm trying to render these in the same draw call.
Performance presentations around OpenGL often talk about reducing draw calls being an important aspect of performance. This is very true, particularly for high-performance applications.
That being said, this does not mean that one should undertake Herculean efforts to reduce the number of draw calls to 1. The point of the advice is to get people to structure their engines so that the number of draw calls does not increase with the complexity of the scene.
For example, consider your tile map. Issuing a draw call per-tile is bad because the number of draw calls increases linearly with the number of tiles being drawn. So it makes sense to draw the entire tile map in a single call.
Now, let's say that your scene consists of tile maps and font glyphs, and it will always be exactly that. You could rendering this in two calls (one for the maps and one for the glyphs), or you could do it in one call. But the performance difference between them will be negligible. What matters is that adding more tiles/glyphs does not mean adding more draw calls.
So you should not be concerned about adding a new draw call to your engine. What should concern you is if you're adding a new draw call per-X to your engine.

Related

Is this a good way to render multiple lights in OpenGL?

I am currently programming a graphics renderer in OpenGL by following several online tutorials. I've ended up with an engine which has a rendering pipeline which basically consists of rendering an object using a simple Phong Shader. My Phong Shader has a basic vertex shader which modifies the vertex based on a transformation and a fragment shader which looks something like this:
// PhongFragment.glsl
uniform DirectionalLight dirLight;
...
vec3 calculateDirLight() { /* Calculates Directional Light using the uniform */ }
...
void main() {
gl_FragColor = calculateDirLight();
The actual drawing of my object looks something like this:
// Render a Mesh
bindPhongShader();
setPhongShaderUniform(transform);
setPhongShaderUniform(directionalLight1);
mesh->draw(); // glDrawElements using the Phong Shader
This technique works well, but has the obvious downside that I can only have one directional light, unless I use uniform arrays. I could do that but instead I wanted to see what other solutions were available (mostly since I don't want to make an array of some large amount of lights in the shader and have most of them be empty), and I stumbled on this one, which seems really inefficient but I am not sure. It basically involves redrawing the mesh every single time with a new light, like so:
// New Render
bindBasicShader(); // just transforms vertices, and sets the frag color to white.
setBasicShaderUniform(transform); // Set transformation uniform
mesh->draw();
// Enable Blending so that all light contributions are added up...
bindDirectionalShader();
setDirectionalShaderUniform(transform); // Set transformation uniform
setDirectionalShaderUniform(directionalLight1);
mesh->draw(); // Draw the mesh using the directionalLight1
setDirectionalShaderUniform(directionalLight2);
mesh->draw(); // Draw the mesh using the directionalLight2
setDirectionalShaderUniform(directionalLight3);
mesh->draw(); // Draw the mesh using the directionalLight3
This seems terribly inefficient to me, though. Aren't I redrawing all the mesh geometry over and over again? I have implemented this and it does give me the result I was looking for, multiple directional lights, but the frame rate has dropped considerably. Is this a stupid way of rendering multiple lights, or is it on par with using shader uniform arrays?
For forward rendering engines where lighting is handled in the same shader as the main geometry processing, the only really efficient way of doing this is to generate lots of shaders which can cope with the various combinations of light source, light count, and material under illumination.
In your case you would have one shader for 1 light, one for 2 lights, one for 3 lights, etc. It's a combinatorial nightmare in terms of number of shaders, but you really don't want to send all of your meshes multiple times (especially if you are writing games for mobile devices - geometry is very bandwidth heavy and sucks power out of the battery).
The other common approach is a deferred lighting scheme. These schemes store albedo, normals, material properties, etc into a "Geometry Buffer" (e.g. a set of multiple-render-target FBO attachments), and then apply lighting after the fact as a set of post-processing operations. The complex geometry is sent once, with the resulting data stored in the MRT+depth render targets as a set of texture data. The lighting is then applied as a set of basic geometry (typically spheres or 2D quads), using the depth texture as a means to clip and cull light sources, and the other MRT attachments to compute the lighting intensity and color. It's a bit of a long topic for a SO post - but there are lots of good presentations around on the web from GDC and Sigraph.
Basic idea outlined here:
https://en.wikipedia.org/wiki/Deferred_shading

Why is a simple shader slower than the standard pipeline?

I want to write a very simple shader which is equivalent to (or faster) than the standard pipeline. However, even the simplest shader possible:
Vertex Shader
void main(void)
{
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = ftransform();
}
Fragment Shader
uniform sampler2D Texture0;
void main(void)
{
gl_FragColor = texture2D(Texture0, gl_TexCoord[0].xy);
}
Cuts my framerate half in my game, compared to the standard shader, and performs horrific if some transparent images are displayed. I don't understand this, because the standard shader (glUseProgram(0)) does lighting and alpha blending, while this shader only draws flat textures. What makes it so slow?
It looks like this massive slowdown of custom shaders is a problem with old Intel Graphics chips, which seem to emulate the shaders on the CPU.
I tested the same program on recent hardware and the frame drop with the custom shader activated is only about 2-3 percents.
EDIT: wrong theory. See new answer below
I think you might bump into overdraw.
I don't know what engine you are using your shader on, but if you have alpha blend on then you might end up overdrawing allot.
Think about it this way :
If you have a 800x600 screen, and a 2D quad over the whole screen, that 2D quad will have 480000 fragment shader calls, although it has only 4 vertexes.
Now, moving further, let's assume you have 10 such quads, on on top of another. If you don't sort your geometry Front to Back or if you are using alpha blend with no depth test, then you will end up with 10x800x600 = 4800000 fragment calls.
2D usually is quite expensive on OpenGL due to the overdraw. 3D rejects many fragments. Eventhou the shaders are more complicated, the number of calls are greatly reduced for 3D objects compared to 2D objects.
After long investigation, the slowdown of the simple shader was caused by the shader being too simple.
In my case, the slowdown was caused by the text rendering engine, which made heavy use of "glBitmap", which would be very slow with textures enabled (for whatever reason I cannot understand; these letters are tiny).
However, this did not affect the standard pipeline, as it would acknowledge the feature glDisable(GL_LIGHTING) and glDisable(GL_TEXTURE_2D ), which circumvents the slowdown, whereas the simple shader failed to do so and would thus even do more work as the standard pipeline. After introducing these two features to the custom shader, it is as fast as the standard pipeline, plus the ability to add random effects without any performance impact!

Efficiently making a particle system without textures

I am trying to make a particle system where instead of a texture, a quad is rendered by a fragment shader such as below.
uniform vec3 color;
uniform float radius;
uniform float edge;
uniform vec2 position;
uniform float alpha;
void main()
{
float dist = distance(gl_FragCoord.xy, position);
float intensity = smoothstep(dist-edge, dist+edge, radius);
gl_FragColor = vec4(color, intensity*alpha);
}
Each particle is an object of a c++ class that wraps this shader and all the variables together and draws it. I use openFrameworks so the exact openGL calls are hidden from me.
I've read that usually particle systems are done with textures, however, I prefer to do it like this because this way I can add more functionality to the particles. The problem is that after only 30 particles, the framerate drops dramatically. Is there a more efficient way of doing this? I was thinking of maybe putting the variables for each particle into an array and sending these arrays into one fragment shader that then renders all particles in one go. But this would mean that the amount of particles would be fixed because the uniform arrays in the shader would have to be declared beforehand.
Are non-texture-based particle systems simply too inefficient to be realistic, or is there a way of designing this that I'm overlooking?
The reason textures are used is because you can move the particles using the GPU, which is very fast. You'd double buffer a texture which stores particle attributes (like position) per texel, and ping-pong data between them, using a framebuffer object to draw to them and the fragment shader to do the computation, rendering a full screen polygon. Then you'd draw an array of quads and read the texture to get the positions.
Instead of a texture storing attributes you could pack them directly into your VBO data. This gets complicated because you have multiple vertices per particle, but can still be done a number of ways. glVertexBindingDivisor (requires instancing), drawing points, or using the geometry shader come to mind. Transform feedback or image_load_store could be used to update VBOs with the GPU instead of textures.
If you move particles with the CPU, you also need to copy the data to the GPU every frame. This is quite slow, but nothing like 30 particles being a problem slow. This is probably to do with the number of draw calls. Each time you draw something there's a tonne of stuff GL does to set up the operation. Setting uniform values per-primitive (nearly) is very expensive for the same reason. Particles work well if you have arrays of data that gets processed by a manager all at once. They parallelize very well in such cases. Generally their computation is cheap and it all comes down to minimizing memory and keeping good locality.
If you want to keep particle updating CPU side, I'd go with this:
Create a VBO full of -1 to 1 quads (two triangles, 6 verts) and element array buffer to draw them. This buffer will remain static in GPU memory and is what you use to draw the particles all at once with a single draw call.
Create a texture (could be 1D) or VBO (if you choose one of the above methods) that contains positions and particle attributes that update pretty much every frame (using glTexImage1D/glBufferData/glMapBuffer).
Create another texture with particle attributes that rarely update (e.g. only when you spawn them). You can send updates with glTexSubImage1D/glBufferSubData/glMapBufferRange.
When you draw the particles, read position and other attributes from the texture (or attributes if you used VBOs) and use the -1 to 1 quads in the main geometry VBO as offsets to your position.

Better to update a small vertex buffer, or send a uniform?

I'm writing/planning a GUI renderer for my OpenGL (core profile) game engine, and I'm not completely sure how I should be representing the vertex data for my quads. So far, I've thought of 2 possible solutions:
1) The straightforward way, every GuiElement keeps track of it's own vertex array object, containing 2d screen co-ordinates and texture co-ordinates, and is updated (glBufferSubData()) any time the GuiElement is moved or resized.
2) I globally store a single vertex array object, whose co-ordinates are (0,0)(1,0)(0,1)(1,1), and upload a rect as a vec4 uniform (x, y, w, h) every frame, and transform the vertex positions in the vertex shader (vertex.xy *= guiRect.zw; vertex.xy += guiRect.xy;).
I know that method #2 works, but I want to know which one is better.
I do like the idea of option two, however, it would be quite inefficient because it requires a draw call for each element. As was mentioned by other replies, the biggest performance gains lie in batching geometry and reducing the number of draw calls. (In other words, reducing the time your application spends communicating with the GL driver).
So I think the fastest possible way of drawing 2D objects with OpenGL is by using a technique similar to your option one, but adding batching to it.
The smallest possible vertex format you need in order to draw a quadrilateral on the screen is a simple vec2, with 4 vec2s per quadrilateral. The texture coordinates can be generated in a very lightweight vertex shader, such as this:
// xy = vertex position in normalized device coordinates ([-1,+1] range).
attribute vec2 vertexPositionNDC;
varying vec2 vTexCoords;
const vec2 scale = vec2(0.5, 0.5);
void main()
{
vTexCoords = vertexPositionNDC * scale + scale; // scale vertex attribute to [0,1] range
gl_Position = vec4(vertexPositionNDC, 0.0, 1.0);
}
In the application side, you can set up a double buffer to optimize throughput, by using two vertex buffers, so you can write to one of them on a given frame then flip the buffers and send it to GL, while you start writing to the next buffer right away:
// Update:
GLuint vbo = vbos[currentVBO];
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferSubData(GL_ARRAY_BUFFER, dataOffset, dataSize, data);
// Draw:
glDrawElements(...);
// Flip the buffers:
currentVBO = (currentVBO + 1) % NUM_BUFFERS;
Or another simpler option is to use a single buffer, but allocate new storage on every submission, to avoid blocking, like so:
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, dataSize, data, GL_STREAM_DRAW);
This is a well known and used technique for simple async data transfers. Read this for more.
It is also a good idea to use indexed geometry. Keep an index buffer of unsigned shorts with the vertex buffer. A 2-byte per element IB will reduce data traffic quite a bit and should have an index range big enough for any amount of 2D/UI elements that you might wish to draw.
For GUI elements you could use dynamic vertex buffer (ring buffer) and just upload the geometry every frame because this is quite small amount of geometry data. Then you can batch your GUI element rendering unlike in both of your proposed methods.
Batching is quite important if you render large number of GUI elements, such as text. You can quite easily build a generic GUI rendering system with this which caches the GUI element draw calls and flushes the draws to the GPU upon state changes.
I would recommend doing it like DXUT does it, where it takes the rects from each element, and renders them with a single universal method that takes an element as a parameter, which contains a rect. Each control can have many elements. It adds the four points of the rect to a buffer in a specific order in STREAM_DRAW mode and a constant index buffer. This does draw each rect individually, but performance is not completely vital, because your geometry is simple, and when you are in a dialog, you can usually put the rendering of the 3d scene on the back burner. EDIT: even using this to do HUD items, it has a negligible performance penalty.
This is a simple and organized way to do it, where it works well with textures, and there are only two shaders, one for drawing textured components, and one for non-textured. THen there is a special way to do text.
If you want to see how I did it, you can look at this:
https://github.com/kevinmackenzie/ObjGLUF
It is in GLUFGui.h/.cpp

Altering brigthness on OpenGL texture

I would like to increase the brightness on a texture used in OpenGL rendering. Such as making it bright red or white. This is a 2D rendering environment, where every sprite is mapped as a texture to an OpenGL polygon.
I know little to nothing on manipulating data, and my engine works with a texture cache, so altering the whole surface would affect everything using the texture.
I can simulate the effect by having a "mask" and overlaying it, allowing me to make the sprite having solid colors, but that takes away memory.
If there any other solution to this?
If your requirement afford it, you can always write a very simple GLSL fragment shader which does this. It's literally a one liner.
Something like:
uniform sampler2d tex;
void main()
{
gl_FragColor = texture2d(tex, gl_TexCoord[0]) + gl_Color;
}
Perhaps GL_ADD instead of GL_MODULATE?
use GL_MODULATE to multiply the texture color by the current color.
see the texture tutorial in this page.