What are the semantics of glBindMultiTexture and glEnableIndexed?
I have seen glBindMultiTexture used with glEnableIndexed where it seems to to something similar to e.g. glEnable(GL_TEXTURE_2D) though I am unsure if it is required or not and if it replaces glEnable(GL_TEXTURE_2D) or not, or should both be used? The DSA spec doesn't seem to mention glEnableIndexed in the context of glBindMultiTextureEXT.
What is the correct usage?
// Init 1
glEnable(GL_TEXTURE_2D);
for(int n = 0; n < 4; ++n)
glEnableIndexed(GL_TEXTURE_2D, n);
// Init 2
for(int n = 0; n < 4; ++n)
glEnableIndexed(GL_TEXTURE_2D, n);
// Init 3
glEnable(GL_TEXTURE_2D);
// For each frame 1
for(int n = 0; n < 4; ++n)
glBindMultiTexture(GL_TEXTURE0 + n, GL_TEXTURE_2D, textureIds[n]);
// For each frame 2
for(int n = 0; n < 4; ++n)
{
glEnableIndexed(GL_TEXTURE_2D, n);
glBindMultiTexture(GL_TEXTURE0 + n, GL_TEXTURE_2D, textureIds[n]);
}
glEnableIndexed does not exist. glEnableIndexedEXT does however, as does glEnablei (the core OpenGL 3.0 equivalent). I'll assume you're talking about them. Same goes for glBindMultiTextureEXT.
Now that that bit of nomenclature is out of the way, it's not entirely clear what you mean by "correct usage".
If the intent of the "Init" code is to enable GL_TEXTURE_2D for fixed-function use across the first four fixed-function texture units, then 1 and 2 will do that. 3 will only enable it for the current texture unit. Do note that this is only for fixed-function texture use.
Which is where the other point comes in: generally, you do not simply enable a bunch of texture targets globally like that in an initialization routine. This would only make sense if everything you are rendering in the entire scene uses 4 2D textures in the first four texture units. Generally speaking, you enable and disable texture targets as needed for each object.
So I would say that having no enables in your initialization and enabling (and disabling) targets around your rendering calls is the "correct usage".
Also, be advised that this is no different from directly using glActiveTexture in this regard. So the fact that you're using the DSA switch-less commands is irrelevant.
Related
I'm currently writing a gravity-simulation and I have a small problem displaying the particles with OpenGL.
To get "round" particles, I create a small float-array like this:
for (int n = 0; n < 16; n++)
for (int m = 0; m < 16; m++)
{
AlphaData[n * 16 + m] = ((n - 8) * (n - 8) + (m - 8) * (m - 8) < 64);
}
I then put this in a GL_TEXTURE_2D with format GL_RED. In the fragment shader (via glDrawArraysInstanced), I draw the particles like this:
color = vec4(ParticleColor.rgb, texture(Sampler, UV).r);
This works as it should, producing a picture like this (particles enlarged for demonstration):
As you can see, no artifacts. Every particle here is the same size, so every smaller one you see on a "larger" particle is in the background and should not be visible. When I turn on depth-testing with
glEnable(GL_DEPTH_TEST);
glDepthFunc(GL_LESS);
I get something like this:
So for the most part, this looks correct ("smaller" particles being behind the "bigger" ones). But I now have artifacts from the underlying quads. Weirdly not ALL particles have this behavior.
Can anybody tell me, what I'm doing wrong? Or do depth-testing and blending not work nicely together?
I'm not sure, what other code you might need for a diagnosis (everything else seems to work correctly), so just tell me, if you need additional code.
I'm using a perspective projection here (of course for particles in 3D-space).
You're in a special case where your fragments are either fully opaque or fully transparent, so it's possible to get depth-testing and blending to work at the same time. The actual problem is, that for depth testing even a fully transparent fragment will store it's depth value. You can prevent the writing by explicitly discarding the fragment in the shader. Something like:
color = vec4(ParticleColor.rgb, texture(Sampler, UV).r);
if (color.a == 0.0)
discard;
Note, that conditional branching might introduce some additional overhead, but I wouldn't expect too many problems in your case.
For the general case with semi-transparent fragments, blending and depth-testing at the same time will not work. In order for blending to produce the correct result, you have to depth sort your geometry prior to rendering and render from back to front.
So i currently am trying out some stuff in SDL_GPU/C++ and i have the following setup, the images are 32 by 32 pixels respectively and the second image is transparent.
//..sdl init..//
GPU_Image* image = GPU_LoadImage("path");
GPU_Image* image2 = GPU_LoadImage("otherpath");
for (int i = 0; i < screenheight; i += 32) {
for (int j = 0; j < screenwidth; j += 32) {
GPU_Blit(image, NULL, screen, j, i);
GPU_Blit(image2, NULL, screen, j, i);
}
}
This codes with a WQHD sized screen has ~20FPS. When i do the following however
for (int i = 0; i < screenheight; i += 32) {
for (int j = 0; j < screenwidth; j += 32) {
GPU_Blit(image, NULL, screen, j, i);
}
}
for (int i = 0; i < screenheight; i += 32) {
for (int j = 0; j < screenwidth; j += 32) {
GPU_Blit(image2, NULL, screen, j, i);
}
}
i.e. seperate the two blitt calls in two differenct for loops i get 300FPS.
Can someone try to explain this to me or has any idea what might be going on here?
While cache locality might have an impact, I don't think it is the main issue here, especially considering the drop of frame time from 50ms to 3.3ms.
The call of interest is of course GPU_Blit, which is defined here as making some checks followed by a call to _gpu_current_renderer->impl->Blit. This Blit function seems to refer to the same one, regardless of the renderer. It's defined here.
A lot of code in there makes use of the image parameter, but two functions in particular, prepareToRenderImage and bindTexture, call FlushBlitBuffer several times if you are not rendering the same thing as in the previous blit. That looks to me like an expensive operation. I haven't used SDL_gpu before, so I can't guarantee anything, but it necessarily makes more glDraw* calls if you render something other than what you rendered previously, than if you render the same thing again and again. And glDraw* calls are usually the most expensive API calls in an OpenGL application.
It's relatively well known in 3D graphics that making as few changes to the context (in this case, the image to blit) as possible can improve performance, simply because it makes better use of the bandwidth between CPU and GPU. A typical example is grouping together all the rendering that uses some particular set of textures (e.g. materials). In your case, it's grouping all the rendering of one image, and then of the other image.
While both examples render the same number of textures, the first one forces the GPU to make hundreds/thousands (depends on screen size) texture binds while the second makes only 2 texture binds.
The cost of rendering a texture is very cheap on modern GPUs while texture binds (switching to use another texture) are quite expensive.
Note that you can use texture atlas to alleviate the texture bind bottleneck while retaining the desired render order.
Background
2D "Infinite" World separated into chunks
One VAO (& VBO/EBO) per chunk
Nested for loop in chunk render; one draw call per block.
Code
void Chunk::Render(/* ... */) {
glBindVertexArray(vao);
for (int x = 0; x < 64; x++) {
for (int y = 0; y < 64; y++) {
if (blocks[x][y] == 1) {
/* ... Uniforms ... */
glDrawElements(GL_TRIANGLE_STRIP, 6, GL_UNSIGNED_INT, (void*)0);
}
}
}
glBindVertexArray(0);
}
There is a generation algorithm in the constructor. This could be anything: noise, random, etc. The algorithm goes through and sets an element in the blocks array to 1 (meaning: render block) or 0 (meaning: do not render)
Problem
How would I go about combining these triangle strips together in order to minimize draw calls? I can think of a few algorithms to find the triangles that should be merged together in a draw call, but I am confused as how to merge them together. Do I need to add it to the vertices array and call glBufferData again? Would it be bad to call glBufferData so many times per-frame?
I'm not really rendering that many triangles, am I? I think I've heard of people who can easily draw ten-thousand triangles with minimal CPU usage (or.. millions even). So what is wrong with how I am drawing currently?
EDIT
_[Andon M. Coleman][1]_ has given me a lot of information in the [chat][2]. I have now switched over to using instanced arrays; I cannot believe how much of a difference it makes in performance, for a minute I thought Linux's `top` command was malfunctioning. It's _very_ significant. Instead of only being able to render say.. 60 triangles, I can render over a million with barely any change in CPU usage.
I'm programming with OpenGL under MSVC 2010.
One of my goal is to pick objects in the scene. I design it in the way like assigning each object a unique color, rendering them in a framebuffer, then reading the color where the cursor is, and the corresponding object can be acquired.
Now the picking is working well. However, as long as a picking happens, the memory increases rapidly. In detail, the following code render objects into a framebuffer:
for (unsigned i = 0; i < objects.size(); ++i)
{
//some code computing color;
Color color;
for (unsigned j = 0; j < objects[i].listOfPrimitives.size(); ++j)
{
objects[i].listOfPrimitives[j]->color = color;
}
objects[i].Render();
for (unsigned j = 0; j < objects[i].listOfPrimitives.size(); ++j)
{
objects[i].listOfPrimitives[j]->color = colorStorage[i][j];
}
}
where objects are objects to be rendered. Since every object has a certain number of primitives(which may be a cylinder, sphere etc.), this piece of code just changes the color of each object's primitives to a unique computed one, render the object, then change it back (colorSotrage stores the original colors). And there are some code in the following to deal with the object, which I'm sure has nothing to do with this issue.
The render method are implemented as following for most object:
glColor3ub(color[0], color[1], color[2]);
glBegin(GL_TRIANGLES);
for (unsigned i = 0; i < mesh.faces.size(); ++i)
{
glNormal3d(mesh.faces[i].normal.x, mesh.faces[i].normal.y, mesh.faces[i].normal.z);
for (unsigned j = 0; j < 3; ++j)
{
glVertex3d(mesh.vertices[mesh.faces[i].verts[j]].x,
mesh.vertices[mesh.faces[i].verts[j]].y,
mesh.vertices[mesh.faces[i].verts[j]].z);
}
}
glEnd();
But for some object, there are some concave polygons (even with holes), so I use the gluTess* group functions in GLU to render them, and to speed up the rendering procedure, I use display list to that part.
Now, as I've mentioned. this picking procedure increases the memory cost rapidly. There are two more phenomenons I can't explain:
If I comment line 8 in the first piece of code, the memory will not change at all when the piece of code runs (of course, this code will not work);
After the memory increases, if I do some refresh to the scene (I design an interactive trackball), the memory will drop off again.
So I'm wondering which part could be the reason of this issue? The display list? the gluTess*() calling? or even something related to framebuffer?
I want to use texture arrays to reduce the high texture binding cost, but I can't upload the data to the texture array. I use Tao framework. Here's my code:
Gl.glEnable(Gl.GL_TEXTURE_2D_ARRAY_EXT);
Gl.glGenTextures(1, out textureArray);
Gl.glBindTexture(Gl.GL_TEXTURE_2D_ARRAY_EXT, textureArray);
var data = new uint[textureWidth, textureHeight, textureCount];
for (var x = 0; x < textureWidth; x++)
{
for (var y = 0; y < textureHeight; y++)
{
for (var z = 0; z < textureCount; z++)
data[x, y, z] = GetRGBAColor(1, 1, 1, 1);
}
}
Gl.glTexImage3D(Gl.GL_TEXTURE_2D_ARRAY_EXT, 0, Gl.GL_RGBA, textureWidth,
textureHeight, textureCount, 0, Gl.GL_RGBA, Gl.GL_UNSIGNED_BYTE, data);
Console.WriteLine(Glu.gluErrorString(Gl.glGetError()));
The glTexImage3D function says there is an invalid enumerant.
The most likely cause for a GL_INVALID_ENUM in the above code is the
Gl.glEnable(Gl.GL_TEXTURE_2D_ARRAY_EXT);
call.
This is simply not allowed. Array textures cannot be used with the fixed-function pipeline, but only with shaders (which do not need those texture enables at all). The GL_EXT_texture_array spec makes this quite clear:
This extension does not provide for the use of array textures with fixed-function fragment processing. Such support could be added by providing an additional extension allowing pplications to pass the new target enumerants (TEXTURE_1D_ARRAY_EXT and TEXTURE_2D_ARRAY_EXT) to Enable and Disable.
There never was any further extension allowing array textures for fixed-function processing (AFAIK)...
Change the 2nd parameter of glTexImage3d into 1.
I don't know why, however, nvidia's opengl driver seems to need at least 1 level for texture 2d array object.