I want to draw two cubes with a rectangle between them, so I stored vertices data into a vbo,then i created an ebo(Element Buffer Object) to avoid extra vertices(42 vs 12).
I need to draw them separately, because I want the rectangle to reflect the up cube, doing stencil test and disabling the depth mask while drawing the rectangle.
I thought I could draw the first cube with a glDrawElements call
glDrawElements(GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0);
then, to draw the ractangle I'm trying to use glDrawRangeElements
glDrawRangeElements(GL_TRIANGLES, 36, 41, 6, GL_UNSIGNED_INT, 0);
but it is just drawing the base of the cube.
For the last cube I am using the same draw call of the first, just inverting it in the z axis.
I think i did something wrong with glDrawRangeElements parameters, because i tried doing just one call for the first cube and then the rectangle
glDrawElements(GL_TRIANGLES, 42, GL_UNSIGNED_INT, 0);
and it works.
What's wrong with that glDrawRangeElements call?
EDIT: I solved by not using a glDrawRangeElements call but a simple glDrawArrays call, rearranging the rectangle's vertices to draw two triangles;
glDrawRangeElements doesn't do what you think it does. The functionality of glDrawRangeElements is identical to glDrawElements. The only difference is that glDrawRangeElements takes a range that acts as a hint to the implementation as to which vertices you'll be using.
See, because your indices are in an array, the driver doesn't automatically know what section of the vertex data you're using. You use glDrawRangeElements as a potential performance enhancer; it lets you tell the driver what range of vertices your draw call uses.
Nowadays, glDrawRangeElements is pointless. See, the range used to matter, because implementations used to read vertex arrays from CPU memory. So when you did a regular glDrawElements, the driver had to read your index buffer, figure out what the range of vertex data was, and then copy that data from your vertex buffers into GPU memory and then issue the draw call. The Range version allows it to skip the expensive index reading step.
That doesn't matter much anymore, since now we store vertex data in buffer objects on the GPU. So you shouldn't be using glDrawRangeElements at all.
Related
I'm using OpenGL 4 and C++11.
Currently I make a whole bunch of individual calls to glDrawElements using separate VAOs with a separate VBO and an IBO.
I do this because the texture coords change for each, and my Vertex data features the texture coords. I understand that there's some redundent position information in this vertex data; however, it's always -1,-1,1,1 because I use a translation and a scale matrix in my vertex shader to then position and scale the vertex data.
The VAO, VBO, IBO, position and scale matrix and texture ID are stored in an object. It's one object per quad.
Currently, some of the drawing would occur like this:
Draw a quad object via (glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT,0)). The bound VBO is just -1,-1,1,1 and the IBO draws me a quad. The bound VBO contains the texture coords of a common texture (same texture used to texture all drawn quads). Matrix transformations on shader position it.
Repeat with another quad object
glEnable(GL_SCISSOR_TEST) is called and the position information of the preview quad is used in a call to glScissor
Next quad object is drawn; only the parts of it visible from the previous quad are actually shown.
Draw another quad object
The performance I'm getting now is acceptable but I want it faster because I've only scratched the surface of what I have in mind. So I'm looking at optimizing. So far I've read that I should:
Remove the position information from my vertex data and just keep texture coords. Instead bind a single position VBO at the start of drawing quads so it's used by all of them.
But I'm unsure how this would work? Because I can only have one VBO active at any one time.
Would I then have to call glBufferSubData and update the texture coordinates prior to drawing each quad? Would this be better performance or worse (a call to glBindVertexArray for every object or a call to glBufferSubData?)
Would I still pass the position and scale as matrices to the shader, I would I take that opportunity to also update the position info of the vertices as well as the texture coords? Which would be faster?
Create one big VBO with or without an IBO and update the vertex data for the position (rather than use a transformation and scale matrix) of each quad within this. It seems like this would be difficult to manage.
Even if I did manage to do this; I would only have a single glDraw call; which sounds fast. Is this true? What sort of performance impact does a single glBindVertexArray call have over multiple?
I don't think there's any way to use this method to implement something like the glScissor call that I'm making now?
Another option I've read is instancing. So I draw the quad however many times I need it; which means I would pass the shader an array of translation matrices and an array of texture coords?
Would this be a lot faster?
I think I could do something like the glScissor test by passing an additional array of booleans which defines whether the current quad should be only drawn within the bounds of the previous one. However, I think this means that for each gl_InstanceID I would have to traverse all previous instances looking for true and false values, and it seems like it would be slow.
I'm trying to save time by not implementing all of these individually. Hopefully an expert can point me towards which is probably better. If anyone has an even better idea, please let me know.
You can have multiple VBO attached to different attributes!
following seqence binds 2 vbos to attribs 0 & 1, note that glBindBuffer() binds buffer temporarily and actual VBO assignment to attrib is made during glVertexAttribPointer().
glBindBuffer(GL_ARRAY_BUFFER,buf1);
glVertexAttribPointer(0, ...);
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER,buf2);
glVertexAttribPointer(1, ...);
glEnableVertexAttribArray(1);
The fastest way to provide quad positions & sizes is to use texture and sample it inside vertex shader. Of course you'd need at least RGBA (x,y,width,height) 16bits / channel texture. But then you can update quad positions using glTexSubImage2D() or you could even render them via FBO.
Everything other than that will perform slower, of course if you want we can elaborate about using uniforms, attribs in vbos or using attribs without enabled arrays for them.
Putting all together:
use single vbo, store quad id in it (int) + your texturing data
prepare x,y,w,h texture, define mapping from quad id to this texture texcoord ie: u=quad_id&0xFF , v=(quad_id>>8) (for texture 256x256 max 65536 quads)
use vertex shader to sample displacement and size from that texture (for given quad_id stored in attribute (or use vertex_ID/4 or vertex_ID/6)
fill vbo and texture
draw everything with single drawarrays of draw elements
Attempting to switch drawing mode to GL_LINE, GL_LINE_STRIP or GL_LINE_LOOP when your cube's vertex data is constructed mainly for use with GL_TRIANGLES presents some interesting results but none that provide a good wireframe representation of the cube.
Is there a way to construct the cube's vertex and index data so that simply toggling the draw mode between GL_LINES/GL_LINE_STRIP/GL_LINE_LOOP and GL_TRIANGLES provides nice results? Or is the only way to get a good wireframe to re-create the vertices specifically for use with one of the line modes?
The most practical approach is most likely the simplest one: Use separate index arrays for line and triangle rendering. There is certainly no need to replicate the vertex attributes, but drawing entirely different primitive types with the same indices sounds highly problematic.
To implement this, you could use two different index (GL_ELEMENT_ARRAY_BUFFER) buffers. Or, more elegantly IMHO, use a single buffer, and store both sets of indices in it. Say you need triIdxCount indices for triangle rendering, and lineIdxCount for line rendering. You can then set up you index buffer with:
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBuf);
glBufferData(GL_ELEMENT_ARRAY_BUFFER,
(triIdxCount + lineIdxCount) * sizeof(GLushort), 0,
GL_STATIC_DRAW);
glBufferSubData(GL_ELEMENT_ARRAY_BUFFER,
0, triIdxCount * sizeof(GLushort), triIdxArray);
glBufferSubData(GL_ELEMENT_ARRAY_BUFFER,
triIdxCount * sizeof(GLushort), lineIdxCount * sizeof(GLushort),
lineIdxArray);
Then, when you're ready to draw, set up all your state, including the index buffer binding (ideally using a VAO for all of the state setup) and then render conditionally:
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBuf);
if (renderTri) {
glDrawElements(GL_TRIANGLES, triIndexCount, GL_UNSIGNED_SHORT, 0);
} else {
glDrawElements(GL_LINES, lineIdxCount, GL_UNSIGNED_SHORT,
triIndexCount * sizeof(GLushort));
}
From a memory usage point of view, having two sets of indices is a moderate amount of overhead. The actual vertex attribute data is normally much bigger than the index data, and the key point here is that the attribute data is not replicated.
If you don't strictly want to render lines, but just have a requirement for wireframe types of rendering, there are other options. There is for example an elegant approach (never implemented it myself, but it looks clever) where you only draw pixels close to the boundary of polygons, and discard the interior pixels in the fragment shader based on the distance to the polygon edge. This question (where I contributed an answer) elaborates on the approach: Wireframe shader - Issue with Barycentric coordinates when using shared vertices.
Since GL_QUADS has been removed from OpenGL 3.1 and above, what is the fastest way to draw lots of quads without using it? I've tried several different methods (below) and have ranked them on speed on my machine, but I was wondering if there is some better way, since the fastest way still seems wasteful and inelegant. I should mention that in each of these methods I'm using VBOs with interleaved vertex and texture coordinates, since I believe that to be best practice (though I may be wrong). Also, I should say that I can't reuse any vertices between separate quads because they will have different texture coordinates.
glDrawElements with GL_TRIANGLE_STRIP using a primitive restart index, so that the index array looks like {0, 1, 2, 3, PRI, 4, 5, 6, 7, PRI, ...}. This takes in the first 4 vertices in my VBO, treats them as a triangle strip to make a rectangle, and then treats the next 4 vertices as a separate strip. The problem here is just that the index array seems like a waste of space. The nice thing about GL_QUADS in earlier versions of OpenGL is that it automatically restarts primitives every 4 vertices. Still, this is the fastest method I can find.
Geometry shader. I pass in 1 vertex for each rectangle and then construct the appropriate triangle strip of 4 vertices in the shader. This seems like it would be the fastest and most elegant, but I've read, and now seen, that geometry shaders are not that efficient compared to passing in redundant data.
glDrawArrays with GL_TRIANGLES. I just draw every triangle independently, reusing no vertices.
glMultiDrawArrays with GL_TRIANGLE_STRIP, an array of all multiples of 4 for the "first" array, and an array of a bunch of 4's for the "count" array. This tells the video card to draw the first 4 starting at 0, then the first 4 starting at 4, and so on. The reason this is so slow, I think, is that you can't put these index arrays in a VBO.
You've covered all the typical good ways, but I'd like to suggest a few less typical ones that I suspect may have higher performance. Based on the wording of the question, I shall assume that you're trying to draw an m*n array of tiles, and they all need different texture coordinates.
A geometry shader is not the right tool to add and remove vertices. It's capable of doing that, but it's really intended for cases when you actually change the number of primitives you're rendering dynamically (e.g. shadow volume generation). If you just want to draw a whole bunch of adjacent different primitives with different texture coordinates, I suspect the absolute fastest way would be to use tessellation shaders. Just pass in a single quad and have the tessellator compute texture coordinates procedurally.
A similar and more portable method would be to look up each quad's texture coordinate. This is trivial: say you're drawing 50x20 quads, you would have a 50x20 texture that stores all your texture coordinates. Tap this texture in your vertex program (or perhaps more efficiently in your geometry program) and send the result in a varying to the fragment program for actual rendering.
Note that in both of the above cases, you can reuse vertices. In the first method, the intermediate vertices are generated on the fly. In the second, the vertices' texture coordinates are replaced in the shader with cached values from the texture.
Context
I am doing swarm simulation using GPU programming (both OpenCL and CUDA,
but not at the same time of course) for scientific purpose.
I use OpenGL for display.
Goal
I would like to draw the same object —namely the swarming particle, can be a simple triangle in 2D— N times at different positions and with
different orientations in the most efficient way knowing that:
the object is always exactly the same
the positions and orientations are calculated on the GPU and thus stored in the GPU memory
the number of particles N can be large
Current solution
So far, to avoid sending back the data to the CPU, I store the position and
orientation arrays in a VBO and use:
glBindBuffer(GL_ARRAY_BUFFER, position_vbo);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(2, GL_FLOAT, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, velocity_vbo);
glEnableClientState(GL_COLOR_ARRAY);
glColorPointer(4, GL_FLOAT, 0, 0);
glDrawArrays(GL_POINTS, 0, N);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, 0);
to draw a set of points with color-coded velocity without copying back the arrays to the CPU.
What I would like to do is something like drawing a full object instead of a simple point
using a similar way ie without copying back the VBO's to the CPU.
Basically I would like to store on the GPU the model of an object
(a Display List? a Vertex Array?) and to use the positions and orientations on the GPU
to draw the object N times without sending data back to the CPU.
Is it possible and how? Else, how should I do it?
PS: I like keeping the code clean so I would rather separate the display issues from the swarming kernel.
I believe you can do this with a geometry shader (available in OpenGL 3.2). See this tutorial for specific information.
In your case, you need to make the input type and output type of the geometry shader to GL_POINTS and GL_TRIANGLES respectively, and in your geometry shader, emit the 3 vertices of your triangle for each incoming point vertex.
Thus far i have only used glDrawArrays and would like to move over to using an index buffer and indexed triangles. I am drawing a somewhat complicated object with texture coords, normals and vertex coords. All this data is gathered into a single interleaved vertex buffer and drawn using calls similar to ( Assuming all the serup is done correctly ):
glVertexPointer( 3, GL_FLOAT, 22, (char*)m_vertexData );
glNormalPointer( GL_SHORT, 22, (char*)m_vertexData+(12) );
glTexCoordPointer( 2, GL_SHORT, 22, (char*)m_vertexData+(18) );
glDrawElements(GL_TRIANGLES, m_numTriangles, GL_UNSIGNED_SHORT, m_indexData );
Does this allow for m_indexData to also be interleaved with the indices of my normals and texture coords as well as the standard position index array? Or does it assume a single linear list of inidices that apply to the entire vertex format ( POS, NOR, TEX )? If the latter is true, how is it possible to render the same vertex with different texture coords or normals?
I guess this question could also be rephrased into: if i had 3 seperate indexed lists ( POS, NOR, TEX ) where the latter 2 cannot be rearranged to share the same index list as the first, what is the best way to render that.
You cannot have different indexes for the different lists. When you specify glArrayElement(3) then OpenGL is going to take the 3rd element of every list.
What you can do is play with the pointer you specify since essentially the place in the list which is eventually accessed is the pointer offset from the start of the list plus the index you specify. This is useful if you have a constant offset between the lists. if the lists are just a random permutation then this kind of play for every vertex is probably going to be as costy as just using plain old glVertex3fv(), glNormal3fv() and glTexCoord3fv()
I am having similar trouble attempting to do the same in Direct3D 9.0
For my OpenGL 3 implementation it was rather easy, and my source code is available online if it might help you any...
https://github.com/RobertBColton/enigma-dev/blob/master/ENIGMAsystem/SHELL/Graphics_Systems/OpenGL3/GL3model.cpp