I'm a bit confused as to why you can set active an array of vertex buffers but only one index buffer? Could that one index buffer address the vertices from all the vertex buffers? And if so how would i specify which buffer which index belongs to?
Another question i have is that since i'm using indexed triangle lists, the index data is roughly the same size as vertex data per mesh. I was thinking about creating one index buffer per one vertex buffer. I will be dynamically adding meshes until one of the buffers runs out at which point another pair is created. Inevitably by doing this, one of the buffers in the pair will always fill up before the other one and that leftover space will never get used. Does that space actually get marked as reserved in the gpu?
Like for instance, could i fit 4 buffers that contain 32MB of data but are created with byte width of 64MB into 128MB of vram?
The same indices must be used with all vertex buffers at the same time. The purpose of this is to allow different vertex buffers to contain different components of the vertex data. For example, you might decide store the positions in one vertex buffer and the texture coordinates in a second buffer. The zeroth index would access the first position from the first vertex buffer and the first texture coordinate from the second.
This would save bandwidth if you wanted to update the texture coordinates every frame but never change the positions.
Multiple vertex buffers are also used in instancing.
When you create a vertex or index buffer, you specify the size of the buffer. This amount of memory is then reserved in video ram and cannot be used by anything else.
So if I understand your question, no you can't fit four 64mb buffers into 128 mb of ram.
Related
I used to have many VBOs, but I now combined them into one and just index into it depending on what I draw. The issue is that when I load vertices into a VBO and keep a record of how many bytes into the buffer it is it doesn't match with the argument in glDrawArrays, whose offset argument is "first", not offset in bytes. If all the vertices in the VBO are the same layout, am I supposed to do a division? For example, sizeof(Vertex) == 12 and it's placed in VBO at byte offset 48, to draw four vertices do I need to do glDrawArrays(GL_TRIANGLE_STRIP, 48 / sizeof(Vertex), 4) ???
What if it's a big buffer that contains two different types of vertex with a different layout? As buffer areas are freed to make space for other ones and they're moved around or resized how is this supposed to be done?
To set up the vertices I call glVertexAttribPointer and glEnableVertexArray four or five times, I was trying to avoid calling these for each draw call.
If the format of vertices have changed, then this means that (using the old API), you would have to have have made some number of glVertexAttribPointer calls (possibly as part of a new VAO you're going to bind) in order to change the format of the vertex data. Either way, calling this function gives you the opportunity to change the byte offset from which each attribute starts in the buffer.
So if your buffer contains 256 bytes of data in layout 1, followed by 256 bytes of data in layout 2, then the "pointer" you provide as the byte offset when setting up layout 2 should be offset by 256: the starting offset for the new vertex data.
That way, your glDrawArrays function would take an index of 0 for both meshes, since any offsetting is part of the vertex format.
And FYI yes, the "first" parameter to glDrawArrays is an index, not a byte offset.
I was trying to avoid calling these for each draw call.
If you want to avoid that, then you need to sort your models by vertex format, so that you render all models for one format, then models using a different one. Layout changes aren't cheap, so if performance is supposed to matter, pick a small pool of vertex formats and make your meshes conform to them.
My current implementation of removing vertices from the buffer is by copying the portions of the buffer before and after the section of vertices I want to remove, into a new buffer.
However this process becomes very slow when the buffer size increases.
So is there a way to just remove the vertices and have the vertices after them take their place in the buffer?
Let's say I create two vertex buffers, for two different meshes.
(I'm assuming creating separate buffers for separate meshes is how it's usually done)
Now, let's say I want to draw one of the meshes using an index buffer.
Looking at the book Practical Rendering and Computation with Direct3D 11 it doesnt seem like the creation of an index buffer in any way references a vertex buffer, so how does the index buffer know (during input assembly) what vertex buffer to act on?
I've done some googling without answers, which leads me to assume there's something obvious about it that I'm missing.
You are right, index buffers do not reference specific vertex buffers. During DrawIndexed active index buffer is used to supply indices into active vertex buffers (the ones you set using SetIndexBuffer/SetVertexBuffers).
Indeed, Index Buffers and Vertex Buffers are completely independent.
Index buffer will know about VertexBuffer at draw time (eg: when both as bound to the pipeline)
You can think of Index Buffer as a "Lookup Table", where you keep a list or elements indices to draw.
That also means you can attach two completely "logically unrelated" buffers to the pipeline and draw it, nothing will prevent you from doing that, but you will of course have some strange visual results.
Decoupling both has many advantages, here are a few examples:
You can reuse an index buffer (for example, two displaced grids with identical resolution can share the same index buffer). That can be a decent memory gain.
You can draw your Vertex buffer on it's own and do some processing per vertex (draw a point list for sprites for example, or apply skinning/displacement into a Stream Output buffer , then draw the resulting vertex buffer using DrawIndexed)
Both Vertex/Index buffers can also be bound as ByteAddressBuffer, so you can also process your geometry in Compute Shader, and build another optimized index buffer, with culled triangles for example, then process the Indexed Draw with the optimized buffer. Applying those culls in with indices instead of vertices is often faster than vertex, as you will move much less memory.
This is a niche case, but sometimes I have to draw a mesh as a set of triangles, but then draw as a set of lines (some form of wireframe). If as example, you take a single Box, you will not want to draw the diagonals as lines, so I have a shared Vertexbuffer with box vertices, then one IndexBuffer dedicated to draw triangle list, and another to draw line list. In case of large models, this can also be an effective memory gain.
So i have a system (using OpenGL 4.x) where i am receiving a stream of points (potentially with color and/or normal), from an external source. And I need to draw these points as GL_POINTS, running custom switchable shaders for coloring (color could be procedurally generated, or come from vertex color or normal direction).
The stream consists of receiving a group of points (with or without normal or color) of an arbitrary count (typical from 1k to 70k points) at a fairly regular interval (4 to 10 hz), I need to add these points to my current points and draw all the points so far received points.
I am guaranteed that my vertex type will not change, I am told at the beginning of the streaming which to expect, so i am either using an interleaved vertex with: pos+normal+color, pos+normal, pos+color, or just pos.
My current solution is to allocate interleaved vertex VBOs (with surrounding VAOs) of the appropriate vertex type at a config file specified max vertex count (allocated with the DYNAMIC hint).
As new points come in i fill up my current non filled VBO via glBufferSubData. I keep a count (activePoints) of how many vertices the current frontier VBO has in it so far, and use glBufferSubData to fill in a range starting with activePoints, if my current update group has more vertices than can fit in my frontier buffer (since i limit the vertex count per VBO), then i allocate a new VBO and fill the range starting at 0 and ending with the number of points left in my update group (not added to the last buffer), if I still have points left I do this again and again. It is rare that an update group straddles more than 2 buffers.
When rendering i render all my VBOs (-1) with a glDrawArrays(m_DrawMode,0,numVertices), where numVertices is equal to max buffer allowed size, and my frontier buffer with a glDrawArrays(m_DrawMode,startElem,numElems) to account for it not being completely filled with valid vertices.
Of course at some point I will have more points than I can draw interactively, so i have an LRU mechanism that deallocates the oldest (according to the LRU alg) sets of VBOs as needed.
Is there a more optimal method for doing this? Buffer orphaning? Streaming hint? Map vs SubData? Something else?
The second issue is that i am now asked to removed points (at irregular intervals), ranging from 10 to 2000 at a time. But these points are irregularly spaced within the order I received them initially. I can find out what offsets in which buffers they currently exit in, but its more of a scattering than a range. I have been "removing them" by finding their offsets into the right buffers and one by one calling glBufferSubData with a range of 1 (its rare that they are beside each other in a buffer), and changing there pos to be somewhere far off where they will never be seen. Eventually i guess buffers should be deleted from these remove request adding up, but I don't currently do that.
What would be a better way to handle that?
Mapping may be more efficient than glBufferSubData, especially when having to "delete" points. Explicit flush may be of particular help. Also, mapping allows you to offload the filling of a buffer to another thread.
Be positively sure to get the access bits correct (or performance is abysmal), in particular do not map a region "read" if all you do is write.
Deleting points from a vertex buffer is not easily possible, as you probably know. For "few" points (e.g. 10 or 20) I would just set w = 0, which moves them to infinity and keep drawing the whole thing as before. If your clip plane is not at infinity, this will just discard them. With explicit flushing, you would not even need to keep a separate copy in memory.
For "many" points (e.g. 1,000), you may consider using glCopyBufferSubData to remove the "holes". Moving memory on the GPU is fast, and for thousands of points it's probably worth the trouble. You then need to maintain a count for every vertex buffer, so you draw fewer points after removing some.
To "delete" entire vertex buffers, you should just orphan them (and reuse). OpenGL will do the right thing on its own behalf then, and it's the most efficient way to keep drawing and reusing memory.
Using glDrawElements instead of glDrawArrays as suggested in Andon M. Coleman's comment is usually a good advice, but will not help you in this case. The reason why one would want to do that is that the post-transform cache works by tagging vertices by their index, so drawing elements takes advantage of the post-transform cache whereas drawing arrays does not. However, the post-transform cache is only useful on complex geometry such as triangle lists or triangle strips. You're drawing points, so you will not use the post-transform cache in any case -- but using indices increases memory bandwidth both on the GPU and on the PCIe bus.
I just watching my animated sprite code, and get some idea.
Animation was made by altering tex coords. It have buffer object, which holds current frame texture coords, as new frame requested, new texture coords feed up in buffer by glBufferData().
And what if we pre-calculate all animation frames texture coords, put them in BO and create Index Buffer Object with just a number of frame, which we need to draw
GLbyte cur_frames = 0; //1,2,3 etc
Now then as we need to update animation, all we need is update 1 byte (instead of 4 /quad vertex count/ * 2 /s, t/ * sizeof(GLfloat) bytes for quad drawing with TRIANGLE_STRIP) frame of our IBO with glBufferData, we don't need hold any texture coords after init of our BO.
I am missing something? What are contras?
Edit: of course your vertex data may be not gl_float just for example.
As Tim correctly states, this depends on your application, let us talk some numbers, you mention both IBO's and inserting texture coordinates for all frames into one VBO, so let us take a look at the impact of each.
Suppose a typical vertex looks like this:
struct vertex
{
float x,y,z; //position
float tx,ty; //Texture coordinates
}
I added a z-component but the calculations are similar if you don't use it, or if you have more attributes. So it is clear this attribute takes 20 bytes.
Let's assume a simple sprite: a quad, consisting of 2 triangles. In a very naive mode you just send 2x3 vertices and send 6*20=120 bytes to the GPU.
In comes indexing, you know you have actually only four vertices: 1,2,3,4 and two triangles 1,2,3 and 2,3,4. So we send two buffers to the GPU: one containing 4 vertices (4*20=80 byte) and one containing the list of indices for the triangles ([1,2,3,2,3,4]), let's say we can do this in 2 byte (65535 indices should be enough), so this comes down to 6*2=12 byte. In total 92 byte, we saved 28 byte or about 23%. Also, when rendering the GPU is likely to only process each vertex once in the vertex shader, it saves us some processing power also.
So, now you want to add all texture coordinates for all animations at once. First thing you have to note is that a vertex in indexed rendering is defined by all it's attributes, you can't split it in an index for positions and an index for texture coordinates. So if you want to add extra texture coordinates, you will have to repeat the positions. So each 'frame' that you add will add 80 byte to the VBO and 12 byte to the IBO. Suppose you have 64 frames, you end up with 64*(80+12)=5888byte. Let's say you have 1000 sprites, then this would become about 6MB. That does not seem too bad, but note that it scales quite rapidly, each frame adds to the size, but also each attribute (because they have to be repeated).
So, what does it gain you?
You don't have to send data to the GPU dynamically. Note that updating the whole VBO would require sending 80 bytes or 640 bits. Suppose you need to do this for 1000 sprites per frame at 30 frames per second, you get to 19200000bps or 19.2Mbps (no overhead included). This is quite low (e.g. 16xPCI-e can handle 32Gbps), but it could be worth wile if you have other bandwidth issues (e.g. due to texturing). Also, if you construct your VBO's carefully (e.g. separate VBO's or non-interleaved), you could reduce it to only updating the texture-part, which is only 16 byte per sprite in the above example, this could reduce bandwidth even more.
You don't have to waste time computing the next frame position. However, this is usually just a few additions and few if's to handle the edges of your textures. I doubt you will gain much CPU power here.
Finally, you also have the possibility to simply split the animation image over a lot of textures. I have absolutely no idea how this scales, but in this case you don't even have to work with more complex vertex attributes, you just activate another texture for each frame of animation.
edit: another method could be to pass the frame number in a uniform and do the calculations in your fragment shader, before sampling. Setting a single integer uniform should be that much of an overhead.
For a modern GPU, accessing/unpacking single bytes is not necessarily faster than accessing integer types or even vectors (register sizes & load instructions, etc.). You can just save memory and therefore memory bandwidth, but I wouldn't expect this to give much of a difference in relation to all other vertex attribute array accesses.
I think, the fastest way to supply a frame index for animated sprites is either an uniform, or if multiple sprites have to be rendered with one draw call, the usage of instanced vertex attrib arrays. With the latter, you could provide a single index for fixed-size subsequences of vertices.
For example, when drawing 'sprite-quads', you'd have one frame index fetch per 4 vertices.
A third approach would be a buffer-texture, when using instanced rendering.
I recommend a global (shared) uniform for time/frame index calculation, so you can calculate the animation index on the fly within you shader, which doesn't require you to update the index buffer (which then just represents the relative animation state among sprites)