Hi I have a problem drawing with VBO. So I asked a question a here .I could not find the answer of my problem. But by discussing one answer given there I have now another question about strides in VBO. I am confused about what it is and does.
In here I found someone answered
If you have all your vertex data in one array (array : read as
malloc''ed pointer), all your normals in another array, etc. Then your
stride is 0. For instance if vertex, normals, etc are stored like that
:
[vertex0][vertex1][vertex2]...
[normal0][normal1][normal2]...
[texcoord0][texcoord1][texcoord2]...
If your vertex, normal, etc is packed like that :
[vertex0][normal0][texcoord0][vertex1][normal1][texcoord1][vertex2][normal2][texcoord2]...
Then you should set a non-null stride, which corresponds to the offset
needed to switch from one element to the next. (this stride is counted
as bytes btw)
From that explanation I thought the stride actually means the distance between end of one vertex and the start of another vertex in the buffer. In the first case its 0 because all the vertexes are stored contiguously. Same goes for the textures. But then I read another answer about the definition of strides in that same thread.
There tends to be a bit of confusion regarding VBO stride, mostly
because of its special meaning for 0.
"Stride" in this context means the distance between the beginning of a
value in memory, and the beginning of the next value in memory. It is
not the distance between the end of one and the beginning of the next.
So in a VBO that is an array of a structure, the stride for each
element of that structure will be the sizeof the structure as a whole.
Keep in mind that struct padding can affect this.
which just says the opposite of what the other answer says. Or I am wrong about what the first answer meant? Can anyone please help me solve the problem. I will really appreciate if anyone can give an answer with example. I have given the link of my implementation of VBO in the start of this question which is not solved yet. Thanks.
What the first answer is trying to say is that the "stride" between two elements is the offset in bytes between the beginning of one element and the beginning of the next.
However, if the elements you're passing are contiguous (i.e. there's no space between them), you can pass 0 for the stride parameter.
I would say that it's wrong to claim that "stride is 0" in this case - the stride is sizeof(element), but the value 0 gets special treatment and is taken to mean sizeof(element).
This is most likely done so the poor programmer doesn't have to use two (bug-prone) sizeof parameters in the common case when they are the same.
Related
I was looking for ways to associate attributes with arbitrary groupings of verticies, at first instancing appeared to be the only way for me to accomplish this, but then I stumbled up this question and this answer states :
However what is possible with newer versions of OpenGL is setting the rate at which a certain vertex attribute's buffer offset advances. Effectively this means that the data for a given vertex array gets duplicated to n vertices before the buffer offset for a attribute advances. The function to set this divisor is glVertexBindingDivisor.
(emphasis mine)
Which to me seems as if the answer is claiming I can divide on the number of vertices instead of the number of instances. However, when I look at glVertexBindingDivisor's documentation and compare it to glVertexAttribDivisor's they both appear to refer to the division taking place over instances and not vertices. For example in glVertexBindingDivisor's documentation it states:
glVertexBindingDivisor and glVertexArrayBindingDivisor modify the rate at which generic vertex attributes advance when rendering multiple instances of primitives in a single draw command. If divisor is zero, the attributes using the buffer bound to bindingindex advance once per vertex. If divisor is non-zero, the attributes advance once per divisor instances of the set(s) of vertices being rendered. An attribute is referred to as instanced if the corresponding divisor value is non-zero.
(emphasis mine)
So what is the actual difference between these two functions?
OK, first a little backstory.
As of OpenGL 4.3/ARB_vertex_attrib_binding (AKA: where glVertexBindingDivisor comes from, so this is relevant), VAOs are conceptually split into two parts: an array of vertex formats that describe a single attribute's worth of data, and an array of buffer binding points which describe how to fetch arrays of data (the buffer object, the offset, the stride, and the divisor). The vertex format specifies which buffer binding point its data comes from, so that multiple attributes can get data from the same array (ie: interleaving).
When VAOs were split into these two parts, the older APIs were re-defined in terms of the new system. So if you call glVertexAttribPointer with an attribute index, this function will set the vertex format data for the format at the given index, and it will set the buffer binding state (buffer object, byte offset, etc) for the same index. Now, these are two separate arrays of VAO state data (vertex format and buffer binding); this function is simply using the same index in both arrays.
But since the vertex format and buffer bindings are separate now, glVertexAttribPointer also does the equivalent of saying that the vertex format at index index gets its data from the buffer binding at index index. This is important because that's not automatic; the whole point of vertex_attrib_binding is that a vertex format at one index can use a buffer binding from a different index. So when you're using the old API, it's resetting itself to the old behavior by linking format index to binding index.
Now, what does all that have to do with the divisor? Well, because that thing I just said is literally the only difference between them.
glVertexAttribDivisor is the old-style API for setting the divisor. It takes an attribute index, but it acts on state which is part of the buffer binding point (instancing is a per-array construct, not a per-attribute construct now). This means that the function assumes (in the new system) that the attribute at index fetches its data from the buffer binding point at index.
And what I just said is a bit of a lie. It enforces this "assumption" by directly setting the vertex format to use that buffer binding point. That is, it does the same last step as glVertexAttribPointer did.
glVertexBindingDivisor is the modern function. It is not passed an attribute index; it is passed a buffer binding index. As such, it does not change the attribute's buffer binding index.
So glVertexAttribDivisor is exactly equivalent to this:
void glVertexAttribDivisor(GLuint index, GLuint divisor)
{
glVertexBindingDivisor(index, divisor);
glVertexAttribBinding(index, index);
}
Obviously, glVertexBindingDivisor doesn't do that last part.
So what is the actual difference between these two functions?
Modern OpenGL has two different APIs for specifying vertex attribute arrays and their properties. The traditional glVertexAttribArray and friends, where glVertexAttribDivisor is also part of.
With ARB_vertex_attrib_binding (in core since GL 4.3), a new API was introduced, which separates the vertex format from the pointers. It is expected that switching the data pointers is fast, while switching the vertex format can be more expensive. The new API allows to explictely control both aspects separately, while the old API always sets both at once.
For the new API, a new layer of introduction was introduced: the buffer binding points. (See the OpenGL wiki for more details.) glVertexBindingDivisor specifies the attribute instancing divisor for such a binding point, so it is the conceptual equivalent of the glVertexAttribDivisor function for the new API.
This question is a continuation of this subject :
How to bind thousands of buffers properly
This problem is related to the particle simulation subject.
Let say I need a global structure that includes :
A 3D matrix (32*32*32) of uints (holding header id of a hashed linked list).
A counter that tells me the amount of particles in my hashed linked list.
A hashed linked list of particles.
The first idea was to use a 3D texture for the first item, an atomic counter buffer for the second, and a SSB for the third.
Each entry in the SSB is a particle plus a uint which value points to the location of the next particle in the same voxel.
Nothing magical here.
Now, in order to be space independent (not bound to a unique cubic space) I must be able to pass particles from a cube to others surrounding it. Since I'm in a 3D space, 27 cubes (front) as input variables for physical computation, but also 27 cubes (back) as output since I may write a particle from a cube (front) to an other (back) covering a different part of the space.
This leads me to a requirement in bindings of 54 textures, 54 SSB and 54 atomic counters. While the two firsts may not be a problem (my hardware limit is around 90 for both), the ACB binding limit is 8.
Assuming having a single ACB containing the particle count of each cube is not easy to maintain (I didn't give a long time to the thought, it may be the solution, but this is not the question here).
CONTEXT :
A SSB can contain anything. So a solution would be to concatenate the three structures (header matrix, counter and linked list) inside one SSB that will be my cube super structure.
I need before each pass to know how many particles are inside the SSB to do a proper glDispatchCompute() call.
QUESTION :
Would it be bad to bind the SSB just to read the uint that contains the amount of particles ?
If no, is one of the two methods to access the count better than the other ? Why ?
GLuint value;
//1st method
m_pFunctions->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, m_buffer);
m_pFunctions->glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, OFFSET_TO_VALUE, sizeof(GLuint), &value);
m_pFunctions->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, 0);
//2nd method
m_pFunctions->glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 0, m_buffer, OFFSET_TO_VALUE, sizeof(GLuint));
m_pFunctions->glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, sizeof(GLuint), &value);
m_pFunctions->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, 0);
If no, is there a good way or should I separate the counter from the SSB ?
Even though binding the buffer and getting the counter inside is theoretically possible there is a simpler way to do it by pushing the concept of "everything-in-it-SSB" further.
By reserving the space for 3 consecutive uints in the SSB, we can use their value as dispatch parameters X, Y and Z. The X would still be the amount of particle, but Y and Z would simply be hard set 1s.
Then instead of glDispatchCompute(), call glDispatchComputeIndirect() after binding the proper SSB to the GL_DISPATCH_INDIRECT_BUFFER target.
However, by using part of SSBs as fake atomic counters, the buffers must be prefixed by th type qualifier coherent "to enforce coherency of memory accesses". memory barriers should also be set to achieve visibility between coherent variables.
I am confused about glDrawElements() . I was following a tutorial which said that the 4th argument of the glDrawElements() is "Offset within the GL_ELEMENT_ARRAY_BUFFER". But I was having an error "Access voilation: Trying to read 0x0000", if I passed 0 as offset.
So I dig further into the problem and found that OpenGL Documentation provides two different definitions of the 4th argument:
First:
indices: Specifies a byte offset (cast to a pointer type) into the
buffer bound to GL_ELEMENT_ARRAY_BUFFER to start reading indices
from.
(Found Here: https://www.opengl.org/wiki/GLAPI/glDrawElements)
Second:
indices: Specifies a pointer to the location where the indices are stored.
(Found Here: https://www.opengl.org/sdk/docs/man4/index.php
and Here: http://www.khronos.org/opengles/sdk/docs/man/xhtml/glDrawElements.xml)
Which one is true and How to use it correctly?
EDIT: Here is my code: http://pastebin.com/fdxTMjnC
Both are correct. The 2 cases relate to how the buffer of indices is uploaded to the GL Hardware and how it is used to draw. These are described below:
(1) Without using VBO (Vertex Buffer Object):
In this case, the parameter indicates a pointer to the array of indices. Everytime glDrawElements is called, the buffer is uploaded to GL HW.
(2) Using VBO:
For this case - see the definition of Index as "Specifies a byte offset (cast to a pointer type) into the buffer bound to GL_ELEMENT_ARRAY_BUFFER to start reading indices from". This means that the data is already uploaded separately using glBufferData, and the index is used as an offset only. Everytime glDrawElements is called, the buffer is not uploaded, but only the offset can change if required. This makes it more efficient, especially where large number of vertices are involved.
If you use direct drawing, then
indices defines the offset into the index buffer object (bound to GL_ELEMENT_ARRAY_BUFFER, stored in the VAO) to begin reading data.
That means, you need to create VAO, bind to it, and then use glDrawElements() for rendering
The interpretation of that argument depends on whether an internal array has been bound in the GL state machine. If such an array is bound, it's an offset. If no such array is bound, it's a pointer to your memory.
Using an internal array is more performant, so recent documentation (especially wikis) is strongly biased toward that usage. However, there's a long song and dance to set them up.
The function that binds or unbinds the internal array is glBindVertexArray. Check the documentation for these related functions:
glGenVertexArrays
glBufferData
(this is an incomplete list. I've got to run so I'll have to edit later.)
Whatever is on opengl.org/wiki is most likely to be correct. And I can tell you from working projects that it is indeed defined as follows:
Specifies a byte offset (cast to a pointer type) into the buffer bound to
GL_ELEMENT_ARRAY_BUFFER to start reading indices from.
Just cast some integer to GLvoid* and you should be good to go. That is at least for the modern programmable pipeline, I have no idea for the fixed pipeline (nor should you use it!).
If you are using the modern pipeline, without further information, I would bet on the actual function not being loaded correctly. That being GLEW not initialized correctly or if you're doing things manually trying to use GetProcAddress on win32 (seeing as glDrawElements is core since 1.1, and 1.0/1.1 shouldn't be loaded with GetProcAddress on win32).
And indeed note you should've bound an element buffer to the binding point as others have said. Though in practice you can get away with it without binding a VAO. I've done this on NVidia, AMD and even Intel. But in production you should use VAOs.
I am having a hard time to match up the OpenGL specification (version 3.1, page 27) with common example usage all over the internet.
The OpenGL spec version 3.1 states for DrawElements:
The command
void DrawElements(enum mode, sizei count, enum type, void *indices);
constructs a sequence of geometric primitives by successively transferring the
count elements whose indices are stored in the currently bound element array
buffer (see section 2.9.5) at the offset defined by indices to the GL. The i-th element transferred by DrawElements will be taken from element indices[i] of
each enabled array.
I tend to interpret this as follows:
The indices parameter holds at least count values of type type. Its elements serve as offsets into the actual element buffer. Since for every usage of DrawElements an element buffer must be currently bound, we actually have 2 obligatory sets of indices here: one in the element buffer and another in the indices array.
This would seem somehow wasting for most situations. Unless one has to draw a model which is defined with an element array buffer but needs to sort its elements back to front due to transparency or so. But how would we achieve to render with the plain element array buffer (no sorting) than ?
Now, strange enough, most examples and tutorials in the internet (here,here half page down 'Indexed drawing'.) give a single integer as indices parameter, mostly it is 0. Sometimes (void*)0. It is always only a single integer offset - clearly no array for the indices parameter!
I have used the last variant (giving a single pointerized integer for indices) successfully with some NVIDIA graphics. But I get crashes on Intel on board chips. And I am wondering, who is wrong: me, the spec or thousands of examples. What are the correct parameter and usage of DrawElements? If the single integer is allowed, how does this go along with the spec?
You're tripping over the legacy glDrawElements has ever since OpenGL-1.1. Back then there were no VBOs, but just client side arrays, and the program would actually give a pointer (=array in C terms) of indices into the buffer/array set with the gl…Pointer functions.
Now with index buffers, the parameter is actually just an offset into the server side buffer. You might be very interested in this SO Question: What is the result of NULL + int?
Also I gave an exhaustive answer there, I strongly recommend reading https://stackoverflow.com/a/8284829/524368
What I wrote about function signatures and typecasts also applies to glDraw… calls.
I have read and seen other questions that all generally point to the suggestion to interleav vertex positions and colors, etc into one array, as this minimizes the data that gets sent from cpu to gpu.
What I'm not clear on is how OpenGL does this when, even with an interleaved array, you must still make separate GL calls for position and color pointers. If both pointers use the same array, just set to start at different points in that array, does the draw call not copy the array twice since it was the object of two different pointers?
This is mostly about cache. For example, imagine we have 4 vertex and 4 colors. You can provide the information this way (excuse me but I don't remember the exact function names)
glVertexPointer(..., vertex);
glColorPointer(..., colors);
What it internally does, is read vertex[0], then apply colors[0], then again vertex[1] with colors[1]. As you can see, if vertex is, for example, 20 megs long, vertex[0] and colors[0] will be, to say the least, 20 megabytes apart from each other.
Now, on the other hand, if you provide a structure like { vertex0, color0, vertex1, color1, etc.} there will be a lot of cache hits because, well, vertex0 and color0 are together, and so are vertex1 and color1.
Hope this helps answer the question
edit: on second read, I may not have answered the question. You might probably be wondering how does OpenGL know which values to read from that structure, maybe? Like I said before with a structure such as { vertex, color, vertex, color } you tell OpenGL that vertex is at position 0, with an offset of 2 (so next one will be at position 2, then 4, etc) and color starts at position 1, with an offset of 2 also (so position 1, then 3, etc).
addition: In case you want a more practical example, look at this link http://www.lwjgl.org/wiki/index.php?title=Using_Vertex_Buffer_Objects_(VBO). You can see there how it only provides the buffer once and then uses offsets to render efficiently.
I suggest reading: Vertex_Specification_Best_Practices
h4lc0n provided quite nice explanation, but I would like add some additional info:
interleaved data can actually hurt performance when your data often changes. For instance when you change position of point sprites, you update POS, but COLOR and TEXCOORD are usually the same. Then, when data is interleaved you must "touch" additional data. In that case it would be better to have one VBO for POS only (or in general for data that changes often) and the second VBO for data that is constant.
it is not easy to give strict rules about VBO layout, since it is very vendor/driver specific. Also your usage can be different from others. In general it is needed to make some benchmarks for your particular test cases
You could also make an argument for separating different attributes. Assuming a GPU does not process one vertex after another but rather a bunch (ex. 16) of them in parallel, you would would get something like this while executing a vertex shader:
read attribute A for all 16 vertices
perform some computations
read attribute B for all 16 vertices
perform some more computations
....
So you read one attribute for many vertices at once. From this reasoning it would seem that interleaving the attributes actually hurts the performance. Of cours this would only be visible if you are either bandwidth constrained or if the memory latency cannot be hidden for some reason (ex. a complex shader that requires many registers will reduce the number of vertices that can be in flight at a given time).