OpenGL single VBO vs multiple VBOs - c++

Currently, in my rendering engine, I have a VBO for each mesh data (1 VBO for vertices, 1 VBO for normals, 1 VBO for texture coordinates, 1 VBO for tangents and 1 VBO for bitangents) and all of them are bound together with a VAO.
I'm now thinking about changing the system to hold a single VBO containing all the mesh data (vertices, normals, etc.) but how mush will I gain from this? Speaking about speed and utility (because I may not have all the data and provide only vertices and normals if my mesh isn't textured).

You'll be seeking to reduce overall memory bandwidth. If your buffer object contains all of your attributes interleaved together then that means that your entire array object references only one single contiguous section of memory. Which is much easier for a memory subsystem to cache. It's exactly the same principle as for CPUs — the more local consecutive memory accesses are, the faster they're likely to be.
There is also a potential detriment: the general rule is that you should align your elements to whichever is the greater of the size of the element and four bytes. Which leads to some wasted space. But the benefit almost always outweighs the detriment.
Obviously the only thing that will be affected is the time the GPU takes to fetch the vertices. If you're tessellation or fill bound you won't immediately see any improvement.


How to efficiently align vertices in VBO for indexed rendeing?

I have a VBO of 1 050 625 vertices representing a height map. I draw the mesh with GL_TRIANGLE_STRIPS by frustum-culled chunks of 32*32 cells with indexed rendering.
Should I care about how my vertices are aligned in the VBO in terms of performance? I mean is there any information about how distance between different elements affects performance, like: [100,101,102] or [10,1017,2078]?
Distance between indices affects the memory positions to be read from. The affection is related to cached memory. If the position is not in the current cache it must be read from main memory.
At least theorically. In practice, it depends on hardware and driver implementantion. Cache size and bus speed have influence.
As a point to start from, anything with size below a few MB should be the quickest solution.
Anyhow, when performance is a matter, the true way of knowing about it is benchmarking different options, in different hardware if possible.

reduced vertex buffer with indexed triangles

In my OpenGL program have a huge vertex buffer with data (normals,position,texcoords) for 2048x2048 points.
In each frame i reduce my indexed buffer with a LOD algorithm and bind GL_ELEMENT_ARRAY_BUFFER again.
I wonder if it makes sense to additionally reduce the vertex buffer as well, so that it only contains the used vertices from the index buffer.
So the question is, if there is a performance gain even with rebuilding and rebinding the vertex array per frame.

Use of Vertex Array Objects and Vertex Buffer Objects

I am trying to understand these two, how to use them and how they are related. Let's say I want to create a simple terrain and a textured cube. For both objects I have the array of triangles vertices and for the cube I have an array containing the texture's data. My question is: how do I use VAOs and VBOs to create and render these two?
Would I have to create a VAO and VBO for each object?
or should create a VAO for each object's VBO (vertices, texture data, etc.)?
There are many tutorials and books but I still don't get the very idea of how these concepts must be understood and used.
Fundamentally, you need to understand two things:
Vertex Array Objects (VAOs) are conceptually nothing but thin state wrappers.
Vertex Buffer Objects (VBOs) store actual data.
Another way of thinking about this is that VAOs describe the data stored in one or more VBOs.
Think of VBOs (and buffer objects in general) as unstructured arrays of data stored in server (GPU) memory. You can layout your vertex data in multiple arrays if you want, or you can pack them into a single array. In either case, buffer objects boil down to locations where you will store data.
Vertex Array Objects track the actual pointers to VBO memory needed for draw commands.
They are a little bit more sophisticated than pointers as you would know them in a language like C, however. Vertex pointers keep track of the buffer object that was bound when they were specified, the offset into its address space, stride between vertex attributes and how to interpret the underlying data (e.g. whether to keep integer values or to convert them to floating-point [0.0,1.0] by normalizing to the data type's range).
For example, integer data is usually converted to floating-point, but it is the command you use to specify the vertex pointer (glVertexAttribPointer (...) vs. glVertexAttribIPointer (...)) that determines this behavior.
Vertex Array Objects also track the buffer object currently bound to GL_ELEMENT_ARRAY_BUFFER.
GL_ELEMENT_ARRAY_BUFFER is where the command: glDrawElements (...) sources its list of indices from (assuming a non-zero binding) and there is no glElementArrayPointer (...) command. glDrawElements (...) combines the pointer and draw command into a single operation, and will use the binding stored in the active Vertex Array Object to accomplish this.
With that out of the way, unless your objects share vertex data you are generally going to need a unique set of VBOs for each.
You can use a single VAO for your entire software if you want, or you can take advantage of the fact that changing the bound VAO changes nearly the entire set of states necessary to draw different objects.
Thus, drawing your terrain and cube could be as simple as changing the bound VAO. You may have to do more than that if you need to apply different textures to each of them, but the VAO takes care of all vertex data related setup.
Your question is not easily answerable here, but rather in a tutorial. You probably already know these two websites, but if not, I'm leaving the references.
Now trying to elucidate your questions, a Vertex Array Object is an OpenGL object designed with the goal of reducing API overhead for draw calls. You can think of it as a container for a Vertex Buffer and its associated states. Something similar perhaps to the old display-lists.
Normally, there is a 1 to 1 relationship between a VAO and a VBO; that is, each VAO contains a unique VBO. But this is not strictly necessary. You could have several VAOs referencing the same VBO.
The simplest way to model this in code, I think, would be for you to have a VAO class/type and a method to attach a VBO to it. Then give an instance of VAO to each mesh. The mesh in turn can have a reference to a VBO type that may be its own or a shared one.

What is the purpose of OpenGL texture buffer objects?

We use buffer objects for reducing copy operations from CPU-GPU and for texture buffer objects we can change target from vertex to texture in buffer objects. Is there any other advantage here of texture buffer objects? Also, it does not allow filtering, is there any disadvantage of this?
A buffer texture is similar to a 1D-texture but has a backing buffer store that's not part of the texture object (in contrast to any other texture object) but realized with an actual buffer object bound to TEXTURE_BUFFER. Using a buffer texture has several implications and, AFAIK, one use-case that can't be mapped to any other type of texture.
Note that a buffer texture is not a buffer object - a buffer texture is merely associated with a buffer object using glTexBuffer.
By comparison, buffer textures can be huge. Table 23.53 and following of the core OpenGL 4.4 spec defines a minimum maximum (i.e. the minimal value that implementations must provide) number of texels MAX_TEXTURE_BUFFER_SIZE. The potential number of texels being stored in your buffer object is computed as follows (as found in GL_ARB_texture_buffer_object):
floor(<buffer_size> / (<components> * sizeof(<base_type>))
The resulting value clamped to MAX_TEXTURE_BUFFER_SIZE is the number of addressable texels.
You have a buffer object storing 4MiB of data. What you want is a buffer texture for addressing RGBA texels, so you choose an internal format RGBA8. The addressable number of texels is then
floor(4MiB / (4 * sizeof(UNSIGNED_BYTE)) == 1024^2 texels == 2^20 texels
If your implementation supports this number, you can address the full range of values in your buffer object. The above isn't too impressive and can simply be achieved with any other texture on current implementations. However, the machine on which I'm writing this answer supports 2^28 == 268435456 texels.
With OpenGL 4.4 (and 4.3 and possibly with earlier 4.x versions), the MAX_TEXTURE_SIZE is 2 ^ 16 texels per 1D-texture, so a buffer texture can still be 4 times as large. On my local machine I can allocate a 2GiB buffer texture (even larger actually), but only a 1GiB 1D-texture when using RGBAF32 texels.
A use-case for buffer textures is random (and atomic, if desired) read-/write-access (the latter via image load/store) to a large data store inside a shader. Yes, you can do random read-access on arrays of uniforms inside one or multiple blocks but it get's very tedious if you have to process a lot of data and have to work with multiple blocks and even then, looking at the maximum combined size of all uniform components (where a single float component has a size of 4 bytes) in all uniform blocks for a single stage,
isn't really a lot of space to work with in a shader stage (depending on how large your implementation allows the above number to be).
An important difference between textures and buffer textures is that the data store, as a regular buffer object, can be used in operations where a texture simply does not work. The extension mentions:
The use of a buffer object to provide storage allows the texture data to
be specified in a number of different ways: via buffer object loads
(BufferData), direct CPU writes (MapBuffer), framebuffer readbacks
(EXT_pixel_buffer_object extension). A buffer object can also be loaded
by transform feedback (NV_transform_feedback extension), which captures
selected transformed attributes of vertices processed by the GL. Several
of these mechanisms do not require an extra data copy, which would be
required when using conventional TexImage-like entry points.
An implication of using buffer textures is that look-ups inside a shader can only be done via texelFetch. Buffer textures also aren't mip-mapped and, as you already mentioned, during fetches there is no filtering.
Since OpenGL 4.3, we have what is called a
Shader Storage Buffer. These too provide random (atomic) read-/write-access to a large data store but don't need to be accessed with texelFetch() or image load/store functions as is the case for buffer textures. Using buffer textures also implies having to deal with gvec4 return values, both with texelFetch() and imageLoad() / imageStore(). This becomes very tedious as soon as you want to work with structures (or arrays thereof) and you don't want to think of some stupid packing scheme using multiple instances of vec4 or using multiple buffer textures to achieve something similar. With a buffer accessed as shader storage, you can simple index into the data store and pull one or more instances of some struct {} directly from the buffer.
Also, since they are very similar to uniform blocks, using them should be fairly straight forward - if you know how to use uniform buffers, you don't have a long way to go learn how to use shader storage buffers.
It's also absolutely worth browsing the Issues section of the corresponding ARB extension.
Performance Implications
Daniel Rakos did some performance analysis years ago, both as a comparison of uniform buffers and buffer textures, and also on a little more general note based on information from AMD's OpenCL programming guide. There is now a very recent version, specifically targeting OpenCL optimization an AMD platforms.
There are many factors influencing performance:
access patterns and resulting caching behavior
cache line sizes and memory layou
what kind of memory is accessed (registers, local, global, L1/L2 etc.) and its respective memory bandwidth
how well memory fetching latency is hidden by doing something else in the meantime
what kind of hardware you're on, i.e. a dedicated graphics card with dedicated memory or some unified memory architecture
etc., etc.
As always when worrying about performance: implement something that works and see if that solutions is fast enough for your needs. Otherwise, implement two or more approaches to solving the problem, profile them and compare.
Also, vendor specific guides can offer a great deal of insight. The above mentioned OpenCL user and optimization guides provide a high-level architectural perspective and specific hints on how to optimize your CL kernels - stuff that's also relevant when developing shaders.
A one use case I have found was to store per primitive attributes (accessed in the fragment shader with help of gl_PrimitiveID) while still maintaining unique vertices in the indexed mesh.

OpenGL vertex buffer confusion

Would someone care to explain the difference to be between a VertexBuffer, a VertexArray, a VertexBufferObject, and a VertexArrayObject? I'm not even sure if these are all terms for different things, but I've seen all of them appear in the OpenGL spec.
I know that a VertexBuffer simply contains vertices and nothing else, once bound, and once I've set the vertex pointers, I can use DrawArrays to draw it. I've done it this way many times.
I am using what I think is a VertexArray, which stores the state of any vertex buffers that are set, and also any vertex pointers. Binding a VertexArray automatically binds the vertex buffer and sets the vertex pointers. I have used this (mostly) successfully too.
But what is a VertexBufferObject, and a VertexArrayObject? Are they better? Doesn't VertexArray give me everything I need?
A vertex array is simply some data in your program (inside your address space) that you tell OpenGL about by providing a pointer to it.
While more efficient than specifying every single vertex individually, they still have performance issues. The GL must make a copy at the time you call DrawElements (or a similar function), because that is the only time it can be certain that the data is valid (after all, nothing prevents you from overwriting the data right away). This means that there is a significant hindrance to parallelism, and thus a performance issue.
Vertex buffer objects ("vertex buffers") are raw blocks of data that you do not own, i.e. they are not in your address space. You can either copy data into the buffer object with Copy(Sub)Data or by temporarily mapping it to your address space. Once you unmap the buffer, it does no longer belong to you. The huge advantage is that now the GL can decide what to do with it, and when to upload it. It knows that the data will be valid, because you cannot access it. This makes CPU/GPU parallelism a lot easier.
Vertex array abjects are a bit of a misnomer. There are no vertices or arrays in them. They are merely a kind of "state description block" which encapsulate the bindings of one or several vertex buffer objects (including any VertexAttribPointer calls). As such, they are both a convenience function and somewhat more efficient (fewer function calls), but not strictly necessary. You could do anything that a VAO does by hand, too.
BufferObject: a GPU allocated memory buffer
Vertex Buffer Object: a BufferObject containing vertices informations (colors, position, custom data used by a shader, ...)
Pixel Buffer Object: a BufferObject containing pixel or texel informations. Mainly used to upload textures.
Element Buffer Object: a BufferObject containing indices (used by glDrawElements).
Vertex Array: memory used by gl*Pointer call. Might be host memory or a Vertex Buffer Object if it is bound using glBindBuffer command with GL_ARRAY_BUFFER.
Element Array: memory used by glDrawElements call. Might be host memory or an Element Buffer Object if it is bound using glBindBuffer command with GL_ELEMENT_ARRAY_BUFFER.