I recently re-wrote some code to use Shader Storage Buffer Object in OpenGL, to send in dynamic sized arrays into GLSL, vastly increasing performance when drawing many procedurally generated objects.
What I have is a couple of thousand points, and for each point I render a procedurally generated circular billboard. Each one can in turn have different colors and radius, as well as a few other characteristics (represented as bools or enums)
I fill a vector with these positions, packed together with the radius and color. Then I upload it as a Shader Storage Buffer Object with dynamic size. I create a dummy VAO, containing 0 vbos, but call the draw command with the same amount of points that I have.
Inside the shader, I then iterate through this array, using the gl_VertexID, and generate a quad (two triangles) with texture coordinates, for each point.
I'm looking for a way of doing the same in Vulkan. Is there some way in Vulkan, to pass a dynamic sized array into a shader? Reading about Shader Storage Buffer objects in Graham Seller's Vulkan book, it only mentions them being read-write, but not capable of dynamically sized arrays.
Edit: It seems that storage buffers are in fact capable of dynamic sized arrays, based on Sasha Willems particle example. Is there a way of doing the same thing via uniforms?
I may be misunderstanding your question. SSBOs have identical behavior and functionality between the two APIs. That's why they're named the same. An unbounded array at the end of a storage block will have its length defined at runtime, based on what data you provide.
The size of a buffer descriptor is not hard coded into the descriptor set layout; it's something you set with VkWriteDescriptorSet. Now unlike the offset, you cannot change a descriptor set's size without changing the descriptor itself. That is, you don't have an equivalent to vkCmdBindDescriptorSets's pDynamicOffsets field. So you have to actually update the descriptor in-situ to change the length.
But that just requires double-buffering your descriptor set; it shouldn't be a problem.
Is there a way of doing the same thing via uniforms?
Again, the answer is the same for Vulkan as for OpenGL: no.
Related
I'm working with OpenGL and am not totally happy with the standard method of passing values PER TRIANGLE (or in my case, quads) that need to make it to the fragment shader, i.e., assign them to each vertex of the primitive and pass them through the vertex shader to presumably be unnecessarily interpolated (unless using the "flat" directive) in the fragment shader (so in other words, non-varying per fragment).
Is there some way to store a value PER triangle (or quad) that needs to be accessed in the fragment shader in such a way that you don't need redundant copies of it per vertex? Is so, is this way better than the likely overhead of 3x (or 4x) the data moving code CPU side?
I am aware of using geometry shaders to spread the values out to new vertices, but I heard geometry shaders are terribly slow on non up to date hardware. Is this the case?
OpenGL fragment language supports the gl_PrimitiveID input variable, which will be the index of the primitive for the currently processed fragment (starting at 0 for each draw call). This can be used as an index into some data store which holds per-primitive data.
Depending on the amount of data that you will need per primitive, and the number of primitives in total, different options are available. For a small number of primitives, you could just set up a uniform array and index into that.
For a reasonably high number of primitives, I would suggest using a texture buffer object (TBO). This is basically an ordinary buffer object, which can be accessed read-only at random locations via the texelFetch GLSL operation. Note that TBOs are not really textures, they only reuse the existing texture object interface. Internally, it is still a data fetch from a buffer object, and it is very efficient with none of the overhead of the texture pipeline.
The only issue with this approach is that you cannot easily mix different data types. You have to define a base data type for your TBO, and every fetch will get you the data in that format. If you just need some floats/vectors per primitive, this is not a problem at all. If you e.g. need some ints and some floats per primitive, you could either use different TBOs, one for each type, or with modern GLSL (>=3.30), you could use an integer type for the TBO and reinterpret the integer bits as floating point with intBitsToFloat(), so you can get around that limitation, too.
You can use one element in the vertex array for rendering multiple vertices. It's called instanced vertex attributes.
I am trying to understand these two, how to use them and how they are related. Let's say I want to create a simple terrain and a textured cube. For both objects I have the array of triangles vertices and for the cube I have an array containing the texture's data. My question is: how do I use VAOs and VBOs to create and render these two?
Would I have to create a VAO and VBO for each object?
or should create a VAO for each object's VBO (vertices, texture data, etc.)?
There are many tutorials and books but I still don't get the very idea of how these concepts must be understood and used.
Fundamentally, you need to understand two things:
Vertex Array Objects (VAOs) are conceptually nothing but thin state wrappers.
Vertex Buffer Objects (VBOs) store actual data.
Another way of thinking about this is that VAOs describe the data stored in one or more VBOs.
Think of VBOs (and buffer objects in general) as unstructured arrays of data stored in server (GPU) memory. You can layout your vertex data in multiple arrays if you want, or you can pack them into a single array. In either case, buffer objects boil down to locations where you will store data.
Vertex Array Objects track the actual pointers to VBO memory needed for draw commands.
They are a little bit more sophisticated than pointers as you would know them in a language like C, however. Vertex pointers keep track of the buffer object that was bound when they were specified, the offset into its address space, stride between vertex attributes and how to interpret the underlying data (e.g. whether to keep integer values or to convert them to floating-point [0.0,1.0] by normalizing to the data type's range).
For example, integer data is usually converted to floating-point, but it is the command you use to specify the vertex pointer (glVertexAttribPointer (...) vs. glVertexAttribIPointer (...)) that determines this behavior.
Vertex Array Objects also track the buffer object currently bound to GL_ELEMENT_ARRAY_BUFFER.
GL_ELEMENT_ARRAY_BUFFER is where the command: glDrawElements (...) sources its list of indices from (assuming a non-zero binding) and there is no glElementArrayPointer (...) command. glDrawElements (...) combines the pointer and draw command into a single operation, and will use the binding stored in the active Vertex Array Object to accomplish this.
With that out of the way, unless your objects share vertex data you are generally going to need a unique set of VBOs for each.
You can use a single VAO for your entire software if you want, or you can take advantage of the fact that changing the bound VAO changes nearly the entire set of states necessary to draw different objects.
Thus, drawing your terrain and cube could be as simple as changing the bound VAO. You may have to do more than that if you need to apply different textures to each of them, but the VAO takes care of all vertex data related setup.
Your question is not easily answerable here, but rather in a tutorial. You probably already know these two websites, but if not, I'm leaving the references.
OGLDEV
OpenGL-Tutorial.org
Now trying to elucidate your questions, a Vertex Array Object is an OpenGL object designed with the goal of reducing API overhead for draw calls. You can think of it as a container for a Vertex Buffer and its associated states. Something similar perhaps to the old display-lists.
Normally, there is a 1 to 1 relationship between a VAO and a VBO; that is, each VAO contains a unique VBO. But this is not strictly necessary. You could have several VAOs referencing the same VBO.
The simplest way to model this in code, I think, would be for you to have a VAO class/type and a method to attach a VBO to it. Then give an instance of VAO to each mesh. The mesh in turn can have a reference to a VBO type that may be its own or a shared one.
I have a couple questions about how OpenGL handles these drawing operations.
So lets say I pass OpenGL the pointer to my vertex array. Then I can call glDrawElements with an array of indexes. It will draw the requested shapes using those indexes in the vertex array correct?
After that glDrawElements call could I then do another glDawElements call with another set of indexes? Would it then draw the new index array using the original vertex array?
Does OpenGL keep my vertex data around for the next frame when I redo all of these calls? So the the next vertex pointer call would be a lot quicker?
Assuming the answer to the last three questions is yes, What if I want to do this on multiple vertex arrays every frame? I'm assuming doing this on any more than 1 vertex array would cause OpenGL to drop the last used array from graphics memory and start using the new one. But in my case the vertex arrays are never going to change. So what I want to know is does opengl keep my vertex arrays around in-case next time I send it vertex data it will be the same data? If not is there a way I can optimize this to allow something like this? Basically I want to draw procedurally between the vertexes using indicies without updating the vertex data, in order to reduce overhead and speed up complicated rendering that requires constant procedurally changing shapes that will always use the vertexes from the original vertex array. Is this possible or am I just fantasizing?
If I'm just fantasizing about my fourth question what are some good fast ways of drawing a whole lot of polygons each frame where only a few will change? Do I always have to pass in a totally new set of vertex data for even small changes? Does it already do this anyways when the vertex data doesn't change because I notice I cant really get around the vertex pointer call each frame.
Feel free to totally slam any logic errors I've made in my assertions. I'm trying to learn everything I can about how opengl works and it's entirely possible my current assumptions on how it works are all wrong.
1.So lets say I pass OpenGL the pointer to my vertex array. Then I can call glDrawElements with an array of indexes. It will draw the
requested shapes using those indexes in the vertex array correct?
Yes.
2.After that glDrawElements call could I then do another glDawElements
call with another set of indexes? Would it then draw the new index
array using the original vertex array?
Yes.
3.Does OpenGL keep my vertex data around for the next frame when I redo
all of these calls? So the the next vertex pointer call would be a lot
quicker?
Answering that is a bit more tricky than you might. The way you ask these questions makes me to assume that uou use client-side vertex arrays, that is, you have some arrays in your system memory and let your vertes pointers point directly to those. In that case, the answer is no. The GL cannot "cache" that data in any useful way. After the draw call is finished, it must assume that you might change the data, and it would have to compare every single bit to make sure you have not changed anything.
However, client side VAs are not the only way to have VAs in the GL - actually, they are completely outdated, deprecated since GL3.0 and been removed from modern versions of OpenGL. The modern way of doing thins is using Vertex Buffer Objects, which basically are buffers which are managed by the GL, but manipulated by the user. Buffer objects are just a chunk of memory, but you will need special GL calls to create them, read or write or change data and so on. And the buffer object might very well not be stored in system memory, but directly in VRAM, which is very useful for static data which is used over and over again. Have a look at the GL_ARB_vertex_buffer_object extension spec, which orignially introduced that feature in 2003 and became core in GL 1.5.
4.Assuming the answer to the last three questions is yes, What if I want
to do this on multiple vertex arrays every frame? I'm assuming doing
this on any more than 1 vertex array would cause OpenGL to drop the
last used array from graphics memory and start using the new one. But
in my case the vertex arrays are never going to change. So what I want
to know is does opengl keep my vertex arrays around in-case next time
I send it vertex data it will be the same data? If not is there a way
I can optimize this to allow something like this? Basically I want to
draw procedurally between the vertexes using indicies without updating
the vertex data, in order to reduce overhead and speed up complicated
rendering that requires constant procedurally changing shapes that
will always use the vertexes from the original vertex array. Is this
possible or am I just fantasizing?
VBOs are exactly what you are looking for, here.
5.If I'm just fantasizing about my fourth question what are some good
fast ways of drawing a whole lot of polygons each frame where only a
few will change? Do I always have to pass in a totally new set of
vertex data for even small changes? Does it already do this anyways
when the vertex data doesn't change because I notice I cant really get
around the vertex pointer call each frame.
You can also update just parts of a VBO. However, it might become inefficient if you have many small parts which are randomliy distributed in your buffer, it will be more efficient to update continous (sub-)regions. But that is a topic on it's own.
Yes
Yes
No. As soon as you create a Vertex Buffer Object (VBO) it will stay in the GPU memory. Otherwise vector data needs to be re-transferred (an old method of avoiding this was Display Lists). In both cases the performance of subsequent frames should stay similar (but much better with the VBO method): you can do the VBO creation and download before rendering the first frame.
The VBO was introduced to provide you exactly with this functionality. Just create several VBOs. Things get messy when you need more GPU memory than available though.
VBO is still the answer, and see Modifying only a specific element type of VBO buffer data?
It sounds like you should try something called Vertex Buffer Objects. It offers the same benefits as Vertex Arrays, but you can create multiple vertex buffers and store them in "named slots". This method has much better performance as data is stored directly in Graphic Card memory.
Here is a good tutorial in C++ to start with.
We use buffer objects for reducing copy operations from CPU-GPU and for texture buffer objects we can change target from vertex to texture in buffer objects. Is there any other advantage here of texture buffer objects? Also, it does not allow filtering, is there any disadvantage of this?
A buffer texture is similar to a 1D-texture but has a backing buffer store that's not part of the texture object (in contrast to any other texture object) but realized with an actual buffer object bound to TEXTURE_BUFFER. Using a buffer texture has several implications and, AFAIK, one use-case that can't be mapped to any other type of texture.
Note that a buffer texture is not a buffer object - a buffer texture is merely associated with a buffer object using glTexBuffer.
By comparison, buffer textures can be huge. Table 23.53 and following of the core OpenGL 4.4 spec defines a minimum maximum (i.e. the minimal value that implementations must provide) number of texels MAX_TEXTURE_BUFFER_SIZE. The potential number of texels being stored in your buffer object is computed as follows (as found in GL_ARB_texture_buffer_object):
floor(<buffer_size> / (<components> * sizeof(<base_type>))
The resulting value clamped to MAX_TEXTURE_BUFFER_SIZE is the number of addressable texels.
Example:
You have a buffer object storing 4MiB of data. What you want is a buffer texture for addressing RGBA texels, so you choose an internal format RGBA8. The addressable number of texels is then
floor(4MiB / (4 * sizeof(UNSIGNED_BYTE)) == 1024^2 texels == 2^20 texels
If your implementation supports this number, you can address the full range of values in your buffer object. The above isn't too impressive and can simply be achieved with any other texture on current implementations. However, the machine on which I'm writing this answer supports 2^28 == 268435456 texels.
With OpenGL 4.4 (and 4.3 and possibly with earlier 4.x versions), the MAX_TEXTURE_SIZE is 2 ^ 16 texels per 1D-texture, so a buffer texture can still be 4 times as large. On my local machine I can allocate a 2GiB buffer texture (even larger actually), but only a 1GiB 1D-texture when using RGBAF32 texels.
A use-case for buffer textures is random (and atomic, if desired) read-/write-access (the latter via image load/store) to a large data store inside a shader. Yes, you can do random read-access on arrays of uniforms inside one or multiple blocks but it get's very tedious if you have to process a lot of data and have to work with multiple blocks and even then, looking at the maximum combined size of all uniform components (where a single float component has a size of 4 bytes) in all uniform blocks for a single stage,
MAX_(stage)_UNIFORM_BLOCKS *
MAX_UNIFORM_BLOCK_SIZE +
MAX_(stage)_UNIFORM_COMPONENTS * 4
isn't really a lot of space to work with in a shader stage (depending on how large your implementation allows the above number to be).
An important difference between textures and buffer textures is that the data store, as a regular buffer object, can be used in operations where a texture simply does not work. The extension mentions:
The use of a buffer object to provide storage allows the texture data to
be specified in a number of different ways: via buffer object loads
(BufferData), direct CPU writes (MapBuffer), framebuffer readbacks
(EXT_pixel_buffer_object extension). A buffer object can also be loaded
by transform feedback (NV_transform_feedback extension), which captures
selected transformed attributes of vertices processed by the GL. Several
of these mechanisms do not require an extra data copy, which would be
required when using conventional TexImage-like entry points.
An implication of using buffer textures is that look-ups inside a shader can only be done via texelFetch. Buffer textures also aren't mip-mapped and, as you already mentioned, during fetches there is no filtering.
Addendum:
Since OpenGL 4.3, we have what is called a
Shader Storage Buffer. These too provide random (atomic) read-/write-access to a large data store but don't need to be accessed with texelFetch() or image load/store functions as is the case for buffer textures. Using buffer textures also implies having to deal with gvec4 return values, both with texelFetch() and imageLoad() / imageStore(). This becomes very tedious as soon as you want to work with structures (or arrays thereof) and you don't want to think of some stupid packing scheme using multiple instances of vec4 or using multiple buffer textures to achieve something similar. With a buffer accessed as shader storage, you can simple index into the data store and pull one or more instances of some struct {} directly from the buffer.
Also, since they are very similar to uniform blocks, using them should be fairly straight forward - if you know how to use uniform buffers, you don't have a long way to go learn how to use shader storage buffers.
It's also absolutely worth browsing the Issues section of the corresponding ARB extension.
Performance Implications
Daniel Rakos did some performance analysis years ago, both as a comparison of uniform buffers and buffer textures, and also on a little more general note based on information from AMD's OpenCL programming guide. There is now a very recent version, specifically targeting OpenCL optimization an AMD platforms.
There are many factors influencing performance:
access patterns and resulting caching behavior
cache line sizes and memory layou
what kind of memory is accessed (registers, local, global, L1/L2 etc.) and its respective memory bandwidth
how well memory fetching latency is hidden by doing something else in the meantime
what kind of hardware you're on, i.e. a dedicated graphics card with dedicated memory or some unified memory architecture
etc., etc.
As always when worrying about performance: implement something that works and see if that solutions is fast enough for your needs. Otherwise, implement two or more approaches to solving the problem, profile them and compare.
Also, vendor specific guides can offer a great deal of insight. The above mentioned OpenCL user and optimization guides provide a high-level architectural perspective and specific hints on how to optimize your CL kernels - stuff that's also relevant when developing shaders.
A one use case I have found was to store per primitive attributes (accessed in the fragment shader with help of gl_PrimitiveID) while still maintaining unique vertices in the indexed mesh.
I am implementing a voxel raycaster in OpenGL 4.3.0. I have got a basic version going where I store a 256x256x256 voxel data set of float values in a 3D texture of the same dimensions.
However, I want to make a LOD scheme using an octree. I have the data stored in a 1D array on the host side. The root has index 0, the root's children have indices 1-8, the next level have indices 9-72 and so on. The octree has 9 levels in total (where the last level has the full 256x256x256 resolution). Since the octree will always be full the structure is implicit and there is no need to store pointers, just the one float value per voxel. I have the 1D indexing and traversal algorithms all set.
My problem is that I don't know how to store this in a texture. GL_TEXTURE_MAX_SIZE is way too small (16384) for using the 1D array approach for which I have figured out the indexing. I need to store this in a 3D texture, and I don't know what will happen when I try to cram my 1D array in there, nor do I know how to choose a size and a 1D->3D conversion scheme to not waste space or time.
My question is if someone has a good strategy for storing this whole octree structure in one 3D texture, and in that case how to choose dimensions and indexing for it.
First some words on porting your 1D-array solution directly:
First of all, as Mortennobel says in his comment, the max texture size is very likely not 3397, that's just the enum value of GL_MAX_TEXTURE_SIZE (how should the opengl.h Header, that defines this value, know your hardware and driver limits?). To get the actual value from your implementation use int size; glGetIntegerv(GL_MAX_TEXTURE_SIZE, &size);. But even then this might be too small for you (maybe 8192 or something similar).
But to get much larger 1D arrays into your shaders, you can use buffer textures (which are core since OpenGL 3, and therefore present on DX10 class hardware). Those are textures sourcing their data from standard OpenGL buffer objects. But those textures are always 1D, accessed by integer texCoords (array indices, so to say) and not filtered. So they are effectively not really textures, but a way to access a buffer object as a linear 1D array inside a shader, which is a perfect fit for your needs (and in fact a much better fit than a normal filtered and normalized 1D texture).
EDIT: You might also think about using a straight-forward 3D texture like you did before, but with homemade mipmap levels (yes, a 3D texture can have mipmaps, too) for the higher parts of the hierarchy. So mipmap level 0 is the fine 256 grid, level 1 contains the coarser 128 grid, ... But to work with this data structure effectively, you will probably need explicit LOD texture access in the shader (using textureLod or, even better without filtering, texelFetch), which requires OpenGL 3, too.
EDIT: If you don't have support for OpenGL 3, I would still not suggest to use 3D textures to put your 1D array into, but rather 2D textures, like Rahul suggests in his answer (the 1D-2D index magic isn't really that hard). But if you have OpenGL 3, then I would either use buffer textures for using your linear 1D array layout directly, or a 3D texture with mipmaps for a straight-forward octree mapping (or maybe come up with a completely different and more sophisticated data structure for the voxel grid in the first place).
EDIT: Of course a fully subdivided octree is not really using the memory saving features of octrees to its advantage. For a more dynamic and memory efficient method of packing octrees into 3D textures, you might also take some inspiration from this classic GPU Gems article on octree textures. They basically store all octree cells as 2x2x2 grids arbitrarily into a 3D texture using the interal nodes' values as pointers to the children in this texture. Of course nowadays you can employ all sorts of refinements on this (since it seems you want the internal nodes to store data, too), like storing integers alongside floats and using nice bit encodings and the like, but the basic idea is pretty simple.
Here's a solution sketch/outline:
Use a 2D texture to store your 256x256x256 (it'll be 4096x4096 -- I hope you're using an OpenGL platform that supports 4k x 4k textures).
Now store your 1D data in row-major order. Inside your raycaster, simply do a row/col conversion (from 1D address to 4k x 4k) and look up the value you need.
I trust that you will figure out the rest yourself :)