In one of my previous questions (How to instance draw with different transformations for multiple objects) I asked how to instance draw with different transformations, one person answered that the proper way to do it is using instanced arrays.
This lead me to tutorials where they send transformation data through VAO, which is exactly what I was looking for.
But a new problem arose. Since my objects are dynamic (I want to move them) how do I update the buffer with their transformation?
Most of the tutorials Ive seen usually only render instanced objects once and thus they have no need to update the buffer. For a fact I wouldnt even know how to update a buffer to begin with, as I declare VAO with mesh at the beginning and it is not changed during the runtime of program.
What I think I should be doing: Store the transformations on CPU side in some array, when I do something which results in changing a specific transformation I will update this array and then update the transformation buffer.
Probably the actual question:
How do I update a buffer during the runtime of program?
Just use glBufferSubData to update the corresponding data on your GPU, from which your shaders get the transformations per instance (e.g. the vertex buffer for which the divisor is set to 1 with glVertexAttribDivisor(idx, 1)).
The other possibility would be to use glMapBuffer or glMapBufferRange to update that buffer.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Nowadays I'm hearing from different places about the so called GPU driven rendering which is a new paradigm of rendering which doesn't require draw calls at all, and that it is supported by the new versions of OpenGL and Vulkan APIs. Can someone explain how it actually works on conceptual level and what are the main differences with the traditional approach?
Overview
In order to render a scene, a number of things have to happen. You need to walk your scene graph to figure out which objects exist. For each object which exists, you now need to determine if it is visible. For each object which is visible, you need to figure out where its geometry is stored, which textures and buffers will be used to render that object, which shaders to use to render the object, and so forth. Then you render that object.
The "traditional" method handling this is for the CPU to handle this process. The scene graph lives in CPU-accessible memory. The CPU does visibility culling on that scene graph. The CPU takes the visible objects and access some CPU data about the geometry (OpenGL buffer object and texture names, Vulkan descriptor sets and VkBuffers, etc), shaders, etc, transferring this as state data to the GPU. Then the CPU issues a GPU command to render that object with that state.
Now, if we go back farther, the most "traditional" method doesn't involve a GPU at all. The CPU would just take this mesh and texture data, do vertex transformations, rasterizatization, and so forth, producing an image in CPU memory. However, we started off-loading some of this to a separate processor. We started with the rasterization stuff (the earliest graphics chips were just rasterizers; the CPU did all the vertex T&L). Then we incorporated the vertex transformations into the GPU. When we did that, we started having to store vertex data in GPU accessible memory so the GPU could read it on its own time.
We did all of that, off-loading these things to a separate processor for two reasons: the GPU was (much) faster at it, and the CPU can now spend its time doing something else.
GPU driven rendering is just the next stage in that process. We went from no GPU, to rasterization GPU, to vertex GPU, and now to scene-graph-level GPU. The "traditional" method offloads how to render to the GPU; GPU driven rendering offloads the decision of what to render.
Mechanism
Now, the reason we haven't been doing this all along is because the basic rendering commands all take data that comes from the CPU. glDrawArrays/Elements takes a number of parameters from the CPU. So even if we used the GPU to generate that data, we would need a full GPU/CPU synchronization so that the CPU could read the data... and give it right back to the GPU.
That's not helpful.
OpenGL 4 gave us indirect rendering of various forms. The basic idea is that, instead of taking those parameters from a function call, they're just data stored in GPU memory. The CPU still has to make a function call to start the rendering operation, but the actual parameters to that call are just data stored in GPU memory.
The other half of that requires the ability of the GPU to write data to GPU memory in a format that indirect rendering can read. Historically, data on GPUs goes in one direction: data gets read for the purpose of being converted into pixels in a render target. We need a way to generate semi-arbitrary data from other arbitrary data, all on the GPU.
The older mechanism for this was to (ab)use transform feedback for this purpose, but nowadays we just use SSBOs or failing that, image load/store. Compute shaders help here as well, since they are designed to be outside of the standard rendering pipeline and therefore are not bound to its limitations.
The ideal form of GPU-driven rendering makes the scene-graph part of the rendering operation. There are lesser forms, such as having the GPU do nothing more than per-object viewport culling. But let's look at the most ideal process. From the perspective of the CPU, this looks like:
Update the scene graph in GPU memory.
Issue one or more compute shaders that generate multi-draw indirect rendering commands.
Issue a single multi-draw indirect call that draws everything.
Now of course, there's no such thing as a free lunch. Doing full scene graph processing on the GPU requires building your scene graph in a way that is efficient for GPU processing. Even more importantly, visibility culling mechanisms have to be engineered with efficient GPU processing in mind. That's complexity I'm not going to address here.
Implementation
Instead, let's look at the nuts-and-bolts of making the drawing part work. We have to sort out a lot of things here.
See, the indirect rendering command is still a regular old rendering command. While the multi-draw form draws multiple distinct "objects", it's still one CPU rendering command. This means that, for the duration of this command, all rendering state is fixed.
So everything under the purview of this multi-draw operation must use the same shader, bound buffers&textures, blending parameters, stencil state, and so forth. This makes implementing a GPU-driven rendering operation a bit complicated.
State and Shaders
If you need blending, or similar state-based differences in rendering operations, then you are going to have to issue another rendering command. So in the blending case, your scene-graph processing is going to have to compute multiple sets of rendering commands, with each set being for a specific set of blending modes. You may also need to have this system sort transparent objects (unless you're rendering them with an OIT mechanism). So instead of having just one rendering command, you have a small number of them.
But the point of this exercise however isn't to have only one rendering command; the point is that the number of CPU rendering commands does not change with regard to how much stuff you're rendering. It shouldn't matter how many objects are in the scene; the CPU will be issuing the same number of rendering commands.
When it comes to shaders, this technique requires some degree of "ubershader" style: where you have a very few number of rather flexible shaders. You want to parameterize your shader rather than having dozens or hundreds of them.
However things were probably going to fall out that way anyway, particularly with regard to deferred rendering. The geometry pass of deferred renderers tends to use the same kind of processing, since they're just doing vertex transformation and extracting material parameters. The biggest difference usually is with regard to doing skinned vs. non-skinned rendering, but that's really only 2 shader variations. Which you can handle similarly to the blending case.
Speaking of deferred rendering, the GPU driven processes can also walk the graph of lights, thus generating the draw calls and rendering data for the lighting passes. So while the lighting pass will need a separate draw call, it will still only need a single multidraw call regardless of the number of lights.
Buffers
Here's where things start to get interesting. See, if the GPU is processing the scene graph, that means that the GPU needs to somehow associate a particular draw within the multi-draw command with the resources that particular draw needs. It may also need to put the data into those resources, like the matrix transforms for a given object and so forth.
Oh, and you also somehow need to tie the vertex input data to a particular sub-draw.
That last part is probably the most complicated. The buffers which OpenGL/Vulkan's standard vertex input method pull from are state data; they cannot change between sub-draws of a multi-draw operation.
Your best bet is to try to put every object's data in the same buffer object, using the same vertex format. Essentially, you have one gigantic array of vertex data. You can then use the drawing parameters for the sub-draw to select which parts of the buffer(s) to use.
But what do we do about per-object data (matrices, etc), things you would typically use a UBO or global uniform for? How do you effectively change the buffer binding state within a CPU rendering command?
Well... you can't. So you cheat.
First, you realize that SSBOs can be arbitrarily large. So you don't really need to change buffer binding state. What you need is a single SSBO that contains everyone's per-object data. For each vertex, the VS simply needs to pick out the correct data for that sub-draw from the giant list of data.
This is done via a special vertex shader input: gl_DrawID. When you issue a multi-draw command, the VS gets an input value that represents the index of this sub-draw operation within the multidraw command. So you can use gl_DrawID to index into a table of per-object data to fetch the appropriate data for that particular object.
This also means that the compute shader which generates this sub-draw also needs use the index of that sub-draw to define where in the array to put the per-object data for that sub-draw. So the CS that writes a sub-draw also needs to be responsible for setting up the per-object data that matches the sub-draw.
Textures
OpenGL and Vulkan have pretty strict limits on the number of textures that can be bound. Well actually those limits are quite large relative to traditional rendering, but in GPU driven rendering land, we need a single CPU rendering call to potentially access any texture. That's harder.
Now, we do have gl_DrawID; coupled with the table mentioned above, we can retrieve per-object data. So: how do we convert this to a texture?
There are multiple ways. We could put a bunch of our 2D textures into an array texture. We can then use gl_DrawID to fetch an array index from our SSBO of per-object data; that array index becomes the array layer we use to fetch "our" texture. Note that we don't use gl_DrawID directly because multiple different sub-draws could use the same texture, and because the GPU code that sets up the array of draw calls does not control the order in which textures appear in our array.
Array textures have obvious downsides, the most notable of which is that we must respect the limitations of an array texture. All elements in the array must use the same image format. They must all be of the same size. Also, there are limits on the number of array layers in an array texture, so you might encounter them.
The alternatives to array textures differ along API lines, though they basically boil down to the same thing: convert a number into a texture.
In OpenGL land, you can employ bindless texturing (for hardware that supports it). This system provides a mechanism that allows one to generate a 64-bit integer handle which represents a particular texture, pass this handle to the GPU (since it is just an integer, use whatever mechanism you want), and then convert this 64-bit handle into a sampler type. So you use gl_DrawID to fetch a 64-bit handle from the per-object data, then convert that into a sampler of the appropriate type and use it.
In Vulkan land, you can employ sampler arrays (for hardware that supports it). Note that these are not array textures; in GLSL, these are sampler types which are arrayed: uniform sampler2D my_2D_textures[6000];. In OpenGL, this would be a compile error because each array element represents a distinct bind point for a texture, and you cannot have 6000 distinct bind points. In Vulkan, an arrayed sampler only represents a single descriptor, no matter how many elements are in that array. Vulkan implementations have limits on how many elements there can be in such arrays, but hardware that supports the feature you need to employ this (shaderSampledImageArrayDynamicIndexing) will typically offer a generous limit.
So your shader uses gl_DrawID to get an index from the per-object data. The index is turned into a sampler by just fetching the value from the sampler array. The only limitation for textures in that arrayed descriptor is that they must all be of the same type and basic data format (floating-point 2D for sampler2D, unsigned integer cubemap for usamplerCube, etc). The specifics of formats, texture sizes, mipmap counts, and the like are all irrelevant.
And if you're concerned about the cost difference of Vulkan's array of samplers compared to OpenGL's bindless, don't be; implementations of bindless are just doing this behind your back anyway.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am working on a little project in C++ using OpenGL. I want to be able to render multiple 2D rectangles either in some color or with texture as efficiently as possible.
However, I am doing this in modern C++, so I am writing many wrappers over OpenGL API so that I have better logical structure, RAII, ...
In this spirit I wanted to create a class Rectangle, which would have a method draw() and this method would activate corresponding OpenGL context and call glDrawArrays(). This works fine, but then I realized, that if I wanted to render more rectangles, I would have to cycle through many instances. They would each switch context and I don't think this is an effective solution.
After a bit of thinking my solution was to create a Renderer object, which would hold a single VAO for all Rectangles, associated program, and huge buffer in which I would hold all coordinates for my objects (Rectangle instance would then be like a smarter pointer to this buffer) and then draw them all at once. Of course it would add me a lot of work with managing the buffer itself (adding/removing rectangles). Would it be better?
Also, do you have any tips what else should I focus on?
In general you want to minimize the number of drawing calls. Placing a lot of geometry data in a single buffer object and batching it all up into a single glDraw… call is definitely the way to go. The best way to go about this is to not think things drawn using OpenGL being individual objects (there is no spoon) but just patches of colour, which by happenstance appear to be rectangles, boxes, spheres, spoons…
A way to implement this using C++ idioms is to have a buffer object class, which internally manages chunks of data contained inside a buffer object. Keep in mind that buffer objects by themself are pretty much formless, and only by using them as a source for vertex attributes they get meaning. Then your individual rectangles would each allocate from such a buffer object instance; the instance in return could deliver the rectangle attributes' indices relative to a certain internal offset (what's passed to glVertexAttribPointer); it actually kind of makes sense to have a buffer object class, a object attribute view class (which manages the attribute pointers) and the actual geometry classes which operate on attribute views.
In preparation of the actual drawing call the geometry instances would just emit the instances of their respective vertices; concatenate them up and use that for a glDrawElements call.
If your geometry is not changing then best way is to create VAO with one VBO for geometry of renctangle, one vbo with transformations for multiple rectangles you may draw and one vbo for texture coordinates and do instancing you remove lot of trafic from cpu to gpu by this.Also try to cache uniforms and dont set their value every rendering call if the values are not changing. Try to use tool like gltrace and see if you can reduce unnecessary state changes. Collect as much data as possible and then only do rendering call.
This question comes in two (mostly) independent parts
My current setup is that I have a lot of Objects in gamespace. Each has a VBO assigned to it, which holds Vertex Attribute data for each vertex. If the Object wants to change its vertex data (position etc) it does so in an internal array and then call glBufferSubDataARB to update the version in the GPU.
Now I understand that this is a horrible thing to do and so I am looking for alternatives. One that presents itself is to have some managing thing that has a large VBO in the beginning and Objects can request space from it, and edit points in it. This drops the overhead of loading VBOs but comes with a large energy/time expenditure in creating and debugging such a beast (basically an entire memory management system).
My question (part (a)) is if this is the "best" method for doing this, or if there is something better that I have not thought of.
Such a system should allow easy addition/removal of vertices and editing them, as fast as possible.
Part (b) is about some simple actions taken on every object, ie those of rotation and translation. At the moment I am moving each vertex (ouch), but this must have a better option. I am considering uploading rotation and translation matrices to my shader to do there. This seems fine, but I am slightly worried about the overhead of changing uniform variables. Would it ultimately be to my advantage to do this? How fast is changing uniform variables?
Last time I checked the preferred way to do buffer updates was orphaning.
Basically, whenever you want to update your buffered data, you call glBindBuffer on your buffer, which invalidates the current content of the buffer, and then you write your new data with glMapBuffer / glBufferSubdata.
Hence:
Use a single big VBO for your static data is indeed a good idea. You must take care of the maximum allowed VBO size, and split your static data into multiple VBOs if necessary. But this is probably an over-optimization in most cases (i.e. "I wouldn't bother").
Data which is updated frequently should be grouped in the same VBO (with usage = GL_STREAM_DRAW), and you shall use orphaning to update that.
Unfortunately, the actual performance of this stuff varies on different implementations. This guy made some tests on an actual game, it may be worth reading.
For the second part of your question, obviously using uniforms is the way to do it. Yes there is some (little) overhead, but it's sure 1000 times better than streaming all your data at every frame.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
What are the fastest and the most flexible (applicable in most situations) ways to use VBOs?
I am developing an openGL application and i want it to get to the best performance, so i need someone to answer these questions. I read many questions and anwers but i guess there's to much information that i don't need which messes up my brain...
How many vbos should i use?
How should i create vbos?
How should i update vbos data, if the data size is not fixed?
How should i render vbos?
How should i deal with data in vbos that i don't want to render anymore?
How many vbos should i use?
As few as possible. Switching VBOs comes with a small, but measureable cost. In general you'll try to group similar data into VBOs. For example in a FPS game all the different kinds of garbage lying on the street, small props, etc., will be usually located in the same or only a small number of VBOs.
It also comes down to drawing batch sizes. glDraw… calls which render less than about 100 primitives are suboptimal (this has always been the case, even 15 years ago). So you want to batch at least 100 primitives where possible. But if a single mesh has only, say 20 triangles (low polycount props for instancing or such), each in its own VBO you can no longer batch more.
How should i create vbos?
glGenBuffers → glBindBuffer → glBufferData
UPDATE You can pass a null pointer to the data parameter of glBufferData to initialize the buffer object without setting the data.
How should i update vbos data, if the data size is not fixed?
Create VBOs with a coarser size granularity than your data size is. Your operating system is doing this anyway for your host side data, it's called paging. Also if you want to use glMapBuffer making the buffer object a multiple of the host page size is very nice to the whole system.
The usual page size on current systems is 4kiB. So that's the VBO size granularity I'd choose. UPDATE: You can BTW ask your operating system which page size it is using. That's OS dependent though, I'd ask another question for that.
Update the data using glBufferSubData or map it with glMapBuffer modify in the host side mapped memory, then glUnmapBuffer.
If the data outgrows the buffer object, create a new, larger one and copy with glCopyBufferSubData. See the lase paragraph.
How should i render vbos?
glBindBuffer → glDraw…
How should i deal with data in vbos that i don't want to render anymore?
If the data consumes only a part of the VBO and shares it with other data and you're not running out of memory then, well, just don't access it. Ideally you keep around some index in which you keep track of which VBO has which parts of it available for what kind of task. This is very much like memory management, specifically a scheme known as object stacks (obstacks).
However eventually it may make sense to compactify an existing buffer object. For this you'd create a new buffer object, bind it as writing target, with the old buffer object being selected as reading target. Then use glCopyBufferSubData to copy the contents into a new, tightened buffer object. Of course you will then have to update all references to buffer object name (=OpenGL ID) and offsets.
For this reason it makes sense to write a thin abstraction layer on top of OpenGL buffer objects that keeps track of the actual typed data within the structureless blobs OpenGL buffer objects are.
How many vbo's should i use?
As many as you need, sounds silly but well, its that way
How should i update vbos data, if the data size is not fixed?
Overwrite and render the same VBO with different data and lengths.
How should i create vbos?
How should i render vbos?
see VBO tutorial
How should i deal with data in vbos that i don't want to render anymore?
create a new vbo and copy the data into it or render only parts of that vbo which is in memory.
To render only parts see glVertexBuffer glTexCoordPointer(just calculate the new pointer and new size based on the offset)
Edit 1:
Using a single VBO for everything feels wrong, because you have to manage the allcation of new vertex positions/texture coordinates yourself which gets really messy really fast.
It is better to group small props into VBO's and batch the drawing commands.
Can i add data to a buffer with glBufferSubData (adding 100 elements to a buffer with size x)?
No, its not possible because the description says updates a subset of a buffer object's data store and a subset is a smaller set inside a set.
Edit 2
A good tutorial is also Learning Modern 3D Graphics Programming but its not VBO specific.
So I understand how to use a Vertex Buffer Object, and that it offers a large performance increase over immediate mode drawing. I will be drawing a lot of 2D quads (sprites), and I am wanting to know if I am supposed to create a VBO for each one, or create one VBO to hold all of the data?
You shouldn't use a new VBO for each sprite/quad. So putting them all in a single VBO would be the better solution in your case.
But in general i don't think this can be answered in one sentence.
Creating a new VBO for each Quad won't give you a real performance increase. If you do so, a lot of time will be wasted with glBindBuffer calls for switching the VBOs. However if you create VBOs that hold too much data you can run into other problems.
Small VBOs:
Are often easier to handle in your program code. You can use a new VBO for each Object you render. This way you can manage your objects very easy in your world
If VBOs are too small (only a few triangles) you don't gain much benefit. A lot of time will be lost with switching buffers (and maybe switching shaders/textures) between buffers
Large VBOs:
You can render tons of objects with a single drawArrays() call for best performance.
Depending on your data its possible that you create overhead for managing a lot of objects in a single VBO (what if you want to move one of these objects and rotate another object?).
If your VBOs are too large its possible that they can't be moved into VRAM
The following links can help you:
Vertex Specification Best Practices
Use one (or a small number of) VBO(s) to hold all/most of your geometry.
Generally the fewer API calls it takes to render your scene, the better.
It also depends what d you want to do with those sprites?
Are they dynamic? Do you want to change only the centre of quad or maybe modify all four points?
This is important because if your data are dynamic then, in the simplest way, you will have to transfer from cpu to gpu each frame. Maybe you could perform all computation on the GPU - for instance by using geometry shaders?
Also for very simple quads/sprites one can use GL_POINT_SPRITE. With that one need to send only one vertex for whole quad. But the drawback is that it is hard to rotate it...