What's the best way to keep adding new points to glBufferData? - c++

I am making an experiment with OpenGL to find what's the best/most efficient way to very frequently add new data to glBufferData.
To do this, I wrote a small 2D paint program and simply keep adding points when I move the mouse.
The whole function looks like this:
void addPoint(double x, double y)
{
glBindBuffer(GL_ARRAY_BUFFER, pointVertBuffObj);
if (arrayOfPointCapacity < numOfPoints + 1) {
U32 size = (arrayOfPointCapacity + 8) * sizeof(Point2);
Point2 *tmp = (Point2*)realloc(arrayOfPoints, size);
arrayOfPoints = tmp;
arrayOfPointCapacity += 8;
}
arrayOfPoints[numOfPoints].x = x,
arrayOfPoints[numOfPoints].y = y;
U32 offset = numOfPoints * sizeof(Point2);
glBufferData(GL_ARRAY_BUFFER, numOfPoints * sizeof(Point2), arrayOfPoints, GL_DYNAMIC_DRAW);
numOfPoints++;
}
Having to reset glBufferData with new data each time I add a point seems absolutely crazy. I thought about using glBufferData to allocate a large array of points and setting these points up with glBufferSubData. When the size of the buffer becomes too small, then I call glBufferData again increasing the size of the buffer, and copying back existing points to it.
Ideally, I would prefer to avoid storing the point data in the computer memory and keep everything in the GPU memory. But when I would resize the buffer, I would have to copy the data back from the buffer to the CPU, then resize the buffer, and finally copy the data back to the buffer from the CPU. All this, also seems inefficient.
Any idea? What's best practice?

When the size of the buffer becomes too small, then I call glBufferData again increasing the size of the buffer, and copying back existing points to it.
Not a bad idea. In fact that's the recommended way of doing these things. But don't make the chunks too small.
Ideally, I would prefer to avoid storing the point data in the computer memory and keep everything in the GPU memory.
That's not how OpenGL works. The contents of a buffer objects can be freely swapped between CPU and GPU memory as needed.
But when I would resize the buffer, I would have to copy the data back from the buffer to the CPU, then resize the buffer, and finally copy the data back to the buffer from the CPU. All this, also seems inefficient.
Correct. You want to avoid copies between OpenGL and the host program. That's why there is in OpenGL-3.1 and later the function glCopyBufferSubData to copy data between buffers. When you need to resize a buffer you can as well create a new buffer object and copy from the old to the new one^1.
[1]: maybe you can also do resizing copys within the same buffer object name, by exploiting name orphaning; but I'd first have to read the specs if this is actually defined, and then cross fingers that all implementations get this right.

I made a program for scientific graphing before, that could add new data points in real-time. What I did was create a fairly large fixed size buffer with flag GL_DYNAMIC_DRAW, and added individual points to it with glBufferSubData. Once it filled, I created a new buffer with flag GL_STATIC_DRAW and moved all the data there, then started filling the GL_DYNAMIC_DRAW buffer again from the beginning. So I ended up with a small number of static buffers, one dynamic buffer, and since they were all equal size (with monotonically increasing x coordinates) calculating which buffers to use to draw any given segment of the data was easy. And I never had to resize any of them, just keep track of how much of the dynamic buffer was used and only draw that many vertices from it.
I don't think I used glCopyBufferSubData as datenwolf suggests, I kept a copy in CPU memory of the data in the dynamic buffer, until I could flush it to a new static buffer. But GPU->GPU copy would be better. I still would allocate more chunk-sized buffers and avoid resizing.

Related

How to send only new data to opengl without loosing previous data?

I am working with PointCloud data that I need to render using opengl. I get a new vector of data points every frame. I want that I be able to cache the data previously sent to opengl and only send the newest frame data to it. How can I do so?
I did some searching and found this idea here:
// Bind the old buffer to `GL_COPY_READ_BUFFER`
glBindBuffer (GL_COPY_READ_BUFFER, old_buffer);
// Allocate data for a new buffer
glGenBuffers (1, &new_buffer);
glBindBuffer (GL_COPY_WRITE_BUFFER, new_buffer);
glBufferData (GL_COPY_WRITE_BUFFER, ...);
// Copy `old_buffer_size`-bytes of data from `GL_COPY_READ_BUFFER`
// to `GL_COPY_WRITE_BUFFER` beginning at 0.
glCopyBufferSubData (GL_COPY_READ_BUFFER, GL_COPY_WRITE_BUFFER, 0, 0, old_buffer_size);
But it looks like its finally sending previous and new data in the new buffer instead of caching and sending only the latest data. So I am not sure if its the best way. Please correct me if I am wrong or suggest alternative.
So you store some data in your CPU memory, and you append more data to this storage. Then you want to send only the appended data to GPU, not the whole buffer.
Your code example is irrelevant for this task, as glCopyBufferSubData copies data from a location in GPU memory to another location in GPU memory again.
You need a combination of glBufferData and glBufferSubData. glBufferData allocates memory in GPU and optinoaly initializes it. glBufferSubData writes some data to already allocated GPU buffer. You may treat glBufferData as C's malloc or C++ new, while glBufferSubData is like a special version C's memcpy or C++ std::copy. More precisely, glBufferSubData is memcpy from CPU to GPU, and glCopyBufferSubData is memcpy from GPU to GPU.
How to cook them together? The same way as in C. Call glBufferData once at initialization time (when program starts), and call glBufferSubData when you need to append data. Be sure to allocate enough space! A buffer allocated by glBufferData does not grow, as well as malloced buffer. Overflowing a buffer with glBufferSubData causes undefined behavior and may crash your application.
Try to predict space requirement for your buffer, and call glBufferData only if your data does not fit into the buffer.
Remember that calling glBufferData with already allocated buffer binding will deallocate existing buffer and create a new one.
glBufferSubData will not reallocate your buffer, but will overwrite data which is already there.
Let me illustrate it with C translation:
glGenBuffers(..., buf); // float* buf;
glBindBuffer(buf); // Tell opengl that we will use buf pointer, no analog in C.
glBufferData(/*non-null pointer*/); // buf = malloc(/*..*/); memcpy(to_gpu, from_cpu);
glBufferData(/*another non-null pointer*/); // free(buf); buf = malloc(/*..*/); memcpy(to_gpu, from_cpu);
glBufferSubData(...); // memcpy(to_gpu, from_cpu);
Ideomatic approach
What you need is:
glGenBuffers(..., buf); // float* buf;
glBindBuffer(buf); // Tell opengl that we will use buf pointer, no analog in C.
// Initialization
glBufferData(/*non-null pointer*/); // buf = malloc(/*..*/); memcpy(to_gpu, from_cpu);
// Hot loop
while (needToRender) {
if(needToAppend) {
if (dataDoesNotFit) glBufferData(...); // Reallocate, same buffer name
else glBufferSubData(...); // memcpy(to_gpu, from_cpu);
}
}
Here we reallocate memory only occasionally, when we need to append something and buffer is too small.
Other approaches
I advised to reallocate with glBufferData as you already have all data in a single buffer on CPU. If not (i.e. you have a chunk of data on GPU and another chunk on CPU, but not together), you could use glCopyBufferSubData for reallocating:
glBufferData(/*alloc new_gpu_buffer*/);
glCopyBufferSubData(/*from old_gpu_buffer to new_gpu_buffer*/);
glDeleteBuffers(/*old_gpu_buffer*/);
glBufferSubData(/*from_cpu_buffer to new_cpu_buffer*/)p; // Add some new data from CPU.
Another approach for updating GPU data is mapping it to CPU, so you just access GPU memory by pointer. It's likely to be slow (blocks the buffer, stalls the pipeline), and is useful only in special cases. Use it if you know what you do.
Since OpenGL is an API focused on drawing things (ignoring compute shaders for the moment) and when drawing a scene you normally start from an empty canvas, you'll have to retain the complete backlog of point cloud data throughout for the whole span of time, you want to be able to redraw.
Assuming that for large amounts of point cloud data, redrawing the whole set might take some time, some form of cachine might seem reasonable. But let's do some back of the envelope calculateions first:
Typical GPUs these days are perfectly capable of performing full vertex setup at a rate well over 10^9 vertices / second (already 20 years ago GPUs were able to do something on the order of 20·10^6 vertices / second). Your typical computer display has less than 10·10^6 pixels. So because of the pigeonhole principle, if you were to draw more than 10·10^6 points you're either producing serious overdraw or fill up most of the pixels; in practice it's going to be somewhere inbetween.
But as we've already seen, GPUs are more than capable of drawing that many points at interactive framerates. And drawing any more of them will likely fill up your screen or occlude data.
Some form of data retirement is required, if you want the whole thing to remain readable. And for any size of pointcloud that is readable your GPU will be able to redraw the whole thing just fine.
Considering the need for data retirement, I suggest you allocate a large buffer, that is able to hold a whole set of points over their lifetime, before eviction, and use it as a circular round robin buffer: Have an offset where you write over new data as it arrives (using glBufferSubData), at the edges you may have to split this in two calls, pass the latest writing index as a uniform, to fade out points by their age, and then just submit a single glDrawElements call to draw the whole content of that buffer in one go.

QT QOpenGLWidget : how to modify individual vertices values in VBO without using data block copy?

I don't know if it is possible or not:
I have an array of QVector3D vertices that I copy to a VBO
sometimes I want to modify only the z value of a range of vertices between the values (x1, y1) and (x2, y2) - the concerned vertices strictly follow each other
my "good" idea is to only modify the z values with a direct access to the VBO.
I have searched a lot, but all the solutions I saw use memcpy, something like this :
m_vboPos.bind();
GLfloat* PosBuffer = (GLfloat*) (m_vboPos.map(QOpenGLBuffer::WriteOnly));
if (PosBuffer != (GLfloat*) NULL) {
memcpy(PosBuffer, m_Vertices.constData(), m_Vertices.size() * sizeof(QVector3D));
m_vboPos.unmap();
m_vboPos.release();
But it is to copy blocks of data.
I don't think using memcpy to change only 1 float value in every concerned vertex would be very efficient (I have several millions of vertices in the VBO).
I'd just like to optimize because copying millions of vertices takes a (too) long time : is there a way to achieve my goal (without memcpy ?), for only one float here and there ? (already tried that but couldn't make it, I must be missing something)
This call here
GLfloat* PosBuffer = (GLfloat*) (m_vboPos.map(QOpenGLBuffer::WriteOnly));
will internally call glMapBuffer which means that it just maps the buffer contents into the address space of your process (see also the OpenGL Wiki on Buffer Object Mapping.
Since you map it write-only, you can simply overwrite each and every bit of the buffer, as you see fit. There is no need to use memcpy, you can just use any means to write to memory, e.g. you can directly do
PosBuffer[3*vertex_id + 2] = 42.0f; // assuming 3 floats per vertex
I don't think using memcpy to change only 1 float value in every concerned vertex would be very efficient (I have several millions of vertices in the VBO).
Yes, doing a million separate memcpy() calls for 4 bytes each will not be a good idea. A modern compiler might actually inline it, so it might be equivalent to just individual assignments, though. But you can also do the assignments directly, since memcpy is not gaining you anything here.
However, it is not clear what the performance impacts of all this are. glMapBuffer might return a pointer to
some local copy of the VBO in system memory, and will have later to copy the contents to the GPU. Since it does not know which values you changed and which not, it might have to re-transmit the whole buffer.
some system meory inside the GART area, which is mapped on the GPU, so the GPU will directly access this memory when reading from the buffer.
some I/O-mapped region in VRAM. In this case, the caching behavior of the memory region might be significantly different, and changing a 4 bytes in every 12 byte block might not be the most ideal approach. Just re-copying the whole sub-block as one big junk might yield better performance.
The mapping itself is also not for free, it involves changing the page tables, and the GL driver might have to synchronize it's threads, or, in the worst case, synchronize with the GPU (to prevent you from overwriting stuff the GPU is still using for a previous draw call which is still in flight).
sometimes I want to modify only the z value of a range of vertices between the values (x1, y1) and (x2, y2) - the concerned vertices strictly follow each other
So you have a continuous sub-region of the buffer which you want to modify. I would recommend to look at two alternatives:
Use glMapBufferRange (if available in your OpenGL version) to map only the region you care about.
Forget about buffer mapping completely, and try glBufferSubData(). Not individually on each z component of each vertex, but as one big junk for the whole range of modified vertices. This will imply you have a local copy of the buffer contents in your memory somewhere, just update in, and send the results to the GL.
Which option is better will depend on a lot of different factors, and I would not rule one of them out without benchmarking in the actual scenario, on the actual implementations you care about. Also have a look at the general strategies for Buffer Object Streaming in OpenGL. A persistently mapped buffer might or might not be also a good option for your use case.
The glMap method works great and is really FAST !
Thanks a lot genpfault, the speed gain is so great that the 3D rendering isn't choppy anymore.
Here is my new code, simplified to offer an easy to understand answer :
vertexbuffer.bind();
GLfloat* posBuffer = (GLfloat*) (vertexbuffer.map(QOpenGLBuffer::WriteOnly));
if (posBuffer != (GLfloat*) NULL) {
int index = NumberOfVertices(area.y + 1, image.cols); // index of first vertex on line area.y
for (row = ...) for (col = ...) {
if (mask.at<uchar>(row, col) != 0)
posBuffer[3 * index + 2] = depthmap.at<uchar>(row, col) * depth;
index++;
}
}
vertexbuffer.unmap();
vertexbuffer.release();

Does an VBO must "glBufferData" before the first render loop?

I am a newbee to OpenGL. Now I could render something on screen, which is great. Now I want to streaming some data point(GL_POINTS) on my screen. However, initially it doesn't show anything. And it costs me several days to find out how to make it works.
The point is, I use an VBO to save my data and call glBufferSubData() to update the buffer. However, it ONLY works if I call glBufferData() before the first render loop. In other words, if I just do
glGenVertexArrays(1, &VAO);
glGenBuffers(1, &VBO);
in my initializing function (before first render loop), and in my updateData loop I do
glBindVertexArray(VAO);
glBindBuffer(VBO);
glBufferData(GL_ARRAY_BUFFER, myData.size() * sizeof(float), &myData[0], GL_DYNAMIC_DRAW);
... // glVertexAttribPointer and other required general process
It won't render anything. It seems I have to glBufferData(/some random data here/) to "allocate" the buffer (I thought it was allocated when calling glGenBuffer()). And use
glBufferSubData(/*override previous data*/) or
glMapData()
to update that buffer.
So my question would be...what is the proper/normal way to initialize an VBO if I don't know how much data/vertex I need to draw in the compile time? (right now I just glBufferData() a loooooong buffer and update it when needed)
Note: OpenGL3.3, Ubuntu16.04
Update:
In my opinion, it works in the way like allocating char[], I have to say pointing out how much I need for the string char[200], and fill in the data via snprintf(). But we now have vector or string, that allows us to dynamically change the memory location. Why OpenGL doesn't support that? If they did, how should I use it.
I thought it was allocated when calling glGenBuffer()
The statement:
int *p;
Creates an object in C/C++. That object is a pointer. It does not point to an actual object or any kind of storage; it's just a pointer, waiting to be given a valid address.
So too with the result of glGenBuffers. It's a buffer object, but it has no storage. And like an uninitialized pointer, there's not much you can do with a buffer object with no storage.
what is the proper/normal way to initialize an VBO if I only don't know how much data/vertex I need to draw in the compile time?
Pick a reasonable starting number. If you overflow that buffer, then either terminate the application or allocate additional storage.
But we now have vector or string, that allows us to dynamically change the memory location. Why OpenGL doesn't support that?
Because OpenGL is a low-level API (relatively speaking).
In order to support dynamic reallocation of storage in the way you suggest, the OpenGL implementation would have to allocate new storage of the size you request, copy the old data into the new storage, and then destroy the old storage. std::vector and std::string have to do the same stuff; they just do it invisibly.
Also, repeatedly doing that kind of operation horribly fragments the GPU's address space. That's not too much of a problem with vector and string, since those arrays are usually quite small (on the order of kilobytes or low megabytes). Whereas a single buffer object that takes up 10% of a GPU's total memory are hardly uncommon in serious graphics applications. Reallocating that is not a pleasant idea.
OpenGL gives you the low-level pieces to build that, if you want to suffer the performance penalty for doing so. But it doesn't implement it for you.

glBufferData set to null for constantly changing vbo

I have a huge vbo, and the entire thing changes every frame.
I have heard of different methods of quickly changing buffer data, however only one of them seems like a good idea for my program. However I dont understand it and cant find any code samples for it.
I have heard people claim that you should call glBufferData with "null" as the data then fill it with your real data each frame. What is the goal of this? What does this look like in code?
It's all in the docs.
https://www.opengl.org/sdk/docs/man/html/glBufferData.xhtml
If you pass NULL to glBufferData(), it looks something like this:
int bufferSize = ...;
glBufferData(GL_ARRAY_BUFFER, bufferSize, NULL, GL_DYNAMIC_DRAW);
void *ptr = glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
...
Ignore most of that function call, the only two important parts are bufferSize and NULL. This tells OpenGL that the buffer has size bufferSize and the contents are uninitialized / undefined. In practice, this means that OpenGL is free to continue using any previous data in the buffer as long as it needs to. For example, a previous draw call using the buffer may not have finished yet, and using glBufferData() allows you to get a new piece of memory for the buffer instead of waiting for the implementation to finish using the old piece of memory.
This is an old technique and it works fairly well. There are a couple other common techniques. One such technique is to double buffer, and switch between two VBOs every frame. A more sophisticated technique is to use a persistent buffer mapping, but this requires you to manage memory fences yourself in order for it to work correctly.
Note that if you are uploading data with glBufferData() anyway, then calling glBufferData() beforehand with NULL doesn't actually accomplish anything.

OpenGL VBO updating data

I have to draw a buffer that holds a couple thousand vertices. I am using a vbo to store the data.
I know I will have to update the VBO many times - but only in small parts at a time.
So I am wondering what the best method to doing so is:
Split VBO up into smaller VBOs (that hold like 300 verts) and then update individual VBOs with 1 call?
One big VBO and use lots of glBufferSubData() calls?
Use glMapBuffer() and one big VBO?
There is another option, which is a bit like option 3 - use one big VBO (probably with GL_STREAM_DRAW mode) that is reset each frame (by calling glBufferData with a NULL buffer pointer and the same size each time) then glMapBuffer-ed right away. The buffer is left mapped as it is filled in, then unmapped just before drawing. Repeat.
The call to glBufferData tells OpenGL that the old buffer contents aren't needed, so the glMapBuffer doesn't have to potentially wait to ensure the GPU is finished with by the GPU.
This approach seems to be the one officially sanctioned by the vertex_buffer_object extension. See the "Vertex arrays using a mapped buffer object" example:
http://www.opengl.org/registry/specs/ARB/vertex_buffer_object.txt
This suggests that OpenGL (or the driver?) will be watching for this sort of behaviour, and (when spotted) arrange things so that it is performed efficiently.
Doesn't sound like a good idea: it forces you to draw it in several calls while changing the bound buffer between each draw call.
Might do the trick if your buffer is huge.
The whole buffer will certainly be uploaded to the GPU. This will certainly be as efficient as one glBufferData, but you can do it asynchronously.
If think that glBufferData or glMapBuffer are the better solution if your buffer is small. 100000 * sizeof(float) * 3 ~= 1MB. There should be no problem with that.