OpenGL warning after calling glCopyNamedBufferSubData and glNamedBufferSubData - opengl

I'm getting the following warning:
Buffer performance warning: Buffer object 19 (bound to NONE, usage hint is GL_DYNAMIC_DRAW) is being copied/moved from VIDEO memory to HOST memory.
I allocate this buffer with glNamedBufferStorage. I copy data into it from another buffer (allocated the same way) with glCopyNamedBufferSubData then update some of its data with glNamedBufferSubData. this gives me an warning. I don't use mapping. Just these functions. (Using OpenGL4.5. Card is NVidia.)
So far I couldn't reproduce this warning with some minimal code but it comes consistently. After searching I found people who got this message used some kind of buffer mapping. What can cause this warning in my case? How can I found more about what is causing it?
EDIT:
Minimal reproducible example:
GLuint test_buffer_1_ID;
GLuint test_buffer_2_ID;
glCreateBuffers(1, &test_buffer_1_ID);
glCreateBuffers(1, &test_buffer_2_ID);
UByte data_source[100];
const GLbitfield flags = GL_DYNAMIC_STORAGE_BIT;
glNamedBufferStorage(test_buffer_1_ID, 100, data_source, flags);
glNamedBufferStorage(test_buffer_2_ID, 100, data_source, flags);
glCopyNamedBufferSubData(
test_buffer_1_ID,
test_buffer_2_ID,
0,
0,
10
);
glNamedBufferSubData(
test_buffer_2_ID,
0,
10,
data_source
);
Warning comes when glNamedBufferSubData is called. If I specify source data glNamedBufferStorage or give it nullptr makes no difference. If I call SubData first and Copy after, or just SubData, then there is no warning.

Because people have misused and mischaracterized buffer objects so frequently in the past, implementers have basically been forced to move buffer storage around based on how it is used, not how you say it will be used.
You said that your buffer would be filled by the CPU (otherwise you couldn't use BufferSubData on it). But if you drop stuff in it from the CPU enough times, the driver will think that you're serious about doing that. So it will move the storage to directly CPU-accessible memory, where such copies will be cheaper. Potentially at the cost of read performance.
If you're updating the storage of the buffer so frequently that you're triggering this condition, it might be better to just map it persistently and write to that when you need to. Of course, you'll now have to manage synchronization, but you should have been doing that anyway if performance mattered to you.

Related

Deriving the `VkMemoryRequirements`

Is there a way to get the right values for a VkMemoryRequirements structure without having to allocate a buffer first and without using vkGetBufferMemoryRequirements?
Is it supported/compliant?
Motivation - Short version
I have an application that does the following, and everything works as expected.
VkMemoryRequirements memReq;
vkGetBufferMemoryRequirements(application.shell->device, uniformBuffer, &memReq);
int memType = application.shell->findMemoryType(memReq.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
Internally, findMemoryType loops over the memory types and checks that they have the required property flags.
If I replace the call to vkGetMemoryRequirements with hardcoded values (which are not portable, specific to my system and obtained through debugging), everything still works and I don't get any validation errors.
VkMemoryRequirements memReq = { 768, 256, 1665 };
int memType = application.shell->findMemoryType(memReq.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
The above code is IMHO neat because enables to pre-allocate memory before you actually need it.
Motivation - Long version
In Vulkan you create buffers which initially are not backed by device memory and at a later stage you allocate the memory and bind it to the buffer using vkBindBufferMemory:
VkResult vkBindBufferMemory(
VkDevice device,
VkBuffer buffer,
VkDeviceMemory memory,
VkDeviceSize memoryOffset);
Its Vulkan spec states that:
memory must have been allocated using one of the memory types allowed
in the memoryTypeBits member of the VkMemoryRequirements structure
returned from a call to vkGetBufferMemoryRequirements with buffer
Which implies that before allocating the memory for a buffer, you should have already created the buffer.
I have a feeling that in some circumstances it would be useful to pre-allocate a chunk of memory before you actually need it; in most of the OpenGL flavors I have experience with this was not possible, but Vulkan should not suffer from this limitation, right?
Is there a (more or less automatic) way to get the memory requirements before creating the first buffer?
Is it supported/compliant?
Obviously, when you do allocate the memory for the first buffer you can allocate a little more so that when you need a second buffer you can bind it to another range in the same chunk. But my understanding is that, to comply with the spec, you would still need to call vkGetBufferMemoryRequirements on the second buffer, even if it is exactly the same type and the same size as the first one.
This question already recognizes that the answer is "no"; you just seem to want to do an end-run around what you already know. Which you can't.
The code you showed with the hard-coded values works because you already know the answer. It's not that Vulkan requires you to ask the question; Vulkan requires you to provide buffers that use the answer.
However, since "the answer" is implementation-specific, it changes depending on the hardware. It could change when you install a new driver. Indeed, it could change even depending on which extensions or Vulkan features you activate when creating the VkDevice.
That having been said:
Which implies that before allocating the memory for a buffer, you should have already created the buffer.
Incorrect. It requires that you have the answer and have selected memory and byte offsets appropriate to that answer. But Vulkan is specifically loose about what "the answer" actually means.
Vulkan has specific guarantees in place which allow you to know the answer for a particular buffer/image without necessarily having asked about that specific VkBuffer/Image object. The details are kind of complicated, but for buffers they are pretty lax.
The basic idea is that you can create a test VkBuffer/Image and ask about its memory properties. You can then use that answer to know what the properties of the buffers you intend to use which are "similar" to that. At the very least, Vulkan guarantees that two identical buffer/images (formats, usage flags, sizes, etc) will always produce identical memory properties.
But Vulkan also offers a few other guarantees. There are basically 3 things that the memory properties tell you:
The memory types that this object can be bound to.
The alignment requirement for the offset for the memory object.
The byte size the object will take up in memory.
For the size, you get only the most basic guarantee: equivalent buffer/images will produce equivalent sizes.
For the alignment, images are as strict as sizes: only equivalent images are guaranteed to produce equivalent alignment. But for buffers, things are more lax. If the test buffer differs only in usage flags, and the final buffer uses a subset of the usage flags, then the alignment for the final buffer will not be more restrictive than the test buffer. So you can use the alignment from the test buffer.
For the memory types, things are even more loose. For images, the only things that matter are:
Tiling
Certain memory flags (sparse/split-instance binding)
Whether the image format is color or depth/stencil
If the image format is depth/stencil, then the formats must match
External memory
Transient allocation usage
If all of these are the same for two VkImage objects, then the standard guarantees that all such images will support the same set of memory types.
For buffers, things are even more lax. For non-sparse buffers, if your test buffer differs from the final buffer only by usage flags, then if the final one has a subset of the usage flags of the test buffer, then the set of memory types it supports must include all of the ones from the test buffer. The final buffer could support more, but it must support at least those of such a test buffer.
Oh, and linear images and buffers must always be able to be used in at least one mappable, coherent memory type. Of course, this requires that you have created a valid VkDevice/Image with those usage and flags fields, so if the device doesn't allow (for example) linear images to be used as textures, then that gets stopped well before asking about memory properties.

Does an VBO must "glBufferData" before the first render loop?

I am a newbee to OpenGL. Now I could render something on screen, which is great. Now I want to streaming some data point(GL_POINTS) on my screen. However, initially it doesn't show anything. And it costs me several days to find out how to make it works.
The point is, I use an VBO to save my data and call glBufferSubData() to update the buffer. However, it ONLY works if I call glBufferData() before the first render loop. In other words, if I just do
glGenVertexArrays(1, &VAO);
glGenBuffers(1, &VBO);
in my initializing function (before first render loop), and in my updateData loop I do
glBindVertexArray(VAO);
glBindBuffer(VBO);
glBufferData(GL_ARRAY_BUFFER, myData.size() * sizeof(float), &myData[0], GL_DYNAMIC_DRAW);
... // glVertexAttribPointer and other required general process
It won't render anything. It seems I have to glBufferData(/some random data here/) to "allocate" the buffer (I thought it was allocated when calling glGenBuffer()). And use
glBufferSubData(/*override previous data*/) or
glMapData()
to update that buffer.
So my question would be...what is the proper/normal way to initialize an VBO if I don't know how much data/vertex I need to draw in the compile time? (right now I just glBufferData() a loooooong buffer and update it when needed)
Note: OpenGL3.3, Ubuntu16.04
Update:
In my opinion, it works in the way like allocating char[], I have to say pointing out how much I need for the string char[200], and fill in the data via snprintf(). But we now have vector or string, that allows us to dynamically change the memory location. Why OpenGL doesn't support that? If they did, how should I use it.
I thought it was allocated when calling glGenBuffer()
The statement:
int *p;
Creates an object in C/C++. That object is a pointer. It does not point to an actual object or any kind of storage; it's just a pointer, waiting to be given a valid address.
So too with the result of glGenBuffers. It's a buffer object, but it has no storage. And like an uninitialized pointer, there's not much you can do with a buffer object with no storage.
what is the proper/normal way to initialize an VBO if I only don't know how much data/vertex I need to draw in the compile time?
Pick a reasonable starting number. If you overflow that buffer, then either terminate the application or allocate additional storage.
But we now have vector or string, that allows us to dynamically change the memory location. Why OpenGL doesn't support that?
Because OpenGL is a low-level API (relatively speaking).
In order to support dynamic reallocation of storage in the way you suggest, the OpenGL implementation would have to allocate new storage of the size you request, copy the old data into the new storage, and then destroy the old storage. std::vector and std::string have to do the same stuff; they just do it invisibly.
Also, repeatedly doing that kind of operation horribly fragments the GPU's address space. That's not too much of a problem with vector and string, since those arrays are usually quite small (on the order of kilobytes or low megabytes). Whereas a single buffer object that takes up 10% of a GPU's total memory are hardly uncommon in serious graphics applications. Reallocating that is not a pleasant idea.
OpenGL gives you the low-level pieces to build that, if you want to suffer the performance penalty for doing so. But it doesn't implement it for you.

glBufferData set to null for constantly changing vbo

I have a huge vbo, and the entire thing changes every frame.
I have heard of different methods of quickly changing buffer data, however only one of them seems like a good idea for my program. However I dont understand it and cant find any code samples for it.
I have heard people claim that you should call glBufferData with "null" as the data then fill it with your real data each frame. What is the goal of this? What does this look like in code?
It's all in the docs.
https://www.opengl.org/sdk/docs/man/html/glBufferData.xhtml
If you pass NULL to glBufferData(), it looks something like this:
int bufferSize = ...;
glBufferData(GL_ARRAY_BUFFER, bufferSize, NULL, GL_DYNAMIC_DRAW);
void *ptr = glMapBuffer(GL_ARRAY_BUFFER, GL_WRITE_ONLY);
...
Ignore most of that function call, the only two important parts are bufferSize and NULL. This tells OpenGL that the buffer has size bufferSize and the contents are uninitialized / undefined. In practice, this means that OpenGL is free to continue using any previous data in the buffer as long as it needs to. For example, a previous draw call using the buffer may not have finished yet, and using glBufferData() allows you to get a new piece of memory for the buffer instead of waiting for the implementation to finish using the old piece of memory.
This is an old technique and it works fairly well. There are a couple other common techniques. One such technique is to double buffer, and switch between two VBOs every frame. A more sophisticated technique is to use a persistent buffer mapping, but this requires you to manage memory fences yourself in order for it to work correctly.
Note that if you are uploading data with glBufferData() anyway, then calling glBufferData() beforehand with NULL doesn't actually accomplish anything.

glMapBufferRange error "Buffer must be bound and not mapped" while trying to implement VAOs

So the trouble is that I to write to the buffer:
void* dst = glMapBufferRange(target, offset, size, GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_UNSYNCHRONIZED_BIT);
which does work fine for the first call, but not the second, and I'm not sure exactly why that's the case.
Out of the possible causes:
GL_INVALID_OPERATION is generated by glMapBufferRange if zero is bound to target.
GL_INVALID_VALUE is generated if offset or length is negative, if offset+length is greater than the value of GL_BUFFER_SIZE for the buffer object, or if access has any bits set other than those defined above.
length is zero.
The buffer object is already in a mapped state.
Neither GL_MAP_READ_BIT nor GL_MAP_WRITE_BIT is set.
GL_MAP_READ_BIT is set and any of GL_MAP_INVALIDATE_RANGE_BIT, GL_MAP_INVALIDATE_BUFFER_BIT or GL_MAP_UNSYNCHRONIZED_BIT is set.
GL_MAP_FLUSH_EXPLICIT_BIT is set and GL_MAP_WRITE_BIT is not set.
Any of GL_MAP_READ_BIT, GL_MAP_WRITE_BIT, GL_MAP_PERSISTENT_BIT, or GL_MAP_COHERENT_BIT are set, but the same bit is not included in the buffer's storage flags.
...the only one that I'm not sure how to verify/fix would be "The buffer object is already in a mapped state". Would that possibly be related to the error I'm seeing?
In case it was, I read about GL_MAP_PERSISTENT_BIT, but we're targeting 4.2 and glBufferStorage requires 4.4, so I'm not sure how else I could remedy the situation in that case.
Unfortunately I can't post all the code involved without modifying a ton of stuff, but the basic setup I have is to create a VAO for each newly-created GL_ARRAY_BUFFER, and whenever the buffer would have been bound (code without VAOs), I instead bind the VAO.
The VAO that's bound is for a buffer that we've dubbed "rolling VBO", as it's used as a sort-of catch all for all of our text and other misc. rendering. Perhaps it's because it's used for a lot of dissimilar things...but I'm not sure how to pinpoint what it would be exactly.
If any code piece or other information would help please let me know, I'm not sure what exactly would be needed for this issue. Thanks a lot for any help!

glMapBufferRange() downloads full buffer?

I noticed a 15ms slow down when calling some openGL functions. After some tests I do believe I narrowed down the issue. I do have a buffer of couple MBytes (containing mostly particles). I do need to add some particles sometimes. To do so I bind the buffer, get the current number of particles to know the offset whereto write, then write the particles. As expected, the slow down is on the reading part. (For this problem, do assume that keeping track of the number of particles on the CPU side is impossible.)
glBindBuffer(GL_ARRAY_BUFFER, m_buffer);
GLvoid* rangePtr = glMapBufferRange( //This function takes 15ms to return
GL_ARRAY_BUFFER,
m_offsetToCounter,
sizeof(GLuint),
1);
if(rangePtr != NULL)
value = *(GLuint*) rangePtr;
m_functions->glBindBuffer(GL_ARRAY_BUFFER, 0);
I assumed by providing a really limited size (here a GLuint), only a GLuint would be downloaded. However, by reducing drastically the size of my buffer to 200 KBytes, the execution time of the function drops to 8ms.
Two question :
glMapBufferRange as well as glGetBufferSubData do download the full buffer even though the user only ask for a portion of it ?
The math doesn't add up, any idea why ? There is still 8ms to download a really small buffer. The execution time equation looks like y = ax + b where b is 7-8 ms. When I was trying to find the source of the problem before suspecting the buffer size, I also found that glUniform* functions took ~10ms as well. But only the first call. If there is multiple glUniform* calls one after the other, only the first one takes a lot of time. The others are instantaneous. And when the buffer will be accessed in reading, there is no download time as well. Is glUniform* triggering something ?
I'm using the Qt 5 API. I would like to be sure first that I'm using openGL properly before thinking it might be Qt's layer that causes the slow down and re-implement the whole program with glu/glut.
8ms sounds like an awful lot of timeā€¦ How do you measure that time?
glMapBufferRange as well as glGetBufferSubData do download the full buffer even though the user only ask for a portion of it?
The OpenGL specification does not define in which way buffer mapping is to be implemented by the implementation. It may be a full download of the buffers contents. It may be a single page I/O-Memory mapping. It may be anything the makes the contents of the buffer object appear in the host process address space.
The math doesn't add up, any idea why?
For one thing the smallest size of a memory map is the system's page size. Either if it's done by a full object copy or by a I/O-Memory mapping or something entirely different, you're always dealing with memory chunks at least a few kiB in size.
I'm using the Qt 5 API
Could it be that you're using the Qt5 OpenGL functions class? AFAIK this class does load function pointers on demand, so the first invocation of a function may trigger a chain of actions that take a few moments to complete.