OpenGL Compute shader lockup after glMapBuffer - opengl

I am doing a simple cloth simulation based on some existing code and am working on OpenGL 4.3 profile. The problem I am facing is that I am trying to incorporate a simple compute shader which takes in a buffer and just adds some value to it.
Once its done, I map the buffer and then unmap it. After the first 3 frames, the glDispatchCompute locks up. However, if I comment out the map & unmap, it seems to run fine. I tried getting error codes but its returning 0 for every frame. Any ideas on what could be going wrong ??
glUseProgram(computeShader);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, cloth1.vertex_vbo_storage); // Buffer Binding 1
glBufferData(GL_SHADER_STORAGE_BUFFER, cloth1.particles.size() * sizeof(Particle), &(cloth1.particles[0]), GL_DYNAMIC_COPY);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, cloth1.vertex_vbo_storage);
glDispatchCompute(6, 6, 1);
glBindBuffer(GL_ARRAY_BUFFER, cloth1.vertex_vbo_storage);
Particle * ptr = reinterpret_cast<Particle *>(glMapBufferRange(GL_ARRAY_BUFFER, 0, cloth1.particles.size() * sizeof(Particle), GL_MAP_READ_BIT));
{
GLenum err = glGetError();
if (err > 0)
{
std::string name = std::string((char*)(glGetString(err)));
}
}
//// memcpy(&cloth1.particles[0], ptr, cloth1.particles.size()*sizeof(Particle));
glUnmapBuffer(GL_ARRAY_BUFFER);
glBindBuffer(GL_ARRAY_BUFFER, 0);

I figured it out. I was missing an unbind of the GL_SHADER_STORAGE_BUFFER between the dispatch.
glDispatchCompute(6, 6, 1);
**glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, 0);**
glBindBuffer(GL_ARRAY_BUFFER, cloth1.vertex_vbo_storage);

Related

Why is OpenGL immediate mode faster than core?

I am using the following library to render text in OpenGL: fontstash. I have another header file which adds support for OpenGL 3.0+. The question is why is the render implementation with core profile much slower than the immediate mode?
Here is the render code with immediate mode:
static void glfons__renderDraw(void* userPtr, const float* verts, const float* tcoords, const unsigned int* colors, int nverts)
{
GLFONScontext* gl = (GLFONScontext*)userPtr;
if (gl->tex == 0) return;
glBindTexture(GL_TEXTURE_2D, gl->tex);
glEnable(GL_TEXTURE_2D);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
glVertexPointer(2, GL_FLOAT, sizeof(float)*2, verts);
glTexCoordPointer(2, GL_FLOAT, sizeof(float)*2, tcoords);
glColorPointer(4, GL_UNSIGNED_BYTE, sizeof(unsigned int), colors);
glDrawArrays(GL_TRIANGLES, 0, nverts);
glDisable(GL_TEXTURE_2D);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
}
Here is the core profile render code:
static void gl3fons__renderDraw(void* userPtr, const float* verts, const float* tcoords, const unsigned int* colors, int nverts)
{
GLFONScontext* gl = (GLFONScontext*)userPtr;
if (gl->tex == 0) return;
if (gl->shader == 0) return;
if (gl->vao == 0) return;
if (gl->vbo == 0) return;
// init shader
glUseProgram(gl->shader);
// init texture
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, gl->tex);
glUniform1i(gl->texture_uniform, 0);
// init our projection matrix
glUniformMatrix4fv(gl->projMat_uniform, 1, false, gl->projMat);
// bind our vao
glBindVertexArray(gl->vao);
// setup our buffer
glBindBuffer(GL_ARRAY_BUFFER, gl->vbo);
glBufferData(GL_ARRAY_BUFFER, (2 * sizeof(float) * 2 * nverts) + (sizeof(int) * nverts), NULL, GL_DYNAMIC_DRAW);
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(float) * 2 * nverts, verts);
glBufferSubData(GL_ARRAY_BUFFER, sizeof(float) * 2 * nverts, sizeof(float) * 2 * nverts, tcoords);
glBufferSubData(GL_ARRAY_BUFFER, 2 * sizeof(float) * 2 * nverts, sizeof(int) * nverts, colors);
// setup our attributes
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 2, GL_FLOAT, GL_FALSE, 0, 0);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, sizeof(float) * 2, (void *) (sizeof(float) * 2 * nverts));
glEnableVertexAttribArray(2);
glVertexAttribPointer(2, 4, GL_UNSIGNED_BYTE, GL_TRUE, sizeof(int), (void *) (2 * sizeof(float) * 2 * nverts));
glDrawArrays(GL_TRIANGLES, 0, nverts);
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(0);
glUseProgram(0);
}
I made a small test for each implementation and the results show that immediate mode is significantly faster than core.
Both tests fill the screen with AAA... and I log the time it took to do this for each frame. This is the loop:
// Create GL stash for 512x512 texture, our coordinate system has zero at top-left.
struct FONScontext* fs = glfonsCreate(512, 512, FONS_ZERO_TOPLEFT);
// Add font to stash.
int fontNormal = fonsAddFont(fs, "sans", "fontstash/example/DroidSerif-Regular.ttf");
// Render some text
float dx = 10, dy = 10;
unsigned int white = glfonsRGBA(255,255,255,255);
std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();
fonsSetFont(fs, fontNormal);
fonsSetSize(fs, 20.0f);
fonsSetColor(fs, white);
for(int i = 0; i < 90; i++){
for( int j = 0; j < 190; j++){
dx += 10;
fonsDrawText(fs, dx, dy, "A", NULL);
}
dy += 10;
dx = 10;
}
std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> time_span = t2 - t1;
std::cout<<"Time to render: "<<time_span.count()<<"ms"<<std::endl;
And the results show more than 400ms difference between the two:
Core profile (left) vs Immediate mode (right)
What should be changed in order to speed up performance?
I don't know exactly what gl is in this program, but it's pretty clear that, every time you want to render a piece of text, you perform the following operations:
Allocate storage for a buffer, reallocating whatever storage had been created from the last time this was called.
Perform three separate uploads of data to that buffer.
These are not good ways to stream vertex data to the GPU. There are specific techniques for doing this well, but this is not one of them. In particular, the fact that you are constantly reallocating the same buffer is going to kill performance.
The most effective way to deal with this is to have a single buffer with a fixed amount of storage. It gets allocated exactly once and never again. Ideally, whatever API you're getting vertex data from would provide it in an interleaved format, so that you would only need to perform one upload rather than three. But apparently, Fontstash is apparently not so generous.
In any case, the main idea is to avoid reallocation and synchronization. The latter means never trying to write over data that has been written to recently. So your buffer needs to be sufficiently large to hold twice the number of font vertices you ever expect to render. Essentially, you double-buffer the vertex data: writing to one set of data while the other set is being read from.
So at the beginning of the frame, you figure out what the byte offset to where you want to render will be. This will either be the start of the buffer or half-way through it. Then, for each blob of text, you write vertex data to this offset and increment the offset accordingly.
And to avoid having to change VAO state, you should interleave the vertex data manually. Instead of uploading three arrays, you should interleave the vertices so that you're effectively making one gigantic array of vertices. So you never need to call glVertexAttribPointer in the middle of this function; you just use the parameters to glDraw* to draw the part of the array you want.
This also means you only need one glBufferSubData call. But if you have access to persistent mapped buffers, you don't even need that, since you can just write to the memory directly while using the other portion of it. Though if you use persistent mapping, you will need to use a fence sync object when you switch buffer regions to make sure that you're not writing to vertex data that is still being read by the GPU.

How come no cube is drawn on my screen with this code in a GLFW window?

I have a bunch of code (copied from various tutorials) that is supposed to draw a random color-changing cube that the camera shifts around every second or so (with a variable, not using timers yet). It worked in the past before I moved my code into distinctive classes and shoved it all into my main function, but now I can't see anything on the main window other than a blank background. I cannot pinpoint any particular issue here as I am getting no errors or exceptions, and my own personally defined code checks out; when I debugged, every variable had a value I expected, and the shaders I used (in string form) worked in the past before I re-organized my code. I can print out the vertices of the cube in the same scope as the glDrawArrays() function as well, and they have the correct values too. Basically, I have no idea what's wrong with my code that is causing nothing to be drawn.
My best guess is that I called - or forgot to call - some opengl function improperly with the wrong data in one of the three methods of my Model class. In my program, I create a Model object (after glfw and glad are initialized, which then calls the Model constructor), update it every once and a while (time doesn't matter) through the update() function, then draw it to my screen every time my main loop is run through the draw() function.
Possible locations of code faults:
Model::Model(std::vector<GLfloat> vertexBufferData, std::vector<GLfloat> colorBufferData) {
mVertexBufferData = vertexBufferData;
mColorBufferData = colorBufferData;
// Generate 1 buffer, put the resulting identifier in vertexbuffer
glGenBuffers(1, &VBO);
// The following commands will talk about our 'vertexbuffer' buffer
glBindBuffer(GL_ARRAY_BUFFER, VBO);
// Give our vertices to OpenGL.
glBufferData(GL_ARRAY_BUFFER, sizeof(mVertexBufferData), &mVertexBufferData.front(), GL_STATIC_DRAW);
glGenBuffers(1, &CBO);
glBindBuffer(GL_ARRAY_BUFFER, CBO);
glBufferData(GL_ARRAY_BUFFER, sizeof(mColorBufferData), &mColorBufferData.front(), GL_STATIC_DRAW);
// Create and compile our GLSL program from the shaders
programID = loadShaders(zachos::DATA_DEF);
glUseProgram(programID);
}
void Model::update() {
for (int v = 0; v < 12 * 3; v++) {
mColorBufferData[3 * v + 0] = (float)std::rand() / RAND_MAX;
mColorBufferData[3 * v + 1] = (float)std::rand() / RAND_MAX;
mColorBufferData[3 * v + 2] = (float)std::rand() / RAND_MAX;
}
glBufferData(GL_ARRAY_BUFFER, sizeof(mColorBufferData), &mColorBufferData.front(), GL_STATIC_DRAW);
}
void Model::draw() {
// Setup some 3D stuff
glm::mat4 mvp = Mainframe::projection * Mainframe::view * model;
GLuint MatrixID = glGetUniformLocation(programID, "MVP");
glUniformMatrix4fv(MatrixID, 1, GL_FALSE, &mvp[0][0]);
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER, VBO);
glVertexAttribPointer(
0, // attribute 0. No particular reason for 0, but must match the layout in the shader.
3, // size
GL_FLOAT, // type
GL_FALSE, // normalized?
0, // stride
(void*)0 // array buffer offset
);
glEnableVertexAttribArray(1);
glBindBuffer(GL_ARRAY_BUFFER, CBO);
glVertexAttribPointer(
1, // attribute. No particular reason for 1, but must match the layout in the shader.
3, // size
GL_FLOAT, // type
GL_FALSE, // normalized?
0, // stride
(void*)0 // array buffer offset
);
// Draw the array
glDrawArrays(GL_TRIANGLES, 0, mVertexBufferData.size() / 3);
glDisableVertexAttribArray(0);
glDisableVertexAttribArray(1);
};
My question is simple, how come my program won't draw a cube on my screen? Is the issue within these three functions or elsewhere? I can provide more general information about the drawing process if needed, though I believe the code I provided is enough, since I literally just call model.draw().
sizeof(std::vector) will usually just be 24bytes (since the struct contains 3 pointers typically). So basically both of your buffers have 6 floats loaded in them, which is not enough verts for a single triangle, lets alone a cube!
You should instead be calling size() on the vector when loading the data into the vertex buffers.
glBufferData(GL_ARRAY_BUFFER,
mVertexBufferData.size() * sizeof(float), ///< this!
mVertexBufferData.data(), ///< prefer calling data() here!
GL_STATIC_DRAW);

GL Error: Out of Memory when trying to render with VBO

I've been trying to use Vertex Buffer Objects to save vertex data on the GPU and reduce the overhead, but I cannot get it to work. The code is below.
From what I understand you generate the buffer with glGenBuffers, then you bind the buffer with glBindBuffer so it's ready to be used, then you write data to it with glBufferData and its done and can be unbinded and ready for use later with simply binding it again.
However the last part is what I'm having trouble with, when I bind it after I have created and loaded data to it and try to draw using it, it gives me lots of GL Error: Out of Memory.
I doubt that I am running out of memory for my simple mesh, so I must be doing something very wrong.
Thanks.
EDIT 1: I call glGetError after every frame, but since this is the only OpenGL I do in the entire program it shouldn't be a problem
//when loading the mesh we create the VBO
void createBuffer()
{
GLuint buf;
glGenBuffers(1, &buf);
glBindBuffer(GL_ARRAY_BUFFER, buf);
glBufferData(GL_ARRAY_BUFFER, vertNormalBuffer->size() * sizeof(GLfloat), (GLvoid*) bufferData, GL_STATIC_DRAW);
//EDIT 1: forgot to show how I handle the buffer
model->vertexNormalBuffer = &buf;
//Unbinds it
glBindBuffer(GL_ARRAY_BUFFER, 0);
}
void Fighter::doRedraw(GLuint shaderProgram)
{
glm::mat4 transformationMatrix = getTransform();
GLuint loc = glGetUniformLocation(shaderProgram,"modelviewMatrix");
glUniformMatrix4fv(loc, 1, GL_FALSE, (GLfloat*) &transformationMatrix);
glBindBuffer(GL_ARRAY_BUFFER, *model->vertexNormalBuffer);
//If I uncomment this line below all works wonderfully, but isnt the purpose of VBO of not uploading the same data again and again?
//glBufferData(GL_ARRAY_BUFFER, model->vertAndNormalArraySize * sizeof(GLfloat), model->vertAndNormalArray, GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glEnableVertexAttribArray(2);
renderChild(model, model);
glBindBuffer(GL_ARRAY_BUFFER, 0);
}
void Fighter::renderChild(ModelObject* model, ModelObject* parent)
{
//Recursively render the mesh children
for(int i = 0; i < model->nChildren; i++)
{
renderChild( dynamic_cast<ModelObject*>(model->children[i]), parent);
}
//Vertex and normal data are interlieved
glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 8*sizeof(GLfloat),(void*)(model- >vertexDataPosition*sizeof(GLfloat)));
glVertexAttribPointer(2, 4, GL_FLOAT, GL_FALSE, 8*sizeof(GLfloat), (void*)((model->vertexDataPosition + 4)*sizeof(GLfloat)));
//Draws using two sets of indices
glDrawElements(GL_QUADS, model->nQuads * 4, GL_UNSIGNED_INT,(void*) model->quadsIndices);
glDrawElements(GL_TRIANGLES, model->nTriangles * 3, GL_UNSIGNED_INT, (void*) model->trisIndices);
}
This is your problem:
model->vertexNormalBuffer = &buf;
/* ... */
glBindBuffer(GL_ARRAY_BUFFER, *model->vertexNormalBuffer);
You're storing the address of your buf variable, rather than its contents, and then it falls out of scope when createBuffer returns, and is most likely overwritten with other data, so when you're later rendering, you're using an uninitialized buffer. Just store the contents of buf in your vertexNormalBuffer field instead.
I'll admit I don't know why OpenGL thinks it proper to say that it's "out of memory" just because of that, but perhaps you're just invoking undefined behavior. It does explain, however, why it starts working when you re-fill the buffer with data after you rebind it, because you then implicitly initialize the buffer that you just bound.

OpenGL Flicker with glBufferSubData [duplicate]

It seems like glBufferSubData is overwriting or somehow mangling data between my glDrawArrays calls. I'm working in Windows 7 64bit, with that latest drivers for my Nvidia GeForce GT520M CUDA 1GB.
I have 2 models, each with an animation. The models have 1 mesh, and that mesh is stored in the same VAO. They also have 1 animation each, and the bone transformations to be used for rendering the mesh is stored in the same VBO.
My workflow looks like this:
calculate bone transformation matrices for a model
load bone transformation matrices into opengl using glBufferSubData, then bind the buffer
render the models mesh using glDrawArrays
For one model, this works (at least, mostly - sometimes I get weird gaps in between the vertices).
However, for more than one model, it looks like bone transformation matrix data is getting mixed up between the rendering calls to the meshes.
Single Model Animated Windows
Two Models Animated Windows
I load my bone transformation data like so:
void Animation::bind()
{
glBindBuffer(GL_UNIFORM_BUFFER, bufferId_);
glBufferSubData(GL_UNIFORM_BUFFER, 0, currentTransforms_.size() * sizeof(glm::mat4), &currentTransforms_[0]);
bindPoint_ = openGlDevice_->bindBuffer( bufferId_ );
}
And I render my mesh like so:
void Mesh::render()
{
glBindVertexArray(vaoId_);
glDrawArrays(GL_TRIANGLES, 0, vertices_.size());
glBindVertexArray(0);
}
If I add a call to glFinish() after my call to render(), it works just fine! This seems to indicate to me that, for some reason, the transformation matrix data for one animation is 'bleeding' over to the next animation.
How could this happen? I am under the impression that if I called glBufferSubData while that buffer was in use (i.e. for a glDrawArrays for example), then it would block. Is this not the case?
It might be worth mentioning that this same code works just fine in Linux.
Note: Related to a previous post, which I deleted.
Mesh Loading Code:
void Mesh::load()
{
LOG_DEBUG( "loading mesh '" + name_ +"' into video memory." );
// create our vao
glGenVertexArrays(1, &vaoId_);
glBindVertexArray(vaoId_);
// create our vbos
glGenBuffers(5, &vboIds_[0]);
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[0]);
glBufferData(GL_ARRAY_BUFFER, vertices_.size() * sizeof(glm::vec3), &vertices_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[1]);
glBufferData(GL_ARRAY_BUFFER, textureCoordinates_.size() * sizeof(glm::vec2), &textureCoordinates_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[2]);
glBufferData(GL_ARRAY_BUFFER, normals_.size() * sizeof(glm::vec3), &normals_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(2);
glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[3]);
glBufferData(GL_ARRAY_BUFFER, colors_.size() * sizeof(glm::vec4), &colors_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(3);
glVertexAttribPointer(3, 4, GL_FLOAT, GL_FALSE, 0, 0);
if (bones_.size() == 0)
{
bones_.resize( vertices_.size() );
for (auto& b : bones_)
{
b.weights = glm::vec4(0.25f);
}
}
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[4]);
glBufferData(GL_ARRAY_BUFFER, bones_.size() * sizeof(VertexBoneData), &bones_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(4);
glVertexAttribIPointer(4, 4, GL_INT, sizeof(VertexBoneData), (const GLvoid*)0);
glEnableVertexAttribArray(5);
glVertexAttribPointer(5, 4, GL_FLOAT, GL_FALSE, sizeof(VertexBoneData), (const GLvoid*)(sizeof(glm::ivec4)));
glBindVertexArray(0);
}
Animation UBO Setup:
void Animation::setupAnimationUbo()
{
bufferId_ = openGlDevice_->createBufferObject(GL_UNIFORM_BUFFER, Constants::MAX_NUMBER_OF_BONES_PER_MESH * sizeof(glm::mat4), &currentTransforms_[0]);
}
where Constants::MAX_NUMBER_OF_BONES_PER_MESH is set to 100.
In OpenGlDevice:
GLuint OpenGlDevice::createBufferObject(GLenum target, glmd::uint32 totalSize, const void* dataPointer)
{
GLuint bufferId = 0;
glGenBuffers(1, &bufferId);
glBindBuffer(target, bufferId);
glBufferData(target, totalSize, dataPointer, GL_DYNAMIC_DRAW);
glBindBuffer(target, 0);
bufferIds_.push_back(bufferId);
return bufferId;
}
Those usage flags are mostly correct for this scenario, though you might consider trying GL_STREAM_DRAW.
Your driver appears to be failing to implicitly synchronize for some reason, so you might want to try a technique that eliminates the need for synchronization in the first place. I would suggest Buffer Orphaning: call glBufferData (...) with NULL for the data pointer prior to sending data. This will allow commands that are currently using the UBO to continue using the original data store without forcing synchronization, since you will allocate a new data store before sending new data. When the earlier mentioned commands finish the original data store will be orphaned and the GL implementation will free it.
In newer OpenGL implementations you can use glInvalidateBuffer[Sub]Data (...) to hint the driver into doing what was discussed above. Likewise, you can use glMapBufferRange (...) with appropriate flags to control all of this behavior more explicitly. Unmapping will implicitly flush and synchronize access to a buffer object unless told otherwise, this might get your driver to do its job if you do not want to mess around with synchronization-free buffer update logic.
Most of what I mentioned is discussed in more detail here.

glBufferSubData between glDrawArrays calls mangling data

It seems like glBufferSubData is overwriting or somehow mangling data between my glDrawArrays calls. I'm working in Windows 7 64bit, with that latest drivers for my Nvidia GeForce GT520M CUDA 1GB.
I have 2 models, each with an animation. The models have 1 mesh, and that mesh is stored in the same VAO. They also have 1 animation each, and the bone transformations to be used for rendering the mesh is stored in the same VBO.
My workflow looks like this:
calculate bone transformation matrices for a model
load bone transformation matrices into opengl using glBufferSubData, then bind the buffer
render the models mesh using glDrawArrays
For one model, this works (at least, mostly - sometimes I get weird gaps in between the vertices).
However, for more than one model, it looks like bone transformation matrix data is getting mixed up between the rendering calls to the meshes.
Single Model Animated Windows
Two Models Animated Windows
I load my bone transformation data like so:
void Animation::bind()
{
glBindBuffer(GL_UNIFORM_BUFFER, bufferId_);
glBufferSubData(GL_UNIFORM_BUFFER, 0, currentTransforms_.size() * sizeof(glm::mat4), &currentTransforms_[0]);
bindPoint_ = openGlDevice_->bindBuffer( bufferId_ );
}
And I render my mesh like so:
void Mesh::render()
{
glBindVertexArray(vaoId_);
glDrawArrays(GL_TRIANGLES, 0, vertices_.size());
glBindVertexArray(0);
}
If I add a call to glFinish() after my call to render(), it works just fine! This seems to indicate to me that, for some reason, the transformation matrix data for one animation is 'bleeding' over to the next animation.
How could this happen? I am under the impression that if I called glBufferSubData while that buffer was in use (i.e. for a glDrawArrays for example), then it would block. Is this not the case?
It might be worth mentioning that this same code works just fine in Linux.
Note: Related to a previous post, which I deleted.
Mesh Loading Code:
void Mesh::load()
{
LOG_DEBUG( "loading mesh '" + name_ +"' into video memory." );
// create our vao
glGenVertexArrays(1, &vaoId_);
glBindVertexArray(vaoId_);
// create our vbos
glGenBuffers(5, &vboIds_[0]);
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[0]);
glBufferData(GL_ARRAY_BUFFER, vertices_.size() * sizeof(glm::vec3), &vertices_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(0);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[1]);
glBufferData(GL_ARRAY_BUFFER, textureCoordinates_.size() * sizeof(glm::vec2), &textureCoordinates_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(1);
glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[2]);
glBufferData(GL_ARRAY_BUFFER, normals_.size() * sizeof(glm::vec3), &normals_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(2);
glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, 0, 0);
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[3]);
glBufferData(GL_ARRAY_BUFFER, colors_.size() * sizeof(glm::vec4), &colors_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(3);
glVertexAttribPointer(3, 4, GL_FLOAT, GL_FALSE, 0, 0);
if (bones_.size() == 0)
{
bones_.resize( vertices_.size() );
for (auto& b : bones_)
{
b.weights = glm::vec4(0.25f);
}
}
glBindBuffer(GL_ARRAY_BUFFER, vboIds_[4]);
glBufferData(GL_ARRAY_BUFFER, bones_.size() * sizeof(VertexBoneData), &bones_[0], GL_STATIC_DRAW);
glEnableVertexAttribArray(4);
glVertexAttribIPointer(4, 4, GL_INT, sizeof(VertexBoneData), (const GLvoid*)0);
glEnableVertexAttribArray(5);
glVertexAttribPointer(5, 4, GL_FLOAT, GL_FALSE, sizeof(VertexBoneData), (const GLvoid*)(sizeof(glm::ivec4)));
glBindVertexArray(0);
}
Animation UBO Setup:
void Animation::setupAnimationUbo()
{
bufferId_ = openGlDevice_->createBufferObject(GL_UNIFORM_BUFFER, Constants::MAX_NUMBER_OF_BONES_PER_MESH * sizeof(glm::mat4), &currentTransforms_[0]);
}
where Constants::MAX_NUMBER_OF_BONES_PER_MESH is set to 100.
In OpenGlDevice:
GLuint OpenGlDevice::createBufferObject(GLenum target, glmd::uint32 totalSize, const void* dataPointer)
{
GLuint bufferId = 0;
glGenBuffers(1, &bufferId);
glBindBuffer(target, bufferId);
glBufferData(target, totalSize, dataPointer, GL_DYNAMIC_DRAW);
glBindBuffer(target, 0);
bufferIds_.push_back(bufferId);
return bufferId;
}
Those usage flags are mostly correct for this scenario, though you might consider trying GL_STREAM_DRAW.
Your driver appears to be failing to implicitly synchronize for some reason, so you might want to try a technique that eliminates the need for synchronization in the first place. I would suggest Buffer Orphaning: call glBufferData (...) with NULL for the data pointer prior to sending data. This will allow commands that are currently using the UBO to continue using the original data store without forcing synchronization, since you will allocate a new data store before sending new data. When the earlier mentioned commands finish the original data store will be orphaned and the GL implementation will free it.
In newer OpenGL implementations you can use glInvalidateBuffer[Sub]Data (...) to hint the driver into doing what was discussed above. Likewise, you can use glMapBufferRange (...) with appropriate flags to control all of this behavior more explicitly. Unmapping will implicitly flush and synchronize access to a buffer object unless told otherwise, this might get your driver to do its job if you do not want to mess around with synchronization-free buffer update logic.
Most of what I mentioned is discussed in more detail here.