What the aim is:
I'm relatively new to threading. I've been trying to make Quad-Tree rendered terrain which will render fast and efficiently. The amount of terrain which is currently rendered would lag the user majorly if it was all at the maximum detail. This is why I've used a QuadTree to render it. The engine also supports Input and Physics therefore I decided to use a rendering thread. This has caused lots of problems.
The problem('s):
When i wasn't threading there was a bit of lag due to the other systems in the Engine. The main one that caused the lag is the loading and deletion of terrain in the QuadTree (I'm not even sure if this is the optimal way to do it.) Now, Rendering happens very fast and it doesn't seem to lag. When the camera is standing still the game runs fine. I left the game running for an hour and no crashes were found.
When terrain is loaded it uses several of the variables the rendering code uses. Namely, binding the buffers -
glBindBuffer(GL_ARRAY_BUFFER, vertexbuffer);
glBufferData(GL_ARRAY_BUFFER, vertices.size() * sizeof(glm::vec3), &vertices[0], GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, normalbuffer);
glBufferData(GL_ARRAY_BUFFER, normals.size() * sizeof(glm::vec3), &normals[0], GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, uvbuffer);
glBufferData(GL_ARRAY_BUFFER, uvs.size() * sizeof(glm::vec2), &uvs[0], GL_STATIC_DRAW);
This variable I believe is being accessed at the same time as the other thread. Which causes a crash. How does one fix this? I have tried using Mutexes but this doesn't seem to work. Where would I lock and unlock the mutex to fix this?
Another variable that seem to cause the same error are "IsLeaf".
Another crash (std::badAlloc) is caused after loading a lot of terrain. Even though it's beeing cleaned up. I assume this is due to my deletion code but I don't know whats wrong.
The way I currently add and delete the tiles is I check the range from the camera and delete/create the tile. I want to render the tile i'm on and the ones around it. However, this doesn't work when transitioning from one of the 4 main tiles. Creating by using the range doesn't work as it's the range to the center of the big tile rather than the smaller ones. I've also tried deleting the whole map every few seconds and this seems to work too but with more lag. Is there a better way to do the creation and destruction?
Between different resolutions there are gaps. Is there anyway to reduce these? Currently i render the tiles a little larger than they need to be but this doesn't help on major resolution changes.
If you have any idea how to fix one of these errors it'd be much appreciated.
The code (Too much to upload here)
http://pastebin.com/MwXaymG0
http://pastebin.com/2tRbqtEB
An OpenGL context can only be bound to one thread at a time (through wglMakeCurrent() on Windows).
Therefore you should not being using gl* functions across threads, even if you use Mutexes to secure access to certain variables in memory the calls will fail.
What I would suggest is to move your gl* calls into your rendering thread, however, have things such as terrain loading, frustum calculations, clipping etc in your other thread. The rendering thread just needs to check whether an object has new data and then perform the appropriate GL calls as part of it's update/render.
Related
I have a working implementation of this technique for view frustum culling of instanced geometry. The gist of the technique is that we use a vertex shader to check if the bounds of an object lie within the view frustum, and if they do we output the position of that object, using a transform feedback buffer and a geometry shader, to a texture. We can then, during an actual rendering pass, use that texture, along with a query of how many positions we emitted, to acquire the relevant position data for the object we're rendering, and number of draws to specify in our call to glDrawElementsInstanced. One difference between what I do, and what the article does, is that I emit a full transformation matrix, rather than a simple position vector, to the texture, but I doubt that has any bearing on my problem.
The actual problem: Currently I have this setup so that, for each object type being rendered (i.e. tree, box, rock, whatever), the actual rendering pass follows immediately upon the frustum cull rendering pass. This works, and gives the intended results. What I want to do instead, however, is to go over all my drawcommands and do all the frustum culling for the various objects first, and only thereafter do all the actual rendering, to avoid a bunch of unnecessary state changes (i.e. switching back and forth between shader programs). When I do this, however, I encounter the problem that previously established textures -- the ones I use for reading positions from during the actual rendering passes -- all seem to be overwritten by the latest call to the frustum culling function, meaning that all textures established seemingly contain only the position information from the last frustum cull call.
For example: I render, in order, 4 trees, 10 boxes and 3 rocks, and what I will see instead is a tree, a box, and a rock, at all the (three) positions where I would expect only the 3 rocks to be. I cannot for the life of me figure out why this is, because I quite clearly bind new buffers and textures to the TRANSFORM_FEEDBACK_BUFFER every time I call the function. Why are the previously used textures still receiving the new data from the latest call?
Code, in C, for the frustum culling function:
void fcullidraw(drawcommand *tar) {
/* printf("Fculling %s\n", tar->res->name); */
mesh *rmesh = &tar->res->amod->meshes[0];
/* glDeleteTextures(1, &rmesh->ctex); */
if(rmesh->ctbuf == 0)
glGenBuffers(1, &rmesh->ctbuf);
glBindBuffer(GL_TEXTURE_BUFFER, rmesh->ctbuf);
glBufferData(GL_TEXTURE_BUFFER, sizeof(instancedata) * tar->nodraws, NULL, GL_DYNAMIC_COPY);
if(rmesh->ctex == 0)
glGenTextures(1, &rmesh->ctex);
glBindTexture(GL_TEXTURE_BUFFER, rmesh->ctex);
glTexBuffer(GL_TEXTURE_BUFFER, GL_RGBA32F, rmesh->ctbuf);
if(rmesh->cquery == 0)
glGenQueries(1, &rmesh->cquery);
checkactiveshader(tar->tar, findshader("icull"));
glEnable(GL_RASTERIZER_DISCARD);
glUniform1f(activeshader->radius, tar->res->amesh->bbox.radius);
glUniform3fv(activeshader->extent, 1, (const GLfloat*)&tar->res->amesh->bbox.ext);
glUniform3fv(activeshader->cp, 1, (const GLfloat*)&tar->res->amesh->bbox.cp);
glBindVertexArray(tar->res->amod->meshes[0].vao);
glBindBuffer(GL_ARRAY_BUFFER, tar->res->amod->meshes[0].posarray);
glBufferData(GL_ARRAY_BUFFER, sizeof(mat4_t) * tar->nodraws, tar->posarray, GL_DYNAMIC_DRAW);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, rmesh->ctbuf);
glBeginTransformFeedback(GL_POINTS);
glBeginQuery(GL_PRIMITIVES_GENERATED, rmesh->cquery);
glDrawArrays(GL_POINTS, 0, tar->nodraws);
glEndQuery(GL_PRIMITIVES_GENERATED);
glEndTransformFeedback();
glDisable(GL_RASTERIZER_DISCARD);
glGetQueryObjectuiv(rmesh->cquery, GL_QUERY_RESULT, &rmesh->visibleinstances);
}
tar and rmesh obviously vary between each call to this function. Do note that I have left in a few lines of comments here containing code to delete the buffers and textures between each rendering cycle, rather than simply overwriting them, but using that code instead has no effect on the error mode.
I'm stumped. I feel that the textures and buffers are well defined and clearly kept separate, so I do not understand how the textures from previous calls to fcullidraw are somehow still bound to and being overwritten by the TransformFeedback, if that is indeed what is happening, and it certainly seems to be, because the earlier objects will read in the entire transformation matrix of the rock quite neatly, with the "right" rotation, translation, and everything.
The article linked does do the operations in the order I want to do them -- i.e. first repeated frustum culls, and then repeated rendering -- and I'm not sure I see what I do differently. Might be some small and obvious thing, and I might be an idiot, but in that case I'd love to know why and how I am that.
EDIT: I pushed on and updated my implementation with a refinement of the original technique, suggested here, which gets rid of the writing-to-texture method altogether, in favor of instead simply writing to a buffer bound to the VAO, and set to update once per rendered instance with a VertexAttribDivisor. This method looks at lot cleaner on the whole, and incidentally had the additional side effect of not having my original problem at all, as I'm no longer writing to and uploading textures. This is, thus, no longer a practical problem for me, but the answer to the theoretical question does still elude me, so if anyone has ideas I'm still all ears.
I'm trying to render huge point clouds (~150M) but OpenGL only renders part (~52M) of it. When rendering smaller datasets (<40M) everything works fine. I'm using single VBO. When using multiple VBOs, points get rendered but rendering is awfully slow, which is expected. My element has a size of 44bytes and GPU has 3GB of memory available. This should be enough for nearly ~70M points but I can render as much as 100M points with multiple VBOs. Is there any OpenGL specific limitation per VBO I'm not aware of?.
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, cloud.size() * sizeof(Point), cloud.data(), GL_STATIC_DRAW);
// lot of other code
glDrawArrays(GL_POINTS, 0, cloud.size());
It looks like some part of your system uses 32-bit unsigned integers to store the size of the buffers, thus passing 148M*44bytes overflows and gets converted to about 54.9M or 50.4M depending on whether your megabytes are binary or decimal. I'd start with checking your OpenGL binding library to see that the prototypes it declares correctly use 64-bit types. If it does then the bug must be in the OpenGL drivers.
To transfer more than 4GB data to the buffer you may try using one of the other available functions: glBufferSubData and glBufferStorage, or memory-map the buffer with glMapBufferRange, which might workaround the limitation of 4GB.
Another thing to consider is to use one VAO but split the data between multiple buffers. Presumably your Point consists of different attributes, like position, color, etc... You can put each of them in a separate buffer and still use one VAO and one draw call. You can also optimize the types of the attributes you use (e.g. don't use floats where shorts or bytes would do) and the layout of the structure (check that there's no unnecessary padding between the fields).
I don't think memory is a problem in fact i'd say your program makes too many drawing calls . You should try with glDrawArraysInstanced() . For that you need to provide new position for every instance in the vertex shader... maybe that will solve your problem :D.
I'm sorry if i cannot provide you with details ,my skills in OpenGL got a bit dull , but i am planning on recovering asap :D .i hope it helps .
I'm creating a tile-based renderer where each tile has a vertex model. However, from each vertex model only a small portion is rendered in one frame. These subsets change every frame.
What would be the fastest way to render this? I can think of the following options:
Make one draw call for every model. Every model is stored in full on the gpu. For every draw call, the full vbo is switched every time. Indices are then used to pick the appropriate small portion for the actual rendering.
Make one draw call with one vbo which gets assembled every frame by copying the necessary (small) subset of all the other vbos (the data is copied within vram).
Make one draw call with one vbo, but the vbo is recreated every frame with the (small) subset from CPU data using glBufferData.
Which do you think is fastest, or can you think of something faster?
One deciding factor is obviously if switching between larger VBOs is more expensive than switching between smaller VBOs.
It is a bad idea to make a lot of drawcalls. In OpenGL,you will be CPU bound by this method, so it is better to batch a lot of models.
Actually, I would go for this method. All static geometry is inside one and only one VBO and one VAO. It does not mean that you only have "one draw call". However, you should use glMultiDraw*Indirect.
The idea burried that is you have to use compute shaders to perform culling on GPU, and use something like GL_INDIRECT_PARAMETERS extensions with your multi indirect draw call.
Indirect Drawing
For all dynamic geometry, you can use a persistent buffer.
To answer your question about changing vao/vbo. Change VAO, or use glBindVertexBuffer should not make a big overhead.
But you should profile it, it can depends on your driver / hardware :)
I have a really obscure problem that I hope somebody can help with. I have implemented vertex skinning on the CPU (the GPU was too slow because the bad performance of looking up a bone transform in a vertex shader) using a background thread on OSX. I do not need a shared context because no GL calls are made. I allocate a buffer in the process heap big enough to hold my character's vertices. I skin and animate that buffer on the background thread. In the main thread I simply glBufferSubData() that buffer down to my VBO, synchronizing with the end of the buffer update so I don't get tearing in my verts. The VBO has previously been bound to a VAO (one VAO per VBO per character instance). So I only have to bind the VAO and draw my mesh. Not very difficult so far. A single IBO is bound to all VAO's per character instance.
Here's the rub. If I only have one character instance the code works perfectly. I have a lovely little warrior princess doing her idle animation. The moment I add a second instance nothing happens--only the first instance renders.
So the big question is what am I doing wrong? I'm pretty sure my VBO, IBO and VAOs are correct. There's a separate VBO and VAO per instance. I turned off indexing (no IBO) and still the instances fail to draw so it's not that IMHO. I've verified the state using the Mac OpenGL Profiler and everything looks good per instance. Is there some kind of weird flushing not going on due to my glBufferSubData call. glMapBuffer was just too slow!
If you need source code to look at I can upload that easily enough. Just wondering if anyone's heard of weirdness like what I'm seeing when dealing with buffer objects in OpenGL on the Mac.
I have written a simple graphic engine using OpenGL and GLSL. Until here, when I needed to create a new mesh scene node I created a VAO, a VBO and an IBO for each mesh. I loaded my vertex attributes for each mesh this way:
glBufferData(GL_ARRAY_BUFFER, this->GetVerticesByteSize(VERTEX_POSITION)
+ this->GetVerticesByteSize(VERTEX_TEXTURE) + this->GetVerticesByteSize(VERTEX_NORMAL), NULL, this->m_Usage);
glBufferSubData(GL_ARRAY_BUFFER, 0, this->GetVerticesByteSize(VERTEX_POSITION), &this->m_VertexBuffer[VERTEX_POSITION][0]);
if (!this->m_VertexBuffer[VERTEX_TEXTURE].empty())
glBufferSubData(GL_ARRAY_BUFFER, this->GetVerticesByteSize(VERTEX_POSITION),
this->GetVerticesByteSize(VERTEX_TEXTURE), &this->m_VertexBuffer[VERTEX_TEXTURE][0]);
if (!this->m_VertexBuffer[VERTEX_NORMAL].empty())
glBufferSubData(GL_ARRAY_BUFFER, this->GetVerticesByteSize(
VERTEX_POSITION) + this->GetVerticesByteSize(VERTEX_TEXTURE),
this->GetVerticesByteSize(VERTEX_NORMAL), &this->m_VertexBuffer[VERTEX_NORMAL][0]);
But if the scene is composed by a lot of meshes it's not correct for the performance (too many state changes). So, I decided to create a unique VAO, VBO and IBO (singleton classes) for all the geometry of my scene.
The way to do this is the following :
Load all the vertex attributes in a specific class (let's call it 'VertexAttributes') for each mesh. After all the meshes are loaded we can allocate the big vertex buffer in a unique VBO. So like above I call in first the function 'glBufferData' to allocate the whole memory with the size of all the vertex attributes in my scene and after that call the function 'glBufferSubData' for each kind of vertex attribute in a loop.
But is it possible to call glBufferData several times (for each mesh) and fill the VBO during the scene loading (step by step for each mesh). So it looks like a realloc. Is it possible to do this with OpenGL or my first method is the good one ?
But is it possible to call glBufferData several times (for each mesh)
and fill the VBO during the scene loading (step by step for each
mesh). So it looks like a realloc. Is it possible to do this with
OpenGL or my first method is the good one ?
No. Whenever you call glBufferData, a new data storage (with the new size is created, and the previous contents are lost.
However, combining multiple objects in the same VBO is still a valid strategy in many cases, especially if many of those objects are likely to be drawn together.
You cannot dynamically resizes buffer objects. What you can do is pre-allocate bigger buffers and update parts of it. Having a bunch of reasonably-sized buffers available and dynamically fill can be a viable strategy. Note that there is also GL_ARB_copy_buffer (in core since GL 3.1, so widely available) which will allow you quite efficient server-side copies, and you can even emulate the "realloc" behavior by allocating a new buffer and copying the old contents over.
Which strategy is best will always depend on the situation. If you often load or destroy objects dynamically, using some complex buffer allocation and management strategies might pay off.