glDrawArrays does not render entire point cloud - opengl

I'm trying to render huge point clouds (~150M) but OpenGL only renders part (~52M) of it. When rendering smaller datasets (<40M) everything works fine. I'm using single VBO. When using multiple VBOs, points get rendered but rendering is awfully slow, which is expected. My element has a size of 44bytes and GPU has 3GB of memory available. This should be enough for nearly ~70M points but I can render as much as 100M points with multiple VBOs. Is there any OpenGL specific limitation per VBO I'm not aware of?.
glBindBuffer(GL_ARRAY_BUFFER, vbo);
glBufferData(GL_ARRAY_BUFFER, cloud.size() * sizeof(Point), cloud.data(), GL_STATIC_DRAW);
// lot of other code
glDrawArrays(GL_POINTS, 0, cloud.size());

It looks like some part of your system uses 32-bit unsigned integers to store the size of the buffers, thus passing 148M*44bytes overflows and gets converted to about 54.9M or 50.4M depending on whether your megabytes are binary or decimal. I'd start with checking your OpenGL binding library to see that the prototypes it declares correctly use 64-bit types. If it does then the bug must be in the OpenGL drivers.
To transfer more than 4GB data to the buffer you may try using one of the other available functions: glBufferSubData and glBufferStorage, or memory-map the buffer with glMapBufferRange, which might workaround the limitation of 4GB.
Another thing to consider is to use one VAO but split the data between multiple buffers. Presumably your Point consists of different attributes, like position, color, etc... You can put each of them in a separate buffer and still use one VAO and one draw call. You can also optimize the types of the attributes you use (e.g. don't use floats where shorts or bytes would do) and the layout of the structure (check that there's no unnecessary padding between the fields).

I don't think memory is a problem in fact i'd say your program makes too many drawing calls . You should try with glDrawArraysInstanced() . For that you need to provide new position for every instance in the vertex shader... maybe that will solve your problem :D.
I'm sorry if i cannot provide you with details ,my skills in OpenGL got a bit dull , but i am planning on recovering asap :D .i hope it helps .

Related

What is the best way to draw multiple VAO Using the same shader but not having the same texture or colors

I'm wondering what would be the best thing to do if I want to draw
more than ~6000 different VAOs using the same shader.
At the moment I bind my shader then give it all information needed (uniform) then looping through each VAO to binding and draw them.
This code make my computer fall at ~ 200 fps instead of 3000 or 4000.
According to https://learnopengl.com/Advanced-OpenGL/Instancing, using glDrawElementsInstanced can allow me to handle a HUGE amount of same VAO but since I have ~6000 different VAO It seems like I can't use it.
Can someone confirm me this? What you guys would do to draw so many VAO and save many performance as you can?
Step 1: do not have 6,000 different VAOs.
You are undoubtedly treating each VAO as a separate mesh. Stop doing this. You should instead treat each VAO as a separate vertex format. That is, you only need a new VAO if you're passing different kinds of vertex data. The number of attributes and the format of each attributes constitute the format information.
Ideally, you only need between 4 and 10 separate sets of vertex formats. Given that you're using the same shader on multiple VAOs, you probably already have this understanding.
So, how do you use the same VAO for multiple meshes? Ideally, you would do this by putting all of the mesh data for a particular kind of mesh (ie: vertex format) in the same buffer object(s). You would select which data to retrieve for a particular rendering operation via tricks like the baseVertex parameter of glDrawElementsBaseVertex, or just by selecting which range of index data to draw from for a particular draw command. Other alternatives include the multi-draw family of rendering functions.
If you cannot put all of the data in the same buffers for some reason, then you should adopt the glVertexAttribFormat style of VAO usage. That way, you set your vertex format data with glVertexAttribFormat calls, and you can change the buffers as needed with glBindVertexBuffers without ever having to touch the vertex format itself. This is known to be faster than changing VAOs.
And to be honest, you should adopt glVertexAttribFormat anyway, because it's a much better API that isn't stupid like glVertexAttribPointer and its ilk.
glDrawElementsInstanced can allow me to handle a HUGE amount of same VAO but since I have ~6000 differents VAO It seems like I can't use it.
So what you should do is to combine your objects into the same VAO. Then use glMultiDrawArraysIndirect or glMultiDrawElementsIndirect to issue a draw of all the different objects from within the same VAO. This answer demonstrates how to do this.
In order to handle different textures you either build a texture atlas, pack the textures into a texture array, or use the GL_ARB_bindless_texture extensions if available.

Cost of large buffer switch vs. small buffer switch

I'm creating a tile-based renderer where each tile has a vertex model. However, from each vertex model only a small portion is rendered in one frame. These subsets change every frame.
What would be the fastest way to render this? I can think of the following options:
Make one draw call for every model. Every model is stored in full on the gpu. For every draw call, the full vbo is switched every time. Indices are then used to pick the appropriate small portion for the actual rendering.
Make one draw call with one vbo which gets assembled every frame by copying the necessary (small) subset of all the other vbos (the data is copied within vram).
Make one draw call with one vbo, but the vbo is recreated every frame with the (small) subset from CPU data using glBufferData.
Which do you think is fastest, or can you think of something faster?
One deciding factor is obviously if switching between larger VBOs is more expensive than switching between smaller VBOs.
It is a bad idea to make a lot of drawcalls. In OpenGL,you will be CPU bound by this method, so it is better to batch a lot of models.
Actually, I would go for this method. All static geometry is inside one and only one VBO and one VAO. It does not mean that you only have "one draw call". However, you should use glMultiDraw*Indirect.
The idea burried that is you have to use compute shaders to perform culling on GPU, and use something like GL_INDIRECT_PARAMETERS extensions with your multi indirect draw call.
Indirect Drawing
For all dynamic geometry, you can use a persistent buffer.
To answer your question about changing vao/vbo. Change VAO, or use glBindVertexBuffer should not make a big overhead.
But you should profile it, it can depends on your driver / hardware :)

Best way to convert OpenGL immediate mode rendering utility methods to using VBOs?

I've written for myself a small utility class containing useful methods for rendering lines, quads, cubes, etc. quickly and easily in OpenGL. Up until now, I've been using almost entirely immediate mode, so I could focus on learning other aspects of OpenGL. It seems prudent to switch over to using VBOs. However, I want to keep much of the same functionality I've been using, for instance my utility class. Is there a good method of converting these simple immediate mode calls to a versatile VBO system?
I am using LWJGL.
Having converted my own code from begin..end blocks and also taught others, this is what I recommend.
I'm assuming that your utility class is mostly static methods, draw a line from this point to that point.
First step is to have each individual drawing operation create a VBO for each attribute. Replace your glBegin..glEnd block with code that creates an array (actually a ByteBuffer) for each vertex attribute: coordinates, colors, tex coords, etc. After what used to be glEnd, copy the ByteBuffers to the VBOs with glBufferData. Then set up the attributes with chunks of glEnableClientState, glBindBuffer, glVertex|Color|whateverPointer calls. Call glDrawArrays to actually draw something, and finally restore client state and delete the VBOs.
Now, this is not good OpenGL code and is horribly inefficient and wasteful. But it should work, it's fairly straightforward to write, and you can change one method at a time.
And if you don't need to draw very much, well modern GPUs are so fast that maybe you won't care that it's inefficient.
Second step is to start re-using VBOs. Have your class create one VBO for each possible attribute at init time or first use. The drawing code still creates ByteBuffer data arrays and copies them over, but doesn't delete the VBOs.
Third step, if you want to move into OpenGL 4 and are using shaders, would be to replace glVertexPointer with glVertexAttribPointer(0, glColorPointer with glVertexAttribPointer(1, etc. You should also create a Vertex Array Object along with the VBOs at init time. (You'll still have to enable/disable attrib pointers individually depending on whether each draw operation needs colors, tex coords, etc.)
And the last step, which would require changes elsewhere to your program(s), would be to go for 3D "objects" rather than methods. Your utility class would no longer contain drawing methods. Instead you create a line, quad, or cube object and draw that. Each of these objects would (probably) have its own VBOs. This is more work, but really pays off in the common case when a lot of your 3D geometry doesn't change from frame to frame. But again, you can start with the more "wasteful" approach of replacing each method call to draw a line from P1 to P2 with something like l = new Line3D(P1, P2) ; l.draw().
Hope this helps.

Draw a bunch of elements generated by CUDA/OpenCL?

I'm new to graphics programming, and need to add on a rendering backend for a demo we're creating. I'm hoping you guys can point me in the right direction.
Short version: Is there any way to send OpenGL an array of data for distinct elements, without having to issue a draw command for each element distinctly?
Long version: We have a CUDA program (will eventually be OpenCL) which calculates a bunch of data for a bunch of objects for us. We then need to render these objects using, e.g., OpenGL.
The CUDA kernel can generate our vertices, and using OpenGL interop, it can shove these in an OpenGL VBO and not have to transfer the data back to host device memory. But the problem is we have a bunch (upwards of a million is our goal) distinct objects. It seems like our best bet here is allocating one VBO and putting every object's vertices into it. Then we can call glDrawArrays with offsets and lengths of each element inside that VBO.
However, each object may have a variable number of vertices (though the total vertices in the scene can be bounded.) I'd like to avoid having to transfer a list of start indices and lengths from CUDA -> CPU every frame, especially given that these draw commands are going right back to the GPU.
Is there any way to pack a buffer with data such that we can issue only one call to OpenGL to render the buffer, and it can render a number of distinct elements from that buffer?
(Hopefully I've also given enough info to avoid a XY problem here.)
One way would be to get away from understanding these as individual objects and making them a single large object drawn with a single draw call. The question is, what data is it that distinguishes the objects from each other, meaning what is it you change between the individual calls to glDrawArrays/glDrawElements?
If it is something simple, like a color, it would probably be easier to supply this an additional per-vertex attribute. This way you can render all objects as one single large object using a single draw call with the indiviudal sub-objects (which really only exist conceptually now) colored correctly. The memory cost of the additional attribute may be well worth it.
If it is something a little more complex (like a texture), you may still be able to index it using an additional per-vertex attribute, being either an index into a texture array (as texture arrays should be supported on CUDA/OpenCL-able hardware) or a texture coordinate into a particular subregion of a single large texture (a so-called texture atlas).
But if the difference between those objects is something more complex, as a different shader or something, you may really need to render individual objects and make individual draw calls. But you still don't need to neccessarily make a round-trip to the CPU. With the use of the ARB_draw_indirect extension (which is core since GL 4.0, I think, but may be supported on GL 3 hardware (and thus CUDA/CL-hardware), don't know) you can source the arguments to a glDrawArrays/glDrawElements call from an additional buffer (into which you can write with CUDA/CL like any other GL buffer). So you can assemble the offset-length-information of each individual object on the GPU and store them in a single buffer. Then you do your glDrawArraysIndirect loop offsetting into this single draw-indirect-buffer (with the offset between the individual objects now being constant).
But if the only reason for issuing multiple draw calls is that you want to render the objects as single GL_TRIANGLE_STRIPs or GL_TRIANGLE_FANs (or, god beware, GL_POLYGONs), you may want to reconsider just using a bunch of GL_TRIANGLES so that you can render all objects in a single draw call. The (maybe) time and memory savings from using triangle strips are likely to be outweight by the overhead of multiple draw calls, especially when rendering many small triangle strips. If you really want to use strips or fans, you may want to introduce degenerate triangles (by repeating vertices) to seprate them from each other, even when drawn with a single draw call. Or you may look into the glPrimitiveRestartIndex function introduced with GL 3.1.
Probably not optimal, but you could make a single glDrawArray on your whole buffer...
If you use GL_TRIANGLES, you can fill your buffer with zeroes, and write only the needed vertices in your kernel. This way "empty" regions of your buffer will be drawn as 0-area polygons ( = degenerate polygons -> not drawn at all )
If you use GL_TRIANGLE_STRIP, you can do the same, but you'll have to duplicate your first vertex in order to make a fake triangle between (0,0,0) and your mesh.
This can seem overkill, but :
- You'll have to be able to handle as many vertices anyway
- degenerate triangles use no fillrate, so they are almost free (the vertex shader is still computed, though)
A probably better solution would be to use glDrawElements instead : In you kernel, you also generate an index list for your whole buffer, which will be able to completely skip regions of your buffer.

OpenGL vertex array pointers, different buffers per component

A bit of context :
I'm working on a GPU emulator (the NV2A if you want to know) at the push-buffer level, and I'm trying to implement the drawing using OpenGL. The GPU commands that I have to emulate contain separate pointers for each vertex component (so positions are in an entirely different memory address than fog coordinates, colors, texture coordinates, etc.)
Other data, like vertex component size, type and stride are also present in the push-buffer, but those are not really relevant to this question.
I've been reading about Vertex Array Objects, but as far as my tests go, the pointers you can set with glVertexAttribPointer should all be relative to a Vertex Buffer Object - something I would like to avoid, as I've already got a copy of the data in memory.
The question :
Is it possible in OpenGL to draw vertices using separate pointers (not managed by any OpenGL API) per vertex component? And how would the code look like, roughy?
PS: Since I'm emulating a GPU, I have to take vertex shader programs into account too. I haven't worked on these yet, so any suggestion on that is welcome too. TIA!
You don't need to use VBOs, glVertexAttribPointer takes a normal CPU-pointer if no VBO is bound (you can call glBindBuffer(GL_ARRAY_BUFFER, 0) to make sure). And yes, you can set up one address per attribute stream.