OpenGL Triangle fan - opengl

I have a vertex buffer of points which are to be the center of each circle (triangle fan). how do I take these, lets say draw 10 triangles around this point, then move onto the next point? I haven't been able to find any example code.

You have to glEnd() then glBegin() if you're using the old pipeline.
If you're issuing draw calls, you can use the primitive restart function, where an index can be designated as beginning a new primitive. All credit to datenwolf for pointing that out, I've clearly blanked it from my memory.
Personally however I still think that you're better off just using indexed triangles, and then you can re-use whatever vertices you want, whenever you want. It's simpler and the driver/hardware will thank you for it.
(in other words, don't use GL_TRIANGLE_FAN - just use GL_TRIANGLES. It's all the hardware draws anyway).

You can do this using primitive restart index. You set a special index value (usually the largest number representable by the type used for indexing). Then whenever this index is encountered in the index buffer a new primitive is started, as if multiple calls to glDraw… had been issued.
http://www.opengl.org/sdk/docs/man3/xhtml/glPrimitiveRestartIndex.xml

Related

Is there a defined draw order for OpenGL instanced drawing? [duplicate]

Here's what I'm trying to do: I want to render a 2D scene, consisting of a number of objects (quads), using instancing. Objects with a lower y value (towards the bottom of the screen) need to be rendered in front of the ones with higher y values. And alpha blending also needs to work.
So my first idea was to use the Z value for depth, but I soon realized alpha blending will not work unless the objects are drawn in the right order. But I'm not issuing one call for each quad, but use a single instanced call to render the whole scene. Putting the instance data in the correct sorted order seems to work for me, but I doubt this is something I can rely on, since the GPU is supposed to run those computations in parallel as much as possible.
So the question is, is there a way to make this work? The best thing I can think of right now is to issue an instanced call for each separate y value (and issue those in order, back to front). Is there a better way to do this?
Instancing is best used for cases where each instance is medium-sized: hundreds or maybe thousands of triangles. Quads are not a good candidate for instancing.
Just build and render a sequence of triangles. There are even ways to efficiently get around the lack of a GL_QUADS primitive type in modern OpenGL.
Putting the instance data in the correct sorted order seems to work for me, but I doubt this is something I can rely on, since the GPU is supposed to run those computations in parallel as much as possible.
That's not how GPUs work.
When you issue a rendering command, what you (eventually) get is a sequence of primitives. Because the vertices that were given to that command are ordered (first to last), and the instances in that command are ordered, and even the draws within a single draw command are ordered, an order can be assigned to every primitive in the draw call with respect to every other primitive based on the order of vertices, instances, and draws.
This defines the primitive order for a drawing command. GPUs guarantee that blending (and logical operations and other visible post-fragment shader operations) will respect the primitive order of a rendering command and between rendering commands. That is, if you draw 2 triangles in a single call, and the first is behind the second (with depth testing turned off), then blending for the second triangle will respect the data written by the first.
Basically, if you give primitives to the GPU in an order, the GPU will respect that order with regard to blending and such.
So again, just build a ordered stream of triangles to represent your quads and render them.

Using OpenGL instancing for rendering 2D scene with object depths and alpha blending

Here's what I'm trying to do: I want to render a 2D scene, consisting of a number of objects (quads), using instancing. Objects with a lower y value (towards the bottom of the screen) need to be rendered in front of the ones with higher y values. And alpha blending also needs to work.
So my first idea was to use the Z value for depth, but I soon realized alpha blending will not work unless the objects are drawn in the right order. But I'm not issuing one call for each quad, but use a single instanced call to render the whole scene. Putting the instance data in the correct sorted order seems to work for me, but I doubt this is something I can rely on, since the GPU is supposed to run those computations in parallel as much as possible.
So the question is, is there a way to make this work? The best thing I can think of right now is to issue an instanced call for each separate y value (and issue those in order, back to front). Is there a better way to do this?
Instancing is best used for cases where each instance is medium-sized: hundreds or maybe thousands of triangles. Quads are not a good candidate for instancing.
Just build and render a sequence of triangles. There are even ways to efficiently get around the lack of a GL_QUADS primitive type in modern OpenGL.
Putting the instance data in the correct sorted order seems to work for me, but I doubt this is something I can rely on, since the GPU is supposed to run those computations in parallel as much as possible.
That's not how GPUs work.
When you issue a rendering command, what you (eventually) get is a sequence of primitives. Because the vertices that were given to that command are ordered (first to last), and the instances in that command are ordered, and even the draws within a single draw command are ordered, an order can be assigned to every primitive in the draw call with respect to every other primitive based on the order of vertices, instances, and draws.
This defines the primitive order for a drawing command. GPUs guarantee that blending (and logical operations and other visible post-fragment shader operations) will respect the primitive order of a rendering command and between rendering commands. That is, if you draw 2 triangles in a single call, and the first is behind the second (with depth testing turned off), then blending for the second triangle will respect the data written by the first.
Basically, if you give primitives to the GPU in an order, the GPU will respect that order with regard to blending and such.
So again, just build a ordered stream of triangles to represent your quads and render them.

OpenGL Separating Polygons Inside VBO

I am trying to use one VBO to draw polygons separated from each other. When i draw the polygons, OpenGL cannot know where to start a new polygon and draws the polygons united.
How can i put a break point on VBO (or IBO) to tell OpenGL to start a fresh polygon.
Sorry it seems a newbie question, i searched for questions and tutorials but couldn't find the answer.
Thank you for your comments:)
I am trying to create a dxf viewer. Dxf files has polygons with solid hatch so i thought that it would be faster to use GL_POLYON rather than tessellation (i might have been wrong here).
My drawing function looks at each object and makes a draw call for every primitive in file. It looks that this type of approach is slow. So i decided to use one vertex array and one buffer for every uniqe type.
For lines, there is no need to know the endings since every line must have 2 vertices. But polygons may have different number of vertices and i should tell opengl where a new polygon begins.
void draw_scene(/*function variables*/){
Shader::shader_use();
glEnableClientState(GL_VERTEX_ARRAY);
unsigned long int i;
unsigned long int count = objects.size();
for(i = 0; i < objects.size(); i++){
glBindVertexArray(objects[i]->vao);
glDrawArrays(objects[i]->primitive_type, 0, objects[i]->vertice_count);
glBindVertexArray(0);
}
glDisableClientState(GL_VERTEX_ARRAY);
Shader::shader_close();
}
The above code hasn't been optimised yet. I want to draw the objects with same primitive type in one call. Thought that if could put them in one vertex buffer then i would be able to draw them in one call. So there will be one VAO for each type (points, lines etc).
In fact i don't know if there might be a better solution. Because even if i can put those in one buffer, then it would be hard to modify primitive vertices since they all would have been put in one array. Modifying one vertice would cause all vertices of same type to be re-read.
I also searched to bind multiple vao's at once to not to make too many draw calls. But i couldn't find a method.
Thank you very much for your help:)
Part of this was already covered in the comments.
Replacement of GL_POLYGON
The GL_POLYGON primitive type is deprecated, and not available in the OpenGL Core Profile. Fortunately it can easily be replace with GL_TRIANGLE_FAN. GL_TRIANGLE_FAN uses the same vertex order, so you can directly replace GL_POLYGON with GL_TRIANGLE_FAN, without changing anything else.
Only convex polygons can directly be drawn with GL_TRIANGLE_FAN without using any more elaborate triangulation. But the same was true for GL_POLYGON, so that's not a new restriction.
Drawing multiple polygons with a single draw call
There are a couple of approaches:
You can use glMultiDrawArrays(). Where glDrawArrays() takes one start index and one size, glMultiDrawArrays() takes an array of each. So it can be used to replace an arbitrary number of glDrawArrays() calls with a single call.
You can use primitive restart. The downside of this is that you need an index array, which may not have been necessary otherwise. Apart from that, it's straightforward. You enable it with:
glPrimitiveRestartIndex(0xffff);
glEnable(GL_PRIMITIVE_RESTART);
You can use any index you want as the restart index, but it's common policy to use the maximum possible index, which is 0xffff if you use indices of type GL_UNSIGNED_SHORT. If you use at least OpenGL 4.3, or OpenGL ES 3.0, you can also replace the above with:
glEnable(GL_PRIMITIVE_RESTART_FIXED_INDEX);
Then you set up an index array where you insert a 0xffff value at every position you want to start a new polygon, and bind the index array as usual. Then you can draw all the polygons with a single glDrawElements() call.

Fastest way to upload streaming points, and removing occasionally

So i have a system (using OpenGL 4.x) where i am receiving a stream of points (potentially with color and/or normal), from an external source. And I need to draw these points as GL_POINTS, running custom switchable shaders for coloring (color could be procedurally generated, or come from vertex color or normal direction).
The stream consists of receiving a group of points (with or without normal or color) of an arbitrary count (typical from 1k to 70k points) at a fairly regular interval (4 to 10 hz), I need to add these points to my current points and draw all the points so far received points.
I am guaranteed that my vertex type will not change, I am told at the beginning of the streaming which to expect, so i am either using an interleaved vertex with: pos+normal+color, pos+normal, pos+color, or just pos.
My current solution is to allocate interleaved vertex VBOs (with surrounding VAOs) of the appropriate vertex type at a config file specified max vertex count (allocated with the DYNAMIC hint).
As new points come in i fill up my current non filled VBO via glBufferSubData. I keep a count (activePoints) of how many vertices the current frontier VBO has in it so far, and use glBufferSubData to fill in a range starting with activePoints, if my current update group has more vertices than can fit in my frontier buffer (since i limit the vertex count per VBO), then i allocate a new VBO and fill the range starting at 0 and ending with the number of points left in my update group (not added to the last buffer), if I still have points left I do this again and again. It is rare that an update group straddles more than 2 buffers.
When rendering i render all my VBOs (-1) with a glDrawArrays(m_DrawMode,0,numVertices), where numVertices is equal to max buffer allowed size, and my frontier buffer with a glDrawArrays(m_DrawMode,startElem,numElems) to account for it not being completely filled with valid vertices.
Of course at some point I will have more points than I can draw interactively, so i have an LRU mechanism that deallocates the oldest (according to the LRU alg) sets of VBOs as needed.
Is there a more optimal method for doing this? Buffer orphaning? Streaming hint? Map vs SubData? Something else?
The second issue is that i am now asked to removed points (at irregular intervals), ranging from 10 to 2000 at a time. But these points are irregularly spaced within the order I received them initially. I can find out what offsets in which buffers they currently exit in, but its more of a scattering than a range. I have been "removing them" by finding their offsets into the right buffers and one by one calling glBufferSubData with a range of 1 (its rare that they are beside each other in a buffer), and changing there pos to be somewhere far off where they will never be seen. Eventually i guess buffers should be deleted from these remove request adding up, but I don't currently do that.
What would be a better way to handle that?
Mapping may be more efficient than glBufferSubData, especially when having to "delete" points. Explicit flush may be of particular help. Also, mapping allows you to offload the filling of a buffer to another thread.
Be positively sure to get the access bits correct (or performance is abysmal), in particular do not map a region "read" if all you do is write.
Deleting points from a vertex buffer is not easily possible, as you probably know. For "few" points (e.g. 10 or 20) I would just set w = 0, which moves them to infinity and keep drawing the whole thing as before. If your clip plane is not at infinity, this will just discard them. With explicit flushing, you would not even need to keep a separate copy in memory.
For "many" points (e.g. 1,000), you may consider using glCopyBufferSubData to remove the "holes". Moving memory on the GPU is fast, and for thousands of points it's probably worth the trouble. You then need to maintain a count for every vertex buffer, so you draw fewer points after removing some.
To "delete" entire vertex buffers, you should just orphan them (and reuse). OpenGL will do the right thing on its own behalf then, and it's the most efficient way to keep drawing and reusing memory.
Using glDrawElements instead of glDrawArrays as suggested in Andon M. Coleman's comment is usually a good advice, but will not help you in this case. The reason why one would want to do that is that the post-transform cache works by tagging vertices by their index, so drawing elements takes advantage of the post-transform cache whereas drawing arrays does not. However, the post-transform cache is only useful on complex geometry such as triangle lists or triangle strips. You're drawing points, so you will not use the post-transform cache in any case -- but using indices increases memory bandwidth both on the GPU and on the PCIe bus.

Draw a bunch of elements generated by CUDA/OpenCL?

I'm new to graphics programming, and need to add on a rendering backend for a demo we're creating. I'm hoping you guys can point me in the right direction.
Short version: Is there any way to send OpenGL an array of data for distinct elements, without having to issue a draw command for each element distinctly?
Long version: We have a CUDA program (will eventually be OpenCL) which calculates a bunch of data for a bunch of objects for us. We then need to render these objects using, e.g., OpenGL.
The CUDA kernel can generate our vertices, and using OpenGL interop, it can shove these in an OpenGL VBO and not have to transfer the data back to host device memory. But the problem is we have a bunch (upwards of a million is our goal) distinct objects. It seems like our best bet here is allocating one VBO and putting every object's vertices into it. Then we can call glDrawArrays with offsets and lengths of each element inside that VBO.
However, each object may have a variable number of vertices (though the total vertices in the scene can be bounded.) I'd like to avoid having to transfer a list of start indices and lengths from CUDA -> CPU every frame, especially given that these draw commands are going right back to the GPU.
Is there any way to pack a buffer with data such that we can issue only one call to OpenGL to render the buffer, and it can render a number of distinct elements from that buffer?
(Hopefully I've also given enough info to avoid a XY problem here.)
One way would be to get away from understanding these as individual objects and making them a single large object drawn with a single draw call. The question is, what data is it that distinguishes the objects from each other, meaning what is it you change between the individual calls to glDrawArrays/glDrawElements?
If it is something simple, like a color, it would probably be easier to supply this an additional per-vertex attribute. This way you can render all objects as one single large object using a single draw call with the indiviudal sub-objects (which really only exist conceptually now) colored correctly. The memory cost of the additional attribute may be well worth it.
If it is something a little more complex (like a texture), you may still be able to index it using an additional per-vertex attribute, being either an index into a texture array (as texture arrays should be supported on CUDA/OpenCL-able hardware) or a texture coordinate into a particular subregion of a single large texture (a so-called texture atlas).
But if the difference between those objects is something more complex, as a different shader or something, you may really need to render individual objects and make individual draw calls. But you still don't need to neccessarily make a round-trip to the CPU. With the use of the ARB_draw_indirect extension (which is core since GL 4.0, I think, but may be supported on GL 3 hardware (and thus CUDA/CL-hardware), don't know) you can source the arguments to a glDrawArrays/glDrawElements call from an additional buffer (into which you can write with CUDA/CL like any other GL buffer). So you can assemble the offset-length-information of each individual object on the GPU and store them in a single buffer. Then you do your glDrawArraysIndirect loop offsetting into this single draw-indirect-buffer (with the offset between the individual objects now being constant).
But if the only reason for issuing multiple draw calls is that you want to render the objects as single GL_TRIANGLE_STRIPs or GL_TRIANGLE_FANs (or, god beware, GL_POLYGONs), you may want to reconsider just using a bunch of GL_TRIANGLES so that you can render all objects in a single draw call. The (maybe) time and memory savings from using triangle strips are likely to be outweight by the overhead of multiple draw calls, especially when rendering many small triangle strips. If you really want to use strips or fans, you may want to introduce degenerate triangles (by repeating vertices) to seprate them from each other, even when drawn with a single draw call. Or you may look into the glPrimitiveRestartIndex function introduced with GL 3.1.
Probably not optimal, but you could make a single glDrawArray on your whole buffer...
If you use GL_TRIANGLES, you can fill your buffer with zeroes, and write only the needed vertices in your kernel. This way "empty" regions of your buffer will be drawn as 0-area polygons ( = degenerate polygons -> not drawn at all )
If you use GL_TRIANGLE_STRIP, you can do the same, but you'll have to duplicate your first vertex in order to make a fake triangle between (0,0,0) and your mesh.
This can seem overkill, but :
- You'll have to be able to handle as many vertices anyway
- degenerate triangles use no fillrate, so they are almost free (the vertex shader is still computed, though)
A probably better solution would be to use glDrawElements instead : In you kernel, you also generate an index list for your whole buffer, which will be able to completely skip regions of your buffer.