I have a draw buffer and a transform feedback buffer of same length (say, 1000 vertices), but the draw buffer is not contiguous - for example the data I'm interested in is in indexes 0-100 and 900-1000. Now I'd rather not process an extra 800 vertices or make two draw calls, so I use glMultiDraw* to batch the two ranges together. I have yet to find documentation that says if transform feedback will then be similarly populated (with data in indices 0-100 and 900-1000), condensed into a contiguous section (0-100, 101-201), or something else entirely. Does anyone know what happens, or where this behaviour is specified in documentation?
Transform feedback stores primitives. For each primitive that you render in an glBeginTransformFeedback/glEndTransformFeedback block, it will write each vertex in it to the bound feedback buffer in sequential order. It has no concept of indices, and primitives generated from more advanced draw modes (GL_LINE_STRIP, GL_TRIANGLE_STRIP, etc.) are split up into the most basic primitive types: GL_POINT, GL_LINE, and GL_TRIANGLE.
More reading: https://www.opengl.org/wiki/Transform_Feedback
I have a grayscale texture (8000*8000) , the value of each pixel is an ID (actually, this ID is the ID of triangle to which the fragment belongs, I want to using this method to calculate how many triangles and which triangles are visible in my scene).
now I need to count how many unique IDs there are and what are them. I want to implement this with GLSL and minimize the data transfer between GPU RAM and RAM.
The initial idea I come up with is to use a shader storage buffer, bind it to an array in GLSL, its size is totalTriangleNum, then iterate through the ID texture in shader, increase the array element by 1 that have index equal to ID in texture.
After that, read the buffer to OpenGL application and get what I want. Is this a efficient way to do so? Or are there some better solutions like compute-shader (well I'm not familiar with it) or something else.
I want to using this method to calculate how many triangles and which triangles are visible in my scene)
Given your description of your data let me rephrase that a bit:
You want to determine how many distinct values there are in your dataset, and how often each value appears.
This is commonly known as a Histogram. Unfortunately (for you) generating histograms are among the problems not that trivially solved on GPUs. Essentially you have to divide down your image into smaller and smaller subimages (BSP, quadtree, etc.) until divided down to single pixels on which you perform the evaluation. Then you backtrack propagating up the sub-histograms, essentially performing an insertion or merge sort on the histogram.
Generating histograms with GPUs is still actively researched, so I suggest you read up on the published academic works (usually accompanied with source code). Keywords: Histogram, GPU
This one is a nice paper done by the AMD GPU researchers: https://developer.amd.com/wordpress/media/2012/10/GPUHistogramGeneration_preprint.pdf
I am looking for a sort method to optimize the rendering of a scene (regardless the number of meshes and their sizes) minimizing the states changes and maximizing the geometry gathering to have the fewest call of glDraw* functions.
In my proposal, the most judicious choice to render a specific batch is the function glMultiDrawElements because it takes an array of indices (one for each model contained in the batch) to render a batch. Plus, it should be perfect for the space partitioning (if a mesh is not visible in the frustum of the camera, a boolean flag m_IsVisible will be turned to false. In this case, the indices of this specific geometry will not be included and thus will not be rendered).
So, here is my proposition I want to share with someone interested by the subject. For a sake of lisibility I took a few time to show you my demonstration properly on a WORD project):
Here's the criteria I need to sort my geometry during the initialization (there are ordered by pertinence). They could be gathered in an enum:
enum E__SortingCriteria
GEOMETRY_CRITERION (geometry storage -> VBOs)
As you can see, the state changes and the geometry gathering seems to be optimized. However, I am concerned that something may not be correct or that there is room for improvement.
I read about octrees and I didn't fully understand how they world work/be implemented in a voxel world where the octree's purpose is to lower the amount of voxels you would render by connecting repeating voxels to one big "voxel".
Here are the questions I want clarification about:
What type of data structure would you use? How could turn a 3-D array of voxels into and array that has different sized voxels that take multiple locations in the array?
What are the nodes and what are they used for?
Does the octree connect the voxels so there are ONLY square shapes or could it be a rectangle or a L shape or an entire Y column of voxels or what?
Do the octrees really improve performance of a voxel game? If so usually by how much?
Quick answers:
A tree:Each node has 8 children, top-back-left, top-back-right, etc. down to a certain levelThe code for this can get quite complex, especially if the voxels can change at runtime.
The type of voxel (colour, material, a list of items)
yep. Cubes onlyMore specifically 1x1, 2x2, 4x4, 8x8 etc. It must be an entire node.If you really want to you could define some sort of patterns, but its no longer a octdtree.
yeah, but it depends on your data. Imagine describing 256 identical blocks individually, or describing it once (like air in Minecraft)
I'd start with trying to understand quadtrees first. You can do that on paper, or make a test program with it. You'll answer these questions yourself if you experiment
An octree done correctly can also help you with neighbour searches which enable you to determine if a face is considered to be "visible" (ie so you end up with a hull of voxels visible). Once you've established your octree you then use this to store your XYZ coords which you then extract into a single array. You then feed this array into your VERTEX Buffer (GL solutions require this) which you can then render in chunk forms as needed (as the camera moves forward etc).
Octree's also by there very nature collapse Cubes into bigger ones if there are ones of the same type... much like Tetris does when you have colors/shapes that "fit" one another.. this in turn can reduce your vertex count and at render you're really drawing a combination of squares and rectangles
If done correctly you will end up with a lot of chunks that only have the outfacing "faces" visible in the vertex buffers. Now you then have to also build your own Occlusion Culling algorithm which then reduces the visibility ontop of this resulting in less rendering required.
I did an example here:
notice how the outside is only being rendered but the chunks themselves go all the way down to the bottom even though the chunks depth faces should cancel each other out? (needs more optimisation). Also note how the camera turns around and the faces are removed from the rendering buffers?
I have a process that accumulates mostly static data over time--and a lot of it, millions of data elements. It is possible that small parts of the data may change occasionally, but mostly, it doesn't change.
However, I want to allow the user the freedom to change how this data is viewed, both in shape and color.
Is there a way that I could store the data on the GPU just as data. Then have a number of ways to convert that data to something renderable on the GPU. The user could then choose between those algorithms and we swap it in efficiently without having to touch the data at all. Also, color ids would be in the data, but the user could change what color each id should match to, again, without touching the data.
So, for example, maybe there are the following data:
[1000, 602, 1, 1]
[1003, 602.5, 2, 2]
NOTE: the data is NOT vertices, but rather may require some computation or lookup to be converted to vertices.
The user can choose between visualization algorithms. Let's say one would display 2 cubes each at (0, 602, 0) and (3, 602.5, 100). The user chooses that color id 1 = blue and 2 = green. So the origin cube is shown as blue and the other as green.
Then without any modification to the data at all, the user chooses a different visualization and now a spheres are shown at (10, 602, 10) and (13, 602.5, 20) and the colors are different because the user changed the color mapping.
Yet another visualization might show lines between all the data elements, or a rectangle for each set of 4, etc.
Is the above description something that can be done in a straightforward way? How would it best be done?
Note that we would be adding new data, appending to the end, a lot. Bursts of thousands per second are likely. Modifications of existing data would be more rare and taking a performance hit for those cases is acceptable. User changing algorithm and color mapping would be relatively rare.
I'd prefer to do this using a cross platform API (across OS and GPU's), so I'm assuming OpenGL.
You can store your data in a VBO (in GPU memory) and update it when it changes.
On the GPU side, you can use a geometry shader to generate more geometry. Not sure how to switch from line to cube to sphere, but if you are drawing a triangle at each location, your GS can output "extra" triangles (ditto for lines and points).
As for the color change feature, you can bake that logic into the vertex shader. The idx (1, 2, ...) should be a vertex attribute; have the VS lookup a table giving idx -> color mappings (this could be stored as a small texture). You can update the texture to change the color mapping on the fly.
For applications like yours there are special GPGPU programming infrastructures: CUDA and OpenCL. OpenCL is the cross vendor system. CUDA is cross plattform, but supports only NVidia GPUs. Also OpenGL did introduce general purpose compute functionality in OpenGL-4.2 (compute shaders).
and a lot of it, millions of data elements
Millions is not a very lot, even if a single element consumed 100 bytes, that would be then only 100 MiB to transfert. Modern GPUs can transfer about 10 GiB/s from/to host system memory.
Is the above description something that can be done in a straightforward way? How would it best be done?
Yes it can be done. However only if you can parallelize your problem and make it's memory access pattern cater to what GPUs prefer you'll really see performance. Especially bad memory access patterns can cause several orders of magnitude performance loss.
I know that glVertexAttribDivisor can be used to modify the rate at which generic vertex attributes advance during instanced rendering, but I was wondering if there was any way to advance attributes at a specific rate WITHOUT instancing.
Here is an example of what I mean:
Let's say you are defining a list of vertex positions that make up a series of lines, and with each line you wish to associate an ID. So, you create two vbos that each house the data related to one of those attributes (either all the vertex positions or all the vertex IDs). Traditionally, this means each vbo must be the size (in elements) of the number of lines X 2 (as each point contains two lines). This of course means I am duplicating the same ID value for each point in a line.
What I would like to do instead is specify that the IDs advance 1 element for every 2 elements the vertex position buffer advances. I know this requires that my vertex position buffer is declared first (so that I may reference it to tell OpenGL how often to advance the ID buffer) but it still seems like it would be possible. However, I cannot find any functions in the OpenGL specification that allow such a maneuver.
What you want is not generally possible in OpenGL. It's effectively a restrained form of multi-indexed rendering, so you'll have to use one of those techniques to get it.