How do I update part of a 3D texture efficiently? - opengl

I am writing a voxel ray caster that implements a form of an N^3 tree. The voxels are stored in a large chunk pool of size 1024^3 and is segmented into 32^3 chunks. The chunk pool needs to be updated every frame either once or multiple times.
I have all the rendering working on sample data of a 3D texture I generated with a compute shader. I just can’t seem to figure out how to update N^3 chunks of a 3D texture.
I have come across glMapBufferRange and glTexSubImage3D. How do I efficiently stream portions of a 3D texture to the GPU?

Related

Voxel rendering: pass data from CPU to GPU?

I'm trying to make a voxel-style game, like Voxatron but I think that my method of rendering is not efficient. The idea is to set the position and color of each voxel and send this data to the GPU each frame. The problem is that if I have a big square of 50x50x50 voxels, I have 125000 * 4 floats * 3 ints to send and it's clearly overloading the CPU-GPU bus (because if I only send the data once for all, I get much more FPS with the same scene). I could do some spatial optimization but my requirement is that each voxel can change arbitrarily each frame so I think it's not worth calculating.
The rendering itself uses the instancied draw of OpenGL so I only draw the voxels set. How can I do that efficiently (I repeat, don't talk me about chunk or whatever, I want the voxels to change arbitrarily each frame anyway). Maybe should I drop OpenGL and do everything in software (like Voxatron)?

How to do particle binning in OpenGL?

Currently I'm creating a particle system and I would like to transfer most of the work to the GPU using OpenGL, for gaining experience and performance reasons. At the moment, there are multiple particles scattered through the space (these are currently still created on the CPU). I would more or less like to create a histogram of them. If I understand correctly, for this I would first translate all the particles from world coordinates to screen coordinates in a vertex shader. However, now I want to do the following:
So, for each pixel a hit count of how many particles are inside. Each particle will also have several properties (e.g. a colour) and I would like to sum them for every pixel (as shown in the lower-right corner). Would this be possible using OpenGL? If so, how?
The best tool I recomend for having the whole data (if it fits on GPU memory) is the use of SSBO.
Nevertheless, you need data after transforming them (e.g. by a projection). Still SSBO is your best option:
In the fragment shader you read the properties of already handled particles (let's say, the rendered pixel) and write modified properties (number of particles at this pixel, color, etc) to the same index in the buffer.
Due to parallel nature of GPU, several instances coming from different particles may be doing concurrently the work for the same index. Thus you need to handle this on your own. Read Memory model and Atomic operations
Another approach, but limited, is using Blending
The idea is that each fragment increments the actual color value of the frame buffer. This can be done using GL_FUNC_ADD for glBlendEquationSeparate and using as fragment-output-color a value of 1/255 (normalized integer) for each RGB/a component.
Limitations come from the [0-255] range: Only up to 255 particles in the same pixel, the rest amount is clamped to this range and so "lost".
You have four components RGBA, thus four properties can be handled. But can have several renderbuffers in a FBO.
You can read the FBO by glReadPixels. Use glReadBuffer first with a GL_COLOR_ATTACHMENTi if you use a FBO instead of the default frame buffer.

How do I get started with a GPU voxelizer?

I've been reading various articles about how to write a GPU voxelizer. From my understanding the process goes like this:
Inspect the triangles individually and decide the axis that displays the triangle in the largest way. Call this the dominant axis.
Render the triangle on its dominant axis and sample the texels that come out.
Write that texel data onto a 3D texture and then do what you will with the data
Disregarding conservative rasterization, I have a lot of questions regarding this process.
I've gotten as far as rendering each triangle, choosing a dominant axis and orthogonally projecting it. What should the values of the orthogonal projection be? Should it be some value based around the size of the voxels or how large of an area the map should cover?
What am I supposed to do in the fragment shader? How do I write to my 3D texture such that it stores the voxel data? From my understanding, due to choosing the dominant axis we can't have more than a depth of 1 voxel for each fragment. However, since we projected orthogonally I don't see how that would reflect onto the 3D texture.
Finally, I am wondering on where to store the texture data. I know it's a bad idea to store data CPU side since you have to pass it all in to use it on the GPU, however the sourcecode I am kind of following chooses to store all its texture on the CPU side, such as those for a light map. My assumption is that data that will only be used on the GPU should be stored there and data used on both should be stored on the CPU side of things. So, from this I store my data on the CPU side. Is that correct?
My main sources have been: https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf OpenGL Insights
https://github.com/otaku690/sparsevoxeloctree A SVO using a voxelizer. The issue is that the shader code is not in the github.
In my own implementation, the whole scene is positioned and scaled into one unit cube centered on world origin. The modelview-project matrices are straightforward then. And the viewport is simply the desired voxel resolution.
I use 2-pass approach to output those voxel fragments: the 1st pass calculate the number of output voxel fragments by accumulating a single variable using atomic counter. Then I use the info to allocate a linear buffer.
In the 2nd pass the rasterized voxel fragments are stored into the allocated linear buffer, using atomic counter to avoid write conflict.

OpenGL rendering several mesh instances

I started learning OpenGL, now I'm going to try to develop something on my own and got stuck on a doubt.
I'm going to render models that have about 50k primitives (cylinders, cubes, cones, etc). Less than 1/4 of them are 'unique', I mean, have the same dimensions, but different positioning and rotation. So I thought that somehow I could fill a data buffer with only basic vertices and then draw them with individual transformations matrices.
From what I've read, I should use a buffer for vertices and a buffer for indices, so I don't waste memory storing repeated vertices.
All of them are stored in a single big buffer (that's because I read that this is more efficient if the one single buffer if it do not exceed a 1~3mb limit).
To draw them I'm trying to use glDrawElements, but since they are all in a single buffer, I cannot update the individual matrices to update the shaders so they can draw each mesh in the correct position.
One solution would be to use thousands of small buffers and then update the matrices between the glDrawElements calls.
Another would be discard the indices buffer and store only the vertices so I can draw them using glDrawArrays, which allows me to draw only a small part of the buffer.
Anything I said above is wrong? Which option would result in better performance? Is there a better way to do this?

OpenGL Gaussian Kernel on 3D texture

I would like to perform a blur on a 3D texture in openGL. Since it is separable I should be able to do it in 3 passes. My question is, what would be the best way to cope with it?
I currently have the 3D texture and fill it using imageStore. Should I create other 2 copies of the texture for the blur or is there a way to do it while using a single texture?
I am already using glCompute to compute the mip map of the 3D texture, but in this case I read from the texture at level 0 and write to the one at the next level so there is no conflict, while in this case I would need some copy.
In short it can't be done in 3 passes, because is not a 2D image. Even if kernel is separable.
You have to blur each image slice separately, wich is 2 passes for image (if you are using a 256x256x256 texture then you have 512 passes just for blurring along U and V coordinates). The you still have to blur along T and U (or T and V: indifferent) coordinates wich is another 512 passes. You can gain performance by using bilinear filter and read values between texels to save some constant processing cost. The 3D blur will be very costly.
Performance tip: maybe you don't need to blur the whole texture but only a part of it? (the visible part?)
The problem wich a such high number of passes, is the number of interactions between GPU and CPU: drawcalls and FBO setup wich are both slow operations that hang the CPU (probably a different API with low CPU overhead would be faster)
Try to not separate the kernel:
If you have a small kernel (I guess up to 5^3, only profiling will show the max kernel size) probably the fastest way is to NOT separate the kernel (that's it, you save a lot of drawcalls and FBO binding and leverage everything to GPU fillrate and bandwith).
Spread work over time:
Does not matter if your kernel is separated or not. Instead of computing a Gaussian Blur every frame, you could just compute it every second (maybe with a bigger kernel). Then you use as source of "continuos blurring data" the interpolation of the previouse blur and the next blur (wich is a 2x 3D Texture samples each frame, wich is much cheaper than continuosly blurring).