I've been reading various articles about how to write a GPU voxelizer. From my understanding the process goes like this:
Inspect the triangles individually and decide the axis that displays the triangle in the largest way. Call this the dominant axis.
Render the triangle on its dominant axis and sample the texels that come out.
Write that texel data onto a 3D texture and then do what you will with the data
Disregarding conservative rasterization, I have a lot of questions regarding this process.
I've gotten as far as rendering each triangle, choosing a dominant axis and orthogonally projecting it. What should the values of the orthogonal projection be? Should it be some value based around the size of the voxels or how large of an area the map should cover?
What am I supposed to do in the fragment shader? How do I write to my 3D texture such that it stores the voxel data? From my understanding, due to choosing the dominant axis we can't have more than a depth of 1 voxel for each fragment. However, since we projected orthogonally I don't see how that would reflect onto the 3D texture.
Finally, I am wondering on where to store the texture data. I know it's a bad idea to store data CPU side since you have to pass it all in to use it on the GPU, however the sourcecode I am kind of following chooses to store all its texture on the CPU side, such as those for a light map. My assumption is that data that will only be used on the GPU should be stored there and data used on both should be stored on the CPU side of things. So, from this I store my data on the CPU side. Is that correct?
My main sources have been: https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf OpenGL Insights
https://github.com/otaku690/sparsevoxeloctree A SVO using a voxelizer. The issue is that the shader code is not in the github.
In my own implementation, the whole scene is positioned and scaled into one unit cube centered on world origin. The modelview-project matrices are straightforward then. And the viewport is simply the desired voxel resolution.
I use 2-pass approach to output those voxel fragments: the 1st pass calculate the number of output voxel fragments by accumulating a single variable using atomic counter. Then I use the info to allocate a linear buffer.
In the 2nd pass the rasterized voxel fragments are stored into the allocated linear buffer, using atomic counter to avoid write conflict.
Related
Im trying to wrap my head around the GPU pipeline and the performance implications...
I create a coordinate system and put a million vertices on it, all of them are now in memory usable by the GPU. I assume this is the performance hit on this step: moving all the floating values into the GPU memory, implying the points where already created.
Then I transform my million points coordinates into clipping coordinates. Here I’m applying a transformation to each point.
As a result of this transformation some points are now outside of the clip coordinates, let’s say only a thousands points are in. Does the vertex shader run on the thousand or all million points? What about the fragment shader? And the building of triangles? Transformation into the final device coordinates only takes the thousand points?
My guess is that the vertex runs on all but the fragment only on the interpolation of the visible vertices.
Is the only optimization possible then to just include as little vertices as possible in the first place? If I’m looking at a full 3D world with buildings, trees, roads... and then zoom at just one rock I’m running all the shaders on all the objects anywhay... so the only solution would be to not put those trees and buildings in the first place? Or can I have this world on GPU memory but just compute the rock? Could I apply the transformation of coordinates to just the rock somehow? Where in the pipeline a technique like GPU culling, level of detail or dynamic tessellation takes place?
Vertex shaders are executed for each vertex submitted with glDrawArrays and glDrawElements function families, perhaps even multiple times per vertex. The transformed vertices are then assembled into primitives and clipped—if it's outside the viewport then their processing is done. To reduce the overhead of processing vertices of objects outside the viewport multiple techniques are employed. The simplest one is "frustum culling"—submit the object for rendering only if its bounding box intersects the camera frustum.
Fragment shaders are executed for each fragment ("pixel") in the framebuffer that passes the depth test. One way to reduce their count is to render front to back—so that only the front-visible fragments are ever calculated.
I'm pretty new to OpenGL and am trying to implement a simple program where I can draw cubes, move them around with the mouse, and delete them.
Previously I had done my drag operations by translating on the CPU. In this way I was able to use ray-tracing to pick out the element I wanted because the vertices themselves were being updated.
However, I'm trying to move all of the transformations to the GPU and in doing so realized that I would then be giving up updated access to the vertices on the CPU (as the CPU still thinks the vertices are the un-transformed ones). How does one do this communication so that I wouldn't have to manually do transformations on the CPU as well as in the Vertex Shader?
No matter where you're doing your transformations, you will typically have a model matrix that describes where each object is in the scene. Instead of transforming each object into world space just so you can check for intersection with a world-space ray, you can also transform the ray into the object space of each object by transforming the ray with the inverse model matrix.
One general issue with ray-tracing is that, as your scene gets larger, brute force testing of each object will get increasingly slow. You can use acceleration structures like an Octree or a Bounding Volume Hierarchy to speed things up. A completely different approach when it comes to picking would be just render an ID buffer, i.e. a buffer that has the same resolution as your currently rendered frame and for each pixel saves the ID of the object that is visible at that pixel. Then you can simply read back the value of the pixel underneath the cursor to find out what object you hit without the need to do any raytracing. Rendering the ID buffer could be done as a separate pass or can likely just be added as an additional render target to a pass you're already doing, e.g., prefilling the depth buffer or just when rendering the scene in case you only do one pass.
Currently I'm creating a particle system and I would like to transfer most of the work to the GPU using OpenGL, for gaining experience and performance reasons. At the moment, there are multiple particles scattered through the space (these are currently still created on the CPU). I would more or less like to create a histogram of them. If I understand correctly, for this I would first translate all the particles from world coordinates to screen coordinates in a vertex shader. However, now I want to do the following:
So, for each pixel a hit count of how many particles are inside. Each particle will also have several properties (e.g. a colour) and I would like to sum them for every pixel (as shown in the lower-right corner). Would this be possible using OpenGL? If so, how?
The best tool I recomend for having the whole data (if it fits on GPU memory) is the use of SSBO.
Nevertheless, you need data after transforming them (e.g. by a projection). Still SSBO is your best option:
In the fragment shader you read the properties of already handled particles (let's say, the rendered pixel) and write modified properties (number of particles at this pixel, color, etc) to the same index in the buffer.
Due to parallel nature of GPU, several instances coming from different particles may be doing concurrently the work for the same index. Thus you need to handle this on your own. Read Memory model and Atomic operations
Another approach, but limited, is using Blending
The idea is that each fragment increments the actual color value of the frame buffer. This can be done using GL_FUNC_ADD for glBlendEquationSeparate and using as fragment-output-color a value of 1/255 (normalized integer) for each RGB/a component.
Limitations come from the [0-255] range: Only up to 255 particles in the same pixel, the rest amount is clamped to this range and so "lost".
You have four components RGBA, thus four properties can be handled. But can have several renderbuffers in a FBO.
You can read the FBO by glReadPixels. Use glReadBuffer first with a GL_COLOR_ATTACHMENTi if you use a FBO instead of the default frame buffer.
I've written my own 3D Game Engine in the past few years and wanted to actually use it for a game.
I stumbled accros the following problem:
I have multiple planes in my game but lets talk about one single plane.
Naturally, planes are not able to dive into the ground and fly under the terrain.
Therefor, I need to implement something that detects the collision between a plane/jet and my ground.
The informations given are the following:
Grid of terrain [2- dimensional array; stores height at according x,z coordinate]
Hitbox of my plane (it moves with my plane, so the bounds etc. are all already calculated and given)
So about the hitboxes:
I though about which method to use. The best one in terms of performance seems to be simple spheres with different radius.
About the ground: Graphically, the ground is subdivided into triangles:
So what I need now is the optimal type of hitbox (sphere, AABB,...) and the according most efficient calculations.
My attempt was to get every surrounding triangle and calculate the distance from that one to each center of my hitbox spheres. If the distance is less than the radius, it has successfully detected a collision. But when I have up to 10/20 spheres in my plane and like 100 triangles to check, it will take to much time.
Another attempt was to get the vertical distance to the ground from each hitbox sphere. This one needs way less calculations but fails when getting near steep surfaces.
I would be very happy if someone could help me implementing an efficient version of plane/terrain collision detection :)
render terrain
May be you could try liner depth buffer to improve accuracy.
read depth texture
you can use glReadPixels with GL_DEPTH_COMPONENT and GL_FLOAT. That will copy depth buffer into CPU side memory. So now you can do also collision on CPU side or any computation related to ground in view...
use the depth buffer as texture
so copy it back GPU with glTexImage2D. I know this is slow (but most likely much faster then your current computation of collision. In case you are not using Intel HD Graphics You can instead #2,#3 use FBO for depth which will render depth buffer directly to texture. But on Intel this does not work reliably (or at all).
now render your objects (off screen) with GLSL
inside fragment shader just compare rendered position with depth (attached as texture). If bellow output the collision somewhere. If done in compute shaders than you can store results in some texture. Or you could use some attachment or FBO for this.
In case you can not use FBO you could render to "screen" with specifically color encoded collisions. Then read it with glReadPixels and scan for it to handle what ever collision logic you have on CPU side...
Do not write to Depth buffer in this pass !!! And also do not use CULL_FACE because that could miss some collision of the back side of your object.
now render the objects normally
in case you do not render in #4 or you encode collision to screen buffer you need to overwrite/render the stuff. Otherwise this step is not needed. But rendering after collision detection is good because in case of collision you most likely change the object position/orientation/mesh and already rendered object could be hindering the altered one.
[Notes]
Copying image between CPU and GPU is slow so use FBO and render to texture if you can instead.
If you are not familiar with multiple pass rendering see some QAs for inspiration:
OpenGL Scale Single Pixel Line
Render filled complex polygons with large number of vertices with OpenGL
This works only in view ... but you can do just collision rendering pass (per object). Render with camera set to view from top to down (birdseye) and covering only area around your object... Also you do not need too big resolution for this so it should be relatively fast ... So you can divide your screen to square areas (using glViewport) testing more objects in single frame to lover the sync time slowdowns as much as possible (use less glReadPixel calls). Also you do not need any vertex colors or textures for this.
I'm currently looking into raycasting and voxels, which is a nice combination. A Voxelrenderer by Sebastian Scholz implements this pretty nicely, but also uses OpenGL. I'm wondering how his formula is working; how can you use OpenGL with raycasting and voxels? Isn't the idea for raycasting that a ray is casted for every pixel (or line ie in Doom) and then to draw the result?
The mentioned raycaster is a Voxelrenderer, i.e. a method do visualize volumetric data, like opacities stored in a 3D texture. Doom's raycasting algorithm has another intention: For every pixel on the screen find the first planar surface of the map and draw the color of that there. The rasterizing capabilited of modern GPUs obsoleted this use of raycasters.
Realtime visualizing volumetric data still is a task done by special hardware, typically found in medical and geodesic imaging systems. Basically those are huge bulks of RAM (several dozens of GB) holding volumetric RGBA data. Then for every on screen pixel a ray is cast through the volume and the RGBA data integrated over that ray. A GPU Voxelrenderer does the same thing by a fragment shader; pseudocode:
vec4f prev_color;
for(i=0; i<STEPS; i++) {
p = ray_direction * i*STEP_DELTA;
voxel = texture3D(volumedata, p);
prev_color = combine(voxel, prev_color);
}
final_color = finalize(prev_color);
finalize and combine depend on the kind of data and what you want to visualize. For example if you want to integrate the density (like in an X ray image), combine would be a summing operation and finalize a normalization. If you were to visualize a cloud, you'd alpha blend between voxels.
Raycasting in a voxel space wouldn't use pixels, it would be inefficient.
You already have an array to say what spaces are empty and which ones have a voxel cube.
So a fast version is tracing a line which checks the emptimess of every voxel in the direction of the line, until it reaches a full voxel.
That would take a few hundred read ops from the memory and 2-3 multiplications of the ray vector for every read op.
to read a billion memory positions of voxels takes about 1 second, so a few hundred would be very fast and always within a frame.
Raycasting often uses optmizations to detect fractional places in space where a maths formula stars, where a mesh vertex is based on it's bounding box and then it's mesh, and in voxels it's just checks of a line in an integer array progressively until you find a non void.