Tree to optimize OpenGL Pointcloud - c++

I would like to optimize my OpenGL program.
It consists in loading a vector of 3D points then applying shaders to them.
But I have billions of points, and my FPS drop to 2 when I try to see the points.
Actually I'm sending every points, and I believe this is what is too much for my computer.
Is making a KD-tree (for example) to store my points and then send to my shaders only the points contained in the viewing frustum an efficient way to optimize my program?
And, since my goal isn't to do research of points, but only use points in the viewing frustum, which tree would be better? Octree? KD-tree?

using trees is definitely a good way to deal with large point clouds. i worked on a point cloud rendering software for a while and we used kd-trees for rendering, and regular voxel grids for analysis.
i don't exactly remember the reasons for/against using an octree, but i guess it depends on the density distribution of your clouds: if you have large point clouds with some small high-density areas, you would have lots of empty cells in the octree, whether for evenly-distributed point clouds, octrees might be simpler. We also had 2.5D maps (from aerial scans: several square kilometers of terrain but only little devation in height) where we used quad trees for some tasks.
also we did not render all the points that were in the frustum, because that degenerates e.g. when you zoom all the way out so the whole point cloud is in the frustum again.
instead, all the inner (non-leaf) nodes in the kd-tree contained a "representative" selection of the points in their children, and we rendered the tree only up to a depth that seemed appropriate depending on the distance from the camera to the bounding volume of each node. this way, for areas that are far away from the camera, you render a thinned out version of the point cloud, an LOD of sorts.
if you want to go fancy: we actually maintained a "front-line" of nodes, that is a line or cut from left to right through the tree up to which all nodes should be rendered. this way we did not need to check each node but only the ones in the cut whether their status ("rendered" or "not rendered") should change. additionally, we had out-of-core point clouds which where larger that the (V)RAM, where we allowed the front to only move farther down the tree if the parent node had been loaded from disk.
kd-trees are a bit harder to build because you need to determine where the split plane is located. for this we used a first pass where we read the locations of all the points in the node, determining the split plane, and then a second pass doing the actual split.
i think we had 4096 points per node (i think we experimented with more and 8k or 16k were fine as well), and did one draw call per node. as long as your point cloud fits in VRAM you can simply put all of it in one large buffer and do draw calls with offsets into that buffer.

Related

How do you store voxel data?

I've been looking online and I'm impressed by the capabilities of using voxel data, especially for terrain building and manipulation. The problem is that voxels are never clearly explained on any site that i visited or how to use/implement them. All i find is that voxels are volumetric data. Please provide a more complete answer; what is volumetric data. It may seem like a simple question but I'm still unsure.
Also, how would you implement voxel data? (I aim to implement this into a c++ program.) What sort of data type would you use to store the voxel data to enable me to modify the contents at run time as fast as possible. I have looked online and i couldn't find anything which explained how to store the data. Lists of objects, arrays, ect...
How do you use voxels?
EDIT:
Since I'm just beginning with voxels, I'll probably start by using it to only model simple objects but I will eventually be using it for rendering terrain and world objects.
In essence, voxels are a three-dimensional extension of pixels ("volumetric pixels"), and they can indeed be used to represent volumetric data.
What is volumetric data
Mathematically, volumetric data can be seen as a three-dimensional function F(x,y,z). In many applications this function is a scalar function, i.e., it has one scalar value at each point (x,y,z) in space. For instance, in medical applications this could be the density of certain tissues. To represent this digitally, one common approach is to simply make slices of the data: imagine images in the (X,Y)-plane, and shifting the z-value to have a number of images. If the slices are close to eachother, the images can be displayed in a video sequence as for instance seen on the wiki-page for MRI-scans (https://upload.wikimedia.org/wikipedia/commons/transcoded/4/44/Structural_MRI_animation.ogv/Structural_MRI_animation.ogv.360p.webm). As you can see, each point in space has one scalar value which is represented as a grayscale.
Instead of slices or a video, one can also represent this data using voxels. Instead of dividing a 2D plane in a regular grid of pixels, we now divide a 3D area in a regular grid of voxels. Again, a scalar value can be given to each voxel. However, visualizing this is not as trivial: whereas we could just give a gray value to pixels, this does not work for voxels (we would only see the colors of the box itself, not of its interior). In fact, this problem is caused by the fact that we live in a 3D world: we can look at a 2D image from a third dimension and completely observe it; but we cannot look at a 3D voxel space and observe it completely as we have no 4th dimension to look from (unless you count time as a 4th dimension, i.e., creating a video).
So we can only look at parts of the data. One way, as indicated above, is to make slices. Another way is to look at so-called "iso-surfaces": we create surfaces in the 3D space for which each point has the same scalar value. For a medical scan, this allows to extract for instance the brain-part from the volumetric data (not just as a slice, but as a 3D model).
Finally, note that surfaces (meshes, terrains, ...) are not volumetric, they are 2D-shapes bent, twisted, stretched and deformed to be embedded in the 3D space. Ideally they represent the border of a volumetric object, but not necessarily (e.g., terrain data will probably not be a closed mesh). A way to represent surfaces using volumetric data, is by making sure the surface is again an iso-surface of some function. As an example: F(x,y,z) = x^2 + y^2 + z^2 - R^2 can represent a sphere with radius R, centered around the origin. For all points (x',y',z') of the sphere, F(x',y',z') = 0. Even more, for points inside the sphere, F < 0, and for points outside of the sphere, F > 0.
A way to "construct" such a function is by creating a distance map, i.e., creating volumetric data such that every point F(x,y,z) indicates the distance to the surface. Of course, the surface is the collection of all the points for which the distance is 0 (so, again, the iso-surface with value 0 just as with the sphere above).
How to implement
As mentioned by others, this indeed depends on the usage. In essence, the data can be given in a 3D matrix. However, this is huge! If you want the resolution doubled, you need 8x as much storage, so in general this is not an efficient solution. This will work for smaller examples, but does not scale very well.
An octree structure is, afaik, the most common structure to store this. Many implementations and optimizations for octrees exist, so have a look at what can be (re)used. As pointed out by Andreas Kahler, sparse voxel octrees are a recent approach.
Octrees allow easier navigating to neighbouring cells, parent cells, child cells, ... (I am assuming now that the concept of octrees (or quadtrees in 2D) are known?) However, if many leaf cells are located at the finest resolutions, this data structure will come with a huge overhead! So, is this better than a 3D array: it somewhat depends on what volumetric data you want to work with, and what operations you want to perform.
If the data is used to represent surfaces, octrees will in general be much better: as stated before, surfaces are not really volumetric, hence will not require many voxels to have relevant data (hence: "sparse" octrees). Refering back to the distance maps, the only relevant data are the points having value 0. The other points can also have any value, but these do not matter (in some cases, the sign is still considered, to denote "interior" and "exterior", but the value itself is not required if only the surface is needed).
How to use
If by "use", you are wondering how to render them, then you can have a look at "marching cubes" and its optimizations. MC will create a triangle mesh from volumetric data, to be rendered in any classical way. Instead of translating to triangles, you can also look at volume rendering to render a "3D sampled data set" (i.e., voxels) as such (https://en.wikipedia.org/wiki/Volume_rendering). I have to admit that I am not that familiar with volume rendering, so I'll leave it at just the wiki-link for now.
Voxels are just 3D pixels, i.e. 3D space regularly subdivided into blocks.
How do you use them? It really depends on what you are trying to do. A ray casting terrain game engine? A medical volume renderer? Something completely different?
Plain 3D arrays might be the best for you, but it is memory intensive. As BWG pointed out, octree is another popular alternative. Search for Sparse Voxel Octrees for a more recent approach.
In popular usage during the 90's and 00's, 'voxel' could mean somewhat different things, which is probably one reason you have been finding it hard to find consistent information. In technical imaging literature, it means 3D volume element. Oftentimes, though, it is used to describe what is somewhat-more-clearly termed a high-detail raycasting engine (as opposed to the low-detail raycasting engine in Doom or Wolfenstein). A popular multi-part tutorial lives in the Flipcode archives. Also check out this brief one by Jacco.
There are many old demos you can find out there that should run under emulation. They are good for inspiration and dissection, but tend to use a lot of assembly code.
You should think carefully about what you want to support with your engine: car-racing, flying, 3D objects, planets, etc., as these constraints can change the implementation of your engine. Oftentimes, there is not a data structure, per se, but the terrain heightfield is represented procedurally by functions. Otherwise, you can use an image as a heightfield. For performance, when rendering to the screen, think about level-of-detail, in other words, how many actual pixels will be taken up by the rendered element. This will determine how much sampling you do of the heightfield. Once you get something working, you can think about ways you can blend pixels over time and screen space to make them look better, while doing as little rendering as possible.

Clarification about octrees and how they work in a Voxel world

I read about octrees and I didn't fully understand how they world work/be implemented in a voxel world where the octree's purpose is to lower the amount of voxels you would render by connecting repeating voxels to one big "voxel".
Here are the questions I want clarification about:
What type of data structure would you use? How could turn a 3-D array of voxels into and array that has different sized voxels that take multiple locations in the array?
What are the nodes and what are they used for?
Does the octree connect the voxels so there are ONLY square shapes or could it be a rectangle or a L shape or an entire Y column of voxels or what?
Do the octrees really improve performance of a voxel game? If so usually by how much?
Quick answers:
A tree:Each node has 8 children, top-back-left, top-back-right, etc. down to a certain levelThe code for this can get quite complex, especially if the voxels can change at runtime.
The type of voxel (colour, material, a list of items)
yep. Cubes onlyMore specifically 1x1, 2x2, 4x4, 8x8 etc. It must be an entire node.If you really want to you could define some sort of patterns, but its no longer a octdtree.
yeah, but it depends on your data. Imagine describing 256 identical blocks individually, or describing it once (like air in Minecraft)
I'd start with trying to understand quadtrees first. You can do that on paper, or make a test program with it. You'll answer these questions yourself if you experiment
An octree done correctly can also help you with neighbour searches which enable you to determine if a face is considered to be "visible" (ie so you end up with a hull of voxels visible). Once you've established your octree you then use this to store your XYZ coords which you then extract into a single array. You then feed this array into your VERTEX Buffer (GL solutions require this) which you can then render in chunk forms as needed (as the camera moves forward etc).
Octree's also by there very nature collapse Cubes into bigger ones if there are ones of the same type... much like Tetris does when you have colors/shapes that "fit" one another.. this in turn can reduce your vertex count and at render you're really drawing a combination of squares and rectangles
If done correctly you will end up with a lot of chunks that only have the outfacing "faces" visible in the vertex buffers. Now you then have to also build your own Occlusion Culling algorithm which then reduces the visibility ontop of this resulting in less rendering required.
I did an example here:
https://vimeo.com/71330826
notice how the outside is only being rendered but the chunks themselves go all the way down to the bottom even though the chunks depth faces should cancel each other out? (needs more optimisation). Also note how the camera turns around and the faces are removed from the rendering buffers?

Ray-mesh intersection or AABB tree implementation in C++ with little overhead?

Can you recommend me...
either a proven lightweight C / C++ implementation of an AABB tree?
or, alternatively, another efficient data-structure, plus a lightweight C / C++ implementation, to solve the problem of intersecting a large number of rays with a large number of triangles?
"Large number" means several 100k for both rays and triangles.
I am aware that AABB trees are part of the CGAL library and probably of game physics libraries like Bullet. However, I don't want the overhead of an enormous additional library in my project. Ideally, I'd like to use a small float-type templated header-only implementation. I would also go for something with a bunch of CPP files, as long as it integrated easily in my project. Dependency on boost is ok.
Yes, I have googled, but without success.
I should mention that my application context is mesh processing, and not rendering. In a nutshell, I'm transferring the topology of a reference mesh to the geometry of a mesh from a 3D scan. I'm shooting rays from vertices and along the normals of the reference mesh towards the 3D scan, and I need to recover the intersection of these rays with the scan.
Edit
Several answers / comments pointed to nearest-neighbor data structures. I have created a small illustration regarding the problems that arise when ray-mesh intersections are approached with nearest neighbor methods. Nearest neighbors methods can be used as heuristics that work in many cases, but I'm not convinced that they actually solve the problem systematically, like AABB trees do.
While this code is a bit old and using the 3DS Max SDK, it gives a fairly good tree system for object-object collision deformations in C++. Can't tell at a glance if it is Quad-tree, AABB-tree, or even OBB-tree (comments are a bit skimpy too).
http://www.max3dstuff.com/max4/objectDeform/help.html
It will require translation from Max to your own system but it may be worth the effort.
Try the ANN library:
http://www.cs.umd.edu/~mount/ANN/
It's "Approximate Nearest Neighbors". I know, you're looking for something slightly different, but here's how you can use this to speed up your data processing:
Feed points into ANN.
Query a user-selectable (think of this as a "per-mesh knob") radius around each vertex that you want to ray-cast from and find out the mesh vertices that are within range.
Select only the triangles that are within that range, and ray trace along the normal to find the one you want.
By judiciously choosing the search radius, you will definitely get a sizable speed-up without compromising on accuracy.
If there's no real time requirements, I'd first try brute force.
1M * 1M ray->triangle tests shouldn't take much more than a few minutes to run (in CPU).
If that's a problem, the second best thing to do would be to restrict the search area by calculating a adjacency graph/relation between the triangles/polygons in the target mesh. After an initial guess fails, one can try the adjacent triangles. This of course relies on lack of self occlusion / multiple hit points. (which I think is one interpretation of "visibility doesn't apply to this problem").
Also depending on how pathological the topologies are, one could try environment mapping the target mesh on a unit cube (each pixel would consists of a list of triangles projected on it) and test the initial candidate by a single ray->aabb test + lookup.
Given the feedback, there's one more simple option to consider -- space partitioning to simple 3D grid, where each dimension can be subdivided by the histogram of the x/y/z locations or even regularly.
100x100x100 grid is of very manageable size of 1e6 entries
the maximum number of cubes to visit is proportional to the diameter (max 300)
There are ~60000 extreme cells, which suggests an order of 10 triangles per cell
caveats: triangles must be placed on every cell they occupy
-- a conservative algorithm places them to cells they don't belong to; large triangles will probably require clipping and reassembly.

Fastest way to perform rotational transformations on a chain of dependent, attached objects

Suppose I have two (two for the example, it will actually be some n > 1) sort of rectangular prisms "attached to each other" such that the 4 vertices on their adjacent faces are the same vertex in memory. So like two wooden blocks, one stacked on the other, with 4 vertices on the bottom, 4 in the middle that are shared between the two, and 4 on the top. Now, I want to be able to first do a specific rotation on the "top" wooden block, as if it were on a hinge that has a centerpoint of those 4 shared vertices.
So like an elbow, let's say it can only flex up to 45 degrees at a specific angle, and to perform the rotation I rotate the 8 vertices that make up the object around that invisible hinge center point. In the process, the 4 shared vertices of the other block get somewhat moved, but since the hinge is the center point among them they aren't getting "translated" away from the bottom block. I guess calling them wooden is counter-intuitive, since they will morph in specific ways, but I was trying to set it up to visualize. Anyway, let's say I want to be able to rotate this bottom block in a different manner, but have the top block act like it is attached. Thus, if the bottom block moves, the top block is swung around with it, but also with whatever flex it has on the hinge between them.
I was considering incrementally doing the transformations either via axis angle or quaternions, starting with the "top most" block and working my way down the dependency chain, performing the rotation on the current block and every vertex on blocks "above" it. However, this would require messing with offsetting all the vertices to put the current hinge as the origin, performing the rotation, then reversing the previous offset, for each step in this chain. Is there a more efficient way of handling this? I mean efficiency in speed, having extra preprocessed data in memory isn't a big deal. There may also come a time when I can't count on having such a linear dependency chain (such as the top block ends up being attached to the bottom block to form a ring, perhaps). What would be the proper way to handle this for these kind of possibilities?
Sounds to me from your description that you basically want something like a long piece of "jello", i.e., if the top section of the block/prism moves, then there is some secondary movement in the rest of the segments of the block/prism-chain, sort of like how moving a chain or some soft-body will create secondary-movements in the rest of the segments that make-up the chain or ring.
If that is the case, then I suggest actually constructing some "bones", where each bone segment starts and ends at the center-point of the 4-vertices that make-up each start and end-face of the prism/blocks. Then you can calculate when you move one segment of the bone-chain, how much the other bones in the chain should move relative to the bone that was moved. From there, you can weight the rest of the vertices in the prism/block against this central "bone" so that they move the appropriate amount as the bone moves. You may also want to average the vertices attached to one "bone" against another bone segment as well so that there is a fall-off in the weight of the attached vertices, creating a smoother movement if you end up with too much pinching at each "joint".
Using bones with the vertices weighed against the bones should reduce the number of rotational transforms you need to calculate. Only the movement of the bone-joints needs the heavy-lifting calculations ... the vertices themselves are simply interpolated from the location of the bones in the chain.
Consider using an existing tool. Have a look at this question about linking rigid bodies:
https://physics.stackexchange.com/questions/19724/how-to-represent-the-effect-of-linking-rigid-bodies-together
The standard way to handle an articulated, single-ended chain is skeletal animation -- using a chain of "bone" elements (defined by a relative translation/rotation relation), with the option of doing linear interpolation based on the bones to determine the position of the "skin" vertices. (Note that you will need to determine the rotation angle of each "joint" to fully define the pose.)
A ring of elements is more difficult to handle, because you can no longer define the rotation of each joint independently of all others. To solve this problem, set up a physical simulation or other solver which includes all the constraints. Exactly what to do depends on how you need to manipulate the object -- if it's part of a game engine, physical simulation makes sense, but if it's to be hand-animated, you have a wide range of possibilities for semi-automated rigging (keyword: reverse-kinematic).

Mesh rendering using Octree algorithm

I'm the founder of SceneMax - a 3D scripting language since 2005. I want to add a scene rendering using one mesh object built with 3d package like 3ds max and split by octree algorithm for optimized performance.
Do you know where can I find an algorithm which takes a mesh .X file, splits it to nodes (octree) and knows how to render it? I want to add it to my engine. The engine is open source (google for SceneMax if you're interested).
There are several variations, but when building an octree, one approach is this:
Start with one node (ie. a cube) encompassing your entire scene or object being partitioned.
For each element in your scene/object (eg. a mesh, or a poly, or whatever granularity you're working to):
Check if that element fits completely inside the node.
If yes, subdivide the node into eight children, then recursively do Step 2 for each child.
If no, then continue to the next node until there are no nodes left.
Add the element to the smallest node that can contain it.
It's also common to stop recursion based on some heuristic, like if the size of the nodes or the number of elements within the node is smaller than a certain threshold.
See this Flipcode tutorial for some more details about building the octree.
Once you have the octree, there are several approaches you can take to render it. The basic idea is that if you can't "see" a node, then you can't see its children either, so everything inside that node (and its children) don't need to be rendered.
Frustum culling is easy to implement, and does the "can you see it?" test using your view projection's frustum. Gamedev.net has an article discussing frustum culling and some other approaches.
You can also go further and implement occlusion culling after frustum culling, which will let you skip rendering any nodes which are covered up by nodes in front of them, using the z-buffer to determine if a node is hidden. This involves being able to traverse your octree nodes from closest to furthest. This technique is discussed in this Gamasutra article.
It's important to first consider the types of mesh files that you will be having to support.
There are effectively 3 different kinds of objects.
1. Open natural terrain. This fits with a quadtree very well as an octree brings complexity that is unnecessary. There's little reason to introduce a 3rd dimension if it won't give you much performance gain.
2. Open terrain with many tall objects. This fits into an octree very well as an octree allows you to remove things on the vertical axis from rendering and a scene such as this has a lot of those. I honestly can't name any that fit into this off the top of my head.
3. Enclosed spaces or character models/static meshes. This fits very well into BSP trees. A BSP tree allows for objects that are fully onscreen, or enclosed spaces to only render as many polygons as needed.
I would recommend adding 1 and 3, assuming 1 is even a model type you need to support. #3 is very standard for first person shooters or character models and adding support for that may give you the best bang for your buck. The overall idea is to remove as much geometry from a render as possible, thus a quad tree for 95% of the outdoor terrain players game worlds have is perfect. An octree has uses, but less than you may think.
Each of these algroithms are relatively easy to write. For example I wrote an octree in 3 hours many years ago. Generally these are processed at load time, pieces of geometry being added to each square (if quad/octree) or to the tree (if BSP) and then rendering follows suit. There are a lot of great articles out there via a quick Google search that I will leave for your research. A quick note is that BSP tree's also have the ability to handle collision detection and are ideal candidates for character models and static meshes. Thus this is an algorithm that I would recommend taking your time on in order to ensure it is flexible enough for multiple uses.