I've added a particle system to a voxel game I'm working on. At the moment, all the physics are done on the CPU, and it's pretty slow (my CPU struggles with 2000 particles).
For each particle, I determine the range of voxels that it could theoretically collide with (based on the particle's position and velocity, and the elapsed time), and then check for collisions with all the voxels in that range and use the nearest collision.
To increase performance, I'm exploring whether I can use a compute shader to perform the physics.
If I convert my voxel world into a bit array and toss that in an SSBO, then the compute shader has all the geometry information required to do the collision detection. However...
The collision detection code that I wrote for CPU is not going to be efficient at all on the GPU; there's just wayyy to much branching/looping. Is there an efficient way to have particles collide with a voxel grid, on a compute shader?
For simplification, consider your particles as point-shaped objects with only position P and velocity V units/tick. Instead of computing exactly which voxels your particles will touch, a first approximation for particle motion could just be to check if P + V is occupied by a solid voxel (using a 3D sampler), and set V to zero (or to a fraction of itself) if this is the case, otherwise increment P by V. These conditional operations can be efficiently done with integer arithmetic, no branching is required.
If this approximation is too rough, because V is often multiple voxel units long, and because your voxel geometry is fine enough that this could cause particles to clip through solid walls, you could simply repeat the above operation N times inside the shader, using V/N instead of V, where N should be the smallest constant integer that makes the clipping stop most of the time. For-loops of constant length will be unrolled by the shader compiler, so you will still require no true branching.
Now with this algorithm, the behavior of your particles will be to cease all (or most) motion once they reach an obstacle. If they are affected by gravity (best done inside the shader as well), they will fall straight down, but only after losing their vertical velocity. If they reach a floor, they will stay where they are.
If you want your particles to skid across horizontal surfaces instead of stopping where they land, and retain their vertical momentum when hitting walls, you can separate V into a horizontal and a vertical component, and do the above steps separately for the two components.
You also have the option to separate all three coordinates of V, so particles hitting walls with diagonal horizontal motion will follow the wall instead of dropping straight down, but the performance penalty may outweigh the benefits compared to two components.
Related
I am attempting to create a reasonably interactive N-body simulation, with the novelty of being able to observe the simulation from the surface of one of the bodies. By this, I mean that I have some randomly placed 'stars' of very high masses with random velocities and 'planets' of smaller masses given initial circular velocities around these stars. I am then rendering this in real-time via OpenGL on Linux and DirectX11 on Windows.
My question is in regards to rendering the scene out, NOT the N-body simulation. I have a very efficient/accurate solver working now, and it can always be improved later without affecting the rendering.
The problem obviously arises that stars are obscenely far away from each other, thus the fragment shader is incapable of rendering distant stars as they are fractions of pixels in size. Using a logarithmic depth-buffer works fine for standing on a planet and looking at a moon and the host star, but I am really struggling on how to deal with the distant stars. I am not interested in 'faking' it, or rendering a star map centered on the player, as the whole point is to be able to view the simulation in real time. A.k.a the star your planet is orbiting is ~1e6m away and is rendered as a sphere, as it has a radius ~1e4 m. Other stars are ~1e8m away from you, so they show up as single lit pixels (sometimes) with a far Z-plane of ~1e13.
I think I have an idea/plan, but I think it involves knowledge/techniques I am not aware of yet.
Rationale:
Have world space of stars on a given frame
This gives us 'screen' space, or fragment position, of star's center of mass in fragment shader
Rather than render this as a scaled sphere, we can try to mimic what our eye's actually do: convolve this point (pixel) with an airy disc (or gaussian or whatever is most efficient, doesn't matter) so that stars are rendered instead as 'blurs' on the sky, with their 'bigness' depending on their luminosity and distance (in essence re-creating the magnitude system for free)
Theoretically this would enable me to change the 'lens' parameters of my airy disc at will in order to produce things that look reasonably accurate/artistic.
The problem: I have no idea how to achieve this blurring effect!
I have some basic understanding of shaders, and have different render passes going on currently, but this seems to involve things I have not stumbled upon, or even how to achieve this effect.
TLDR: given an input of a fragment position, how can I blur it in a fragment/pixel shader with an airy disc/gaussian/etc.?
I thought a logarithmic depth buffer would work initially, but obviously that only helps with z-fighting, not dealing with angular size of far away objects.
You are over-thinking it. For stars smaller than a pixel, just render a square with an Airy disc texture. This is not "faking" - this is just how [real-time] computer graphics works.
If the lens diameter changes, calculate a new Airy disc texture.
For stars that are a few pixels big (do they exist?) maybe you want to render a few-pixel sphere convolved with an Airy disc, then use that texture. Asking the GPU to do convolution every frame is a waste of time, unless you really need it to. If the size really is only a few pixels, you could alternatively render a few copies of the single-pixel texture, overlapping itself and 1 pixel apart. Though computing the texture would allow you to have precision smaller than a pixel, if that's something you need.
For the nearby stars, the Airy disc from each pixel sums up to make a halo, I think? Then you just render a halo, instead of doing the convolution. It isn't cheating, I swear.
If you really do want to do a convolution, you can do it directly: render everything to a texture by using a framebuffer, and then render that texture onto the screen, using a shader that reads from several adjacent texture pixels, and multiplies them by the kernel. Since this runs for every pixel multiplied by the size of the kernel, it quickly gets expensive, the more pixels you want to sample for the convolution, so you may prefer to skip some and make it approximate. If you are not doing real-time rendering then you can make it as slow as you want, of course.
When game developers do a Gaussian blur (quite common) or a box blur, they do a separate X blur and Y blur. This works because the convolution of an X blur and a Y blur is a 2D blur, but I don't know if this works for the Airy disc function. It minimizes the number of pixels sampled for the convolutions.
Let's say you want make a simple rain/snow/dust/starfield effect in a 3D scene. Putting individual raindrops, snowflakes, dust particles or stars as individual nodes in a scene hierarchy would be too slow.
Another solution would be to use some kind of particle system that generates a cube of particles around the camera using a formula that generates infinite stars in all directions. For this case I'm using effectively
for N particles
particlePos = cameraPos +
euclideanModulo(pseudoRandomVector(0, 1) - camerPos, 1) *
boxSize - boxSize / 2;
The problem with this system is most of the particles would be outside the view.
Here the camera is the green dot in the center. It is spinning around the Y axis. The frustum is shown. There's a big waste as 85% of the particles are not in the frustum.
Yet another solution is to find the largest bounding box that always contains the view frustum for boxSize above (which is a smaller box), then instead of camera position use the min corner of the box for position
boxSize = sizeOfAxiallyAlignedBoxThatContainsTheFrustumAtAnyOrientation()
minCorner = minCornerOfFrustumAtCurrentOrientation()
for N particles
particlePos = minCorner +
euclideanModulo(pseudoRandomVector(0, 1) - minCorner, 1) *
boxSize;
This works much better as now the much less particles are wasted. Both images above are drawing 200 particles and you can see the number in the frustum is far more dense in this version
Still it looks like 30% of the particles are still outside the frustum. In fact it gets much much worse with a wider angle frustum.
Now we're back to most particles outside the frustum.
If it's not clear the particles need to stay static relative to the camera (if they are not moving) so at least with my current algorithm I don't think I can do any better. Moving particles in and out of the frustum would make the density change based on view direction.
For this reason the 200 particles are spread through a box the size of the smallest box that always contains the frustum at any orientation. When the frustum is oblong then that box becomes much larger than any individual orientation of the frustum since sometimes the frustum will be tall and thin and other times wide and short.
Are there any solutions that waste less particles but still maintain the same properties of static position relative to the camera?
Can someone tell me the best way to estimate the normal at a point on CAD STL geometry?
This is not exactly a question on code, but rather about efficiency and approach.
I have used an approach in which I compare the point whose normal needs to be estimated with all the triangles in the mesh and check to see if it lies inside the triangle using the barycentric coordinates test. (If the value of each barycentric coordinate lies between 0 and 1, the point lies inside.) This post explains it
https://math.stackexchange.com/questions/4322/check-whether-a-point-is-within-a-3d-triangle
Then I compute the normal of that triangle to get the point normal.
The problem with my approach is that, if I have some 1000 points, and if the mesh has say, 500 triangles, that would mean doing some 500X1000 checks. This takes a lot of time.
Is there an efficient data structure or approach I could use, to pinpoint the right triangle? Or a library that could get the work done?
A relatively easy solution is by using a grid: decompose the space in a 3D array of voxels, and for every voxel keep a list of the triangles that interfere with it.
By interfere, I mean that there is a nonempty intersection between the voxel and the bounding box of the triangle. (When you know the bounding box, it is straight forward to tell what voxels it covers.)
When you want to test a point, find the voxel it belongs to and compare to the list of triangles. You will achieve a speedup equal to N/M, where M is the average number of triangles per voxel.
The voxel size should be chosen carefully. Too small will result in a too big data structure; too large will make the method ineffective. If possible, adjust to "a few" triangles per voxel. (Use the average triangle size - square root of double area - as a starting value.)
For better efficiency, you can compute the exact intersections between the triangles and the voxels, using a 3D polygon clipping algorithm (rather than a mere bounding box test), but this is more complex to implement.
I have a 3D texture containing voxels and I am ray tracing and, everytime i hit a voxel i display the color. The result is nice but you can clearly see the different blocks being separated by one another. i would like to get a smoothing color going from one voxel to the other so I was thinking of doing interpolation.
My problem is that when I hit the voxel I am not sure which other neighbouring voxels to extract the colors from because i don't know if the voxel is part of a wall parallel to some axis or if it is a floor or an isolate part of the scene. Ideally I would have to get, for every voxel, the 26 neighbouring voxels, but that can be quite expensive. Is there any fast and approximate solution for such thing?
PS: I notice that in minecraft there is smooth shadows that form when voxels are placed near each other, maybe that uses a technique that might be adapted for this purpose?
For example if we have ortho projection:
left = -aspect, right = aspect, top = 1.0, bottom = -1.0, far = 1.0, near = -1.0
And will draw triangle at -2.0, it will be cut of by near clipping plane. Will it really saves some precious rendering time?
Culling determinate if we need to draw something and discards if out of our view (written by programer in vertex shader/in main program). Clipping == cheap auto culling?
Also just in theme of cheap culling - will be
if(dist(cam.pos, sprite.pos) < MAX_RENDER_DIST)
draw_sprite(sprite);
just enough for simple 2d game?
Default OpenGL clipping space is -1 to +1, for x, y and z.
The conditional test for sprite distance will work. It is kind of not needed, as the far clipping plane will do almost the same thing. Usually it is good enough. There are cases where the test is needed. Objects at the corner inside the clipping planes may come outside the far clipping plane with the camera turns. The reason is that distance from the camera to the corner is longer than the perpendicular distance from the camera to the far clipping plane. This is not a problem if you have a 2D game and do not allow changes of the camera viewing angle.
If you have a simple 2D game, chances are high that you do not need to worry about graphics optimization. If you are drawing sprites outside of the clipping planes, you save time. But how much time you save depends. If a huge amount of the sprites are outside, you may save considerable time. But then, you should probably consider what algorithm you use, and not draw things that are not going to be shown anyway. If only a small percentage of the sprites are outside, then the time saved will be negligible.
The problem with clipping in the GPU is that it happens relatively late in the pipeline, just before rasterization, so a lot of computations could already be done for nothing.
Doing it on the CPU can save these computations from happening and, also very important, reduce the number of actual draw commands (which can also be a bottleneck).
However, you do want to do this fast in the CPU, typically you'll use an octree or similar to represent data so you can discard an entire subtree at once. If you have to go over each polygon or even object separately this can become to expensive.
So in conclusion: the usefulness depends on where your bottleneck lies (cpu, vertex shader, transmission rate, ...).