Ray Tracing: Only use single ray instead of both reflection & refraction rays - c++

I am currently trying to understand the ray tracer developed by Kevin Beason (smallpt: http://www.kevinbeason.com/smallpt/) and if I understand the code correctly he randomly chooses to either reflect or refract the ray (if the surface is both reflective and refractive).
Line 71-73:
return obj.e + f.mult(depth>2 ? (erand48(Xi)<P ? // Russian roulette
radiance(reflRay,depth,Xi)*RP:radiance(Ray(x,tdir),depth,Xi)*TP) :
radiance(reflRay,depth,Xi)*Re+radiance(Ray(x,tdir),depth,Xi)*Tr);
Can anybody please explain the disadvantages of only casting a single ray instead of both of them? I had never heard of this technique and I am curious what the trade-off is, given that it results in a huge complexity reduction.

This is a monte-carlo ray tracer. Its advantages are that you don't spawn an exponentially increasing number of rays - which can occur in some simple geometries.. The down side is that you need to average over a large number of samples. Typically you sample until the expected deviation from the true value is "low enough". Working out how many samples is required requires some stats - or you just take a lot of samples.

Presumably he's relying on super-sampling pixels and trusting that the average colour will work out roughly correct, although not as accurate.
i.e. fire 4 rays through one pixel and on average 2 are reflected, 2 are refracted.
Combine them to get an approximation of one ray reflected and refracted.

Related

Linear Interpolation and Object Collision

I have a physics engine that uses AABB testing to detect object collisions and an animation system that does not use linear interpolation. Because of this, my collisions act erratically at times, especially at high speeds. Here is a glaringly obvious problem in my system...
For the sake of demonstration, assume a frame in our animation system lasts 1 second and we are given the following scenario at frame 0.
At frame 1, the collision of the objects will not bet detected, because c1 will have traveled past c2 on the next draw.
Although I'm not using it, I have a bit of a grasp on how linear interpolation works because I have used linear extrapolation in this project in a different context. I'm wondering if linear interpolation will solve the problems I'm experiencing, or if I will need other methods as well.
There is a part of me that is confused about how linear interpolation is used in the context of animation. The idea is that we can achieve smooth animation at low frame rates. In the above scenario, we cannot simply just set c1 to be centered at x=3 in frame 1. In reality, they would have collided somewhere between frame 0 and frame 1. Does linear interpolation automatically take care of this and allow for precise AABB testing? If not, what will it solve and what other methods should I look into to achieve smooth and precise collision detection and animation?
The phenomenon you are experiencing is called tunnelling, and is a problem inherent to discrete collision detection architectures. You are correct in feeling that linear interpolation may have something to do with the solution as it can allow you to, within a margin of error (usually), predict the path of an object between frames, but this is just one piece of a much larger solution. The terminology I've seen associated with these types of solutions is "Continuous Collision Detection". The topic is large and gets quite complex, and there are books that discuss it, such as Real Time Collision Detection and other online resources.
So to answer your question: no, linear interpolation on its own won't solve your problems*. Unless you're only dealing with circles or spheres.
What to Start Thinking About
The way the solutions look and behave are dependant on your design decisions and are generally large. So just to point in the direction of the solution, the fundamental idea of continuous collision detection is to figure out: How far between the early frame and the later frame does the collision happen, and in what position and rotation are the two objects at this point. Then you must calculate the configuration the objects will be in at the later frame time in response to this. Things get very interesting addressing these problems for anything other than circles in two dimensions.
I haven't implemented this but I've seen described a solution where you march the two candidates forward between the frames, advancing their position with linear interpolation and their orientation with spherical linear interpolation and checking with discrete algorithms whether they're intersecting (Gilbert-Johnson-Keerthi Algorithm). From here you continue to apply discrete algorithms to get the smallest penetration depth (Expanding Polytope Algorithm) and pass that and the remaining time between the frames, along to a solver to get how the objects look at your later frame time. This doesn't give an analytic answer but I don't have knowledge of an analytic answer for generalized 2 or 3D cases.
If you don't want to go down this path, your best weapon in the fight against complexity is assumptions: If you can assume your high velocity objects can be represented as a point things get easier, if you can assume the orientation of the objects doesn't matter (circles, spheres) things get easier, and it keeps going and going. The topic is beyond interesting and I'm still on the path of learning it, but it has provided some of the most satisfying moments in my programming period. I hope these ideas get you on that path as well.
Edit: Since you specified you're working on a billiard game.
First we'll check whether discrete or continuous is needed
Is any amount of tunnelling acceptable in this game? Not in billiards
no.
What is the speed at which we will see tunnelling? Using a 0.0285m
radius for the ball (standard American) and a 0.01s physics step, we
get 2.85m/s as the minimum speed that collisions start giving bad
response. I'm not familiar with the speed of billiard balls but that
number feels too low.
So just checking on every frame if two of the balls are intersecting is not enough, but we don't need to go completely continuous. If we use interpolation to subdivide each frame we can increase the velocity needed to create incorrect behaviour: With 2 subdivisions we get 5.7m/s, which is still low; 3 subdivisions gives us 8.55m/s, which seems reasonable; and 4 gives us 11.4m/s which feels higher than I imagine billiard balls are moving. So how do we accomplish this?
Discrete Collisions with Frame Subdivisions using Linear Interpolation
Using subdivisions is expensive so it's worth putting time into candidate detection to use it only where needed. This is another problem with a bunch of fun solutions, and unfortunately out of scope of the question.
So you have two candidate circles which will very probably collide between the current frame and the next frame. So in pseudo code the algorithm looks like:
dt = 0.01
subdivisions = 4
circle1.next_position = circle1.position + (circle1.velocity * dt)
circle2.next_position = circle2.position + (circle2.velocity * dt)
for i from 0 to subdivisions:
temp_c1.position = interpolate(circle1.position, circle1.next_position, (i + 1) / subdivisions)
temp_c2.position = interpolate(circle2.position, circle2.next_position, (i + 1) / subdivisions)
if intersecting(temp_c1, temp_c2):
intersection confirmed
no intersection
Where the interpolate signature is interpolate(start, end, alpha)
So here you have interpolation being used to "move" the circles along the path they would take between the current and the next frame. On a confirmed intersection you can get the penetration depth and pass the delta time (dt / subdivisions), the two circles, the penetration depth and the collision points along to a resolution step that determines how they should respond to the collision.

Demosaicing algorithm that contains downsampling

Introduction: What I am working on.
Hello everyone! I am working on a Demosaicing algorithm which I use to transform images that have Bayer pattern into images that represent red, green and blue channels. I wish that the algorithm would have the following properties:
It preserves as much raw information as possible.
It does not obscure details in the image, even if that means absence of denoising.
It produces as little artifacts as possible.
If the size of mosaic image is N x N, the three color images should each have size N/2 x N/2.
Algorithm should be fast. To put "fast" into a context, let me say this: I will settle for something that is at least twice as fast as OpenCV's algorithm which uses bilinear interpolation.
What I have achieved so far.
So far, I've come up with the algorithm that uses bilinear interpolation and produces three images that have half size of the mosaic image. The algorithm is approximately 3-4 times faster than OpenCV's cvtColor algorithm that performs CV_BayerBG2BGR conversion (bilinear interpolation).
See the sketch of the Bayer pattern below to get an idea about how it works. I perform the interpolation at points marked by circles. The numbers represent the coefficients by which I multiply the underlying pixels in order to get interpolated value in the point marked by black circle.
You can observe the results of my algorithm below. I've also added the results of both demosaicing algorithms that are available in OpenCV (bilinear interpolation and variable number of gradients). Please note that while results of my algorithm look really poor in comparison, the OpenCV's bilinear interpolation results look almost exactly the same if I downsample them. This is of course expected as the underlying algorithm is the same.
... so finally: the question.
My current solution gives acceptable results for my project and it is also acceptably fast. However, I would be willing to use a up to twice slower algorithm if that would bring improvements to any of the 5 criteria listed above. The question then is: how to improve my algorithm without significantly hindering the performance?
I have enough programming experience for this task so I am not specifically asking for code snippets - the answers of any kind (code, links, suggestions - especially the ones based on past experiences) are welcome.
Some additional information:
I am working in C++.
The algorithm is highly optimized, it uses SSE instructions and it is non-parallel.
I work with large images (few MB in size); cache-awareness and avoiding multiple passes through image are very important.
I am not looking for general programming advice (such as optimization in general, etc.), but on the other hand some task-specific answers are more than welcome. Thank you in advance.
High quality results are obtained by filtering the samples to their Nyquist frequency. Usually that's half the sample rate, but in this case since your red and blue samples only come at 1/2 the pixel rate then Nyquist will be 1/4 the sample rate. When you resize you need to filter to the lower of the Nyquist rate of the input and output, but since your output is 1/2 your input you again need to filter to 1/4.
The perfect filter is the Sinc function; it delivers 100% of the signal below the cutoff frequency and none above the cutoff. Unfortunately it's completely impractical, extending as it does to infinity. For practical applications a windowed Sinc is used instead, the most common of these being the Lanczos kernel. The window size is chosen on the basis of quality vs. speed, with higher orders being closer to the ideal Sinc function. In your case since speed is paramount I will suggest Lanczos2.
The cutoff frequency of the filter is inversely proportional to its width. Since in this case we want the cutoff to be half of the normal cutoff, the filter will be stretched to twice its normal width. Ordinarily a Lanczos2 filter will require inputs up to but not including +/-2 pixels from the center point; stretching it by 2 requires inputs up to +/-4 from the center.
The choice of a center point is completely arbitrary once you have a good cutoff filter. In your case you chose a point that was midway between 4 sample points. If we choose instead a point that is exactly on one of our input samples, some interesting things happen. Many of the filter coefficients become zero, which means those pixels don't have to be included in the calculations. In the example below I've centered on the Red pixel, and we find that red pixels need no filtering at all! Following is a diagram with Lanczos2 filter values scaled so that they total to 1.0 for each color, followed by the formulas that result.
red = p[x][y]
green = (p[x][y-3] + p[x-3][y] + p[x+3][y] + p[x][y+3]) * -0.03125 +
(p[x][y-1] + p[x-1][y] + p[x+1][y] + p[x][y+1]) * 0.28125
blue = (p[x-3][y-3] + p[x+3][y-3] + p[x-3][y+3] + p[x+3][y+3]) * 0.00391 +
(p[x-1][y-3] + p[x+1][y-3] + p[x-3][y-1] + p[x+3][y-1] + p[x-3][y+1] + p[x+3][y+1] + p[x-1][y+3] + p[x+1][y+3]) * -0.03516 +
(p[x-1][y-1] + p[x+1][y-1] + p[x-1][y+1] + p[x+1][y+1]) * 0.31641
If you'd prefer to keep everything in the integer domain this works very well with fixed point numbers too.
red = p[x][y]
green = ((p[x][y-3] + p[x-3][y] + p[x+3][y] + p[x][y+3]) * -32 +
(p[x][y-1] + p[x-1][y] + p[x+1][y] + p[x][y+1]) * 288) >> 10
blue = ((p[x-3][y-3] + p[x+3][y-3] + p[x-3][y+3] + p[x+3][y+3]) * 4 +
(p[x-1][y-3] + p[x+1][y-3] + p[x-3][y-1] + p[x+3][y-1] + p[x-3][y+1] + p[x+3][y+1] + p[x-1][y+3] + p[x+1][y+3]) * -36 +
(p[x-1][y-1] + p[x+1][y-1] + p[x-1][y+1] + p[x+1][y+1]) * 324) >> 10
The green and blue pixel values may end up outside of the range 0 to max, so you'll need to clamp them when the calculation is complete.
I'm a bit puzzled by your algorithm and won't comment on it... but to put some things into perspective...
OpenCV is a library which contains a lot of generic stuff to get the job done, and sometimes is deliberately not performance-over-optimized, there is a cost-maintainability tradeoff and “good enough is better than better”.
There is a trove of people selling performance-optimized libraries implementing some of OpenCV's features, sometimes with the exact same API.
I have not used it, but OpenCV has a cv::gpu::cvtColor() which could be achieving your goals, out of the box, assuming it's implemented for demosaicing, and that you have a suitable GPU.
Considering the bilinear demosaicing, a less-maintainable but more-optimized CPU implementation can run much faster than the one from OpenCV, I'd estimate above 250 Mpx/s on one mainstream CPU core.
Now to elaborate on the optimization path...
First, because demosaicing is a local operation, cache awareness is really not a significant problem.
An performance-optimized implementation will have different code paths depending on the image dimensions, Bayer pattern type, instruction sets supported by the CPU (and their speed/latency), for such a simple algorithm it's going to become a lot of code.
There are SIMD instructions to perform shuffling, arithmetic including averaging, streaming memory writes, which you'd find useful. Intel's summary is not so bad to navigate, and Agner Fog's site is also valuable for any kind of implementation optimization. AVX & AVX2 provide several interesting instructions for pixel processing.
If you are more the 80/20 kind of person (good for you!), you'll appreciate working with a tool like Halide which can generate optimized stencil code like a breeze (modulo the learning curve, which is already setting you back a few days from a 1-hour naive implementation or the 10 minutes using OpenCV) and especially handles the boundary conditions (image borders).
You can get a little further (or take an alternative road) by using compiler intrinsics to access specific CPU instructions, at this point your code is now 4x costlier (in terms of development cost), and will probably get you 99% as far as hand-crafted assembly ($$$ x4 again).
If you want to squeeze the last drop (not generally advisable), you will definitely have to perform days of implementation benchmarks, to see which sequence of instructions can get you the best performance.
But also, GPUs... you can use your integrated GPU to perform demosaicing, it may be a little faster than the CPU and has access to the main memory... of course you'd have to care about pre-allocating shared buffers. A discrete GPU would have a more significant transfer overhead, at these fill rates.
I know I'm late to the discussion but I want to contribute my ideas as this is a problem that I have been thinking about myself. I see how you determined your coefficients from a linear interpolation of the 4 closest red or blue pixels. This would give you the best possible result if the intensity per color channel would vary linearly.
De-bayering artifacts, however, are most significant at color edges. In your case you would interpolate over the color edge, which would give you worse results than simply picking the closest red or blue pixel.
This is why I average the green pixels and take the closest red and blue pixel for my combined De-bayering and down-sampling. I believe that this should work better at color edges, but it would work less well for image areas with gradually varying color.
I haven't yet had an idea yet of what would be the optimal way to do this.
I actually implemented exactly what you're talking about here, in Halide.
You should read the paper by Morgan McGuire that I used as a reference... it's less about how much the neighbors factor into the output identity pixel and more about which pixels you're looking at to do a straight average.

Simplest way to simulate basic diffusion over a 3D matrix?

I'm currently writing a program that will simulate in very basic terms the diffusion and pressure of a gas in a 3D volume with boundaries throughout - Think for example an ant cave.
The effects I want to achieve:
1. Gas diffuses throughout the environment over time, respecting walls.
2. I'd like to measure pressure, or the compression of the gas, per grid point. The effect of this should be that if a room is opened the gas will diffuse out of the opening in a speed that reflects the pressure difference.
My problem is that I lack the knowledge to fully understand theoretical math equations, and to be honest I'm really not looking for an accurate simulation. I'd just want to achieve some of the prominent effects of the physics at play. I'm not interested in fluid dynamics (For example the simulation of smoke).
I'll be writing this program in OpenCL but happy to take any form of code examples, be it C or pseudo code.
I'm thinking I should pass in 3 3D arrays - One for the gas, one that defines the walls (eg 1 at xyz = wall), and one to store the pressure.
I'm currently assuming checking for the wall is easy enough. One simply checks each neighbour cell for it and account for the cell if its not a wall:
For each grid point,
is wallmatrix[x+1] a wall?
[diffusion math here]
is wallmatrix[x-1] a wall?
[diffusion math here]
is wallmatrix[y+1] a wall?
[diffusion math here]
etc...
But I'm just not sure what to do with the diffusion math, nor how I would include pressure in all this?
Diffusion is one of the easiest things to simulate because it's self smoothing.
For example, you could run your simulation in terms of constant time steps and keeping track of the individual particle positions, and at each time step move each particle a fixed (small) distance, but in a random direction.
There are other ways too, for example, you can do a grid based approach, where change the number of particles in each grid location.
A slight issue with your question is where you say, "diffuse out of the opening in a speed that reflects the pressure difference". Diffusion doesn't really do this, since diffusion is just the random motion of particles. I think, though, that even straight diffusion might look satisfying to you here, since the gas will diffuse out of an opening, and it will look faster. Really what will be happening though is that it will be diffusing out at the same speed as everywhere else, it's just that nothing will be diffusing back in. Still, if this isn't satisfying, then you will need to get into fluid dynamics, at least a bit, since this is how one describes how fluid behaves when there's a pressure gradient, not diffusion.
Well, this is not an easy task!
First of all: you want to simulate basic diffusion OR the complete motion of the gas? The second case isn't easy at all, but you can get an idea here.
If you just want to diffuse a gas in an static environment, things are easier but you can't measure the
total pressure, your only variable will be the local concentration of the gas.
This phenomena is governed by the Fick laws; the second one is probably what you are looking for.
Read for finite difference methods to understand how to discretize the diffusion equation.
The subject is quite big to write a complete answer here.

Finding curvature from a noisy set of data points using 2d/3dsplines? (C++)

I am trying to extract the curvature of a pulse along its profile (see the picture below). The pulse is calculated on a grid of length and height: 150 x 100 cells by using Finite Differences, implemented in C++.
I extracted all the points with the same value (contour/ level set) and marked them as the red continuous line in the picture below. The other colors are negligible.
Then I tried to find the curvature from this already noisy (due to grid discretization) contour line by the following means:
(moving average already applied)
1) Curvature via Tangents
The curvature of the line at point P is defined by:
So the curvature is the limes of angle delta over the arclength between P and N. Since my points have a certain distance between them, I could not approximate the limes enough, so that the curvature was not calculated correctly. I tested it with a circle, which naturally has a constant curvature. But I could not reproduce this (only 1 significant digit was correct).
2) Second derivative of the line parametrized by arclength
I calculated the first derivative of the line with respect to arclength, smoothed with a moving average and then took the derivative again (2nd derivative). But here I also got only 1 significant digit correct.
Unfortunately taking a derivative multiplies the already inherent noise to larger levels.
3) Approximating the line locally with a circle
Since the reciprocal of the circle radius is the curvature I used the following approach:
This worked best so far (2 correct significant digits), but I need to refine even further. So my new idea is the following:
Instead of using the values at the discrete points to determine the curvature, I want to approximate the pulse profile with a 3 dimensional spline surface. Then I extract the level set of a certain value from it to gain a smooth line of points, which I can find a nice curvature from.
So far I could not find a C++ library which can generate such a Bezier spline surface. Could you maybe point me to any?
Also do you think this approach is worth giving a shot, or will I lose too much accuracy in my curvature?
Do you know of any other approach?
With very kind regards,
Jan
edit: It seems I can not post pictures as a new user, so I removed all of them from my question, even though I find them important to explain my issue. Is there any way I can still show them?
edit2: ok, done :)
There is ALGLIB that supports various flavours of interpolation:
Polynomial interpolation
Rational interpolation
Spline interpolation
Least squares fitting (linear/nonlinear)
Bilinear and bicubic spline interpolation
Fast RBF interpolation/fitting
I don't know whether it meets all of your requirements. I personally have not worked with this library yet, but I believe cubic spline interpolation could be what you are looking for (two times differentiable).
In order to prevent an overfitting to your noisy input points you should apply some sort of smoothing mechanism, e.g. you could try if things like Moving Window Average/Gaussian/FIR filters are applicable. Also have a look at (Cubic) Smoothing Splines.

How to go about benchmarking a software rasterizer

Ok, ive been developing a software rasterizer for some time now, but have no idea how to go about benchmarking it to see if its actually any good.... i mean say you can render X amount of verts ant Y frames per second, what would be a good way to analyse this data to see if its any good? rather than someone just saying
"30 fps with 1 light is good" etc?
What do you want to measure? I suggest fillrate and triangle rate. Basically fillrate is how many pixels your rasterizer can spit out each second, Triangle rate is how many triangles your rasterizer + affine transformation functions can push out each second, independent of the fillrate. Here's my suggestion for measuring both:
To measure the fillrate without getting noise from the time used for the triangle setup, use only two triangles, which forms a quad. Start with a small size, and then increase it with a small interval. You should eventually find an optimal size with respect to the render time of one second. If you don't, you can perform blending, with full-screen triangle pairs, which is a pretty slow operation, and which only burns fillrate. The fillrate becomes width x height of your rendered triangle. For example, 4 megapixels / second.
To measure the triangle rate, do the same thing; only for triangles this time. Start with two tiny triangles, and increase the number of triangles until the rendering time reaches one second. The time used by the triangle/transformation setup is much more apparent in small triangles than the time used to fill it. The units is triangle count/second.
Also, the overall time used to render a frame might be comparable too. The render time for a frame is the derivative of the global time, i.e delta time. The reciprocal of the delta time is the number of frames per second, if that delta time was constant for all frames.
Of course, for these numbers to be half-way comparable across rasterizers, you have to use the same techniques and features. Comparing numbers from a rasterizer which uses per-pixel lighting against another which uses flat-shading doesn't make much sense. Resolution and color depth should also be equal.
As for optimization, getting a proper profiler should do the trick. GCC has the GNU profiler gprof. If you want an opinion on clever things to optimize in a rasterizer, ask that as a seperate question. I'll answer to the best of my ability.
If you want to determine if it's "any good" you will need to compare your rasterizer with other rasterizers. "30 fps with 1 light" might be extremely good, if no-one else has ever managed to go beyond, say, 10 fps.