How to port and debug math functions into shader code? - glsl

I'm trying to learn to apply the techniques to sdf render a landscape from Painting a Landscape with Maths. In the video, Quilez shows this formula:
But I seem to be incorrectly implementing it into glsl. I'm very new to
shaders. This is what I have so far:
#define PI 3.141593
// a_ij = 2 {uv(u+v)} - 1
float uv_to_cooef(in vec2 uv) {
return 2.0 * fract(uv.x * uv.y * (uv.x + uv.y)) - 1.0;
}
// S(a,b,x) = 3λ^2 - 2λ^3
float smoothstep01(in float x) {
return smoothstep(0.0, 1.0, x);
}
float terrain(in vec2 p) {
vec2 ij = floor(p);
vec2 i = vec2(1.0, 0.0);
vec2 j = vec2(0.0, 1.0);
float s = 500.0;
float a = uv_to_cooef(s * fract(ij / PI));
float b = uv_to_cooef(s * fract((ij + i) / PI));
float c = uv_to_cooef(s * fract((ij + j) / PI));
float d = uv_to_cooef(s * fract((ij + i + j) / PI));
float s_xi = smoothstep01(p.x - ij.x);
float s_yj = smoothstep01(p.y - ij.y);
return a
+ (b - a) * s_xi
+ (c - a) * s_yj;
+ (a - b - c + d) * s_xi * s_yj;
}
// Use raymarching setup in https://www.shadertoy.com/view/tdS3DG but change map to:
vec2 map( in vec3 p, int id ) {
return vec2(p.y - terrain(p.xz), 1.0);
}
I get discontinous junk:
What are some good ways to debug converting math functions to glsl?
Replacing map with a simpler continuous function produces nice results:
vec2 map( in vec3 p, int id ) {
float d2 = p.y + sin(p.x / 3.0) * cos(p.y / 7.0);
return vec2(d2,2.0);
}
So I think that means my function isn't continuous, but how to debug that? Break it into 2d functions and graph it on desmos?
The completed sdf scene is on shadertoy, but I want to understand how to get from math to that.

Turns out my problem was an extra semicolon and glsl didn't complain about the pointless statement on the next line.
return a
+ (b - a) * s_xi
+ (c - a) * s_yj; // bad semicolon
+ (a - b - c + d) * s_xi * s_yj;
The steps I took to try to debug:
Visualize the function in another format. I ported the function to desmos and it was discontinuous.
I fixed my desmos (the definition of a(i,j) was wrong), but couldn't see how the fixed version was different from my glsl code. My next step was going to be to reduce the function to a simpler continuous one in desmos and then apply the same changes to my glsl code.
Instead, I noodled around with other code and noticed the semicolon by accident. So luck?

Related

How does Mathf.SmoothDamp() work? what is it algorithm?

I was wondering how SmoothDamp works in unity. I'm trying to re-create the function outside unity but the thing is I don't know how it works.
From Unity3d C# reference source code:
// Gradually changes a value towards a desired goal over time.
public static float SmoothDamp(float current, float target, ref float currentVelocity, float smoothTime, [uei.DefaultValue("Mathf.Infinity")] float maxSpeed, [uei.DefaultValue("Time.deltaTime")] float deltaTime)
{
// Based on Game Programming Gems 4 Chapter 1.10
smoothTime = Mathf.Max(0.0001F, smoothTime);
float omega = 2F / smoothTime;
float x = omega * deltaTime;
float exp = 1F / (1F + x + 0.48F * x * x + 0.235F * x * x * x);
float change = current - target;
float originalTo = target;
// Clamp maximum speed
float maxChange = maxSpeed * smoothTime;
change = Mathf.Clamp(change, -maxChange, maxChange);
target = current - change;
float temp = (currentVelocity + omega * change) * deltaTime;
currentVelocity = (currentVelocity - omega * temp) * exp;
float output = target + (change + temp) * exp;
// Prevent overshooting
if (originalTo - current > 0.0F == output > originalTo)
{
output = originalTo;
currentVelocity = (output - originalTo) / deltaTime;
}
return output;
}

Optimizing sphere fold function (GLSL)

There is a sphere fold function, which transforms space based on the distance from origin. There are two constant parameters: fR2 and mR2. The function looks like this:
const float fR2 = 1.0;
const float mR2 = 0.25;
vec3 s_fold(in vec3 v) {
float mag = dot(v, v);
if (mag < mR2)
{
v = v * fR2 / mR2;
}
else if (mag < fR2)
{
v = v * fR2 / mag;
}
return v;
}
It works, but because of the if-else branch, it runs slowly. It is possible to get rid of the branching, if I use the step function:
float a = step(mR2 , mag);
float b = step(fR2 , mag);
float sc = dot( vec3(
1.0-a,
a*(1.0-b),
b
), vec3(
fR2 / mR2,
fR2 / mag,
1.0)
);
return sc*v;
It is now a bit faster, but I would like to optimize it further. I've found a one-line solution, but it gives me a different result:
return v*clamp(max(mR2/mag,mR2),0.0,fR2);
It is unclear to me, how is it possible to calculate it with a single clamp.
Your if-else branch is a function like res= v * F where F is one of three possible values: a constant (fR2/mR2), a (hyperbolic) variable (fR2/mag), and a constant again (1.0).
clamp(x, minVal, maxVal) does this same logic (const-var-const). So you can write:
res = v * clamp(fR2/mag, 1.0, fR2/mR2); //with fR2 >= mR2
Let's write it in other way:
res = v * fR2 * clamp(1.0/mag, 1.0/fR2, 1.0/mR2);
We can avoid two divisions because 1/mR2 <= 1/fR2 implies fR2 >= mR2
res = v * fR2 / clamp(mag, mR2, fR2);
The final code is
const float fR2 = 1.0;
const float mR2 = 0.25;
vec3 s_fold(in vec3 v) {
return v * fR2 / clamp(dot(v, v), mR2, fR2);
}
Both your solution and the posted "liner" suffer a potential division by zero when mag=0

Perlin Noise algorithm does not seem to produce gradient noise

I am attempting to implement Perlin Noise in c++.
Firstly, the problem (I think) is that the output is not what I expect. Currently I simply use the generated Perlin Noise values in a greyscaled image, and this is the results I get:
However, from my understanding, it's supposed to look more along the lines of:
That is, the noise I am producing currently seems to be more along the lines of "standard" irregular noise.
This is the Perlin Noise Algorithm I have implemented so far:
float perlinNoise2D(float x, float y)
{
// Find grid cell coordinates
int x0 = (x > 0.0f ? static_cast<int>(x) : (static_cast<int>(x) - 1));
int x1 = x0 + 1;
int y0 = (y > 0.0f ? static_cast<int>(y) : (static_cast<int>(y) - 1));
int y1 = y0 + 1;
float s = calculateInfluence(x0, y0, x, y);
float t = calculateInfluence(x1, y0, x, y);
float u = calculateInfluence(x0, y1, x, y);
float v = calculateInfluence(x1, y1, x, y);
// Local position in the grid cell
float localPosX = 3 * ((x - (float)x0) * (x - (float)x0)) - 2 * ((x - (float)x0) * (x - (float)x0) * (x - (float)x0));
float localPosY = 3 * ((y - (float)y0) * (y - (float)y0)) - 2 * ((y - (float)y0) * (y - (float)y0) * (y - (float)y0));
float a = s + localPosX * (t - s);
float b = u + localPosX * (v - u);
return lerp(a, b, localPosY);
}
The function calculateInfluence has the job of generating the random gradient vector and distance vector for one of the corner points of the current grid cell and returning the dot product of these. It is implemented as:
float calculateInfluence(int xGrid, int yGrid, float x, float y)
{
// Calculate gradient vector
float gradientXComponent = dist(rdEngine);
float gradientYComponent = dist(rdEngine);
// Normalize gradient vector
float magnitude = sqrt( pow(gradientXComponent, 2) + pow(gradientYComponent, 2) );
gradientXComponent = gradientXComponent / magnitude;
gradientYComponent = gradientYComponent / magnitude;
magnitude = sqrt(pow(gradientXComponent, 2) + pow(gradientYComponent, 2));
// Calculate distance vectors
float dx = x - (float)xGrid;
float dy = y - (float)yGrid;
// Compute dot product
return (dx * gradientXComponent + dy * gradientYComponent);
}
Here, dist is a random number generator from C++11:
std::mt19937 rdEngine(1);
std::normal_distribution<float> dist(0.0f, 1.0f);
And lerp is simply implemented as:
float lerp(float v0, float v1, float t)
{
return ( 1.0f - t ) * v0 + t * v1;
}
To implement the algorithm, I primarily made use of the following two resources:
Perlin Noise FAQ
Perlin Noise Pseudo Code
It's difficult for me to pinpoint exactly where I seem to be messing up. It could be that I am generating the gradient vectors incorrectly, as I'm not quite sure what type of distribution they should have. I have tried with a uniform distribution, however this seemed to generate repeating patterns in the texture!
Likewise, it could be that I am averaging the influence values incorrectly. It has been a bit difficult to discern exactly how it should be done from from the Perlin Noise FAQ article.
Does anyone have any hints as to what might be wrong with the code? :)
It seems like you are only generating a single octave of Perlin Noise. To get a result like the one shown, you need to generate multiple octaves and add them together. In a series of octaves, each octave should have a grid cell size double that of the last.
To generate multi-octave noise, use something similar to this:
float multiOctavePerlinNoise2D(float x, float y, int octaves)
{
float v = 0.0f;
float scale = 1.0f;
float weight = 1.0f;
float weightTotal = 0.0f;
for(int i = 0; i < octaves; i++)
{
v += perlinNoise2D(x * scale, y * scale) * weight;
weightTotal += weight;
// "ever-increasing frequencies and ever-decreasing amplitudes"
// (or conversely decreasing freqs and increasing amplitudes)
scale *= 0.5f;
weight *= 2.0f;
}
return v / weightTotal;
}
For extra randomness you could use a differently seeded random generator for each octave. Also, the weights given to each octave can be varied to adjust the aesthetic quality of the noise. If the weight variable is not adjusted each iteration, then the example above is "pink noise" (each doubling of frequency carries the same weight).
Also, you need to use a random number generator that returns the same value each time for a given xGrid, yGrid pair.

Making a 3D graphics engine, my Translation matrix doesn't work for positions equal to 0

Hi I'm making a 3D graphics engine for an assignment that is due later tonight, it's going smoothly at the moment except I'm loading a cube model from an .obj file, the positions start at 0.
My transformation matrix works for numbers that don't = 0. I mean if X = 0 and I try to translate it by 10 on the X Axis, it returns 0.
Matrix * Vector:
Vec4 Mat4::operator*(const Vec4& v) const
{
Vec4 tmp(0, 0, 0, 0, 255, 255, 255, 255);
tmp.x = (this->data[0] * v.x) + (this->data[4] * v.y) + (this->data[8] * v.z) + (this->data[12] * v.w);
tmp.y = (this->data[1] * v.x) + (this->data[5] * v.y) + (this->data[9] * v.z) + (this->data[13] * v.w);
tmp.z = (this->data[2] * v.x) + (this->data[6] * v.y) + (this->data[10] * v.z) + (this->data[14] * v.w);
tmp.w = (this->data[3] * v.x) + (this->data[7] * v.y) + (this->data[11] * v.z) + (this->data[15] * v.w);
return tmp;
}
Translate Matrix:
Mat4 Mat4::translate(float x, float y, float z)
{
Mat4 tmp;
tmp.data[12] = x;
tmp.data[13] = y;
tmp.data[14] = z;
return tmp;
}
A Mat4 class by default is an identity matrix.
It is too late now, but... it might be helpful to know the following:
A vector strictly equal to 0.0 (e.g. <0,0,0,0>) cannot be translated using matrix multiplication and technically should not be considered a position in this context. In fact, such a vector is not even representative of a direction because it has 0 length. It is simply zero; there are not a whole lot of uses for a vector that cannot be rotated or translated.
You can rotate vectors with 0.0 for the W coordinate, but the value 0.0 for W prevents translation.
Generally you want a W coordinate of 1.0 for spatial (e.g. position) vectors and 0.0 for directional (e.g. normal).
If you want to understand this better, you need to consider how your 4x4 matrix is setup. The first 3 rows or columns (depending on which convention you use) store rotation, and the 4th stores translation.
Consider how translation is applied when you multiply your matrix and vector:
x = ... + (this->data[12] * v.w);
y = ... + (this->data[13] * v.w);
z = ... + (this->data[14] * v.w);
w = ... + (this->data[15] * v.w);
If v.w is 0.0, then translation evaluates to 0.0 for all coordinates.

Efficient Bicubic filtering code in GLSL?

I'm wondering if anyone has complete, working, and efficient code to do bicubic texture filtering in glsl. There is this:
http://www.codeproject.com/Articles/236394/Bi-Cubic-and-Bi-Linear-Interpolation-with-GLSL
or
https://github.com/visionworkbench/visionworkbench/blob/master/src/vw/GPU/Shaders/Interp/interpolation-bicubic.glsl
but both do 16 texture reads where only 4 are necessary:
https://groups.google.com/forum/#!topic/comp.graphics.api.opengl/kqrujgJfTxo
However the method above uses a missing "cubic()" function that I don't know what it is supposed to do, and also takes an unexplained "texscale" parameter.
There is also the NVidia version:
https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-quality-rendering/chapter-20-fast-third-order-texture-filtering
but I believe this uses CUDA, which is specific to NVidia's cards. I need glsl.
I could probably port the nvidia version to glsl, but thought I'd ask first to see if anyone already has a complete, working glsl bicubic shader.
I found this implementation which can be used as a drop-in replacement for texture() (from http://www.java-gaming.org/index.php?topic=35123.0 (one typo fixed)):
// from http://www.java-gaming.org/index.php?topic=35123.0
vec4 cubic(float v){
vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v;
vec4 s = n * n * n;
float x = s.x;
float y = s.y - 4.0 * s.x;
float z = s.z - 4.0 * s.y + 6.0 * s.x;
float w = 6.0 - x - y - z;
return vec4(x, y, z, w) * (1.0/6.0);
}
vec4 textureBicubic(sampler2D sampler, vec2 texCoords){
vec2 texSize = textureSize(sampler, 0);
vec2 invTexSize = 1.0 / texSize;
texCoords = texCoords * texSize - 0.5;
vec2 fxy = fract(texCoords);
texCoords -= fxy;
vec4 xcubic = cubic(fxy.x);
vec4 ycubic = cubic(fxy.y);
vec4 c = texCoords.xxyy + vec2 (-0.5, +1.5).xyxy;
vec4 s = vec4(xcubic.xz + xcubic.yw, ycubic.xz + ycubic.yw);
vec4 offset = c + vec4 (xcubic.yw, ycubic.yw) / s;
offset *= invTexSize.xxyy;
vec4 sample0 = texture(sampler, offset.xz);
vec4 sample1 = texture(sampler, offset.yz);
vec4 sample2 = texture(sampler, offset.xw);
vec4 sample3 = texture(sampler, offset.yw);
float sx = s.x / (s.x + s.y);
float sy = s.z / (s.z + s.w);
return mix(
mix(sample3, sample2, sx), mix(sample1, sample0, sx)
, sy);
}
Example: Nearest, bilinear, bicubic:
The ImageData of this image is
{{{0.698039, 0.996078, 0.262745}, {0., 0.266667, 1.}, {0.00392157,
0.25098, 0.996078}, {1., 0.65098, 0.}}, {{0.996078, 0.823529,
0.}, {0.498039, 0., 0.00392157}, {0.831373, 0.00392157,
0.00392157}, {0.956863, 0.972549, 0.00784314}}, {{0.909804,
0.00784314, 0.}, {0.87451, 0.996078, 0.0862745}, {0.196078,
0.992157, 0.760784}, {0.00392157, 0.00392157, 0.498039}}, {{1.,
0.878431, 0.}, {0.588235, 0.00392157, 0.00392157}, {0.00392157,
0.0666667, 0.996078}, {0.996078, 0.517647, 0.}}}
I tried to reproduce this (many other interpolation techniques)
but they have clamped padding, while I have repeating (wrapping) boundaries. Therefore it is not exactly the same.
It seems this bicubic business is not a proper interpolation, i.e. it does not take on the original values at the points where the data is defined.
I decided to take a minute to dig my old Perforce activities and found the missing cubic() function; enjoy! :)
vec4 cubic(float v)
{
vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v;
vec4 s = n * n * n;
float x = s.x;
float y = s.y - 4.0 * s.x;
float z = s.z - 4.0 * s.y + 6.0 * s.x;
float w = 6.0 - x - y - z;
return vec4(x, y, z, w);
}
Wow. I recognize the code above (I can not comment w/ reputation < 50) as I came up with it in early 2011. The problem I was trying to solve was related to old IBM T42 (sorry the exact model number escapes me) laptop and it's ATI graphics stack. I developed the code on NV card and originally I used 16 texture fetches. That was kinda of slow but fast enough for my purposes. When someone reported it did not work on his laptop it became apparent that they did not support enough texture fetches per fragment. I had to engineer a work-around and the best I could come up with was to do it with number of texture fetches that would work.
I thought about it like this: okay, so if I handle each quad (2x2) with linear filter the remaining problem is can the rows and columns share the weights? That was the only problem on my mind when I set out to craft the code. Of course they could be shared; the weights are same for each column and row; perfect!
Now I had four samples. The remaining problem was how to correctly combine the samples. That was the biggest obstacle to overcome. It took about 10 minutes with pencil and paper. With trembling hands I typed the code in and it worked, nice. Then I uploaded the binaries to the guy who promised to check it out on his T42 (?) and he reported it worked. The end. :)
I can assure that the equations check out and give mathematically identical results to computing the samples individually. FYI: with CPU it's faster to do horizontal and vertical scan separately. With GPU multiple passes is not that great idea, especially when it's probably not feasible anyway in typical use case.
Food for thought: it is possible to use a texture lookup for the cubic() function. Which is faster depends on the GPU but generally speaking, the sampler is light on the ALU side just doing the arithmetic would balance things out. YMMV.
The missing function cubic() in JAre's answer could look like this:
vec4 cubic(float x)
{
float x2 = x * x;
float x3 = x2 * x;
vec4 w;
w.x = -x3 + 3*x2 - 3*x + 1;
w.y = 3*x3 - 6*x2 + 4;
w.z = -3*x3 + 3*x2 + 3*x + 1;
w.w = x3;
return w / 6.f;
}
It returns the four weights for cubic B-Spline.
It is all explained in NVidia Gems.
(EDIT)
Cubic() is a cubic spline function
Example:
Texscale is sampling window size coefficient. You can start with 1.0 value.
vec4 filter(sampler2D texture, vec2 texcoord, vec2 texscale)
{
float fx = fract(texcoord.x);
float fy = fract(texcoord.y);
texcoord.x -= fx;
texcoord.y -= fy;
vec4 xcubic = cubic(fx);
vec4 ycubic = cubic(fy);
vec4 c = vec4(texcoord.x - 0.5, texcoord.x + 1.5, texcoord.y -
0.5, texcoord.y + 1.5);
vec4 s = vec4(xcubic.x + xcubic.y, xcubic.z + xcubic.w, ycubic.x +
ycubic.y, ycubic.z + ycubic.w);
vec4 offset = c + vec4(xcubic.y, xcubic.w, ycubic.y, ycubic.w) /
s;
vec4 sample0 = texture2D(texture, vec2(offset.x, offset.z) *
texscale);
vec4 sample1 = texture2D(texture, vec2(offset.y, offset.z) *
texscale);
vec4 sample2 = texture2D(texture, vec2(offset.x, offset.w) *
texscale);
vec4 sample3 = texture2D(texture, vec2(offset.y, offset.w) *
texscale);
float sx = s.x / (s.x + s.y);
float sy = s.z / (s.z + s.w);
return mix(
mix(sample3, sample2, sx),
mix(sample1, sample0, sx), sy);
}
Source
For anybody interested in GLSL code to do tri-cubic interpolation, ray-casting code using cubic interpolation can be found in the examples/glCubicRayCast folder in:
http://www.dannyruijters.nl/cubicinterpolation/CI.zip
edit: The cubic interpolation code is now available on github: CUDA version and WebGL version, and GLSL sample.
I've been using #Maf 's cubic spline recipe for over a year, and I recommend it, if a cubic B-spline meets your needs.
But I recently realized that, for my particular application, it is important for the intensities to match exactly at the sample points. So I switched to using a Catmull-Rom spline, which uses a slightly different recipe like so:
// Catmull-Rom spline actually passes through control points
vec4 cubic(float x) // cubic_catmullrom(float x)
{
const float s = 0.5; // potentially adjustable parameter
float x2 = x * x;
float x3 = x2 * x;
vec4 w;
w.x = -s*x3 + 2*s*x2 - s*x + 0;
w.y = (2-s)*x3 + (s-3)*x2 + 1;
w.z = (s-2)*x3 + (3-2*s)*x2 + s*x + 0;
w.w = s*x3 - s*x2 + 0;
return w;
}
I found these coefficients, plus those for a number of other flavors of cubic splines, in the lecture notes at:
http://www.cs.cmu.edu/afs/cs/academic/class/15462-s10/www/lec-slides/lec06.pdf
I think it is possible that the Catmull version could be done with 4 texture lookups by (a) arranging the input texture like a chessboard with alternate slots saved as positives and as negatives, and (b) an associated modification of textureBicubic. That would rely on the contributions/weights w.x/w.w always being negative, and the contributions w.y/w.z always being positive. I haven't double-checked if this is true, or exactly how the modified textureBicubic would look.
... I have verified that w contributions do satisfy the +ve -ve rules.