Linear/Nonlinear Texture Mapping a Distorted Quad - opengl

In my previous question, it was established that, when texturing a quad, the face is broken down into triangles and the texture coordinates interpolated in an affine manner.
Unfortunately, I do not know how to fix that. The provided link was useful, but it doesn't give the desired effect. The author concludes: "Note that the image looks as if it's a long rectangular quad extending into the distance. . . . It can become quite confusing . . . because of the "false depth perception" that this produces."
What I would like to do is to have the texturing preserve the original scaling of the texture. For example, in the trapezoidal case, I want the vertical spacing of the texels to be the same (example created with paint program):
Notice that, by virtue of the vertical spacing being identical, yet the quad's obvious distortion, straight lines in texture space are no longer straight lines in world space. Thus, I believe the required mapping to be nonlinear.
The question is: is this even possible in the fixed function pipeline? I'm not even sure exactly what the "right answer" is for more general quads; I imagine that the interpolation functions could get very complicated very fast, and I realize that "preserve the original scaling" isn't exactly an algorithm. World-space triangles are no longer linear in texture space.
As an aside, I do not really understand the 3rd and 4th texture coordinates; if someone could point me to a resource, that would be great.

The best approach to a solution working with modern gpu-api's can be found on Nathan Reed's blog.
But you end up with a problem similar to the original problem with the perspective :( I try to solve it and will post a solution once i have it.
Edit:
There is a working sample of a Quad-Texture on shadertoy.
Here is a simplified modified version I did, all honors deserved to Inigo Quilez:
float cross2d( in vec2 a, in vec2 b )
{
return a.x*b.y - a.y*b.x;
}
// given a point p and a quad defined by four points {a,b,c,d}, return the bilinear
// coordinates of p in the quad. Returns (-1,-1) if the point is outside of the quad.
vec2 invBilinear( in vec2 p, in vec2 a, in vec2 b, in vec2 c, in vec2 d )
{
vec2 e = b-a;
vec2 f = d-a;
vec2 g = a-b+c-d;
vec2 h = p-a;
float k2 = cross2d( g, f );
float k1 = cross2d( e, f ) + cross2d( h, g );
float k0 = cross2d( h, e );
float k2u = cross2d( e, g );
float k1u = cross2d( e, f ) + cross2d( g, h );
float k0u = cross2d( h, f);
float v1, u1, v2, u2;
float w = k1*k1 - 4.0*k0*k2;
w = sqrt( w );
v1 = (-k1 - w)/(2.0*k2);
u1 = (-k1u - w)/(2.0*k2u);
bool b1 = v1>0.0 && v1<1.0 && u1>0.0 && u1<1.0;
if( b1 )
return vec2( u1, v1 );
v2 = (-k1 + w)/(2.0*k2);
u2 = (-k1u + w)/(2.0*k2u);
bool b2 = v2>0.0 && v2<1.0 && u2>0.0 && u2<1.0;
if( b2 )
return vec2( u2, v2 )*.5;
return vec2(-1.0);
}
float sdSegment( in vec2 p, in vec2 a, in vec2 b )
{
vec2 pa = p - a;
vec2 ba = b - a;
float h = clamp( dot(pa,ba)/dot(ba,ba), 0.0, 1.0 );
return length( pa - ba*h );
}
vec3 hash3( float n )
{
return fract(sin(vec3(n,n+1.0,n+2.0))*43758.5453123);
}
//added for dithering
bool even(float a)
{
return fract(a/2.0) <= 0.5;
}
void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
vec2 p = (-iResolution.xy + 2.0*fragCoord.xy)/iResolution.y;
// background
vec3 bg = vec3(sin(iTime+(fragCoord.y * .01)),sin(3.0*iTime+2.0+(fragCoord.y * .01)),sin(5.0*iTime+4.0+(fragCoord.y * .01)));
vec3 col = bg;
// move points
vec2 a = sin( 0.11*iTime + vec2(0.1,4.0) );
vec2 b = sin( 0.13*iTime + vec2(1.0,3.0) );
vec2 c = cos( 0.17*iTime + vec2(2.0,2.0) );
vec2 d = cos( 0.15*iTime + vec2(3.0,1.0) );
// area of the quad
vec2 uv = invBilinear( p, a, b, c, d );
if( uv.x>-0.5 )
{
col = texture( iChannel0, uv ).xyz;
}
//mesh or screen door or dithering like in many sega saturn games
fragColor = vec4( col, 1.0 );
}

Wow this is quite a complex problem. Basically you aren't bringing perspective into the equation. Your final x,y coord are divided by the z value you get when you rotate. This means you get a smaller change in texture space (s,t) the further you get from the camera.
However by doing this you, effectively, interpolate linearly as z increases. I wrote an ANCIENT demo that did this back in 1997. This is called "affine" texture mapping.
Thing is because you are dividing x and y by z you actually need to interpolate your values using "1/z". This properly takes into account the perspective that is being applied (when you divide x & y) by z. Hence you end up with "perspective correct" texture mapping.
I wish I could go into more detail than that but over the last 15 odd years of alcohol abuse my memory has got a bit hazy ;) One of these days I'd love to re-visit software rendering as I'm convinced that with the likes of OpenCL taking off it will soon end up being a better way to write a rendering engine!
Anyway, hope thats some help and apologies I can't be more help.
(As an aside when I was figuring all that rendering out 15 years ago I'd loved to have had all the resources available to me now, the internet makes my job and life so much easier. Working everything out from first principles was incredibly painful. I recall initially trying the 1/z interpolation but it made thins smaller as it got closer and I gave up on it when I should've inverted things ... to this day, as my knowledge has increased exponentially, i STILL really wish i'd written a "slow" but fully perspective correct renderer ... one day I'll get round to it ;))
Good luck!

Related

Draw a line segment in a fragment shader

I'm struggling to understand the following code, the idea is to draw a simple segment in a fragment shader. I tried to decompose it but I still don't get the ??? line.
It would be awesome to have a nice explanation. I couldn't find anything on SO or Google.
float lineSegment(vec2 p, vec2 a, vec2 b) {
float thickness = 1.0/100.0;
vec2 pa = p - a;
vec2 ba = b - a;
float h = clamp( dot(pa,ba)/dot(ba,ba), 0.0, 1.0 );
// ????????
float idk = length(pa - ba*h);
return smoothstep(0.0, thickness, idk);
}
Original code comes from TheBookOfShaders.
Assuming the line is defined by the points a, b, and p is the point to evaluate, then pa is the vector that goes from point a to point p and ba the vector from a to b.
Now, dot(ba, ba) is equal to length(ba) ^ 2 and dot(pa,ba) / length(ba) is the projection of the vector pa over your line. Then, dot(pa,ba)/dot(ba,ba) is the projection normalized over the length of your line. That value is clamped between 0 and 1 so your projection will always be in between the point that define your line.
Then on length(pa - ba * h), ba * h is equal to dot(pa,ba) / length(ba) which was the projection of your point over your line, now clamped between points a and b. The subtraction pa - ba * h results in a vector that represents the minimum distance between your line and point p. Using the length of that vector and comparing it against the thickness you can determine whether the point falls inside the line you want to draw.

What is this technique, if not bilinear filtering?

I'm trying to replicate the automatic bilinear filtering algorithm of Unity3D using the next code:
fixed4 GetBilinearFilteredColor(float2 texcoord)
{
fixed4 s1 = SampleSpriteTexture(texcoord + float2(0.0, _MainTex_TexelSize.y));
fixed4 s2 = SampleSpriteTexture(texcoord + float2(_MainTex_TexelSize.x, 0.0));
fixed4 s3 = SampleSpriteTexture(texcoord + float2(_MainTex_TexelSize.x, _MainTex_TexelSize.y));
fixed4 s4 = SampleSpriteTexture(texcoord);
float2 TexturePosition = float2(texcoord)* _MainTex_TexelSize.z;
float fu = frac(TexturePosition.x);
float fv = frac(TexturePosition.y);
float4 tmp1 = lerp(s4, s2, fu);
float4 tmp2 = lerp(s1, s3, fu);
return lerp(tmp1, tmp2, fv);
}
fixed4 frag(v2f IN) : SV_Target
{
fixed4 c = GetBilinearFilteredColor(IN.texcoord) * IN.color;
c.rgb *= c.a;
return c;
}
I thought I was using the correct algoritm because is the only one I have seen out there for bilinear. But I tried it using unity with the same texture duplicated:
1º texture: is in Point filtering and using the custom bilinear shader (maded from the default sprite shader).
2º texture: is in Bilinear filter with the default sprite shader
And this is the result:
You can see that they are different and also there is some displacement in my custom shader that makes the sprite being off-center when rotating in the Z axis.
Any idea of what I'm doing wrong?
Any idea of what is doing Unity3D different?
Are there another algorithm's who fits in the Unity3D default filtering?
Solution
Updated with the complete code solution with Nico's code for other people who search for it here:
fixed4 GetBilinearFilteredColor(float2 texcoord)
{
fixed4 s1 = SampleSpriteTexture(texcoord + float2(0.0, _MainTex_TexelSize.y));
fixed4 s2 = SampleSpriteTexture(texcoord + float2(_MainTex_TexelSize.x, 0.0));
fixed4 s3 = SampleSpriteTexture(texcoord + float2(_MainTex_TexelSize.x, _MainTex_TexelSize.y));
fixed4 s4 = SampleSpriteTexture(texcoord);
float2 TexturePosition = float2(texcoord)* _MainTex_TexelSize.z;
float fu = frac(TexturePosition.x);
float fv = frac(TexturePosition.y);
float4 tmp1 = lerp(s4, s2, fu);
float4 tmp2 = lerp(s1, s3, fu);
return lerp(tmp1, tmp2, fv);
}
fixed4 frag(v2f IN) : SV_Target
{
fixed4 c = GetBilinearFilteredColor(IN.texcoord - 0.498 * _MainTex_TexelSize.xy) * IN.color;
c.rgb *= c.a;
return c;
}
And the image test with the result:
Why don't substract 0.5 exactly?
If you test it you will see some edge cases where it jumps to (pixel - 1).
Let's take a closer look at what you are actually doing. I will stick to the 1D case because it is easier to visualize.
You have an array of pixels and a texture position. I assume, _MainTex_TexelSize.z is set in a way, such that it gives pixel coordinates. This is what you get (the boxes represent pixels, numbers in boxes the pixel number and numbers below the pixel space coordinates):
With your sampling (assuming nearest point sampling), you will get pixels 2 and 3. However, you see that the interpolation coordinate for lerp is actually wrong. You will pass the fractional part of the texture position (i.e. 0.8) but it should be 0.3 (= 0.8 - 0.5). The reasoning behind this is quite simple: If you land at the center of a pixel, you want to use the pixel value. If you land right in the middle between two pixels, you want to use the average of both pixel values (i.e. an interpolation value of 0.5). Right now, you have basically an offset by a half pixel to the left.
When you solve the first problem, there is a second one:
In this case, you actually want to blend between pixel 1 and 2. But because you always go to the right in your sampling, you will blend between 2 and 3. Again, with a wrong interpolation value.
The solution should be quite simple: Subtract half of the pixel width from the texture coordinate before doing anything with it, which is probably just the following (assuming that your variables hold the things I think):
fixed4 c = GetBilinearFilteredColor(IN.texcoord - 0.5 * _MainTex_TexelSize.xy) * IN.color;
Another reason why the results are different could be that Unity actually uses a different filter, e.g. bicubic (but I don't know). Also, the usage of mipmaps could influence the result.

Best way to interpolate triangle surface using 3 positions and normals for ray tracing

I am working on conventional Whitted ray tracing, and trying to interpolate surface of hitted triangle as if it was convex instead of flat.
The idea is to treat triangle as a parametric surface s(u,v) once the barycentric coordinates (u,v) of hit point p are known.
This surface equation should be calculated using triangle's positions p0, p1, p2 and normals n0, n1, n2.
The hit point itself is calculated as
p = (1-u-v)*p0 + u*p1 + v*p2;
I have found three different solutions till now.
Solution 1. Projection
The first solution I came to. It is to project hit point on planes that come through each of vertexes p0, p1, p2 perpendicular to corresponding normals, and then interpolate the result.
vec3 r0 = p0 + dot( p0 - p, n0 ) * n0;
vec3 r1 = p1 + dot( p1 - p, n1 ) * n1;
vec3 r2 = p2 + dot( p2 - p, n2 ) * n2;
p = (1-u-v)*r0 + u*r1 + v*r2;
Solution 2. Curvature
Suggested in a paper of Takashi Nagata "Simple local interpolation of surfaces using normal vectors" and discussed in question "Local interpolation of surfaces using normal vectors", but it seems to be overcomplicated and not very fast for real-time ray tracing (unless you precompute all necessary coefficients). Triangle here is treated as a surface of the second order.
Solution 3. Bezier curves
This solution is inspired by Brett Hale's answer. It is about using some interpolation of the higher order, cubic Bezier curves in my case.
E.g., for an edge p0p1 Bezier curve should look like
B(t) = (1-t)^3*p0 + 3(1-t)^2*t*(p0+n0*adj) + 3*(1-t)*t^2*(p1+n1*adj) + t^3*p1,
where adj is some adjustment parameter.
Computing Bezier curves for edges p0p1 and p0p2 and interpolating them gives the final code:
float u1 = 1 - u;
float v1 = 1 - v;
vec3 b1 = u1*u1*(3-2*u1)*p0 + u*u*(3-2*u)*p1 + 3*u*u1*(u1*n0 + u*n1)*adj;
vec3 b2 = v1*v1*(3-2*v1)*p0 + v*v*(3-2*v)*p2 + 3*v*v1*(v1*n0 + v*n2)*adj;
float w = abs(u-v) < 0.0001 ? 0.5 : ( 1 + (u-v)/(u+v) ) * 0.5;
p = (1-w)*b1 + w*b2;
Alternatively, one can interpolate between three edges:
float u1 = 1.0 - u;
float v1 = 1.0 - v;
float w = abs(u-v) < 0.0001 ? 0.5 : ( 1 + (u-v)/(u+v) ) * 0.5;
float w1 = 1.0 - w;
vec3 b1 = u1*u1*(3-2*u1)*p0 + u*u*(3-2*u)*p1 + 3*u*u1*( u1*n0 + u*n1 )*adj;
vec3 b2 = v1*v1*(3-2*v1)*p0 + v*v*(3-2*v)*p2 + 3*v*v1*( v1*n0 + v*n2 )*adj;
vec3 b0 = w1*w1*(3-2*w1)*p1 + w*w*(3-2*w)*p2 + 3*w*w1*( w1*n1 + w*n2 )*adj;
p = (1-u-v)*b0 + u*b1 + v*b2;
Maybe I messed something in code above, but this option does not seem to be very robust inside shader.
P.S. The intention is to get more correct origins for shadow rays when they are casted from low-poly models. Here you can find the resulted images from test scene. Big white numbers indicates number of solution (zero for original image).
P.P.S. I still wonder if there is another efficient solution which can give better result.
Keeping triangles 'flat' has many benefits and simplifies several stages required during rendering. Approximating a higher order surface on the other hand introduces quite significant tracing overhead and requires adjustments to your BVH structure.
When the geometry is being treated as a collection of facets on the other hand, the shading information can still be interpolated to achieve smooth shading while still being very efficient to process.
There are adaptive tessellation techniques which approximate the limit surface (OpenSubdiv is a great example). Pixar's Photorealistic RenderMan has a long history using subdivision surfaces. When they switched their rendering algorithm to path tracing, they've also introduced a pretessellation step for their subdivision surfaces. This stage is executed right before rendering begins and builds an adaptive triangulated approximation of the limit surface. This seems to be more efficient to trace and tends to use less resources, especially for the high-quality assets used in this industry.
So, to answer your question. I think the most efficient way to achieve what you're after is to use an adaptive subdivision scheme which spits out triangles instead of tracing against a higher order surface.
Dan Sunday describes an algorithm that calculates the barycentric coordinates on the triangle once the ray-plane intersection has been calculated. The point lies inside the triangle if:
(s >= 0) && (t >= 0) && (s + t <= 1)
You can then use, say, n(s, t) = nu * s + nv * t + nw * (1 - s - t) to interpolate a normal, as well as the point of intersection, though n(s, t) will not, in general, be normalized, even if (nu, nv, nw) are. You might find higher order interpolation necessary. PN-triangles were a similar hack for visual appeal rather than mathematical precision. For example, true rational quadratic Bezier triangles can describe conic sections.

glsl (GPU) matrix/vector calculations yielding different results than CPU

I can't find any documentation of different behavior, so this is just a sanity check that I'm not doing anything wrong...
I've created some helper functions in GLSL to output float/vec/mat comparisons as a color:
note: pretty sure there aren't any errors here, just including it so you know exactly what I'm doing...
//returns true or false if floats are eq (within some epsillon)
bool feq(float a, float b)
{
float c = a-b;
return (c > -0.05 && c < 0.05);
}
returns true or false if vecs are eq
bool veq(vec4 a, vec4 b)
{
return
(
feq(a.x, b.x) &&
feq(a.y, b.y) &&
feq(a.z, b.z) &&
feq(a.w, b.w) &&
true
);
}
//returns color indicating where first diff lies between vecs
//white for "no diff"
vec4 cveq(vec4 a, vec4 b)
{
if(!feq(a.x, b.x)) return vec4(1.,0.,0.,1.);
else if(!feq(a.y, b.y)) return vec4(0.,1.,0.,1.);
else if(!feq(a.z, b.z)) return vec4(0.,0.,1.,1.);
else if(!feq(a.w, b.w)) return vec4(1.,1.,0.,1.);
else return vec4(1.,1.,1.,1.);
}
//returns true or false if mats are eq
bool meq(mat4 a, mat4 b)
{
return
(
veq(a[0],b[0]) &&
veq(a[1],b[1]) &&
veq(a[2],b[2]) &&
veq(a[3],b[3]) &&
true
);
}
//returns color indicating where first diff lies between mats
//white means "no diff"
vec4 cmeq(mat4 a, mat4 b)
{
if(!veq(a[0],b[0])) return vec4(1.,0.,0.,1.);
else if(!veq(a[1],b[1])) return vec4(0.,1.,0.,1.);
else if(!veq(a[2],b[2])) return vec4(0.,0.,1.,1.);
else if(!veq(a[3],b[3])) return vec4(1.,1.,0.,1.);
else return vec4(1.,1.,1.,1.);
}
So I have a model mat, a view mat, and a proj mat. I'm rendering a rectangle on screen (that is correctly projected/transformed...), and setting its color based on how well each steps of the calculations match with my on-cpu-calculated equivalents.
uniform mat4 model_mat;
uniform mat4 view_mat;
uniform mat4 proj_mat;
attribute vec4 position;
varying vec4 var_color;
void main()
{
//this code works (at least visually)- the rect is transformed as expected
vec4 model_pos = model_mat * position;
gl_Position = proj_mat * view_mat * model_pos;
//this is the test code that does the same as above, but tests its results against CPU calculated equivalents
mat4 m;
//test proj
//compares the passed in uniform 'proj_mat' against a hardcoded rep of 'proj_mat' as printf'd by the CPU
m[0] = vec4(1.542351,0.000000,0.000000,0.000000);
m[1] = vec4(0.000000,1.542351,0.000000,0.000000);
m[2] = vec4(0.000000,0.000000,-1.020202,-1.000000);
m[3] = vec4(0.000000,0.000000,-2.020202,0.000000);
var_color = cmeq(proj_mat,m); //THIS PASSES (the rect is white)
//view
//compares the passed in uniform 'view_mat' against a hardcoded rep of 'view_mat' as printf'd by the CPU
m[0] = vec4(1.000000,0.000000,-0.000000,0.000000);
m[1] = vec4(-0.000000,0.894427,0.447214,0.000000);
m[2] = vec4(0.000000,-0.447214,0.894427,0.000000);
m[3] = vec4(-0.000000,-0.000000,-22.360680,1.000000);
var_color = cmeq(view_mat,m); //THIS PASSES (the rect is white)
//projview
mat4 pv = proj_mat*view_mat;
//proj_mat*view_mat
//compares the result of GPU computed proj*view against a hardcoded rep of proj*view **<- NOTE ORDER** as printf'd by the CPU
m[0] = vec4(1.542351,0.000000,0.000000,0.000000);
m[1] = vec4(0.000000,1.379521,-0.689760,0.000000);
m[2] = vec4(0.000000,-0.456248,-0.912496,20.792208);
m[3] = vec4(0.000000,-0.447214,-0.894427,22.360680);
var_color = cmeq(pv,m); //THIS FAILS (the rect is green)
//view_mat*proj_mat
//compares the result of GPU computed proj*view against a hardcoded rep of view*proj **<- NOTE ORDER** as printf'd by the CPU
m[0] = vec4(1.542351,0.000000,0.000000,0.000000);
m[1] = vec4(0.000000,1.379521,0.456248,0.903462);
m[2] = vec4(0.000000,0.689760,21.448183,-1.806924);
m[3] = vec4(0.000000,0.000000,-1.000000,0.000000);
var_color = cmeq(pv,m); //THIS FAILS (the rect is green)
//view_mat_t*proj_mat_t
//compares the result of GPU computed proj*view against a hardcoded rep of view_t*proj_t **<- '_t' = transpose, also note order** as printf'd by the CPU
m[0] = vec4(1.542351,0.000000,0.000000,0.000000);
m[1] = vec4(0.000000,1.379521,-0.456248,-0.447214);
m[2] = vec4(0.000000,-0.689760,-0.912496,-0.894427);
m[3] = vec4(0.000000,0.000000,20.792208,22.360680);
var_color = cmeq(pv,m); //THIS PASSES (the rect is white)
}
And here are my CPU vector/matrix calcs (matrices are col-order [m.x is first column, not first row]):
fv4 matmulfv4(fm4 m, fv4 v)
{
return fv4
{ m.x[0]*v.x+m.y[0]*v.y+m.z[0]*v.z+m.w[0]*v.w,
m.x[1]*v.x+m.y[1]*v.y+m.z[1]*v.z+m.w[1]*v.w,
m.x[2]*v.x+m.y[2]*v.y+m.z[2]*v.z+m.w[2]*v.w,
m.x[3]*v.x+m.y[3]*v.y+m.z[3]*v.z+m.w[3]*v.w };
}
fm4 mulfm4(fm4 a, fm4 b)
{
return fm4
{ { a.x[0]*b.x[0]+a.y[0]*b.x[1]+a.z[0]*b.x[2]+a.w[0]*b.x[3], a.x[0]*b.y[0]+a.y[0]*b.y[1]+a.z[0]*b.y[2]+a.w[0]*b.y[3], a.x[0]*b.z[0]+a.y[0]*b.z[1]+a.z[0]*b.z[2]+a.w[0]*b.z[3], a.x[0]*b.w[0]+a.y[0]*b.w[1]+a.z[0]*b.w[2]+a.w[0]*b.w[3] },
{ a.x[1]*b.x[0]+a.y[1]*b.x[1]+a.z[1]*b.x[2]+a.w[1]*b.x[3], a.x[1]*b.y[0]+a.y[1]*b.y[1]+a.z[1]*b.y[2]+a.w[1]*b.y[3], a.x[1]*b.z[0]+a.y[1]*b.z[1]+a.z[1]*b.z[2]+a.w[1]*b.z[3], a.x[1]*b.w[0]+a.y[1]*b.w[1]+a.z[1]*b.w[2]+a.w[1]*b.w[3] },
{ a.x[2]*b.x[0]+a.y[2]*b.x[1]+a.z[2]*b.x[2]+a.w[2]*b.x[3], a.x[2]*b.y[0]+a.y[2]*b.y[1]+a.z[2]*b.y[2]+a.w[2]*b.y[3], a.x[2]*b.z[0]+a.y[2]*b.z[1]+a.z[2]*b.z[2]+a.w[2]*b.z[3], a.x[2]*b.w[0]+a.y[2]*b.w[1]+a.z[2]*b.w[2]+a.w[2]*b.w[3] },
{ a.x[3]*b.x[0]+a.y[3]*b.x[1]+a.z[3]*b.x[2]+a.w[3]*b.x[3], a.x[3]*b.y[0]+a.y[3]*b.y[1]+a.z[3]*b.y[2]+a.w[3]*b.y[3], a.x[3]*b.z[0]+a.y[3]*b.z[1]+a.z[3]*b.z[2]+a.w[3]*b.z[3], a.x[3]*b.w[0]+a.y[3]*b.w[1]+a.z[3]*b.w[2]+a.w[3]*b.w[3] } };
}
A key thing to notice is that the view_mat_t * proj_mat_t on the CPU matched the proj_mat * view_mat on the GPU. Does anyone know why? I've done tests on matrices on the CPU and compared them to results of online matrix multipliers, and they seem correct...
I know that the GPU does things between vert shader and frag shader (I think it like, divides gl_Position by gl_Position.w or something?)... is there something else I'm not taking into account going on here in just the vert shader? Is something being auto-transposed at some point?
You may wish to consider GLM for CPU-side Matrix instantiation and calculations. It'll help reduce possible sources of errors.
Secondly, GPUs and CPUs do not perform identical calculations. The IEEE 754 standard for computing Floating Point Numbers has relatively rigorous standards for how these calculations have to be performed and to what degree they have to be accurate, but:
It's still possible for numbers to come up different in the least significant bit (and more than that depending on the specific operation/function being used)
Some GPU vendors opt out of ensuring strict IEEE compliance in the first place (Nvidia has been known in the past to prioritize Speed over strict IEEE compliance)
I would finally note that your CPU-side computations leave a lot of room for rounding errors, which can add up. The usual advice for these kinds of questions, then, is to include tolerance in your code for small amounts of deviations. Usually code to check for 'equality' of two floating point numbers presumes that abs(x-y) < 0.000001 means x and y are essentially equal. Naturally, the specific number will have to be calibrated for your personal use.
And of course, you'll want to check to make sure that all your matrices/uniforms are being passed in correctly.
Ok. I've found an answer. There is nothing special about matrix operations from within a single shader. There are, however, a couple things you should be aware of:
:1: OpenGL (GLSL) uses column-major matrices. So to construct the matrix that would be visually represented in a mathematical context as this:
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
you would, from within GLSL use:
mat4 m = mat4(
vec4( 1, 5, 9,13),
vec4( 2, 6,10,14),
vec4( 3, 7,11,15),
vec4( 4, 8,12,16),
);
:2: If you instead use row-major matrices on the CPU, make sure to set the "transpose" flag to true when uploading the matrix uniforms to the shader, and make sure to set it to false if you're using col-major matrices.
So long as you are aware of these two things, you should be good to go.
My particular problem above was that I was in the middle of switching from row-major to col-major in my CPU implementation and wasn't thorough in ensuring that implementation was taken into account across all my CPU matrix operations.
Specifically, here is my now-correct mat4 multiplication implementation, assuming col-major matrices:
fm4 mulfm4(fm4 a, fm4 b)
{
return fm4
{ { a.x[0]*b.x[0] + a.y[0]*b.x[1] + a.z[0]*b.x[2] + a.w[0]*b.x[3], a.x[1]*b.x[0] + a.y[1]*b.x[1] + a.z[1]*b.x[2] + a.w[1]*b.x[3], a.x[2]*b.x[0] + a.y[2]*b.x[1] + a.z[2]*b.x[2] + a.w[2]*b.x[3], a.x[3]*b.x[0] + a.y[3]*b.x[1] + a.z[3]*b.x[2] + a.w[3]*b.x[3] },
{ a.x[0]*b.y[0] + a.y[0]*b.y[1] + a.z[0]*b.y[2] + a.w[0]*b.y[3], a.x[1]*b.y[0] + a.y[1]*b.y[1] + a.z[1]*b.y[2] + a.w[1]*b.y[3], a.x[2]*b.y[0] + a.y[2]*b.y[1] + a.z[2]*b.y[2] + a.w[2]*b.y[3], a.x[3]*b.y[0] + a.y[3]*b.y[1] + a.z[3]*b.y[2] + a.w[3]*b.y[3] },
{ a.x[0]*b.z[0] + a.y[0]*b.z[1] + a.z[0]*b.z[2] + a.w[0]*b.z[3], a.x[1]*b.z[0] + a.y[1]*b.z[1] + a.z[1]*b.z[2] + a.w[1]*b.z[3], a.x[2]*b.z[0] + a.y[2]*b.z[1] + a.z[2]*b.z[2] + a.w[2]*b.z[3], a.x[3]*b.z[0] + a.y[3]*b.z[1] + a.z[3]*b.z[2] + a.w[3]*b.z[3] },
{ a.x[0]*b.w[0] + a.y[0]*b.w[1] + a.z[0]*b.w[2] + a.w[0]*b.w[3], a.x[1]*b.w[0] + a.y[1]*b.w[1] + a.z[1]*b.w[2] + a.w[1]*b.w[3], a.x[2]*b.w[0] + a.y[2]*b.w[1] + a.z[2]*b.w[2] + a.w[2]*b.w[3], a.x[3]*b.w[0] + a.y[3]*b.w[1] + a.z[3]*b.w[2] + a.w[3]*b.w[3] } };
}
again, the above implementation is for column major matrices. That means that a.x is the first column of the matrix, not the row.
A key thing to notice is that the view_mat_t * proj_mat_t on the CPU
matched the proj_mat * view_mat on the GPU. Does anyone know why?
The reason for this is that for two matrices A, B: A * B = (B' * A')', where ' indicates the transpose operation. As already pointed out by yourself, your math code (as well as popular math libraries such as GLM) uses a row-major representation of matrices, while OpenGL (by default) uses a column-major representation. What this means is that the matrix A,
(a b c)
A = (d e f)
(g h i)
in your CPU math library is stored in memory as [a, b, c, d, e, f, g, h, i], whereas defined in a GLSL shader, it would be stored as [a, d, g, b, e, h, c, f, i]. So if you upload the data [a, b, c, d, e, f, g, h, i] of the GLM matrix with glUniformMatrix3fv with the transpose parameter set to GL_FALSE, then the matrix you will see in GLSL is
(a d g)
A' = (b e h)
(c f i)
which is the transposed original matrix. Having realized that changing the interpretation of the matrix data between row-major and column-major leads to a transposed version of the original matrix, you can now explain why suddenly the matrix multiplication works the other way around. Your view_mat_t and proj_mat_t on the CPU are interpreted as view_mat_t' and proj_mat_t' in your GLSL shader, so uploading the pre-calculated view_mat_t * proj_mat_t to the shader will lead to the same result as uploading both matrices separately and then calculating proj_mat_t * view_mat_t.

Efficient Bicubic filtering code in GLSL?

I'm wondering if anyone has complete, working, and efficient code to do bicubic texture filtering in glsl. There is this:
http://www.codeproject.com/Articles/236394/Bi-Cubic-and-Bi-Linear-Interpolation-with-GLSL
or
https://github.com/visionworkbench/visionworkbench/blob/master/src/vw/GPU/Shaders/Interp/interpolation-bicubic.glsl
but both do 16 texture reads where only 4 are necessary:
https://groups.google.com/forum/#!topic/comp.graphics.api.opengl/kqrujgJfTxo
However the method above uses a missing "cubic()" function that I don't know what it is supposed to do, and also takes an unexplained "texscale" parameter.
There is also the NVidia version:
https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-quality-rendering/chapter-20-fast-third-order-texture-filtering
but I believe this uses CUDA, which is specific to NVidia's cards. I need glsl.
I could probably port the nvidia version to glsl, but thought I'd ask first to see if anyone already has a complete, working glsl bicubic shader.
I found this implementation which can be used as a drop-in replacement for texture() (from http://www.java-gaming.org/index.php?topic=35123.0 (one typo fixed)):
// from http://www.java-gaming.org/index.php?topic=35123.0
vec4 cubic(float v){
vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v;
vec4 s = n * n * n;
float x = s.x;
float y = s.y - 4.0 * s.x;
float z = s.z - 4.0 * s.y + 6.0 * s.x;
float w = 6.0 - x - y - z;
return vec4(x, y, z, w) * (1.0/6.0);
}
vec4 textureBicubic(sampler2D sampler, vec2 texCoords){
vec2 texSize = textureSize(sampler, 0);
vec2 invTexSize = 1.0 / texSize;
texCoords = texCoords * texSize - 0.5;
vec2 fxy = fract(texCoords);
texCoords -= fxy;
vec4 xcubic = cubic(fxy.x);
vec4 ycubic = cubic(fxy.y);
vec4 c = texCoords.xxyy + vec2 (-0.5, +1.5).xyxy;
vec4 s = vec4(xcubic.xz + xcubic.yw, ycubic.xz + ycubic.yw);
vec4 offset = c + vec4 (xcubic.yw, ycubic.yw) / s;
offset *= invTexSize.xxyy;
vec4 sample0 = texture(sampler, offset.xz);
vec4 sample1 = texture(sampler, offset.yz);
vec4 sample2 = texture(sampler, offset.xw);
vec4 sample3 = texture(sampler, offset.yw);
float sx = s.x / (s.x + s.y);
float sy = s.z / (s.z + s.w);
return mix(
mix(sample3, sample2, sx), mix(sample1, sample0, sx)
, sy);
}
Example: Nearest, bilinear, bicubic:
The ImageData of this image is
{{{0.698039, 0.996078, 0.262745}, {0., 0.266667, 1.}, {0.00392157,
0.25098, 0.996078}, {1., 0.65098, 0.}}, {{0.996078, 0.823529,
0.}, {0.498039, 0., 0.00392157}, {0.831373, 0.00392157,
0.00392157}, {0.956863, 0.972549, 0.00784314}}, {{0.909804,
0.00784314, 0.}, {0.87451, 0.996078, 0.0862745}, {0.196078,
0.992157, 0.760784}, {0.00392157, 0.00392157, 0.498039}}, {{1.,
0.878431, 0.}, {0.588235, 0.00392157, 0.00392157}, {0.00392157,
0.0666667, 0.996078}, {0.996078, 0.517647, 0.}}}
I tried to reproduce this (many other interpolation techniques)
but they have clamped padding, while I have repeating (wrapping) boundaries. Therefore it is not exactly the same.
It seems this bicubic business is not a proper interpolation, i.e. it does not take on the original values at the points where the data is defined.
I decided to take a minute to dig my old Perforce activities and found the missing cubic() function; enjoy! :)
vec4 cubic(float v)
{
vec4 n = vec4(1.0, 2.0, 3.0, 4.0) - v;
vec4 s = n * n * n;
float x = s.x;
float y = s.y - 4.0 * s.x;
float z = s.z - 4.0 * s.y + 6.0 * s.x;
float w = 6.0 - x - y - z;
return vec4(x, y, z, w);
}
Wow. I recognize the code above (I can not comment w/ reputation < 50) as I came up with it in early 2011. The problem I was trying to solve was related to old IBM T42 (sorry the exact model number escapes me) laptop and it's ATI graphics stack. I developed the code on NV card and originally I used 16 texture fetches. That was kinda of slow but fast enough for my purposes. When someone reported it did not work on his laptop it became apparent that they did not support enough texture fetches per fragment. I had to engineer a work-around and the best I could come up with was to do it with number of texture fetches that would work.
I thought about it like this: okay, so if I handle each quad (2x2) with linear filter the remaining problem is can the rows and columns share the weights? That was the only problem on my mind when I set out to craft the code. Of course they could be shared; the weights are same for each column and row; perfect!
Now I had four samples. The remaining problem was how to correctly combine the samples. That was the biggest obstacle to overcome. It took about 10 minutes with pencil and paper. With trembling hands I typed the code in and it worked, nice. Then I uploaded the binaries to the guy who promised to check it out on his T42 (?) and he reported it worked. The end. :)
I can assure that the equations check out and give mathematically identical results to computing the samples individually. FYI: with CPU it's faster to do horizontal and vertical scan separately. With GPU multiple passes is not that great idea, especially when it's probably not feasible anyway in typical use case.
Food for thought: it is possible to use a texture lookup for the cubic() function. Which is faster depends on the GPU but generally speaking, the sampler is light on the ALU side just doing the arithmetic would balance things out. YMMV.
The missing function cubic() in JAre's answer could look like this:
vec4 cubic(float x)
{
float x2 = x * x;
float x3 = x2 * x;
vec4 w;
w.x = -x3 + 3*x2 - 3*x + 1;
w.y = 3*x3 - 6*x2 + 4;
w.z = -3*x3 + 3*x2 + 3*x + 1;
w.w = x3;
return w / 6.f;
}
It returns the four weights for cubic B-Spline.
It is all explained in NVidia Gems.
(EDIT)
Cubic() is a cubic spline function
Example:
Texscale is sampling window size coefficient. You can start with 1.0 value.
vec4 filter(sampler2D texture, vec2 texcoord, vec2 texscale)
{
float fx = fract(texcoord.x);
float fy = fract(texcoord.y);
texcoord.x -= fx;
texcoord.y -= fy;
vec4 xcubic = cubic(fx);
vec4 ycubic = cubic(fy);
vec4 c = vec4(texcoord.x - 0.5, texcoord.x + 1.5, texcoord.y -
0.5, texcoord.y + 1.5);
vec4 s = vec4(xcubic.x + xcubic.y, xcubic.z + xcubic.w, ycubic.x +
ycubic.y, ycubic.z + ycubic.w);
vec4 offset = c + vec4(xcubic.y, xcubic.w, ycubic.y, ycubic.w) /
s;
vec4 sample0 = texture2D(texture, vec2(offset.x, offset.z) *
texscale);
vec4 sample1 = texture2D(texture, vec2(offset.y, offset.z) *
texscale);
vec4 sample2 = texture2D(texture, vec2(offset.x, offset.w) *
texscale);
vec4 sample3 = texture2D(texture, vec2(offset.y, offset.w) *
texscale);
float sx = s.x / (s.x + s.y);
float sy = s.z / (s.z + s.w);
return mix(
mix(sample3, sample2, sx),
mix(sample1, sample0, sx), sy);
}
Source
For anybody interested in GLSL code to do tri-cubic interpolation, ray-casting code using cubic interpolation can be found in the examples/glCubicRayCast folder in:
http://www.dannyruijters.nl/cubicinterpolation/CI.zip
edit: The cubic interpolation code is now available on github: CUDA version and WebGL version, and GLSL sample.
I've been using #Maf 's cubic spline recipe for over a year, and I recommend it, if a cubic B-spline meets your needs.
But I recently realized that, for my particular application, it is important for the intensities to match exactly at the sample points. So I switched to using a Catmull-Rom spline, which uses a slightly different recipe like so:
// Catmull-Rom spline actually passes through control points
vec4 cubic(float x) // cubic_catmullrom(float x)
{
const float s = 0.5; // potentially adjustable parameter
float x2 = x * x;
float x3 = x2 * x;
vec4 w;
w.x = -s*x3 + 2*s*x2 - s*x + 0;
w.y = (2-s)*x3 + (s-3)*x2 + 1;
w.z = (s-2)*x3 + (3-2*s)*x2 + s*x + 0;
w.w = s*x3 - s*x2 + 0;
return w;
}
I found these coefficients, plus those for a number of other flavors of cubic splines, in the lecture notes at:
http://www.cs.cmu.edu/afs/cs/academic/class/15462-s10/www/lec-slides/lec06.pdf
I think it is possible that the Catmull version could be done with 4 texture lookups by (a) arranging the input texture like a chessboard with alternate slots saved as positives and as negatives, and (b) an associated modification of textureBicubic. That would rely on the contributions/weights w.x/w.w always being negative, and the contributions w.y/w.z always being positive. I haven't double-checked if this is true, or exactly how the modified textureBicubic would look.
... I have verified that w contributions do satisfy the +ve -ve rules.