I want to draw a frustum using GL_LINE_STRIP. What will be my coordinates for these frustum vertices? I have model view and projection matrices. Is it possible to calculate coordinates in shader itself using these matrices?
If you want the world space coordinates for the frustum corners, all you need to do is project the 8 corner points from NDC space (which is going from -1 to 1 in every dimension, so the corner points are easy to enumerate) back to world space. But do not forget that you have to divide by w:
c_world = inverse(projection * view) * vec4(c_ncd, 1);
c_world = c_world*1.0/c_world.w;
While I wrote this in GLSL syntax, this is meant as pseudocode only. You can do it in the shader, but that means that this has to be calculated many times (depending on which shader stage you put this into). It is typically much faster to at least pre-calculate that inverted matrix on the CPU.
Related
Variable gl_Position output from a GLSL vertex shader must have 4 coordinates. In OpenGL, it seems w coordinate is used to scale the vector, by dividing the other coordinates by it. What is the purpose of w in Vulkan?
Shaders and projections in Vulkan behave exactly the same as in OpenGL. There are small differences in depth ranges ([-1, 1] in OpenGL, [0, 1] in Vulkan) or in the origin of the coordinate system (lower-left in OpenGL, upper-left in Vulkan), but the principles are exactly the same. The hardware is still the same and it performs calculations in the same way both in OpenGL and in Vulkan.
4-component vectors serve multiple purposes:
Different transformations (translation, rotation, scaling) can be
represented in the same way, with 4x4 matrices.
Projection can also be represented with a 4x4 matrix.
Multiple transformations can be combined into one 4x4 matrix.
The .w component You mention is used during perspective projection.
All this we can do with 4x4 matrices and thus we need 4-component vectors (so they can be multiplied by 4x4 matrices). Again, I write about this because the above rules apply both to OpenGL and to Vulkan.
So for purpose of the .w component of the gl_Position variable - it is exactly the same in Vulkan. It is used to scale the position vector - during perspective calculations (projection matrix multiplication) original depth is modified by the original .w component and stored in the .z component of the gl_Position variable. And additionally, original depth is also stored in the .w component. After that (as a fixed-function step) hardware performs perspective division and divides position stored in the gl_Position variable by its .w component.
In orthographic projection steps performed by the hardware are exactly the same, but values used for calculations are different. So the perspective division step is still performed by the hardware but it does nothing (position is dived by 1.0).
gl_Position is a Homogeneous coordinates. The w component plays a role at perspective projection.
The projection matrix describes the mapping from 3D points of the view on a scene, to 2D points on the viewport. It transforms from eye space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) by dividing with the w component of the clip coordinates (Perspective divide).
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
Perspective Projection Matrix:
r = right, l = left, b = bottom, t = top, n = near, f = far
2*n/(r-l) 0 0 0
0 2*n/(t-b) 0 0
(r+l)/(r-l) (t+b)/(t-b) -(f+n)/(f-n) -1
0 0 -2*f*n/(f-n) 0
When a Cartesian coordinate in view space is transformed by the perspective projection matrix, then the the result is a Homogeneous coordinates. The w component grows with the distance to the point of view. This cause that the objects become smaller after the Perspective divide, if they are further away.
In computer graphics, transformations are represented with matrices. If you want something to rotate, you multiply all its vertices (a vector) by a rotation matrix. Want it to move? Multiply by translation matrix, etc.
tl;dr: You can't describe translation along the z-axis with 3D matrices and vectors. You need at least 1 more dimension, so they just added a dummy dimension w. But things break if it's not 1, so keep it at 1 :P.
Anyway, now we begin with a quick review on matrix multiplication:
You basically put x above a, y above b, z above c. Multiply the whole column by the variable you just moved, and sum up everything in the row.
So if you were to translate a vector, you'd want something like:
See how x and y is now translated by az and bz? That's pretty awkward though:
You'd have to account for how big z is whenever you move things (what if z was negative? You'd have to move in opposite directions. That's cumbersome as hell if you just want to move something an inch over...)
You can't move along the z axis. You'll never be able to fly or go underground
But, if you can make sure z = 1 at all times:
Now it's much clearer that this matrix allows you to move in the x-y plane by a, and b amounts. Only problem is that you're conceptually levitating all the time, and you still can't go up or down. You can only move in 2D.
But you see a pattern here? With 3D matrices and 3D vectors, you can describe all the fundamental movements in 2D. So what if we added a 4th dimension?
Looks familiar. If we keep w = 1 at all times:
There we go, now you get translation along all 3 axis. This is what's called homogeneous coordinates.
But what if you were doing some big & complicated transformation, resulting in w != 1, and there's no way around it? OpenGL (and basically any other CG system I think) will do what's called normalization: divide the resultant vector by the w component. I don't know enough to say exactly why ('cause scaling is a linear transformation?), but it has favorable implications (can be used in perspective transforms). Anyway, the translation matrix would actually look like:
And there you go, see how each component is shrunken by w, then it's translated? That's why w controls scaling.
I am trying to calculate a normal map from subdivisions of a mesh, there are 2 meshes a UV unwrapped base mesh which contains quads and triangles and a subdivision mesh that contains only quads.
Suppose a I have a quad with all the coordinates of the vertices both in object space and UV space (quad is not flat), the quad's face normal and a pixel with it's position in UV space.
Can I calculate the TBN matrix for the given quad and write colors to the pixel, if so then is it different for quads?
I ask this because I couldn't find any examples for calculating a TBN matrix for quads, only triangles ?
Before answering your question, let me start by explaining what the tangents and bitangents that you need actually are.
Let's forget about triangles, quads, or polygons for a minute. We just have a surface (given in whatever representation) and a parameterization in form of texture coordinates that are defined at every point on the surface. We could then define the surface as: xyz = s(uv). uv are some 2D texture coordinates and the function s turns these texture coordinates into 3D world positions xyz. Now, the tangent is the direction in which the u-coordinate increases. I.e., it is the derivative of the 3D position with respect to the u-coordinate: T = d s(uv) / du. Similarly, the bitangent is the derivative with respect to the v-coordinate. The normal is a vector that is perpendicular to both of them and usually points outwards. Remember that the three vectors are usually different at every point on the surface.
Now let's go over to discrete computer graphics where we approximate our continuous surface s with a polygon mesh. The problem is that there is no way to get the exact tangents and bitangents anymore. We just lost to much information in our discrete approximation. So, there are three common ways how we can approximate the tangents anyway:
Store the vectors with the model (this is usually not done).
Estimate the vectors at the vertices and interpolate them in the faces.
Calculate the vectors for each face separately. This will give you a discontinuous tangent space, which produces artifacts when the dihedral angle between two neighboring faces is too big. Still, this is apparently what most people are doing. And it is apparently also what you want to do.
Let's focus on the third method. For triangles, this is especially simple because the texture coordinates are interpolated linearly (barycentric interpolation) across the triangle. Hence, the derivatives are all constant (it's just a linear function). This is why you can calculate tangents/bitangents per triangle.
For quads, this is not so simple. First, you must agree on a way to interpolate positions and texture coordinates from the vertices of the quad to its inside. Oftentimes, bilinear interpolation is used. However, this is not a linear interpolation, i.e. the tangents and bitangents will not be constant anymore. This will only happen in special cases (if the quad is planar and the quad in uv space is a parallelogram). In general, these assumptions do not hold and you end up with different tangents/bitangents/normals for every point on the quad.
One way to calculate the required derivatives is by introducing an auxiliary coordinate system. Let's define a coordinate system st, where the first corner of the quad has coordinates (0, 0) and the diagonally opposite corner has (1, 1) (the other corners have (0, 1) and (1, 0)). These are actually our interpolation coordinates. Therefore, given an arbitrary interpolation scheme, it is relatively simple to calculate the derivatives d xyz / d st and d uv / d st. The first one will be a 3x2 matrix and the second one will be a 2x2 matrix (these matrices are called Jacobians of the interpolation). Then, given these matrices, you can calculate:
d xyz / d uv = (d xyz / d st) * (d st / d uv) = (d xyz / d st) * (d uv / d st)^-1
This will give you a 3x2 matrix where the first column is the tangent and the second column is the bitangent.
I am drawing a stack of decals on a quad. Same geometry, different textures. Z-fighting is the obvious result. I cannot control the rendering order or use glPolygonoffset due to batched rendering. So I adjust depth values inside the vertex shader.
gl_Position = uMVPMatrix * pos;
gl_Position.z += aDepthLayer * uMinStep * gl_Position.w;
gl_Position holds clip coordinates. That means a change in z will move a vertex along its view ray and bring it to the front or push it to the back. For normalized device coordinates the clip coords get divided by gl_Position.w (=-Zclip). As a result the depth buffer does not have linear distribution and has higher resolution towards the near plane. By premultiplying gl_Position.w that should be fixed and I should be able to apply a flat amount (uMinStep) to the NDC.
That minimum step should be something like 1/(2^GL_DEPTH_BITS -1). Or, since NDC space goes from -1.0 to 1.0, it might have to be twice that amount. However it does not work with these values. The minStep is roughly 0.00000006 but it does not bring a texture to the front. Neither when I double that value. If I drop a zero (scale by 10), it works. (Yay, thats something!)
But it does not work evenly along the frustum. A value that brings a texture in front of another while the quad is close to the near plane does not necessarily do the same when the quad is close to the far plane. The same effect happens when I make the frustum deeper. I would expect that behaviour if I was changing eye coordinates, because of the nonlinear z-Buffer distribution. But it seems that premultiplying gl_Position.w is not enough to counter that.
Am I missing some part of the transformations that happen to clip coords? Do I need to use a different formula in general? Do I have to include the depth range [0,1] somehow?
Could the different behaviour along the frustum be a result of nonlinear floating point precision instead of nonlinear z-Buffer distribution? So maybe the calculation is correct, but the minStep just cannot be handled correctly by floats at some point in the pipeline?
The general question: How do I calculate a z-Shift for gl_Position (clip coordinates) that will create a fixed change in the depth buffer later? How can I make sure that the z-Shift will bring one texture in front of another no matter where in the frustum the quad is placed?
Some material:
OpenGL depth buffer faq
https://www.opengl.org/archives/resources/faq/technical/depthbuffer.htm
Same with better readable formulas (but some typos, be careful)
https://www.opengl.org/wiki/Depth_Buffer_Precision
Calculation from eye coords to z-buffer. Most of that happens already when I multiply the projection matrix.
http://www.sjbaker.org/steve/omniv/love_your_z_buffer.html
Explanation about the elements in the projection matrix that turn into the A and B parts in most depth buffer calculation formulas.
http://www.songho.ca/opengl/gl_projectionmatrix.html
As far as I understand, location of a point/pixel cannot be a fraction, at least on a raster graphics system where hardwares use pixels to display images.
Then, why and how does OpenGL use fractional values for plotting pixels?
For example, how is it possible: glVertex2f(0.15f, 0.51f); ?
This command does not plot any pixels. It merely defines the location of a point in 3D space (you'll notice that there are 3 coordinates, while for a pixel on the screen you'd only need 2). This is the starting point for the OpenGL pipeline. This point then goes through a lot of transformations before it ends up on the screen.
Also, the coordinates are unitless. For example, you can say that your viewport is between 0.0f and 1.0f, then these coordinates make a lot of sense. Basically you have to think of these point in terms of mathematics, not pixels.
I would suggest some reading on how OpenGL transformations work, for example here, here or the tutorial here.
The vectors you pass into OpenGL are not viewport positions but arbitrary numbers in some vector space. Only after a chain of transformations these numbers are mapped into viewport pixel positions. With the old fixed function pipeline this could be anything that can be represented by a vector–matrix multiplication.
These days, where everything is programmable (shaders) the mapping can very well be any kind of function you can think of. For example the values you pass into glVertex (immediate mode call, but available to shaders with OpenGL-2.1) may be interpreted as polar coordinates in the vertex shader:
This is a perfectly valid OpenGL-2.1 vertex shader that interprets the vertex position to be in polar coordinates. Note that due to triangles and lines being straight edges and polar coordinates being curvilinear this gives good visual results only for points or highly tesselated primitives.
#version 110
void main() {
gl_Position =
gl_ModelViewProjectionMatrix
* vec4( gl_Vertex.y*vec2(sin(gl_Vertex.x),cos(gl_Vertex.x)) , 0, 1);
}
As you can see here the valus passed to glVertex are actually arbitrary, unitless components of vectors in some vector space. Only by applying some transformation to the viewport space these vectors gain meaning. Hence it makes no way to impose a certain value range onto the values that go into the vertex attribute.
Vertex and pixel are very different things.
It's quite possible to have all your vertices within one pixel (although in this case you probably need help with LODing).
You might want to start here...
http://www.glprogramming.com/blue/ch01.html
Specifically...
Primitives are defined by a group of one or more vertices. A vertex defines a point, an endpoint of a line, or a corner of a polygon where two edges meet. Data (consisting of vertex coordinates, colors, normals, texture coordinates, and edge flags) is associated with a vertex, and each vertex and its associated data are processed independently, in order, and in the same way.
And...
Rasterization produces a series of frame buffer addresses and associated values using a two-dimensional description of a point, line segment, or polygon. Each fragment so produced is fed into the last stage, per-fragment operations, which performs the final operations on the data before it's stored as pixels in the frame buffer.
For your example, before glVertex2f(0.15f, 0.51f) is on the screen, there are many transforms to be done. Making complex thing crudely simpler, after moving your vertex to view space (applying camera position and direction), the magic here is (1) projection matrix, and (2) viewport setting.
Internally, OpenGL "screen coordinates" are in a cube (-1, -1, -1) - (1, 1, 1), :
http://www.matrix44.net/cms/wp-content/uploads/2011/03/ogl_coord_object_space_cube.png
Projection matrix 'squeezes' the frustum in this cube (which you do in vertex shader), assuming you have perspective transform - if projection is orthogonal, the projection is just a tube, limited by near and far values (and like in both cases, scaling factors):
http://www.songho.ca/opengl/files/gl_projectionmatrix01.png
EDIT: Maybe better example here:
http://www.opengl-tutorial.org/beginners-tutorials/tutorial-3-matrices/#The_Projection_matrix
(EDIT: The Z-coordinate is used as depth value) When fragments are finally transferred to pixels on texture/framebuffer/screen, these are multiplied with viewport settings:
https://www3.ntu.edu.sg/home/ehchua/programming/opengl/images/GL_2DViewportAspectRatio.png
Hope this helps!
I have a very general question. I wish to determine the boundary points of a number of objects (comprising 30-50 closed polygons (z) each having around 300 points(x,y,z)). I am working with a fixed viewport which is rotated about x,y and z-axes (alpha, beta, gamma) wrt origin of coordinate system for polygons.
As I see it there are two possibilities: perspective projection or raytracing. Perspective projection would seem to requires a large number of matrix operations for each point to determine its position is within or without the viewport.
Or given the large number of points would I better to raytrace the viewport pixels to object?
i.e. determine whether there is an intersection and then whether intersection occurs within or without object(s).
In either case I will write this result as 0 (outside) or 1 (inside) to 200x200 an integer matrix representing the viewport
Thank you in anticipation
Perspective projection (and then scan-converting the polygons in image coordinates) is going to be a lot faster.
The matrix transform that is required in the case of perspective projection (essentially the world-to-camera matrix) is required in exactly the same way when raytracing. However, with perspective projection, you're only transforming the corner points, whereas with raytracing, you're transforming all the points in the image.
You should be able to use perspective projection and a perspective projection matrix to compute the position of the vertices in screen space? It's hard to understand what you want to do really. If you want to create an image of that 3D scene then with only few polygons it would be hard to see any difference anyway between ray tracing and rasterisation if your code is optimised (you will still need to use an acceleration structure for the ray tracing approach), however yes rasterisation is likely to be faster anyway.
Now if you need to compute the distance from between the eye (the camera's origin) and the geometry visible through the camera's view, the I don't see why you can't use the depth value of any sample for any pixel in the image and use the inverse of the perspective projection matrix to find its distance in camera space.
Why is speed an issue in your problem? Otherwise use RT indeed.
Most of this information can be found on www.scratchapixel.com