OpenGL: Size of a 3D bounding box on screen - opengl

I need a simple and fast way to find out how big a 3D bounding box appears on screen (for LOD calculation) by using OpenGL Modelview and Projection matrices and the OpenGL Viewport dimensions.
My first intention is to project all 8 box corners on screen by using gluProject() and calculate the area of the convex hull afterwards. This solution works only with bounding boxes that are fully within the view frustum.
But how can a get the covered area on screen for boxes that are not fully within the viewing volume? Imaging a box where 7 corners are behind the near plane and only one corner is in front of the near plane and thus within the view frustum.
I have found another very similar question Screen Projection and Culling united but it does not cover my problem.

what about using queries and get samples that passes rendering?
http://www.opengl.org/wiki/Query_Object and see GL_SAMPLES_PASSED,
that way you could measure how many fragments are rendered and compare it for proper LOD selection.

Why not just manually multiply the world-view-projection with the vertex positions? This will give you the vertices in "normalized device coordinates" where -1 is the bottom left of the screen and +1 is the top-right of the screen?
The only thing is if the projection is perspective, you have to divide your vertices by their 4th component, ie if the final vertex is (x,y,z,w) you would divide by w.
Take for example a position vector
v = {x, 0, -z, 1}
Given a vertical viewing angle view 'a' and an aspect ration 'r', the position of x' in normalized device coordinates (range 0 - 1) is this (this formula taken directly out of a graphics programming book):
x' = x * cot(a/2) / ( r * z )
So a perspective projection for given parameters these will be as follows (shown in row major format):
cot(a/2) / r 0 0 0
0 cot(a/2) 0 0
0 0 z1 -1
0 0 z2 0
When you multiply your vector by the projection matrix (assuming the world, view matrices are identity in this example) you get the following (i'm only computing the new "x" and "w" values cause only they matter in this example).
v' = { x * cot(a/2) / r, newY, newZ, z }
So finally when we divide the new vector by its fourth component we get
v' = { x * cot(a/2) / (r*z), newY/z, newZ/z, 1 }
So v'.x is now the screen space coordinate v.x. This is exactly what the graphics pipeline does to figure out where your vertex is on screen.
I've used this basic method before to figure out the size of geometry on screen. The nice part about it is that the math works regardless of whether or not the projection is perspective or orthographic, as long you divide by the 4th component of the vector (for orthographic projections, the 4th component will be 1).

Related

My understanding on the projection matrix, perspective division, NDC and viewport transform

I was quite confused on how the projection matrix worked so I researched and I discovered a few other things but after researching a few days, I just wanted to confirm my understanding is correct. I might use a few wrong terms but my brain was exhausted after writing this. A few topics I just researched briefly like screen coordinates and window transform so I didn’t write much about it and my knowledge might be incorrect. Is everything I’ve written here correct or mostly correct? Correct me on anything if I’m wrong.
What does the projection matrix do?
So the perspective projection matrix defines a frustum that is a truncated pyramid. Anything outside of that frustum/frustum range will be clipped. I'll get more on that later. The perspective projection matrix also adds perspective. To make the vertices follow the rules of perspective, the perspective projection matrix manipulates the vertex's w component (the homogenous component) depending on how far the vertex is from the viewer (the farther the vertex is, the higher the w coordinate will increase).
Why and how does the w component make the world look perceptive?
The w component makes the world look perceptive because in the perspective division (perspective division happens in the vertex post processing stage), when the x, y and z is divided by the w component, the vertex coordinate will be scaled smaller depending on how big the w component is. So essentially, the w component scales the object smaller the farther the object is.
Example:
Vertex position (1, 1, 2, 2).
Here, the vertex is 2 away from the viewer. In perspective division the x, y, and z will be divided by 2 because 2 is the w component.
(1/2, 1/2, 2/2) = (0.5, 0.5, 1).
As shown here, the vertex coordinate has been scaled by half.
How does the projection matrix decide what will be clipped?
The near and far plane are the limits of where the viewer can see (anything beyond the far plane and before the near plane will be clipped). Any coordinate will also have to go through a clipping check to see if it has to be clipped. The clipping check is checking whether the vertex coordinate is within a frustum range of -w to w.  If it is outside of that range, it will be clipped.
Let's say I have a vertex with a position of (2, 130, 90, 90).
x value is 2
y value is 130
z value is 90
w value is 90
This vertex must be within the range of -90 to 90. The x and z value is within the range but the y value goes beyond the range thus the vertex will be clipped.
So after the vertex shader is finished, the next step is vertex post processing. In vertex post processing the clipping happens and also perspective division happens where clip space is converted into NDC (normalized device coordinates). Also, viewport transform happens where NDC is converted to window space.
What does perspective division do?
Perspective division essentially divides the x, y, and z component of a vertex with the w component. Doing this actually does two things, converts the clip space to Normalized device coordinates and also add perspective by scaling the vertices.
What is Normalized Device Coordinates?
Normalized Device Coordinates is the coordinate system where all coordinates are condensed into an NDC box where each axis is in the range of -1 to +1.
After NDC is occurred, viewport transform happens where all the NDC coordinates are converted screen coordinates. NDC space will become window space.
If an NDC coordinate is (0.5, 0.5, 0.3), it will be mapped onto the window based on what the programmer provided in the function glViewport. If the viewport is 400x300, the NDC coordinate will be placed at pixel 200 on x axis and 150 on y axis.
The perspective projection matrix does not decide what is clipped. After transforming a world coordinate with the projection, you get a clipspace coordinate. This is a Homogeneous coordinates. Base on this coordinate the Rendering Pipeline clips the scene. The clipping rule is -w < x, y, z < w. In the following process of the rendering pipeline, the clip space coordinates is transformed into the normalized device space by the perspective divide (x, y, z)' = (x/w, y/w, z/w). This division by the w component gives the perspective effect. (See also What exactly are eye space coordinates? and Transform the modelMatrix)

What is the role of gl_Position.w in Vulkan?

Variable gl_Position output from a GLSL vertex shader must have 4 coordinates. In OpenGL, it seems w coordinate is used to scale the vector, by dividing the other coordinates by it. What is the purpose of w in Vulkan?
Shaders and projections in Vulkan behave exactly the same as in OpenGL. There are small differences in depth ranges ([-1, 1] in OpenGL, [0, 1] in Vulkan) or in the origin of the coordinate system (lower-left in OpenGL, upper-left in Vulkan), but the principles are exactly the same. The hardware is still the same and it performs calculations in the same way both in OpenGL and in Vulkan.
4-component vectors serve multiple purposes:
Different transformations (translation, rotation, scaling) can be
represented in the same way, with 4x4 matrices.
Projection can also be represented with a 4x4 matrix.
Multiple transformations can be combined into one 4x4 matrix.
The .w component You mention is used during perspective projection.
All this we can do with 4x4 matrices and thus we need 4-component vectors (so they can be multiplied by 4x4 matrices). Again, I write about this because the above rules apply both to OpenGL and to Vulkan.
So for purpose of the .w component of the gl_Position variable - it is exactly the same in Vulkan. It is used to scale the position vector - during perspective calculations (projection matrix multiplication) original depth is modified by the original .w component and stored in the .z component of the gl_Position variable. And additionally, original depth is also stored in the .w component. After that (as a fixed-function step) hardware performs perspective division and divides position stored in the gl_Position variable by its .w component.
In orthographic projection steps performed by the hardware are exactly the same, but values used for calculations are different. So the perspective division step is still performed by the hardware but it does nothing (position is dived by 1.0).
gl_Position is a Homogeneous coordinates. The w component plays a role at perspective projection.
The projection matrix describes the mapping from 3D points of the view on a scene, to 2D points on the viewport. It transforms from eye space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) by dividing with the w component of the clip coordinates (Perspective divide).
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
Perspective Projection Matrix:
r = right, l = left, b = bottom, t = top, n = near, f = far
2*n/(r-l) 0 0 0
0 2*n/(t-b) 0 0
(r+l)/(r-l) (t+b)/(t-b) -(f+n)/(f-n) -1
0 0 -2*f*n/(f-n) 0
When a Cartesian coordinate in view space is transformed by the perspective projection matrix, then the the result is a Homogeneous coordinates. The w component grows with the distance to the point of view. This cause that the objects become smaller after the Perspective divide, if they are further away.
In computer graphics, transformations are represented with matrices. If you want something to rotate, you multiply all its vertices (a vector) by a rotation matrix. Want it to move? Multiply by translation matrix, etc.
tl;dr: You can't describe translation along the z-axis with 3D matrices and vectors. You need at least 1 more dimension, so they just added a dummy dimension w. But things break if it's not 1, so keep it at 1 :P.
Anyway, now we begin with a quick review on matrix multiplication:
You basically put x above a, y above b, z above c. Multiply the whole column by the variable you just moved, and sum up everything in the row.
So if you were to translate a vector, you'd want something like:
See how x and y is now translated by az and bz? That's pretty awkward though:
You'd have to account for how big z is whenever you move things (what if z was negative? You'd have to move in opposite directions. That's cumbersome as hell if you just want to move something an inch over...)
You can't move along the z axis. You'll never be able to fly or go underground
But, if you can make sure z = 1 at all times:
Now it's much clearer that this matrix allows you to move in the x-y plane by a, and b amounts. Only problem is that you're conceptually levitating all the time, and you still can't go up or down. You can only move in 2D.
But you see a pattern here? With 3D matrices and 3D vectors, you can describe all the fundamental movements in 2D. So what if we added a 4th dimension?
Looks familiar. If we keep w = 1 at all times:
There we go, now you get translation along all 3 axis. This is what's called homogeneous coordinates.
But what if you were doing some big & complicated transformation, resulting in w != 1, and there's no way around it? OpenGL (and basically any other CG system I think) will do what's called normalization: divide the resultant vector by the w component. I don't know enough to say exactly why ('cause scaling is a linear transformation?), but it has favorable implications (can be used in perspective transforms). Anyway, the translation matrix would actually look like:
And there you go, see how each component is shrunken by w, then it's translated? That's why w controls scaling.

The result of projMat * viewMat * modelMat * vertPos should result in a screen-space position... right?

And given that, shouldn't all position values that end up being rendered be between the values -1 and 1?
I tried passing said position value as the "color" value from my vertex shader to my fragment shader to see what would happen, and expected a gradient the full way across the screen (at least, where geometry exists).
I would EXPECT the top-right corner of the screen to have color value rgb(1.0,1.0,?.?) (? because the z value might vary), and that would gradiate towards (0.0,0.0,?.?) at the very center of the screen (and anything to the bottom left would have 0'd r and g components because their value would be negative).
But instead, what I'm getting looks like the gradiation is happening in a smaller scale towards the center of my screen (attached):
(
Why is this? This makes it look like the geometry position resulting from the composition of my matrices and position is a value between -10ish and 10ish...?
Any ideas what I might be missing?
edit- don't worry about the funky geometry. if it helps, what's being rendered is a quad behind ~100 unit triangles randomly rotated about the origin. debugging stuff.
You are confusing several things here.
projMat * viewMat * modelMat * vPos will transform vPos into clip space, not screen space. However, using your coordinates as colors, you don't even want screen space (which are the pixel coordinates relative to the output window, and accessible via gl_FragmentPosition in the fragment shader), but you want [-1,1] normalized device coordinates. You get to NDC by dividing by the w component of your clipspace value.
From the images you get I can guess that projMat is a projective transform, since in the orthogonal case, clip.w will typically be 1 for all vertices (not necessarily, but very likely), and the results would look more like you would expect.
You can see from your image that x and y (r and g) are zero in the center. However, those values are not limited to the interval [-1,1], but your shader output values are clamped to [0,1] so you don't see much of a gradient. Your z coord is >= 1 anywhere, so it does not change at all in the image.
If you would use the NDC coords as colors, you would indeed see a red gradient in the right half, and a green gradient in the upper half, but the other areas where the value is below 0 will still be clamped to zero. You would also get some more info in the blue channel (although it might be possible that NDC z is <= 0 for your whole scene), but you should be aware of the nonlinear z distortions introducted by the divide.

Rendering infinitely large plane

I want to render a plane so that it looks as if it goes to infinity in all directions. I want the plane boundary in the distance to be the horizon.
Using a simple mesh does not work - the computer can't render infinitely many triangles. Even if this was possible, the camera frustum would cut out the distant polygons and create a gap between the plane boundary and the horizon.
A workaround is to compute the horizon mathematically: finding points on the plane, which also lie on the plane at infinity. Connecting these points and two corners of the viewport creates a trapezoid which represents the sought plane. However, this way the plane can not be lit properly, or applied a texture, or anything else which requires a fine triangulation...
You can draw an infinite plane using the standard rasterization pipeline. The homogeneous coordinates it uses can represent "ideal" points (otherwise known as vanishing points or points at infinity) just as happily as regular Euclidean points, and likewise it is perfectly practical to set up a projection matrix which places the far plane at infinity.
A simple way to do this would be to use one triangle per quadrant, as follows:
vertices [x,y,z,w], for drawing an (x,y) coordinate plane, at (z==0):
0: [ 0, 0, 0, 1 ]
1: [ 1, 0, 0, 0 ]
2: [ 0, 1, 0, 0 ]
3: [-1, 0, 0, 0 ]
4: [ 0,-1, 0, 0 ]
draw 4 triangles using indices:
(0,1,2); (0,2,3); (0,3,4); (0,4,1)
If you want a test pattern (like an infinite checkerboard), you will have to deal with the fact that stretching your triangles to infinity will distort any standard texture. However, you can write a pixel shader that determines the color based on the actual 3D point (i.e., use x and y from the worldspace (x,y,z) coordinates), ignoring the (distorted) texture coords altogether.
You could choose between two constant colors based on parity (for a checkerboard), or tile a texture by sampling it based on the fractional part of your chosen coordinates.
Note that OpenGL's clip space is [-1..1] for each of x, y, and z. You can compute the appropriate projection matrix by evaluating the limits as far clip distance f increases without bound:
clip coords: [x] = [ n/r ] * view coords [x]
[y] [ n/t ] [y]
[z] [ -1 -2n ] [z]
[w] [ -1 0 ] [w]
Where (as in the link): n is the near clip plane, r is half the frustum width at the near clip plane, and t is half the frustum height at the near clip plane.
I have not tested the above matrix, so it's worth what you paid for it. Also be aware that the depth value will lose its precision as you approach infinity...
Although, the precision at closer distances will be likely be fine -- e.g., at any given distance, the depth resolution in the (near:infinity) case should be about 10% less than the case where the (near:far) ratio is (1:10).
Your viewing frustum, which is a capped pyramid built from 4 clip planes on the sides and on the top/bottom, and on a near and a far plane is "infinite" (it is not infinite, but since you cannot see anything outside the frustum, it is as infinite as it can be).
Drawing the bottom side of your capped pyramid (a quad, or two triangles) therefore is a plane that is "infinite", going to the horizon. Or, for that matter, any quad with its corner points on the near and far planes.

OpenGL: 2D Vertex coordinates to 2D viewing coordinates?

I'm implementing a rasterizer for a class project, and currently im stuck on what method/how i should convert vertex coordinates to viewing pane coordinates.
I'm given a list of verticies of 2d coordinates for a triangle, like
0 0 1
2 0 1
0 1 1
and im drawing in a viewing pane (using OpenGL and GLUT) of size 400X400 pixels, for example.
My question is how do i decide where in the viewing pane to put these verticies, assuming
1) I want the coordinate's to be centered around 0,0 at the center of the screen
2) I want to fill up most of the screen (lets say for this example, the screen is the maximum x coordinate + 1 lengths wide, etc)
3) I have any and all of OpenGL's and GLUT's standard library functions at my disposal.
Thanks!
http://www.opengl.org/sdk/docs/man/xhtml/glOrtho.xml
To center around 0 use symmetric left/right and bottom/top. Beware the near/far which are somewhat arbitrary but are often chosen (in examples) as -1..+1 which might be a problem for your triangles at z=1.
If you care about the aspect ratio make sure that right-left and bottom-top are proportional to the window's width/height.
You should consider the frustum which is your volumetric view and calculate the coordinates by transforming the your objects to consider their position, this explains the theory quite thoroughly..
basically you have to project the object using a specified projection matrix that is calculated basing on the characteristics of your view:
scale them according to a z (depth) value: you scale both y and x in so inversely proportionally to z
you scale and shift coordinates in order to fit the width of your view