Clipping in clipping coordinate system and normalized device coordinate system (OpenGL) - opengl

I heard clipping should be done in clipping coordinate system.
The book suggests a situation that a line is laid from behind camera to in viewing volume. (We define this line as PQ, P is behind camera point)
I cannot understand why it can be a problem.
(The book says after finishing normalizing transformation, the P will be laid in front of camera.)
I think before making clipping coordinate system, the camera is on original point (0, 0, 0, 1) because we did viewing transformation.
However, in NDCS, I cannot think about camera's location.
And I have second question.
In vertex shader, we do model-view transformation and then projection transformation. Finally, we output these vertices to rasterizer.
(some vertices's w is not equal to '1')
Here, I have curiosity. The rendering pipeline will automatically do division procedure (by w)? after finishing clipping.

Sometimes not all the model can be seen on screen, mostly because some objects of it lie behind the camera (or "point of view"). Those objects are clipped out. If just a part of the object can't be seen, then just that part must be clipped leaving the rest as seen.
OpenGL clips
OpenGL does this clipping in Clipping Coordinate Space (CCS). This is a cube of size 2w x 2w x 2w where 'w' is the fourth coordinate resulting of (4x4) x (4x1) matrix and point multiplication. A mere comparison of coordinates is enough to tell if the point is clipped or not. If the point passes the test then its coordinates are divided by 'w' (so called "perspective division"). Notice that for ortogonal projections 'w' is always 1, and with perspective it's generally not 1.
CPU clips
If the model is too big perhaps you want to save GPU resources or improve the frame rate. So you decide to skip those objects that are going to get clipped anyhow. Then you do the maths on your own (on CPU) and only send to the GPU the vertices that passed the test. Be aware that some objects may have some vertices clipped while other vertices of this same object may not.
Perhaps you do send them to GPU and let it handle these special cases.
You have a volume defined where only objects inside are seen. This volume is defined by six planes. Let's put ourselves in the camera and look at this volume: If your projection is perspective the six planes build a "fustrum", a sort of truncated pyramid. If your projection is orthogonal, the planes form a parallelepiped.
In order to clip or not to clip a vertex you must use the distance form the vertex to each of these six planes. You need a signed distance, this means that the sign tells you what side of the plane is seen form the vertex. If any of the six distance signs is not the right one, the vertex is discarded, clipped.
If a plane is defined by equation Ax+By+Cz+D=0 then the signed distance from p1,p2,p3 is (Ap1+Bp2+Cp3+D)/sqrt(AA+BB+C*C). You only need the sign, so don't bother to calculate the denominator.
Now you have all tools. If you know your planes on "camera view" you can calculate the six distances and clip or not the vertex. But perhaps this is an expensive operation considering that you must transform the vertex coodinates from model to camera (view) spaces, a ViewModel matrix calculation. With the same cost you use your precalculated ProjectionViewModel matrix instead and obtain CCS coordinates, which are much easier to compare to '2w', the size of the CCS cube.
Sometimes you want to skip some vertices not due to they are clipped, but because their depth breaks a criteria you are using. In this case CCS is not the right space to work with, because Z-coordinate is transformed into [-w, w] range, depth is somehow "lost". Instead, you do your clip test in "view space".

Related

How to determine the XYZ coords of a point on the back buffer

If I pick a spot on my monitor in screen X/Y, how can I obtain the point in 3D space, based on my projection and view matrices?
For example, I want to put an object at depth and have it located at 10,10 in screen coords. So when I update its world matrix it will render onscreen at 10,10.
I presume it's fairly straightforward given I have my camera matrices, but I'm not sure offhand how to 'reverse' the normal process.
DirectXTk XMMath would be best, but I can no doubt sort it out from any linear algebra system (OpenGL, D3DX, etc).
What I'm actually trying to do is find a random point on the back clipping plane where I can start an object that then drifts straight towards the camera along its projection line. So I want to keep picking points in deep space that are still within my view (no point creating ones outside in my case) and starting alien ships (or whatever) at that point.
As discussed in my comments, you need four things to do this generally.
ModelView (GL) or View (D3D / general) matrix
Projection matrix
Viewport
Depth Range (let us assume default, [0, 1])
What you are trying to do is locate in world-space a point that lies on the far clipping plane at a specific x,y coordinate in window-space. The point you are looking for is <x,y,1> (z=1 corresponds to the far plane in window-space).
Given this point, you need to transform back to NDC-space
The specifics are actually API-dependent since D3D's definition of NDC is different from OpenGL's -- they do not agree on the range of Z (D3D = [0, 1], GL = [-1, 1]).
Once in NDC-space, you can apply the inverse Projection matrix to transform back to view-space.
These are homogeneous coordinates and division by W is necessary.
From view-space, apply the inverse View matrix to arrive at a point in world-space that satisfies your criteria.
Most math libraries have a function called UnProject (...) that will do all of this for you. I would suggest using that because you tagged this question D3D and OpenGL and the specifics of some of these transformations is different depending on API.
You are better off knowing how they work, even if you never implement them yourself. I think the key thing you were missing was the viewport, I have an answer here that explains this step visually.

Perspective Projection - OpenGL

I am confused about the position of objects in opengl .The eye position is 0,0,0 , the projection plane is at z = -1 . At this point , will the objects be in between the eye position and and the plane (Z =(0 to -1)) ? or its behind the projection plane ? and also if there is any particular reason for being so?
First of all, there is no eye in modern OpenGL. There is also no camera. There is no projection plane. You define these concepts by yourself; the graphics library does not give them to you. It is your job to transform your object from your coordinate system into clip space in your vertex shader.
I think you are thinking about projection wrong. Projection doesn't move the objects in the same sense that a translation or rotation matrix might. If you take a look at the link above, you can see that in order to render a perspective projection, you calculate the x and y components of the projected coordinate with R = V(ez/pz), where ez is the depth of the projection plane, pz is the depth of the object, V is the coordinate vector, and R is the projection. Almost always you will use ez=1, which makes that equation into R = V/pz, allowing you to place pz in the w coordinate allowing OpenGL to do the "perspective divide" for you. Assuming you have your eye and plane in the correct places, projecting a coordinate is almost as simple as dividing by its z coordinate. Your objects can be anywhere in 3D space (even behind the eye), and you can project them onto your plane so long as you don't divide by zero or invalidate your z coordinate that you use for depth testing.
There is no "projection plane" at z=-1. I don't know where you got this from. The classic GL perspective matrix assumes an eye space where the camera is located at origin and looking into -z direction.
However, there is the near plane at z<0 and eveything in front of the near plane is going to be clipped. You cannot put the near plane at z=0, because then, you would end up with a division by zero when trying to project points on that plane. So there is one reasin that the viewing volume isn't a pyramid with they eye point at the top but a pyramid frustum.
This is btw. also true for real-world eyes or cameras. The projection center lies behind the lense, so no object can get infinitely close to the optical center in either case.
The other reason why you want a big near clipping distance is the precision of the depth buffer. The whole depth range between the front and the near plane has to be mapped to some depth value with a limited amount of bits, typically 24. So you want to keep the far plane as close as possible, and shift away the near plane as far as possible. The non-linear mapping of the screen-space z coordinate makes this even more important, as that the precision is non-uniformely distributed over that range.

Questions about Orthogonal vs Perspective in OpenGL [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I've gotten a 3 vertices triangle rotating around the y-axis. One of the things I find "weird" is normalized coordinates in GL's default orthogonal projection. I've used 2-D libraries like SDL and SFML, which almost always deal with pixels. You say you want an image surface that is 50x50 pixels, that is what you get. So initially it was strange for me to say limit my vertex position choices from [-1,1].
Why do orthogonal coordinates have to be normalized? Is perspective projection the same? If so, how would you say you want the origin of your object to be at z=-10? (My quick look over matrix m ath says perspective is different. Something about division by 'w' creating homogenous (same thing as normalized?) coordinates, but I'm not sure).
gl_Position = View * Model * Project * Vertex;
I've seen that equation above and I'm boggled by how the variable gl_Position used in shaders can represent both the position of the current vertices of a model/object, and at the same time a view/projection, or the position of the camera. How does that work? I understand by multiplication all that information is stored in one matrix, but how does OpenGL use that one matrix whose information is now combined to say, "ok, this part/fraction of gl_Position is for the camera, and that other part is information for where the model is going to go."? (BTW, I'm not quite sure what the Vertex vec4 represents. I thought all vertices of a model were inside Model. Any ideas?
One more question, if you just wanted to move the camera, for example in FPS games you move the mouse up to look up, but no objects are being rotated or translated (I think) other than the camera, would the equation above look something like this?
gl_Position = View * Project;
Why do orthogonal coordinates have to be normalized?
They don't. You can set the limits of the orthogonal projection volume however you desire. The left, right, bottom, top, near and far parameters of the glOrtho call define the limits of the viewport volume. If you chose them left=0, right=win_pixel_width, bottom=0, top=win_pixel_height you end up with a pixel unit projection volume as you're used to. However why bother with pixels? You'd just have to compensate for the actual window size later. Just choose the ortho projection volume extents to match the scene you want to draw.
Maybe you're confusing this with normalized device coordinates. And for those it simply has been defined that it is the value range [-1, 1] that's mapped to the viewport extents.
Update
BTW, I'm not quite sure what the Vertex vec4 represents. I thought all vertices of a model were inside Model. Any ideas?
I'm getting quite fatigued right now, because I've been answering several questions like this numerous times over the last few days. So, here it goes again:
In OpenGL there is no camera.
In OpenGL there is no scene.
In OpenGL there are no models.
"Wait, what?!" you may wonder now. But it's true.
All OpenGL cares about is, that there is some target framebuffer, i.e. a canvas it can draw to, and a stream of vertex attributes that make geometric primitives. The primitives are points, lines and triangles. Somehow the vertex attributes for, say a triangle, must be mapped to a position on the framebuffer canvas. For this an vertex attribute we call position goes through a number of affine transformations.
The first is from a local model space into world space , the Model transform.
From world space into eye space, the View transform. It is this view transform, which acts like placing the camera in a scene.
After that it put through the equivalent of a camera's lens, which is a Projection transform.
After the Projection transform the position is in clip space where it undergoes some operations, that are not essential to understand for the time being. After clipping the so called homogenous divide is applied to reach normalized device coordinate space, by dividing the clip space position vector by its own w-component.
v_position_ndc = v_position_clip / v_position_clip.w
This step is, what makes a perspective projection actually work. The z-distance of a vertex' position is worked into the clip space w-component. And by the homogenous divide vertices with a larger position w get scaled proportionally to 1/w in the XY plane which creates a perspective effect.
You mistook this operation as normalization, but it is not!
After the homogenous divide vertex position has been mapped from clip to NDC space. And OpenGL defines, that the visible volume of NDC space is the box [-1, 1]^3 ; vertices outside this box are clipped.
It's crucial to understand that View transform and Projection are different. For a position it's not so obvious, but another vertex attribute called the normal, which is an important ingredient for lighting calculations, must be transformed in a slightly different way (instead of Projection · View · Model it must be transformed by inverse(transpose(View · Model)), i.e. the Projection takes no part in it but the viewpoint does).
The matrices itself are 4×4 grids of real valued scalars (ignore for the time being that numbers in a computer are always rational numbers). So the rank of the matrix is 4 and hence it must be multiplied of vectors of dimension 4 (hence the type vec4)
OpenGL treats vertex attributes as column vectors so matrix multiplication is left associative i.e. a vector enters an expression on the right side and comes out on the left. The order of matrix multiplication matters. You can not freely reorder things!
The statement
gl_Position = Projection * View * Model * vertex_position; // note the order
makes the vertex shader perform this very transformation process I just described.
"Note that there is no separate camera (view) matrix in OpenGL. Therefore, in order to simulate transforming the camera or view, the scene (3D objects and lights) must be transformed with the inverse of the view transformation. In other words, OpenGL defines that the camera is always located at (0, 0, 0) and facing to -Z axis in the eye space coordinates, and cannot be transformed. See more details of GL_MODELVIEW matrix in ModelView Matrix."
Source: http://www.songho.ca/opengl/gl_transform.html
That is what I was getting hung up on. I thought there would a separate camera view matrix in OpenGL.

How to render a plane of seemingly infinite size?

How can i render a textured plane at some z-pos to be visible towards infinity?
I could achieve this by drawing really huge plane, but if i move my camera off the ground to higher altitude, then i would start to see the plane edges, which i want to avoid being seen.
If this is even possible, i would prefer non-shader method.
Edit: i tried with the 4d coordinate system as suggested, but: it works horribly bad. my textures will get distorted even at camera position 100, so i would have to draw multiple textured quads anyways. perhaps i could do that, and draw the farthest quads with the 4d coordinate system? any better ideas?
Edit2: for those who dont have a clue what opengl texture distortion is, here's example from the tests i did with 4d vertex coords:
(in case image not visible: http://img828.imageshack.us/img828/469/texturedistort.jpg )
note that it only happens when camera gets far enough, in this case its only 100.0 units away from middle! (middle = (0,0) where my 4 triangles starts to go towards infinity). usually this happens around at 100000.0 or something. but with 4d vertices it seems to happen earlier for some reason.
You cannot render an object of infinite size.
You are more than likely confusing the concept of projection with rendering objects of infinite size. A 4D homogeneous coordinate who's W is 0 represents a 3D position that is at infinity relative to the projection. But that doesn't mean a point infinitely far from the camera; it means a point infinitely close to the camera. That is, it represents a point who's Z coordinate (before multiplication with the perspective projection matrix) was equal to the camera position (in camera space, this is 0).
See under perspective projection, a point that is in the same plane as the camera is infinitely far away on the X and Y axes. That is the nature of the perspective projection. 4D homogeneous coordinates allow you to give them all finite numbers, and therefore you can do useful mathematics to them (like clipping).
4D homogeneous coordinates do not allow you to represent an infinitely large surface.
Drawing an infinitely large plane is easy - all you need is to compute the horizon line in screen coordinates. To do so, you have to simply take two non-collinear 4D directions (say, [1, 0, 0, 0] and [0, 0, 1, 0]), then compute their position on the screen (by multiplying manually with the view-matrix and the projection matrix, and then clipping into viewport coordinates. When you have these two points, you can compute a 2D line through the screen and clip it against it. There, you have your infinity plane (the lower polygon). However, it is difficult to display a texture on this plane, because it would be infinitely large. But if your texture is simple (e.g. a grid), then you can compute it yourself with 4D coordinates, using the same schema like above - computing points and their corresponding vanishing point and connecting them.

Efficiency of perspective projection vs raytracing/ray casting

I have a very general question. I wish to determine the boundary points of a number of objects (comprising 30-50 closed polygons (z) each having around 300 points(x,y,z)). I am working with a fixed viewport which is rotated about x,y and z-axes (alpha, beta, gamma) wrt origin of coordinate system for polygons.
As I see it there are two possibilities: perspective projection or raytracing. Perspective projection would seem to requires a large number of matrix operations for each point to determine its position is within or without the viewport.
Or given the large number of points would I better to raytrace the viewport pixels to object?
i.e. determine whether there is an intersection and then whether intersection occurs within or without object(s).
In either case I will write this result as 0 (outside) or 1 (inside) to 200x200 an integer matrix representing the viewport
Thank you in anticipation
Perspective projection (and then scan-converting the polygons in image coordinates) is going to be a lot faster.
The matrix transform that is required in the case of perspective projection (essentially the world-to-camera matrix) is required in exactly the same way when raytracing. However, with perspective projection, you're only transforming the corner points, whereas with raytracing, you're transforming all the points in the image.
You should be able to use perspective projection and a perspective projection matrix to compute the position of the vertices in screen space? It's hard to understand what you want to do really. If you want to create an image of that 3D scene then with only few polygons it would be hard to see any difference anyway between ray tracing and rasterisation if your code is optimised (you will still need to use an acceleration structure for the ray tracing approach), however yes rasterisation is likely to be faster anyway.
Now if you need to compute the distance from between the eye (the camera's origin) and the geometry visible through the camera's view, the I don't see why you can't use the depth value of any sample for any pixel in the image and use the inverse of the perspective projection matrix to find its distance in camera space.
Why is speed an issue in your problem? Otherwise use RT indeed.
Most of this information can be found on www.scratchapixel.com