Is it possible to get NDC of a point behind the view with perspective projection? - opengl

I want to get normalized device coordinates (NDC) of a point that is in view/world space by projecting it using a perspective projection matrix, which I use it to get screen position afterwards. A common issue with this approach is that points behind the view appears to be on front of the view after the homogeneous division, because w is negative behind the view, after dividing xyz it will invert the actual position (even if I invert again the position, it appears to be a bit off the actual view position).
So I was thinking if there is another way to get the NDC coordinates even if it's behind the view, every source I find always say to clip the homogeneous position (-w <= xyz <= w) before dividing it to get the NDC, but I wanted the actual non-clipped screen position for a line shader.
Is there someway to get the actual NDC position of a point that is behind the view with a projection matrix? Even if just xy components are correct, which is what I need.

Related

Direct3D find 2D screen coordinate works but mirrored glitch when stressed

I have this function to get a 2D pixel location from 3D coordinate position. The x y z are pre-transform coordinates (1 to -1). This is a model view architecture with camera permanently at -3.5,0,0 looking at 0,0,0 while the object/scenes
coordinates are transformed by a horizontal xz rotation and vertical y rotation, etc to produce the final frame.
This function is mostly used to overlay 2D text on top of the 3D scene. Where the 2D text is positioned relative to the 3D underlying scene.
void My3D::Get2Dfrom3Dx(float x, float y, float z, float* psx, float* psy) {
XMVECTOR xmScreenCoord = XMLoadFloat3( (XMFLOAT3*) &screenCoord);
XMMATRIX xmWorldViewProjection = XMLoadFloat4x4( (XMFLOAT4X4*) &m_WorldViewProjection);
XMVECTOR result = XMVector3TransformCoord( xmScreenCoord, xmWorldViewProjection);
XMStoreFloat3( (XMFLOAT3*) &screenCoord, result);
screenCoord.x = ((screenCoord.x + 1.0f) / 2.0f) * m_nCurrWidth;
screenCoord.y = ((-screenCoord.y + 1.0f) / 2.0f) * m_nCurrHeight;
*psx = screenCoord.x;
*psy = screenCoord.y; }
This function works perfectly when the scene is fully/mostly visible, (the eyeat between -4 and -1.5.)
I have a nagging problem with text showing up mirrored in 3D position where it should not be.
This happens when for example I'm viewing the image from below (60+ degrees upward below object), and zooming(moving the eyeat location closer to say -.5,0,0.) The text should not be visible as it should be behind the eye (note eyeat is not past 0,0,0 which really messes the image up),
but somehow the above function causes the calculated screen x y coordinates to show within the viewport in situations where they should not.
I seem to think there is a simple solution to this side effect but can't find it. Hopefully someone has seen this 2d mirrored problem/effect before and knows the simple tweak.
I realize I could go down a more complex path of determining if the view vector is opposite the target point and filter this way, but I seem to think there should be a simpler solution.
Again, the camera is permanently on the line -3.5, 0, 0 to say -.5,0,0 as the world is transformed around it.
The problem lies in the way the projection works. Basically, the perspective projection will divide the x and y coordinates by the z coordinate. That's how you get the effect of perspective, i.e., that things that are farther away (larger z coordinate) appear smaller on screen. One issue with this perspective division is (simplified) that it doesn't work correctly for stuff that's behind the camera. Stuff behind the camera will have a negative z coordinate. When you divide x and y by a negative value, you'll have your point reflected around the origin. Which is exactly what you see. Since stuff that's located behind the camera is not going to be visible anyways, one way to solve this problem is to simply clip all geometry before dividing by z such that everything that has a negative z value is cut off and removed.
I assume the division in your code here happens inside XMVector3TransformCoord(). As you note yourself, the text should not be visible in the problematic cases anyways. So I suggest you simply check whether the text is behind the camera and don't render it if it is. One way to do so would be to simply check the result of transforming your world-space position with the xmWorldViewProjection matrix and only continue if it happens to be in front of the camera. xmScreenCoord holds the homogeneous clipspace coordinates of your point. The point will be in front of the camera iff the z coordinate of xmScreenCoord is larger than zero. So I guess you'd want to do something like
if (XMVectorGetZ(xmScreenCoord) > 0)
{
…
}
Sidenote due to the discussion in the comments below: When one wants to solve a problem involving the projections of objects on screen, one can often avoid explicitly computing the projection by instead transforming the problem into its dual and working directly in projective space on homogeneous coordinates. Since your problem is about placing text in 2D on screen, however, I don't think this is an option here. You could place the geometry for drawing your text in clip-space directly. You would start again by computing the clip-space coordinates of the 3D point to which you want your 2D text attached (by multiplying them with m_WorldViewProjection but not dividing by w). You can then generate homogeneous coordinates for the geometry for drawing your text by simply offsetting the x- and y- coordinates from that point to get the corners of a quad or whatever you need to construct. If you then also scale the size of the quad by the w coordinate of the point, you will get a quad at that position that projects to always the same size on the screen (since the premultiplication with w effectively cancels out the projection). However, all you're effectively doing then is leaving the application of the projection and necessarily clipping to the GPU. If you want to render a large number of quads, that might be an option to consider as it could be done completely on the GPU, e.g., using a geometry shader. However, if you just have a few text elements, it would be much simpler and probably also more efficient to just skip the drawing of text elements that would be behind the camera as described above…
Michael's response was very helpful in making me continue down a path that the solution should be a simple comparison. In my case, I had to re-evaluate the screen coordinates by only applying the World transform, versus the full WorldViewProjection. I call this TargetTransformed. My comparison value was then simply the Eye/Camera location (this never gets adjusted (except zoom) as the world is transformed around the Eye.) And again note my Camera in this case is at -3.5,0,0 looking at 0,0,0 (center of model, really 8,0,0 thus a line through the center). So I had to compare the x component, not the z component. I add a bit of fudge .1F as the mirror artifact happens when the target is significantly behind the camera. In which case I return the final screenCoord locations translated (-8000) way out in outer space as to guarantee they are not seen in the viewport.
if ((Eye.x + 0.1F) > TargetTransformed.x)
{
screenCoord.x += -8000;
screenCoord.y += -8000;
//TRACE("point is behind camera.\n");
*psx = screenCoord.x;
*psy = screenCoord.y;
}
else
{
*psx = screenCoord.x;
*psy = screenCoord.y;
}
And for completeness, my project has 2 view models: a) looking along a line through the center of the model. Which this can be translated to look from any direction and offset by screen x and y. The first view model works fine with this above code. The second view model b) targets the camera to look at a focal point of the model and then allows full rotation around that random point (not the center of the model) which requires calculating a tricky additional translation matrix and vector I call TargetViewTranslation. And for this additional translation, the formula adds the z component of the additional transform.
if ((Eye.x + 0.1F - m_structTargetViewTranslation.Z) > TargetTransformed.x)
{
screenCoord.x += -8000;
screenCoord.y += -8000;
//TRACE("point is behind camera.\n");
*psx = screenCoord.x;
*psy = screenCoord.y;
}
else
{
*psx = screenCoord.x;
*psy = screenCoord.y;
}
And success, my mirrored text problem is resolved. Hopefully this helps others with this mirrored text problem. Realizing that one may need to only transform the test case by the World transform, and it should be a simple comparison, and the location of the camera may impact which x or z component is used. And if you are translating the world in any additional ways, then this translation could also impact if x or z is compared. Using TRACE and looking at the x y z values was helpful in figuring out what components I needed to use in my specific case.

Clipping in clipping coordinate system and normalized device coordinate system (OpenGL)

I heard clipping should be done in clipping coordinate system.
The book suggests a situation that a line is laid from behind camera to in viewing volume. (We define this line as PQ, P is behind camera point)
I cannot understand why it can be a problem.
(The book says after finishing normalizing transformation, the P will be laid in front of camera.)
I think before making clipping coordinate system, the camera is on original point (0, 0, 0, 1) because we did viewing transformation.
However, in NDCS, I cannot think about camera's location.
And I have second question.
In vertex shader, we do model-view transformation and then projection transformation. Finally, we output these vertices to rasterizer.
(some vertices's w is not equal to '1')
Here, I have curiosity. The rendering pipeline will automatically do division procedure (by w)? after finishing clipping.
Sometimes not all the model can be seen on screen, mostly because some objects of it lie behind the camera (or "point of view"). Those objects are clipped out. If just a part of the object can't be seen, then just that part must be clipped leaving the rest as seen.
OpenGL clips
OpenGL does this clipping in Clipping Coordinate Space (CCS). This is a cube of size 2w x 2w x 2w where 'w' is the fourth coordinate resulting of (4x4) x (4x1) matrix and point multiplication. A mere comparison of coordinates is enough to tell if the point is clipped or not. If the point passes the test then its coordinates are divided by 'w' (so called "perspective division"). Notice that for ortogonal projections 'w' is always 1, and with perspective it's generally not 1.
CPU clips
If the model is too big perhaps you want to save GPU resources or improve the frame rate. So you decide to skip those objects that are going to get clipped anyhow. Then you do the maths on your own (on CPU) and only send to the GPU the vertices that passed the test. Be aware that some objects may have some vertices clipped while other vertices of this same object may not.
Perhaps you do send them to GPU and let it handle these special cases.
You have a volume defined where only objects inside are seen. This volume is defined by six planes. Let's put ourselves in the camera and look at this volume: If your projection is perspective the six planes build a "fustrum", a sort of truncated pyramid. If your projection is orthogonal, the planes form a parallelepiped.
In order to clip or not to clip a vertex you must use the distance form the vertex to each of these six planes. You need a signed distance, this means that the sign tells you what side of the plane is seen form the vertex. If any of the six distance signs is not the right one, the vertex is discarded, clipped.
If a plane is defined by equation Ax+By+Cz+D=0 then the signed distance from p1,p2,p3 is (Ap1+Bp2+Cp3+D)/sqrt(AA+BB+C*C). You only need the sign, so don't bother to calculate the denominator.
Now you have all tools. If you know your planes on "camera view" you can calculate the six distances and clip or not the vertex. But perhaps this is an expensive operation considering that you must transform the vertex coodinates from model to camera (view) spaces, a ViewModel matrix calculation. With the same cost you use your precalculated ProjectionViewModel matrix instead and obtain CCS coordinates, which are much easier to compare to '2w', the size of the CCS cube.
Sometimes you want to skip some vertices not due to they are clipped, but because their depth breaks a criteria you are using. In this case CCS is not the right space to work with, because Z-coordinate is transformed into [-w, w] range, depth is somehow "lost". Instead, you do your clip test in "view space".

Calculating the perspective projection matrix according to the view plane

I'm working with openGL but this is basically a math question.
I'm trying to calculate the projection matrix, I have a point on the view plane R(x,y,z) and the Normal vector of that plane N(n1,n2,n3).
I also know that the eye is at (0,0,0) which I guess in technical terms its the Perspective Reference Point.
How can I arrive the perspective projection from this data? I know how to do it the regular way where you get the FOV, aspect ration and near and far planes.
I think you created a bit of confusion by putting this question under the "opengl" tag. The problem is that in computer graphics, the term projection is not understood in a strictly mathematical sense.
In maths, a projection is defined (and the following is not the exact mathematical definiton, but just my own paraphrasing) as something which doesn't further change the results when applied twice. Think about it. When you project a point in 3d space to a 2d plane (which is still in that 3d space), each point's projection will end up on that plane. But points which already are on this plane aren't moving at all any more, so you can apply this as many times as you want without changing the outcome any further.
The classic "projection" matrices in computer graphics don't do this. They transfrom the space in a way that a general frustum is mapped to a cube (or cuboid). For that, you basically need all the parameters to describe the frustum, which typically is aspect ratio, field of view angle, and distances to near and far plane, as well as the projection direction and the center point (the latter two are typically implicitely defined by convention). For the general case, there are also the horizontal and vertical asymmetries components (think of it like "lens shift" with projectors). And all of that is what the typical projection matrix in computer graphics represents.
To construct such a matrix from the paramters you have given is not really possible, because you are lacking lots of parameters. Also - and I think this is kind of revealing - you have given a view plane. But the projection matrices discussed so far do not define a view plane - any plane parallel to the near or far plane and in front of the camera can be imagined as the viewing plane (behind the camere would also work, but the image would be mirrored), if you should need one. But in the strict sense, it would only be a "view plane" if all of the projected points would also end up on that plane - which the computer graphics perspective matrix explicitely does'nt do. It instead keeps their 3d distance information - which also means that the operation is invertible, while a classical mathematical projection typically isn't.
From all of that, I simply guess that what you are looking for is a perspective projection from 3D space onto a 2D plane, as opposed to a perspective transformation used for computer graphics. And all parameters you need for that are just the view point and a plane. Note that this is exactly what you have givent: The projection center shall be the origin and R and N define the plane.
Such a projection can also be expressed in terms of a 4x4 homogenous matrix. There is one thing that is not defined in your question: the orientation of the normal. I'm assuming standard maths convention again and assume that the view plane is defined as <N,x> + d = 0. From using R in that equation, we can get d = -N_x*R_x - N_y*R_y - N_z*R_z. So the projection matrix is just
( 1 0 0 0 )
( 0 1 0 0 )
( 0 0 1 0 )
(-N_x/d -N_y/d -N_z/d 0 )
There are a few properties of this matrix. There is a zero column, so it is not invertible. Also note that for every point (s*x, s*y, s*z, 1) you apply this to, the result (after division by resulting w, of course) is just the same no matter what s is - so every point on a line between the origin and (x,y,z) will result in the same projected point - which is what a perspective projection is supposed to do. And finally note that w=(N_x*x + N_y*y + N_z*z)/-d, so for every point fulfilling the above plane equation, w= -d/-d = 1 will result. In combination with the identity transform for the other dimensions, which just means that such a point is unchanged.
Projection matrix must be at (0,0,0) and viewing in Z+ or Z- direction
this is a must because many things in OpenGL depends on it like FOG,lighting ... So if your direction or position is different then you need to move this to camera matrix. Let assume your focal point is (0,0,0) as you stated and the normal vector is (0,0,+/-1)
Z near
is the distance between focal point and projection plane so znear is perpendicular distance of plane and (0,0,0). If assumption is correct then
znear=R.z
otherwise you need to compute that. I think you got everything you need for it
cast line from R with direction N
find closest point to focal point (0,0,0)
and then the z near is the distance of that point to R
Z far
is determined by the depth buffer bit width and z near
zfar=znear*(1<<(cDepthBits-1))
this is the maximal usable zfar (for mine purposes) if you need more precision then lower it a bit do not forget precision is higher near znear and much much worse near zfar. The zfar is usually set to the max view distance and znear computed from it or set to min focus range.
view angle
I use mostly 60 degree view. zang=60.0 [deg]
Common males in my region can see up to 90 degrees but that is peripherial view included the 60 degree view is more comfortable to view.
Females have a bit wider view ... but I did not heard any complains from them on 60 degree views ever so let assume its comfortable for them too...
Aspect
aspect ratio is determined by your OpenGL window dimensions xs,ys
aspect=(xs/ys)
This is how I set the projection matrix:
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(zang/aspect,aspect,znear,zfar);
// gluPerspective has inacurate tangens so correct perspective matrix like this:
double perspective[16];
glGetDoublev(GL_PROJECTION_MATRIX,perspective);
perspective[ 0]= 1.0/tan(0.5*zang*deg);
perspective[ 5]=aspect/tan(0.5*zang*deg);
glLoadMatrixd(perspective);
deg = M_PI/180.0
perspective is projection matrix copy I use it for mouse position conversions etc ...
If you do not correct the matrix then you will be off when using advanced things like overlapping more frustrum to get high precision depth range. I use this to obtain <0.1m,1000AU> frustrum with 24bit depth buffer and the inaccuracy would cause the images will not fit perfectly ...
[Notes]
if the focal point is not really (0,0,0) or you are not viewing in Z axis (like you do not have camera matrix but instead use projection matrix for that) then on basic scenes/techniques you will see no problem. They starts with use of advanced graphics. If you use GLSL then you can handle this without problems but fixed OpenGL function can not handle this properly. This is also called PROJECTION_MATRIX abuse
[edit1] few links
If your view is standard frustrum then write the matrix your self gluPerspective otherwise look here Projections for some ideas how to construct it
[edit2]
From your comment I see it like this:
f is your viewing point (axises are the global world axises)
f' is viewing point if R would be the center of screen
so create projection matrix for f' position (as explained above), create transform matrix to transform f' to f. The transformed f must have Z axis the same as in f' the other axises can be obtained by cross product and use that as camera or multiply booth together and use as abused Projection matrix
How to construct the matrix is explained in the Understanding transform matrices link from my earlier comments

Perspective Projection - OpenGL

I am confused about the position of objects in opengl .The eye position is 0,0,0 , the projection plane is at z = -1 . At this point , will the objects be in between the eye position and and the plane (Z =(0 to -1)) ? or its behind the projection plane ? and also if there is any particular reason for being so?
First of all, there is no eye in modern OpenGL. There is also no camera. There is no projection plane. You define these concepts by yourself; the graphics library does not give them to you. It is your job to transform your object from your coordinate system into clip space in your vertex shader.
I think you are thinking about projection wrong. Projection doesn't move the objects in the same sense that a translation or rotation matrix might. If you take a look at the link above, you can see that in order to render a perspective projection, you calculate the x and y components of the projected coordinate with R = V(ez/pz), where ez is the depth of the projection plane, pz is the depth of the object, V is the coordinate vector, and R is the projection. Almost always you will use ez=1, which makes that equation into R = V/pz, allowing you to place pz in the w coordinate allowing OpenGL to do the "perspective divide" for you. Assuming you have your eye and plane in the correct places, projecting a coordinate is almost as simple as dividing by its z coordinate. Your objects can be anywhere in 3D space (even behind the eye), and you can project them onto your plane so long as you don't divide by zero or invalidate your z coordinate that you use for depth testing.
There is no "projection plane" at z=-1. I don't know where you got this from. The classic GL perspective matrix assumes an eye space where the camera is located at origin and looking into -z direction.
However, there is the near plane at z<0 and eveything in front of the near plane is going to be clipped. You cannot put the near plane at z=0, because then, you would end up with a division by zero when trying to project points on that plane. So there is one reasin that the viewing volume isn't a pyramid with they eye point at the top but a pyramid frustum.
This is btw. also true for real-world eyes or cameras. The projection center lies behind the lense, so no object can get infinitely close to the optical center in either case.
The other reason why you want a big near clipping distance is the precision of the depth buffer. The whole depth range between the front and the near plane has to be mapped to some depth value with a limited amount of bits, typically 24. So you want to keep the far plane as close as possible, and shift away the near plane as far as possible. The non-linear mapping of the screen-space z coordinate makes this even more important, as that the precision is non-uniformely distributed over that range.

How to use OpenGL orthographic projection with the depth buffer?

I've rendered a 3D scene using glFrustum() perspective mode. I then have a 2D object that I place over the 3D scene to act as a label for a particular 3D object. I have calculated the 2D position of the 3D object using using gluProject() at which position I then place my 2D label object. The 2D label object is rendered using glOrtho() orthographic mode. This works perfectly and the 2D label object hovers over the 3D object.
Now, what I want to do is to give the 2D object a z value so that it can be hidden behind other 3D objects in the scene using the depth buffer. I have given the 2D object a z value that I know should be hidden by the depth buffer, but when I render the object it is always visible.
So the question is, why is the 2D object still visible and not hidden?
I did read somewhere that orthographic and perspective projections store incompatible depth buffer values. Is this true, and if so how do I convert between them?
I don't want it to be transformed, instead I want it to appear as flat 2D label that always faces the camera and remains the same size on the screen at all times. However, if it is hidden behind something I want it to appear hidden.
First, you should have put that in your question; it explains far more about what you're trying to do than your question.
To achieve this, what you need to do is find the z-coordinate in the orthographic projection that matches the z-coordinate in pre-projective space of where you want the label to appear.
When you used gluProject, you got three coordinates back. The Z coordinate is important. What you need to do is reverse-transform the Z coordinate based on the zNear and zFar values you give to glOrtho.
Pedantic note: gluProject doesn't transform the Z coordinate to window space. To do so, it would have to take the glDepthRange parameters. What it really does is assume a depth range of near = 0.0 and far = 1.0.
So our first step is to transform from window space Z to normalized device coordinate (NDC) space Z. We use this simple equation:
ndcZ = (2 * winZ) - 1
Simple enough. Now, we need to go to clip space. Which is a no-op, because with an orthographic projection, the W coordinate is assumed to be 1.0. And the division by W is the difference between clip space and NDC space.
clipZ = ndcZ
But we don't need clip space Z. We need pre-orthographic projection space Z (aka: camera space Z). And that requires the zNear and zFar parameters you gave to glOrtho. To get to camera space, we do this:
cameraZ = ((clipZ + (zFar + zNear)/(zFar - zNear)) * (zFar - zNear))/-2
And you're done. Use that Z position in your rendering. Oh, and make sure your modelview matrix doesn't include any transforms in the Z direction (unless you're using the modelview matrix to apply this Z position to the label, which is fine).
Based on Nicol's answer you can simply set zNear to 0 (which generally makes sense for 2D elements that are acting as part of the GUI) and then you simply have:
cameraZ = -winZ*zFar