I want to get normalized device coordinates (NDC) of a point that is in view/world space by projecting it using a perspective projection matrix, which I use it to get screen position afterwards. A common issue with this approach is that points behind the view appears to be on front of the view after the homogeneous division, because w is negative behind the view, after dividing xyz it will invert the actual position (even if I invert again the position, it appears to be a bit off the actual view position).
So I was thinking if there is another way to get the NDC coordinates even if it's behind the view, every source I find always say to clip the homogeneous position (-w <= xyz <= w) before dividing it to get the NDC, but I wanted the actual non-clipped screen position for a line shader.
Is there someway to get the actual NDC position of a point that is behind the view with a projection matrix? Even if just xy components are correct, which is what I need.
I was wondering about the problem of getting 3D position from a 2D one. In my application, I got everything set up (mvp matrix, screen position of mouse).
We go from 3D to 2D by the following transform.
gl_Position = projection* view * model * vertex;
Then these clip coordinates are divided by w to NDC, and then converted to screen coordinates. I want to do the reverse (mouse click to 3D). Screen to NDC is easy for x and y. But there is a loss of z data.
As far as I can tell, the ONLY method to recover the 3D data is using ray-casting with a tree based spatial data structure and maybe CUDA/OpenCL. But I am not sure. Is there any other way? Is there a method to recover 3D position given that we have the MVP matrix and screen coordinates?
Read the deph value at the mouse pointer position (glReadPixels(…, GL_DEPTH,…)) and use that for recovering Z. Or do the raycasting into a scene structure. Depending on your applications either one may be the better solution.
Given a model-view-projection matrix, how would I determine if an object is displayed on the screen? Determining if it is within the clipping bounds is easy, but how do I use the numbers if the mvp matrix to determine if object is too far left/right/high/low given the object position and the screen width and height in pixels? (For simplicity, we can say that we only care about the object's center of mass)
simply apply the mvp matrixes to the center: centerInScreen = projMartix*viewMatrix*modelMatrix*center
then see if centerInScreen is inside the -1,-1 to 1,1 box, (which opengl maps to the viewport)
I am reading a book about 3D concepts and OpenGL. The book always talks about world space, eye space, and so on.
What exactly is a world inside the computer monitor screen?
What is the world space?
What is eye space? Is it synonymous to projection?
World space
World space is the (arbitrarily chosen) frame of reference in which everything within the world is located in absolute coordinates.
Local space
Local space is space relative to another local frame of reference, in coordinates relative to the local frame.
For example, the mesh of a model will be constructed in relation to a coordinate system local to the model. When you move around the model in the world, the relative positions to each other of the points making up the model don't change. But they change within the world.
Hence there exists a model-to-world transformation from local to world space.
Eye (or view) space
Eye (or view) space is the world as seen by the viewer, i.e. all the positions of things in the world are no longer in relation to the (arbitrary) world coordinate system, but in relation to the viewer.
View space is somewhat special, because it not arbitrarily chosen. The coordinate (0, 0, 0) in view space is the position of the viewer and a certain direction (usually parallel to Z) is the direction of viewing.
So there exists a transformation world-to-view. Now because the viewer is always at the origin of the view space, setting the viewpoint is done by defining the world-to-view transformation.
Since for the purposes of rendering the graphics world space is of little use, you normally coalesce model-to-world and world-to-view transformations into a single model-to-view transformation.
Note that eye (or view) space is not the projection. Projection happens by a separate projection transform that transforms view-to-clip space.
You should read this: http://www.opengl-tutorial.org/beginners-tutorials/tutorial-3-matrices/
That tutorial uses term "camera space" instead of "eye space" but they are the same.
As I am learning OpenGL I often stumble upon so-called eye space coordinates.
If I am right, you typically have three matrices. Model matrix, view matrix and projection matrix. Though I am not entirely sure how the mathematics behind that works, I do know that the convert coordinates to world space, view space and screen space.
But where is the eye space, and which matrices do I need to convert something to eye space?
Perhaps the following illustration showing the relationship between the various spaces will help:
Depending if you're using the fixed-function pipeline (you are if you call glMatrixMode(), for example), or using shaders, the operations are identical - it's just a matter of whether you code them directly in a shader, or the OpenGL pipeline aids in your work.
While there's distaste in discussing things in terms of the fixed-function pipeline, it makes the conversation simpler, so I'll start there.
In legacy OpenGL (i.e., versions before OpenGL 3.1, or using compatibility profiles), two matrix stacks are defined: model-view, and projection, and when an application starts the matrix at the top of each stack is an identity matrix (1.0 on the diagonal, 0.0 for all other elements). If you draw coordinates in that space, you're effectively rendering in normalized device coordinates(NDCs), which clips out any vertices outside of the range [-1,1] in both X, Y, and Z. The viewport transform (as set by calling glViewport()) is what maps NDCs into window coordinates (well, viewport coordinates, really, but most often the viewport and the window are the same size and location), and the depth value to the depth range (which is [0,1] by default).
Now, in most applications, the first transformation that's specified is the projection transform, which come in two varieties: orthographic and perspective projections. An orthographic projection preserves angles, and is usually used in scientific and engineering applications, since it doesn't distort the relative lengths of line segments. In legacy OpenGL, orthographic projections are specified by either glOrtho or gluOrtho2D. More commonly used are perspective transforms, which mimic how the eye works (i.e., objects far from the eye are smaller than those close), and are specified by either glFrustum or gluPerspective. For perspective projections, they defined a viewing frustum, which is a truncated pyramid anchored at the eye's location, which are specified in eye coordinates. In eye coordinates, the "eye" is located at the origin, and looking down the -Z axis. Your near and far clipping planes are specified as distances along the -Z axis. If you render in eye coordinates, any geometry specified between the near and far clipping planes, and inside of the viewing frustum will not be culled, and will be transformed to appear in the viewport. Here's a diagram of a perspective projection, and its relationship to the image plane .
The eye is located at the apex of the viewing frustum.
The last transformation to discuss is the model-view transform, which is responsible for moving coordinate systems (and not objects; more on that in a moment) such that they are well position relative to the eye and the viewing frustum. Common modeling transforms are translations, scales, rotations, and shears (of which there's no native support in OpenGL).
Generally speaking, 3D models are modeled around a local coordinate system (e.g., specifying a sphere's coordinates with the origin at the center). Modeling transforms are used to move the "current" coordinate system to a new location so that when you render your locally-modeled object, it's positioned in the right place.
There's no mathematical difference between a modeling transform and a viewing transform. It's just usually, modeling transforms are used to specific models and are controlled by glPushMatrix() and glPopMatrix() operations, which a viewing transformation is usually specified first, and affects all of the subsequent modeling operations.
Now, if you're doing this modern OpenGL (core profile versions 3.1 and forward), you have to do all these operations logically yourself (you might only specify one transform folding both the model-view and projection transformations into a single matrix multiply). Matrices are specified usually as shader uniforms. There are no matrix stacks, separation of model-view and projection transformations, and you need to get your math correct to emulate the pipeline. (BTW, the perspective division and viewport transform steps are performed by OpenGL after the completion of your vertex shader - you don't need to do the math [you can, it doesn't hurt anything unless you fail to set w to 1.0 in your gl_Position vertex shader output).
Eye space, view space, and camera space are all synonyms for the same thing: the world relative to the camera.
In a rendering, each mesh of the scene usually is transformed by the model matrix, the view matrix and the projection matrix. Finally the projected scene is mapped to the viewport.
The projection, view and model matrix interact together to present the objects (meshes) of a scene on the viewport.
The model matrix defines the position orientation and scale of a single object (mesh) in the world space of the scene.
The view matrix defines the position and viewing direction of the observer (viewer) within the scene.
The projection matrix defines the area (volume) with respect to the observer (viewer) which is projected onto the viewport.
Coordinate Systems:
Model coordinates (Object coordinates)
The model space is the coordinates system, which is used to define or modulate a mesh. The vertex coordinates are defined in model space.
World coordinates
The world space is the coordinate system of the scene. Different models (objects) can be placed multiple times in the world space to form a scene, in together.
The model matrix defines the location, orientation and the relative size of a model (object, mesh) in the scene. The model matrix transforms the vertex positions of a single mesh to world space for a single specific positioning. There are different model matrices, one for each combination of a model (object) and a location of the object in the world space.
View space (Eye coordinates)
The view space is the local system which is defined by the point of view onto the scene.
The position of the view, the line of sight and the upwards direction of the view, define a coordinate system relative to the world coordinate system. The objects of a scene have to be drawn in relation to the view coordinate system, to be "seen" from the viewing position. The inverse matrix of the view coordinate system is named the view matrix. This matrix transforms from world coordinates to view coordinates.
In general world coordinates and view coordinates are Cartesian coordinates
The view coordinates system describes the direction and position from which the scene is looked at. The view matrix transforms from the world space to the view (eye) space.
If the coordinate system of the view space is a Right-handed system, where the X-axis points to the right and the Y-axis points up, then the Z-axis points out of the view (Note in a right hand system the Z-Axis is the cross product of the X-Axis and the Y-Axis).
Clip space coordinates are Homogeneous coordinates. In clip space the clipping of the scene is performed.
A point is in clip space if the x, y and z components are in the range defined by the inverted w component and the w component of the homogeneous coordinates of the point:
-w <= x, y, z <= w.
The projection matrix describes the mapping from 3D points of a scene, to 2D points of the viewport. The projection matrix transforms from view space to the clip space. The coordinates in the clip space are transformed to the normalized device coordinates (NDC) in the range (-1, -1, -1) to (1, 1, 1) by dividing with the w component of the clip coordinates.
At orthographic projection, this area (volume) is defined by 6 distances (left, right, bottom, top, near and far) to the viewer's position.
If the left, bottom and near distance are negative and the right, top and far distance are positive (as in normalized device space), this can be imagined as box around the viewer.
All the objects (meshes) which are in the space (volume) are "visible" on the viewport. All the objects (meshes) which are out (or partly out) of this space are clipped at the borders of the volume.
This means at orthographic projection, the objects "behind" the viewer are possibly "visible". This may seem unnatural, but this is how orthographic projection works.
At perspective projection the viewing volume is a frustum (a truncated pyramid), where the top of the pyramid is the viewing position.
The direction of view (line of sight) and the near and the far distance define the planes which truncated the pyramid to a frustum (the direction of view is the normal vector of this planes).
The left, right, bottom, top distance define the distance from the intersection of the line of sight and the near plane, with the side faces of the frustum (on the near plane).
This causes that the scene looks like, as it would be seen from of a pinhole camera.
One of the most common mistakes, when an object is not visible on the viewport (screen is all "black"), is that the mesh is not within the view volume which is defined by the projection and view matrix.
Normalized device coordinates
The normalized device space is a cube, with right, bottom, front of (-1, -1, -1) and a left, top, back of (1, 1, 1).
The normalized device coordinates are the clip space coordinates divide by the w component of the clip coordinates. This is called Perspective divide
Window coordinates (Screen coordinates)
The window coordinates are the coordinates of the viewport rectangle. The window coordinates are decisive for the rasterization process.
The normalized device coordinates are linearly mapped to the viewport rectangle (Window Coordinates / Screen Coordinates) and to the depth for the depth buffer.
The viewport rectangle is defined by glViewport. The depth range is set by glDepthRange and is by default [0, 1].