Using GLM's UnProject - c++

I'm not sure how to use the Unproject method provided by GLM.
Specifically, in what format is the viewport passed in? And why doesn't the function require a view matrix as well as a projection and world matrix?

A bit of history is required here. GLM's unproject is actually a more or less direct replacement for the gluUnProject function that uses deprecated OpenGL fixed-function rendering. In this mode the Model and View matrix were actually combined in the "ModelView" matrix. Apparently, the GLM author dropped the 'view' part in the naming, which confuses thing even more, but it comes down to passing something like view*model.
Now for the actual use:
win is a vector that holds three components that have meaning in Window coordinates. These are coordinates 'x,y' in your viewport, the 'z' coordinate you usually retrieve by reading your depth buffer at (x,y)
model, view and projection matrices should speak for itself if you're even thinking of using this function. But a good (opengl-specific) refresher can be useful.
The viewport is defined as in glViewport, which means (x,y,w,h). X and Y specify the lower left corner of your viewport (usually 0,0). Width and height (w,h). Note that in many other systems x,y specify the upper left corner, you have to transform your y-coordinate then, which is shown in the NeHe code I link to below.
When applied you simply end up by converting the provided window coordinates back to the object coordinates, more or less the inverse of what your render code usually does.
A half-decent explanation on the original gluUnProject can be found as a NeHe article. But of course that is OpenGL-specific, while glm can be used in other contexts.

The viewport is passed in as four floats: the x and y window coordinates of the viewport, followed by its width and height. That's the same order as used e.g. by glGetFloatv(GL_VIEWPORT, ...). So in most cases, the first two values should be 0.
As KillianDS already pointed out, the modelargument in fact is a modelview matrix, see the example use of unProject() in gtx_simd_mat4.cpp, function test_compute_gtx():
glm::mat4 E = glm::translate(D, glm::vec3(1.4f, 1.2f, 1.1f));
glm::mat4 F = glm::perspective(i, 1.5f, 0.1f, 1000.f);
glm::mat4 G = glm::inverse(F * E);
glm::vec3 H = glm::unProject(glm::vec3(i), G, F, E[3]);
As you can see, the matrix passed as the second argument basically is the product of a translation and a perspective transformation.

Related

What does the glFrustum function do?

According to MSDN, we are multiplying the current matrix with the perspective matrix. What matrix are we talking about here? Also from MSDN:
"The glFrustum function multiplies the current matrix by this matrix, with the result replacing the current matrix. That is, if M is the current matrix and F is the frustum perspective matrix, then glFrustum replaces M with M • F."
Now, from my current knowledge of grade 12 calculus (in which I am currently), M is a vector and M dot product F returns a scalar, so how can one replace a vector with a scalar?
I'm also not sure what "clipping" planes are and how they can be referenced via one float value.
Please phrase your answer in terms of the parameters, and also conceptually. I'd really appreciate that.
I'm trying to learn openGL via this tutorial: http://www.youtube.com/watch?v=WdGF7Bw6SUg. It is actually not really good because it explains nothing or else it assumes one knows openGL. I'd really appreciate a link to a tutorial otherwise, I really appreciate your time and responses! I'm sure I can struggle through and figure things out.
You misunderstand what it's saying. M is a matrix. M•F therefore is also a matrix. It constructs a perspective matrix. See this article for an explanation of how it is constructed and when you want to use glFrustum() vs. gluPerspective():
glFrustum() and gluPerspective() both produce perspective projection matrices that you can use to transform from eye coordinate space to clip coordinate space. The primary difference between the two is that glFrustum() is more general and allows off-axis projections, while gluPerspective() only produces symmetrical (on-axis) projections. Indeed, you can use glFrustum() to implement gluPerspective().
Clipping planes are planes that cut out sections of the world so they don't have to be rendered. The frustum describes where the planes are in space. Its sides define the view volume.
glFrustum generates a perspective projection matrix.
This matrix maps a portion of the space (the "frustum") to your screen. Many caveats apply (normalized device coordinates, perspective divide, etc), but that's the idea.
The part that puzzles you is a relic of the deprecated OpenGL API. You can safely ignore it, because usually you apply gluFrustum() on an identity projection matrix, so the multiplication mentioned in the doc has no effect.
A perhaps more comprehensible way to do things is the following :
// Projection matrix : 45° Field of View, 4:3 ratio, display range : 0.1 unit <-> 100 units
glm::mat4 Projection = glm::perspective(45.0f, 4.0f / 3.0f, 0.1f, 100.0f);
// Or, for an ortho camera :
//glm::mat4 Projection = glm::ortho(-10.0f,10.0f,-10.0f,10.0f,0.0f,100.0f); // In world coordinates
// Camera matrix
glm::mat4 View = glm::lookAt(
glm::vec3(4,3,3), // Camera is at (4,3,3), in World Space
glm::vec3(0,0,0), // and looks at the origin
glm::vec3(0,1,0) // Head is up (set to 0,-1,0 to look upside-down)
);
// Model matrix : an identity matrix (model will be at the origin)
glm::mat4 Model = glm::mat4(1.0f);
// Our ModelViewProjection : multiplication of our 3 matrices
glm::mat4 MVP = Projection * View * Model; // Remember, matrix multiplication is the other way around
Here, I used glm::perspective (the equivalent of old-style gluPerspective() ), but you can use gluFrustum too.
As for the arguments of the function, they are best explained by this part of the doc :
left bottom - nearVal and right top - nearVal specify the points on
the near clipping plane that are mapped to the lower left and upper
right corners of the window, assuming that the eye is located at (0,
0, 0)
If you're unfamiliar with transformation matrices, I wrote a tutorial here; the "Projection Matrix" section especially should help you.

Confused about OpenGL transformations

In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.

How to use OpenGL orthographic projection with the depth buffer?

I've rendered a 3D scene using glFrustum() perspective mode. I then have a 2D object that I place over the 3D scene to act as a label for a particular 3D object. I have calculated the 2D position of the 3D object using using gluProject() at which position I then place my 2D label object. The 2D label object is rendered using glOrtho() orthographic mode. This works perfectly and the 2D label object hovers over the 3D object.
Now, what I want to do is to give the 2D object a z value so that it can be hidden behind other 3D objects in the scene using the depth buffer. I have given the 2D object a z value that I know should be hidden by the depth buffer, but when I render the object it is always visible.
So the question is, why is the 2D object still visible and not hidden?
I did read somewhere that orthographic and perspective projections store incompatible depth buffer values. Is this true, and if so how do I convert between them?
I don't want it to be transformed, instead I want it to appear as flat 2D label that always faces the camera and remains the same size on the screen at all times. However, if it is hidden behind something I want it to appear hidden.
First, you should have put that in your question; it explains far more about what you're trying to do than your question.
To achieve this, what you need to do is find the z-coordinate in the orthographic projection that matches the z-coordinate in pre-projective space of where you want the label to appear.
When you used gluProject, you got three coordinates back. The Z coordinate is important. What you need to do is reverse-transform the Z coordinate based on the zNear and zFar values you give to glOrtho.
Pedantic note: gluProject doesn't transform the Z coordinate to window space. To do so, it would have to take the glDepthRange parameters. What it really does is assume a depth range of near = 0.0 and far = 1.0.
So our first step is to transform from window space Z to normalized device coordinate (NDC) space Z. We use this simple equation:
ndcZ = (2 * winZ) - 1
Simple enough. Now, we need to go to clip space. Which is a no-op, because with an orthographic projection, the W coordinate is assumed to be 1.0. And the division by W is the difference between clip space and NDC space.
clipZ = ndcZ
But we don't need clip space Z. We need pre-orthographic projection space Z (aka: camera space Z). And that requires the zNear and zFar parameters you gave to glOrtho. To get to camera space, we do this:
cameraZ = ((clipZ + (zFar + zNear)/(zFar - zNear)) * (zFar - zNear))/-2
And you're done. Use that Z position in your rendering. Oh, and make sure your modelview matrix doesn't include any transforms in the Z direction (unless you're using the modelview matrix to apply this Z position to the label, which is fine).
Based on Nicol's answer you can simply set zNear to 0 (which generally makes sense for 2D elements that are acting as part of the GUI) and then you simply have:
cameraZ = -winZ*zFar

Glut Mouse Coordinates

I am trying to obtain the Cartesian coordinates of a point for that I am using a simple function the devolves a window coordinates, something like (100,200). I want to convert this to Cartesian coordinates, the problems is that the window size is variable so I really don't know how to implement this.
Edit:
Op:
I tried to use the guUnPorject by doing something like this
GLdouble modelMatrix[16];
glGetDoublev(GL_MODELVIEW_MATRIX,modelMatrix);
GLdouble projMatrix[16];
glGetDoublev(GL_PROJECTION_MATRIX,projMatrix);
double position[3];
gluUnProject(
x,
y,
1,
modelMatrix,
projMatrix,
viewport,
&position[0], //-> pointer to your own position (optional)
&position[1], // id
&position[2] //
);
cout<<position[0];
However the position I got seems completely random
Mouse coordinates are already in cartesian coordinates, namely the coordinates of window space. I think what you're looking for is a transformation into the world space of your OpenGL scene. The 2D mouse coordinates in the window lack some information thougn: The depth.
So what you can otain is actually a ray from the "camera" into the scene. For this you need to perform the back projection from screen space to world space. For this you take the projection matrix, P and the view matrix V (what's generated by gluLookAt or similar), form the product P*V and invert it i.e. (P*V)^-1 = V^-1 * P^-1. Take notice that inversion is not transposition. Inverting a matrix is done by Gauss-Jordan elimination, preferrably with some pivoting. Any mathtextbook on linear algebra explains it.
Then you need two vectors to form a ray. For this you take two screen space positions with the same XY coordinates but differing depth (say, 0 and 1) and do the backprojection:
(P*V)^-1 * (x, y, {0, 1})
The difference of the resulting vectors gives you a ray direction.
The whole backprojection process has been wrapped up into gluUnProject. However I recommend not using it in its GLU form, but either look at the source code to learn from it, or use a modern day substitute like it's offered by GLM.

OpenGL define vertex position in pixels

I've been writing a 2D basic game engine in OpenGL/C++ and learning everything as I go along. I'm still rather confused about defining vertices and their "position". That is, I'm still trying to understand the vertex-to-pixels conversion mechanism of OpenGL. Can it be explained briefly or can someone point to an article or something that'll explain this. Thanks!
This is rather basic knowledge that your favourite OpenGL learning resource should teach you as one of the first things. But anyway the standard OpenGL pipeline is as follows:
The vertex position is transformed from object-space (local to some object) into world-space (in respect to some global coordinate system). This transformation specifies where your object (to which the vertices belong) is located in the world
Now the world-space position is transformed into camera/view-space. This transformation is determined by the position and orientation of the virtual camera by which you see the scene. In OpenGL these two transformations are actually combined into one, the modelview matrix, which directly transforms your vertices from object-space to view-space.
Next the projection transformation is applied. Whereas the modelview transformation should consist only of affine transformations (rotation, translation, scaling), the projection transformation can be a perspective one, which basically distorts the objects to realize a real perspective view (with farther away objects being smaller). But in your case of a 2D view it will probably be an orthographic projection, that does nothing more than a translation and scaling. This transformation is represented in OpenGL by the projection matrix.
After these 3 (or 2) transformations (and then following perspective division by the w component, which actually realizes the perspective distortion, if any) what you have are normalized device coordinates. This means after these transformations the coordinates of the visible objects should be in the range [-1,1]. Everything outside this range is clipped away.
In a final step the viewport transformation is applied and the coordinates are transformed from the [-1,1] range into the [0,w]x[0,h]x[0,1] cube (assuming a glViewport(0, w, 0, h) call), which are the vertex' final positions in the framebuffer and therefore its pixel coordinates.
When using a vertex shader, steps 1 to 3 are actually done in the shader and can therefore be done in any way you like, but usually one conforms to this standard modelview -> projection pipeline, too.
The main thing to keep in mind is, that after the modelview and projection transforms every vertex with coordinates outside the [-1,1] range will be clipped away. So the [-1,1]-box determines your visible scene after these two transformations.
So from your question I assume you want to use a 2D coordinate system with units of pixels for your vertex coordinates and transformations? In this case this is best done by using glOrtho(0.0, w, 0.0, h, -1.0, 1.0) with w and h being the dimensions of your viewport. This basically counters the viewport transformation and therefore transforms your vertices from the [0,w]x[0,h]x[-1,1]-box into the [-1,1]-box, which the viewport transformation then transforms back to the [0,w]x[0,h]x[0,1]-box.
These have been quite general explanations without mentioning that the actual transformations are done by matrix-vector-multiplications and without talking about homogenous coordinates, but they should have explained the essentials. This documentation of gluProject might also give you some insight, as it actually models the transformation pipeline for a single vertex. But in this documentation they actually forgot to mention the division by the w component (v" = v' / v'(3)) after the v' = P x M x v step.
EDIT: Don't forget to look at the first link in epatel's answer, which explains the transformation pipeline a bit more practical and detailed.
It is called transformation.
Vertices are set in 3D coordinates which is transformed into a viewport coordinates (into your window view). This transformation can be set in various ways. Orthogonal transformation can be easiest to understand as a starter.
http://www.songho.ca/opengl/gl_transform.html
http://www.opengl.org/wiki/Vertex_Transformation
http://www.falloutsoftware.com/tutorials/gl/gl5.htm
Firstly be aware that OpenGL not uses standard pixel coordinates. I mean by that for particular resolution, ie. 800x600 you dont have horizontal coordinates in range 0-799 or 1-800 stepped by one. You rather have coordinates ranged from -1 to 1 later send to graphic card rasterizing unit and after that matched to particular resolution.
I ommited one step here - before all that you have an ModelViewProjection matrix (or viewProjection matrix in some simple cases) which before all that will cast coordinates you use to an projection plane. Default use of that is to implement a camera which converts 3D space of world (View for placing an camera into right position and Projection for casting 3d coordinates into screen plane. In ModelViewProjection it's also step of placing a model into right place in world).
Another case (and you can use Projection matrix this way to achieve what you want) is to use these matrixes to convert one range of resolutions to another.
And there's a trick you will need. You should read about modelViewProjection matrix and camera in openGL if you want to go serious. But for now I will tell you that with proper matrix you can just cast your own coordinate system (and ie. use ranges 0-799 horizontaly and 0-599 verticaly) to standarized -1:1 range. That way you will not see that underlying openGL api uses his own -1 to 1 system.
The easiest way to achieve this is glOrtho function. Here's the link to documentation:
http://www.opengl.org/sdk/docs/man/xhtml/glOrtho.xml
This is example of proper usage:
glMatrixMode (GL_PROJECTION)
glLoadIdentity ();
glOrtho (0, 800, 600, 0, 0, 1)
glMatrixMode (GL_MODELVIEW)
Now you can use own modelView matrix ie. for translation (moving) objects but don't touch your projection example. This code should be executed before any drawing commands. (Can be after initializing opengl in fact if you wont use 3d graphics).
And here's working example: http://nehe.gamedev.net/tutorial/2d_texture_font/18002/
Just draw your figures instead of drawing text. And there is another thing - glPushMatrix and glPopMatrix for choosen matrix (in this example projection matrix) - you wont use that until you combining 3d with 2d rendering.
And you can still use model matrix (ie. for placing tiles somewhere in world) and view matrix (in example for zooming view, or scrolling through world - in this case your world can be larger than resolution and you could crop view by simple translations)
After looking at my answer I see it's a little chaotic but If you confused - just read about Model, View, and Projection matixes and try example with glOrtho. If you're still confused feel free to ask.
MSDN has a great explanation. It may be in terms of DirectX but OpenGL is more-or-less the same.
Google for "opengl rendering pipeline". The first five articles all provide good expositions.
The key transition from vertices to pixels (actually, fragments, but you won't be too far off if you think "pixels") is in the rasterization stage, which occurs after all vertices have been transformed from world-coordinates to screen coordinates and clipped.