Questions about Orthogonal vs Perspective in OpenGL [closed] - opengl

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I've gotten a 3 vertices triangle rotating around the y-axis. One of the things I find "weird" is normalized coordinates in GL's default orthogonal projection. I've used 2-D libraries like SDL and SFML, which almost always deal with pixels. You say you want an image surface that is 50x50 pixels, that is what you get. So initially it was strange for me to say limit my vertex position choices from [-1,1].
Why do orthogonal coordinates have to be normalized? Is perspective projection the same? If so, how would you say you want the origin of your object to be at z=-10? (My quick look over matrix m ath says perspective is different. Something about division by 'w' creating homogenous (same thing as normalized?) coordinates, but I'm not sure).
gl_Position = View * Model * Project * Vertex;
I've seen that equation above and I'm boggled by how the variable gl_Position used in shaders can represent both the position of the current vertices of a model/object, and at the same time a view/projection, or the position of the camera. How does that work? I understand by multiplication all that information is stored in one matrix, but how does OpenGL use that one matrix whose information is now combined to say, "ok, this part/fraction of gl_Position is for the camera, and that other part is information for where the model is going to go."? (BTW, I'm not quite sure what the Vertex vec4 represents. I thought all vertices of a model were inside Model. Any ideas?
One more question, if you just wanted to move the camera, for example in FPS games you move the mouse up to look up, but no objects are being rotated or translated (I think) other than the camera, would the equation above look something like this?
gl_Position = View * Project;

Why do orthogonal coordinates have to be normalized?
They don't. You can set the limits of the orthogonal projection volume however you desire. The left, right, bottom, top, near and far parameters of the glOrtho call define the limits of the viewport volume. If you chose them left=0, right=win_pixel_width, bottom=0, top=win_pixel_height you end up with a pixel unit projection volume as you're used to. However why bother with pixels? You'd just have to compensate for the actual window size later. Just choose the ortho projection volume extents to match the scene you want to draw.
Maybe you're confusing this with normalized device coordinates. And for those it simply has been defined that it is the value range [-1, 1] that's mapped to the viewport extents.
Update
BTW, I'm not quite sure what the Vertex vec4 represents. I thought all vertices of a model were inside Model. Any ideas?
I'm getting quite fatigued right now, because I've been answering several questions like this numerous times over the last few days. So, here it goes again:
In OpenGL there is no camera.
In OpenGL there is no scene.
In OpenGL there are no models.
"Wait, what?!" you may wonder now. But it's true.
All OpenGL cares about is, that there is some target framebuffer, i.e. a canvas it can draw to, and a stream of vertex attributes that make geometric primitives. The primitives are points, lines and triangles. Somehow the vertex attributes for, say a triangle, must be mapped to a position on the framebuffer canvas. For this an vertex attribute we call position goes through a number of affine transformations.
The first is from a local model space into world space , the Model transform.
From world space into eye space, the View transform. It is this view transform, which acts like placing the camera in a scene.
After that it put through the equivalent of a camera's lens, which is a Projection transform.
After the Projection transform the position is in clip space where it undergoes some operations, that are not essential to understand for the time being. After clipping the so called homogenous divide is applied to reach normalized device coordinate space, by dividing the clip space position vector by its own w-component.
v_position_ndc = v_position_clip / v_position_clip.w
This step is, what makes a perspective projection actually work. The z-distance of a vertex' position is worked into the clip space w-component. And by the homogenous divide vertices with a larger position w get scaled proportionally to 1/w in the XY plane which creates a perspective effect.
You mistook this operation as normalization, but it is not!
After the homogenous divide vertex position has been mapped from clip to NDC space. And OpenGL defines, that the visible volume of NDC space is the box [-1, 1]^3 ; vertices outside this box are clipped.
It's crucial to understand that View transform and Projection are different. For a position it's not so obvious, but another vertex attribute called the normal, which is an important ingredient for lighting calculations, must be transformed in a slightly different way (instead of Projection · View · Model it must be transformed by inverse(transpose(View · Model)), i.e. the Projection takes no part in it but the viewpoint does).
The matrices itself are 4×4 grids of real valued scalars (ignore for the time being that numbers in a computer are always rational numbers). So the rank of the matrix is 4 and hence it must be multiplied of vectors of dimension 4 (hence the type vec4)
OpenGL treats vertex attributes as column vectors so matrix multiplication is left associative i.e. a vector enters an expression on the right side and comes out on the left. The order of matrix multiplication matters. You can not freely reorder things!
The statement
gl_Position = Projection * View * Model * vertex_position; // note the order
makes the vertex shader perform this very transformation process I just described.

"Note that there is no separate camera (view) matrix in OpenGL. Therefore, in order to simulate transforming the camera or view, the scene (3D objects and lights) must be transformed with the inverse of the view transformation. In other words, OpenGL defines that the camera is always located at (0, 0, 0) and facing to -Z axis in the eye space coordinates, and cannot be transformed. See more details of GL_MODELVIEW matrix in ModelView Matrix."
Source: http://www.songho.ca/opengl/gl_transform.html
That is what I was getting hung up on. I thought there would a separate camera view matrix in OpenGL.

Related

Clipping in clipping coordinate system and normalized device coordinate system (OpenGL)

I heard clipping should be done in clipping coordinate system.
The book suggests a situation that a line is laid from behind camera to in viewing volume. (We define this line as PQ, P is behind camera point)
I cannot understand why it can be a problem.
(The book says after finishing normalizing transformation, the P will be laid in front of camera.)
I think before making clipping coordinate system, the camera is on original point (0, 0, 0, 1) because we did viewing transformation.
However, in NDCS, I cannot think about camera's location.
And I have second question.
In vertex shader, we do model-view transformation and then projection transformation. Finally, we output these vertices to rasterizer.
(some vertices's w is not equal to '1')
Here, I have curiosity. The rendering pipeline will automatically do division procedure (by w)? after finishing clipping.
Sometimes not all the model can be seen on screen, mostly because some objects of it lie behind the camera (or "point of view"). Those objects are clipped out. If just a part of the object can't be seen, then just that part must be clipped leaving the rest as seen.
OpenGL clips
OpenGL does this clipping in Clipping Coordinate Space (CCS). This is a cube of size 2w x 2w x 2w where 'w' is the fourth coordinate resulting of (4x4) x (4x1) matrix and point multiplication. A mere comparison of coordinates is enough to tell if the point is clipped or not. If the point passes the test then its coordinates are divided by 'w' (so called "perspective division"). Notice that for ortogonal projections 'w' is always 1, and with perspective it's generally not 1.
CPU clips
If the model is too big perhaps you want to save GPU resources or improve the frame rate. So you decide to skip those objects that are going to get clipped anyhow. Then you do the maths on your own (on CPU) and only send to the GPU the vertices that passed the test. Be aware that some objects may have some vertices clipped while other vertices of this same object may not.
Perhaps you do send them to GPU and let it handle these special cases.
You have a volume defined where only objects inside are seen. This volume is defined by six planes. Let's put ourselves in the camera and look at this volume: If your projection is perspective the six planes build a "fustrum", a sort of truncated pyramid. If your projection is orthogonal, the planes form a parallelepiped.
In order to clip or not to clip a vertex you must use the distance form the vertex to each of these six planes. You need a signed distance, this means that the sign tells you what side of the plane is seen form the vertex. If any of the six distance signs is not the right one, the vertex is discarded, clipped.
If a plane is defined by equation Ax+By+Cz+D=0 then the signed distance from p1,p2,p3 is (Ap1+Bp2+Cp3+D)/sqrt(AA+BB+C*C). You only need the sign, so don't bother to calculate the denominator.
Now you have all tools. If you know your planes on "camera view" you can calculate the six distances and clip or not the vertex. But perhaps this is an expensive operation considering that you must transform the vertex coodinates from model to camera (view) spaces, a ViewModel matrix calculation. With the same cost you use your precalculated ProjectionViewModel matrix instead and obtain CCS coordinates, which are much easier to compare to '2w', the size of the CCS cube.
Sometimes you want to skip some vertices not due to they are clipped, but because their depth breaks a criteria you are using. In this case CCS is not the right space to work with, because Z-coordinate is transformed into [-w, w] range, depth is somehow "lost". Instead, you do your clip test in "view space".

Why does OpenGL allow/use fractional values as the location of vertices?

As far as I understand, location of a point/pixel cannot be a fraction, at least on a raster graphics system where hardwares use pixels to display images.
Then, why and how does OpenGL use fractional values for plotting pixels?
For example, how is it possible: glVertex2f(0.15f, 0.51f); ?
This command does not plot any pixels. It merely defines the location of a point in 3D space (you'll notice that there are 3 coordinates, while for a pixel on the screen you'd only need 2). This is the starting point for the OpenGL pipeline. This point then goes through a lot of transformations before it ends up on the screen.
Also, the coordinates are unitless. For example, you can say that your viewport is between 0.0f and 1.0f, then these coordinates make a lot of sense. Basically you have to think of these point in terms of mathematics, not pixels.
I would suggest some reading on how OpenGL transformations work, for example here, here or the tutorial here.
The vectors you pass into OpenGL are not viewport positions but arbitrary numbers in some vector space. Only after a chain of transformations these numbers are mapped into viewport pixel positions. With the old fixed function pipeline this could be anything that can be represented by a vector–matrix multiplication.
These days, where everything is programmable (shaders) the mapping can very well be any kind of function you can think of. For example the values you pass into glVertex (immediate mode call, but available to shaders with OpenGL-2.1) may be interpreted as polar coordinates in the vertex shader:
This is a perfectly valid OpenGL-2.1 vertex shader that interprets the vertex position to be in polar coordinates. Note that due to triangles and lines being straight edges and polar coordinates being curvilinear this gives good visual results only for points or highly tesselated primitives.
#version 110
void main() {
gl_Position =
gl_ModelViewProjectionMatrix
* vec4( gl_Vertex.y*vec2(sin(gl_Vertex.x),cos(gl_Vertex.x)) , 0, 1);
}
As you can see here the valus passed to glVertex are actually arbitrary, unitless components of vectors in some vector space. Only by applying some transformation to the viewport space these vectors gain meaning. Hence it makes no way to impose a certain value range onto the values that go into the vertex attribute.
Vertex and pixel are very different things.
It's quite possible to have all your vertices within one pixel (although in this case you probably need help with LODing).
You might want to start here...
http://www.glprogramming.com/blue/ch01.html
Specifically...
Primitives are defined by a group of one or more vertices. A vertex defines a point, an endpoint of a line, or a corner of a polygon where two edges meet. Data (consisting of vertex coordinates, colors, normals, texture coordinates, and edge flags) is associated with a vertex, and each vertex and its associated data are processed independently, in order, and in the same way.
And...
Rasterization produces a series of frame buffer addresses and associated values using a two-dimensional description of a point, line segment, or polygon. Each fragment so produced is fed into the last stage, per-fragment operations, which performs the final operations on the data before it's stored as pixels in the frame buffer.
For your example, before glVertex2f(0.15f, 0.51f) is on the screen, there are many transforms to be done. Making complex thing crudely simpler, after moving your vertex to view space (applying camera position and direction), the magic here is (1) projection matrix, and (2) viewport setting.
Internally, OpenGL "screen coordinates" are in a cube (-1, -1, -1) - (1, 1, 1), :
http://www.matrix44.net/cms/wp-content/uploads/2011/03/ogl_coord_object_space_cube.png
Projection matrix 'squeezes' the frustum in this cube (which you do in vertex shader), assuming you have perspective transform - if projection is orthogonal, the projection is just a tube, limited by near and far values (and like in both cases, scaling factors):
http://www.songho.ca/opengl/files/gl_projectionmatrix01.png
EDIT: Maybe better example here:
http://www.opengl-tutorial.org/beginners-tutorials/tutorial-3-matrices/#The_Projection_matrix
(EDIT: The Z-coordinate is used as depth value) When fragments are finally transferred to pixels on texture/framebuffer/screen, these are multiplied with viewport settings:
https://www3.ntu.edu.sg/home/ehchua/programming/opengl/images/GL_2DViewportAspectRatio.png
Hope this helps!

Confused about OpenGL transformations

In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.

Why would it be beneficial to have a separate projection matrix, yet combine model and view matrix?

When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.

OpenGL define vertex position in pixels

I've been writing a 2D basic game engine in OpenGL/C++ and learning everything as I go along. I'm still rather confused about defining vertices and their "position". That is, I'm still trying to understand the vertex-to-pixels conversion mechanism of OpenGL. Can it be explained briefly or can someone point to an article or something that'll explain this. Thanks!
This is rather basic knowledge that your favourite OpenGL learning resource should teach you as one of the first things. But anyway the standard OpenGL pipeline is as follows:
The vertex position is transformed from object-space (local to some object) into world-space (in respect to some global coordinate system). This transformation specifies where your object (to which the vertices belong) is located in the world
Now the world-space position is transformed into camera/view-space. This transformation is determined by the position and orientation of the virtual camera by which you see the scene. In OpenGL these two transformations are actually combined into one, the modelview matrix, which directly transforms your vertices from object-space to view-space.
Next the projection transformation is applied. Whereas the modelview transformation should consist only of affine transformations (rotation, translation, scaling), the projection transformation can be a perspective one, which basically distorts the objects to realize a real perspective view (with farther away objects being smaller). But in your case of a 2D view it will probably be an orthographic projection, that does nothing more than a translation and scaling. This transformation is represented in OpenGL by the projection matrix.
After these 3 (or 2) transformations (and then following perspective division by the w component, which actually realizes the perspective distortion, if any) what you have are normalized device coordinates. This means after these transformations the coordinates of the visible objects should be in the range [-1,1]. Everything outside this range is clipped away.
In a final step the viewport transformation is applied and the coordinates are transformed from the [-1,1] range into the [0,w]x[0,h]x[0,1] cube (assuming a glViewport(0, w, 0, h) call), which are the vertex' final positions in the framebuffer and therefore its pixel coordinates.
When using a vertex shader, steps 1 to 3 are actually done in the shader and can therefore be done in any way you like, but usually one conforms to this standard modelview -> projection pipeline, too.
The main thing to keep in mind is, that after the modelview and projection transforms every vertex with coordinates outside the [-1,1] range will be clipped away. So the [-1,1]-box determines your visible scene after these two transformations.
So from your question I assume you want to use a 2D coordinate system with units of pixels for your vertex coordinates and transformations? In this case this is best done by using glOrtho(0.0, w, 0.0, h, -1.0, 1.0) with w and h being the dimensions of your viewport. This basically counters the viewport transformation and therefore transforms your vertices from the [0,w]x[0,h]x[-1,1]-box into the [-1,1]-box, which the viewport transformation then transforms back to the [0,w]x[0,h]x[0,1]-box.
These have been quite general explanations without mentioning that the actual transformations are done by matrix-vector-multiplications and without talking about homogenous coordinates, but they should have explained the essentials. This documentation of gluProject might also give you some insight, as it actually models the transformation pipeline for a single vertex. But in this documentation they actually forgot to mention the division by the w component (v" = v' / v'(3)) after the v' = P x M x v step.
EDIT: Don't forget to look at the first link in epatel's answer, which explains the transformation pipeline a bit more practical and detailed.
It is called transformation.
Vertices are set in 3D coordinates which is transformed into a viewport coordinates (into your window view). This transformation can be set in various ways. Orthogonal transformation can be easiest to understand as a starter.
http://www.songho.ca/opengl/gl_transform.html
http://www.opengl.org/wiki/Vertex_Transformation
http://www.falloutsoftware.com/tutorials/gl/gl5.htm
Firstly be aware that OpenGL not uses standard pixel coordinates. I mean by that for particular resolution, ie. 800x600 you dont have horizontal coordinates in range 0-799 or 1-800 stepped by one. You rather have coordinates ranged from -1 to 1 later send to graphic card rasterizing unit and after that matched to particular resolution.
I ommited one step here - before all that you have an ModelViewProjection matrix (or viewProjection matrix in some simple cases) which before all that will cast coordinates you use to an projection plane. Default use of that is to implement a camera which converts 3D space of world (View for placing an camera into right position and Projection for casting 3d coordinates into screen plane. In ModelViewProjection it's also step of placing a model into right place in world).
Another case (and you can use Projection matrix this way to achieve what you want) is to use these matrixes to convert one range of resolutions to another.
And there's a trick you will need. You should read about modelViewProjection matrix and camera in openGL if you want to go serious. But for now I will tell you that with proper matrix you can just cast your own coordinate system (and ie. use ranges 0-799 horizontaly and 0-599 verticaly) to standarized -1:1 range. That way you will not see that underlying openGL api uses his own -1 to 1 system.
The easiest way to achieve this is glOrtho function. Here's the link to documentation:
http://www.opengl.org/sdk/docs/man/xhtml/glOrtho.xml
This is example of proper usage:
glMatrixMode (GL_PROJECTION)
glLoadIdentity ();
glOrtho (0, 800, 600, 0, 0, 1)
glMatrixMode (GL_MODELVIEW)
Now you can use own modelView matrix ie. for translation (moving) objects but don't touch your projection example. This code should be executed before any drawing commands. (Can be after initializing opengl in fact if you wont use 3d graphics).
And here's working example: http://nehe.gamedev.net/tutorial/2d_texture_font/18002/
Just draw your figures instead of drawing text. And there is another thing - glPushMatrix and glPopMatrix for choosen matrix (in this example projection matrix) - you wont use that until you combining 3d with 2d rendering.
And you can still use model matrix (ie. for placing tiles somewhere in world) and view matrix (in example for zooming view, or scrolling through world - in this case your world can be larger than resolution and you could crop view by simple translations)
After looking at my answer I see it's a little chaotic but If you confused - just read about Model, View, and Projection matixes and try example with glOrtho. If you're still confused feel free to ask.
MSDN has a great explanation. It may be in terms of DirectX but OpenGL is more-or-less the same.
Google for "opengl rendering pipeline". The first five articles all provide good expositions.
The key transition from vertices to pixels (actually, fragments, but you won't be too far off if you think "pixels") is in the rasterization stage, which occurs after all vertices have been transformed from world-coordinates to screen coordinates and clipped.