I am implementing a custom mathematics library to create model, view and projection matrices in OpenGL and DirectX. I am doing this, to improve my understanding in the mathematics behind 3D applications.
At first I took a look at right-handed and left-handed coordinate systems. I realized, that if I create my view matrix as well as my projection matrix with the same handedness, the scene renders correctly no matter if I use OpenGL or DirectX (except inverted translation/rotation/etc.). This is a bit confusing because every article I read pointed out, that OpenGL uses a right-handed and DirectX a left-handed coordinate system. Is my assumption right, that the coordinate system completely depends on the handedness of the view and projection matrices?
When I took a look at clip spaces, things became more difficult to understand. The articles stated, that the clip volume on the X and Y axis ranges from -1.0 to 1.0 in both APIs. On the Z axis OpenGL clips from -1.0 to 1.0 and DirectX clips from 0.0 to 1.0. However, in OpenGL the scene renders correctly no matter if I use a projection matrix for clipping between -1.0 and 1.0 or 0.0 and 1.0. In DirectX only the projection matrix for clipping between 0.0 and 1.0 works. If I use the other one, nothing is rendered.
So my question to this topic is: Can the clipping volume be changed on the Z axis or are those fixed implementations. If they are fixed, then why can I use both types of clipping volumes in OpenGL and only a single one in DirectX?
However, in OpenGL the scene renders correctly no matter if I use a projection matrix for clipping between -1.0 and 1.0 or 0.0 and 1.0. In DirectX only the projection matrix for clipping between 0.0 and 1.0 works
OpenGL renders because [0.0, 1.0] falls within the range [-1.0, 1.0], but DirectX does not because [-1.0, 1.0] can fall outside of [0.0, 1.0]. If you invert the depth of your test scene, you'll likely be able to see stuff in DirectX using the wrong projection matrix.
Can the clipping volume be changed on the Z axis or are those fixed implementations. If they are fixed, then why can I use both types of clipping volumes in OpenGL and only a single one in DirectX?
No, you can't change that behavior as it's part of the spec. The point of all the transformations you do as part of the rendering pipeline is to get your scene into clipping space. That means, for example, mapping your scene's depth from [-9999, 9999] down to [-1.0, 1.0] for OpenGL, or [0.0, 1.0] for DirectX
Is my assumption right, that the coordinate system completely depends on the handedness of the view and projection matrices?
Not sure what you're asking here. You already observed that a scene in OpenGL looks inverted in DirectX if you don't account for the difference in handedness. The view and projection matrix that your application uses doesn't change the behavior of the OpenGL and DirectX implementations.
Related
What is difference between gl clipdistance and clipplane?
Which will be effective for clipping? Is clip plane is contrained to normalised device coordinates ie. From -0.1 to 1.0.
Assuming you are refering to the OpenGL method glClipPlane and the glsl variable gl_ClipDistance: Those two are not directly related.
glClipPlane controls clipping planes in the fixed-function pipeline and is deprecated.
gl_ClipDistance is the modern version and is set from inside the shader. It has to contain the distance of the current vertex to a clip-plane. OpenGL does in this case not know anything about the clipplane itself, since the only relevant value are the distances to this planes.
The values of the plane (in both cases) are technically not constrained to any range, but in practice only planes intersecting the [-1, 1] cube will have any effect, since clipping against the unit cube still happens.
I am having profound issues regarding understanding the transformations involved in VTK. OpenGL has fairly good documentation and I was of the impression that VTK is verym similar to OpenGL (it is, in many ways). But when it comes to transformations, it seems to be an entirely different story.
This is a good OpenGL documentation about transforms involved:
http://www.songho.ca/opengl/gl_transform.html
The perspective projection matrix in OpenGL is:
I wanted to see if this formula applied in VTK will give me the projection matrix of VTK (by cross-checking with VTK projection matrix).
Relevant Camera and Renderer Parameters:
camera->SetPosition(0,0,20);
camera->SetFocalPoint(0,0,0);
double crSet[2] = {10, 1000};
renderer->GetActiveCamera()->SetClippingRange(crSet);
double windowSize[2];
renderWindow->SetSize(1280,720);
renderWindowInteractor->GetSize(windowSize);
proj = renderer->GetActiveCamera()->GetProjectionTransformMatrix(windowSize[0]/windowSize[1], crSet[0], crSet[1]);
The projection transform matrix I got for this configuration is:
The (3,3) and (3,4) values of the projection matrix (lets say it is indexed 1 to 4 for rows and columns) should be - (f+n)/(f-n) and -2*f*n/(f-n) respectively. In my VTK camera settings, the nearz is 10 and farz is 1000 and hence I should get -1.020 and -20.20 respectively in the (3,3) and (3,4) locations of the matrix. But it is -1010 and -10000.
I have changed my clipping range values to see the changes and the (3,3) position is always nearz+farz which makes no sense to me. Also, it would be great if someone can explain why it is 3.7320 in the (1,1) and (2,2) positions. And this value DOES NOT change when I change the window size of the renderer window. Quite perplexing to me.
I see in VTKCamera class reference that GetProjectionTransformMatrix() returns the transformation matrix that maps from camera coordinates to viewport coordinates.
VTK Camera Class Reference
This is a nice depiction of the transforms involved in OpenGL rendering:
OpenGL Projection Matrix is the matrix that maps from eye coordinates to clip coordinates. It is beyond doubt that eye coordinates in OpenGL is the same as camera coordinates in VTK. But is the clip coordinates in OpenGL same as viewport coordinates of VTK?
My aim is to simulate a real webcam camera (already calibrated) in VTK to render a 3D model.
Well, the documentation you linked to actually explains this (emphasis mine):
vtkCamera::GetProjectionTransformMatrix:
Return the projection transform matrix, which converts from camera
coordinates to viewport coordinates. This method computes the aspect,
nearz and farz, then calls the more specific signature of
GetCompositeProjectionTransformMatrix
with:
vtkCamera::GetCompositeProjectionTransformMatrix:
Return the concatenation of the ViewTransform and the
ProjectionTransform. This transform will convert world coordinates to
viewport coordinates. The 'aspect' is the width/height for the
viewport, and the nearz and farz are the Z-buffer values that map to
the near and far clipping planes. The viewport coordinates of a point located inside the frustum are in the range
([-1,+1],[-1,+1], [nearz,farz]).
Note that this neither matches OpenGL's window space nor normalized device space. If find the term "viewport coordinates" for this aa poor choice, but be it as it may. What bugs me more with this is that the matrix actually does not transform to that "viewport space", but to some clip-space equivalent. Only after the perspective divide, the coordinates will be in the range as given for the above definition of the "viewport space".
But is the clip coordinates in OpenGL same as viewport coordinates of
VTK?
So that answer is a clear no. But it is close. Basically, that projection matrix is just a scaled and shiftet along the z dimension, and it is easy to convert between those two. Basically, you can simply take znear and zfar out of VTK's matrix, and put it into that OpenGL projection matrix formula you linked above, replacing just those two matrix elements.
I've been writing a 2D basic game engine in OpenGL/C++ and learning everything as I go along. I'm still rather confused about defining vertices and their "position". That is, I'm still trying to understand the vertex-to-pixels conversion mechanism of OpenGL. Can it be explained briefly or can someone point to an article or something that'll explain this. Thanks!
This is rather basic knowledge that your favourite OpenGL learning resource should teach you as one of the first things. But anyway the standard OpenGL pipeline is as follows:
The vertex position is transformed from object-space (local to some object) into world-space (in respect to some global coordinate system). This transformation specifies where your object (to which the vertices belong) is located in the world
Now the world-space position is transformed into camera/view-space. This transformation is determined by the position and orientation of the virtual camera by which you see the scene. In OpenGL these two transformations are actually combined into one, the modelview matrix, which directly transforms your vertices from object-space to view-space.
Next the projection transformation is applied. Whereas the modelview transformation should consist only of affine transformations (rotation, translation, scaling), the projection transformation can be a perspective one, which basically distorts the objects to realize a real perspective view (with farther away objects being smaller). But in your case of a 2D view it will probably be an orthographic projection, that does nothing more than a translation and scaling. This transformation is represented in OpenGL by the projection matrix.
After these 3 (or 2) transformations (and then following perspective division by the w component, which actually realizes the perspective distortion, if any) what you have are normalized device coordinates. This means after these transformations the coordinates of the visible objects should be in the range [-1,1]. Everything outside this range is clipped away.
In a final step the viewport transformation is applied and the coordinates are transformed from the [-1,1] range into the [0,w]x[0,h]x[0,1] cube (assuming a glViewport(0, w, 0, h) call), which are the vertex' final positions in the framebuffer and therefore its pixel coordinates.
When using a vertex shader, steps 1 to 3 are actually done in the shader and can therefore be done in any way you like, but usually one conforms to this standard modelview -> projection pipeline, too.
The main thing to keep in mind is, that after the modelview and projection transforms every vertex with coordinates outside the [-1,1] range will be clipped away. So the [-1,1]-box determines your visible scene after these two transformations.
So from your question I assume you want to use a 2D coordinate system with units of pixels for your vertex coordinates and transformations? In this case this is best done by using glOrtho(0.0, w, 0.0, h, -1.0, 1.0) with w and h being the dimensions of your viewport. This basically counters the viewport transformation and therefore transforms your vertices from the [0,w]x[0,h]x[-1,1]-box into the [-1,1]-box, which the viewport transformation then transforms back to the [0,w]x[0,h]x[0,1]-box.
These have been quite general explanations without mentioning that the actual transformations are done by matrix-vector-multiplications and without talking about homogenous coordinates, but they should have explained the essentials. This documentation of gluProject might also give you some insight, as it actually models the transformation pipeline for a single vertex. But in this documentation they actually forgot to mention the division by the w component (v" = v' / v'(3)) after the v' = P x M x v step.
EDIT: Don't forget to look at the first link in epatel's answer, which explains the transformation pipeline a bit more practical and detailed.
It is called transformation.
Vertices are set in 3D coordinates which is transformed into a viewport coordinates (into your window view). This transformation can be set in various ways. Orthogonal transformation can be easiest to understand as a starter.
http://www.songho.ca/opengl/gl_transform.html
http://www.opengl.org/wiki/Vertex_Transformation
http://www.falloutsoftware.com/tutorials/gl/gl5.htm
Firstly be aware that OpenGL not uses standard pixel coordinates. I mean by that for particular resolution, ie. 800x600 you dont have horizontal coordinates in range 0-799 or 1-800 stepped by one. You rather have coordinates ranged from -1 to 1 later send to graphic card rasterizing unit and after that matched to particular resolution.
I ommited one step here - before all that you have an ModelViewProjection matrix (or viewProjection matrix in some simple cases) which before all that will cast coordinates you use to an projection plane. Default use of that is to implement a camera which converts 3D space of world (View for placing an camera into right position and Projection for casting 3d coordinates into screen plane. In ModelViewProjection it's also step of placing a model into right place in world).
Another case (and you can use Projection matrix this way to achieve what you want) is to use these matrixes to convert one range of resolutions to another.
And there's a trick you will need. You should read about modelViewProjection matrix and camera in openGL if you want to go serious. But for now I will tell you that with proper matrix you can just cast your own coordinate system (and ie. use ranges 0-799 horizontaly and 0-599 verticaly) to standarized -1:1 range. That way you will not see that underlying openGL api uses his own -1 to 1 system.
The easiest way to achieve this is glOrtho function. Here's the link to documentation:
http://www.opengl.org/sdk/docs/man/xhtml/glOrtho.xml
This is example of proper usage:
glMatrixMode (GL_PROJECTION)
glLoadIdentity ();
glOrtho (0, 800, 600, 0, 0, 1)
glMatrixMode (GL_MODELVIEW)
Now you can use own modelView matrix ie. for translation (moving) objects but don't touch your projection example. This code should be executed before any drawing commands. (Can be after initializing opengl in fact if you wont use 3d graphics).
And here's working example: http://nehe.gamedev.net/tutorial/2d_texture_font/18002/
Just draw your figures instead of drawing text. And there is another thing - glPushMatrix and glPopMatrix for choosen matrix (in this example projection matrix) - you wont use that until you combining 3d with 2d rendering.
And you can still use model matrix (ie. for placing tiles somewhere in world) and view matrix (in example for zooming view, or scrolling through world - in this case your world can be larger than resolution and you could crop view by simple translations)
After looking at my answer I see it's a little chaotic but If you confused - just read about Model, View, and Projection matixes and try example with glOrtho. If you're still confused feel free to ask.
MSDN has a great explanation. It may be in terms of DirectX but OpenGL is more-or-less the same.
Google for "opengl rendering pipeline". The first five articles all provide good expositions.
The key transition from vertices to pixels (actually, fragments, but you won't be too far off if you think "pixels") is in the rasterization stage, which occurs after all vertices have been transformed from world-coordinates to screen coordinates and clipped.
I have an application which uses DirectX, and hence a left-handed world-coordinate systen, a view which looks down positive z and a projection which projects into the unit cube but with z from 0..1.
I'm porting this application to OpenGL, and OpenGL has a different coordinate system etc. In particular, it's right-handed, the view looks down along negative z, and the projection is into the real unit cube.
What is the easiest way to adapt the DirectX matrices to OpenGL? The world-space change seems to be easy, just by flipping the x-axis, but I'm not entirely sure how to change the view/projection. I can modify all of the matrices individually before multiplying them together and sending them off to my shader. If possible, I would like to stick with the DirectX based coordinate system as far down the pipeline as possible (i.e. I don't want to change where I place my objects, how I compute my view frustum, etc. -- ideally, I'll only apply some modification to the matrices right before handing them over to my shaders.)
The answer is: There's no need to flip anything, only to tweak the depth range. OpenGL has -1..1, and DX has 0..1. You can use the DX matrices just as before, and only multiply a scale/bias matrix at the end on top of the projection matrix to scale the depth values from 0..1 to -1..1.
Not tested or anything, just out of the top of my head :
oglProj = transpose(dxProj)
glScalef (1., 1., -1.); // invert z
glScalef(0.5, 0.5, 1); // from [-1,1] to [-0.5, 0.5]
glTranslatef(0.5, 0.5, 0) // from [-0.5, 0.5] to [0,1]
oglView = transpose(dxView)
glScalef (1., 1., -1.); // invert z
oglModel = transpose(dwModel)
glScalef (1., 1., -1.); // invert z
oglModelView = oglView * oglModel
There is an age-old extension for OpenGL exactly for that: GL_ARB_TRANSPOSE_MATRIX. It transposes matrices on GPU and should be available on every video card.
I'm planning to create a tiled world with OpenGL, with slightly rotated tiles and houses and building in the world will be made of models.
Can anybody suggest me what projection(Orthogonal, Perspective) should I use, and how to setup the View matrix(using OpenGL)?
If you can't figure what style of world I'm planning to create, look at this game:
http://www.youtube.com/watch?v=i6eYtLjFu-Y&feature=PlayList&p=00E63EDCF757EADF&index=2
Using Orhtogonal vs Perspective projection is entirely an art style choice. The Pokemon serious you're talking about is orthogonal -- in fact, it's entirely layered 2D sprites (no 3D involved).
OpenGL has no VIEW matrix. It has a MODELVIEW matrix and a PROJECTION matrix. For Pokemon-style levels, I suggest using simple glOrtho for the projection.
Let's assume your world is in XY space (coordinates for tiles, cameras, and other objects are of the form [x, y, 0]). If a single tile is sized 1,1, then something like glOrtho(12, 9, -10, 10) would be a good projection matrix (12 wide, 9 tall, and Z=0 is the ground plane).
For MODELVIEW, you can start by loading identity, glTranslate() by the tile position, and then glTranslate() by the negative of the camera position, before you draw your geometry. If you want to be able to rotate the camera, you glRotate() by the negative (inverse) of the camera rotation between the two Translate()s. In the end, you end up with the following matrix chain:
output = Projection × (CameraTranslation-1 × CameraRotation-1 × ModelLocation × ModelRotation) × input
The parts in parens are MODELVIEW, and the "-1" means "inverse" which really is negative for translation and transpose for rotation.
If you want to rotate your models, too, you generally do that first of all (before the first glTranslate().
Finally, I suggest the OpenGL forums (www.opengl.org) or the OpenGL subforums of www.gamedev.net might be a better place to ask this question :-)
The projection used by that video game looks Oblique to me. There are many different projections, not just perspective and orthographic. See here for a list of the most common ones: http://en.wikipedia.org/wiki/File:Graphical_projection_comparison.png
You definitely want perspective, with a fixed rotation around the X-axis only. Around 45-60 degrees or thereof. If you don't care about setting up the projection code yourself, the gluPerspective function from the GLU library is handy.
Assuming OpenGL 2.1:
glMatrixMode(GL_PROJECTION); //clip matrix
glLoadIdentity();
gluPerspective(90.0, width/height, 1.0, 20.0);
glMatrixMode(GL_MODELVIEW); //world/object matrix
glLoadIdentity();
glRotatef(45.0f, 1.0f, 0.0f, 0.0f);
/* render */
The last two parameters to gluPerspective is the distance to the near and far clipping planes. Their values depend on the scale you use for the environment.