I am trying to understand open GL concepts . While reading this tutorial -,
I came accross this statement :
This is because camera space and NDC space have different viewing directions. In camera space, the camera looks down the -Z axis; more negative Z values are farther away. In NDC space, the camera looks down the +Z axis; more positive Z values are farther away. The diagram flips the axis so that the viewing direction can remain the same between the two images (up is away).
I am confused as to why the viewing direction has to change . Could some one please help me understand this with an example ?

This is mostly just a convention. OpenGL clip space (and NDC space and screen space) has always been defined as left-handed (with z pointing away into the screen) by the spec.
OpenGL eye space had been defined with camera at origin and looking at -z direction (so right-handed). However, this convention was just meaningful in the fixed-function pipeline, where together with the fixed function per vertex lighting which was carried out in eye space, the viewing direction did matter cases like whenGL_LOCAL_VIEWER was disabled (as was the default).
The classic GL projection matrix typically converts the handedness, and the perspecitve division is done with a divisior of -z_eye, typically, so the last row of the projection matrix is typically (0, 0, -1, 0). The old glFrustum(), glOrtho(), and gluPerspective() actually supported that convention by using the z_near and z_far clipping distances negated, so that you had to specify positive values for clip planes to lie before the camera at z<0.
However, with modern GL, this convention is more or less meaningless. There is no fixed-function unit left which does work in eye space, so the eye space (and anything before that) is totally under the user's control. You can use anything you like here. The clip space and all the later spaces are still used by fixed function units (clipping, rasterization, ...), so there most be some convention to define the interface, and it is still a left-handed system.
Even in modern GL, the old right-handed eye space convention is still in use. The popular glm library for example reimplements the old GL matrix functions the same way.
There is really no reason to prefer one of the possible conventions over the other, but at some point, you have to choose and stick to one.


I need to implement raycasting. For this I need to convert the mouse cursor to world space.
I use the unproject function for this. I need to first find a point on the near plane, then on the far plane. After that, from the last substract the first one and we will get a ray.
But I don't understand how to set winZ correctly. Because in some implementations I see two ways: winZ = 0 (near plane), winZ = 1 (far plane) or winZ = -1 (near plane), winZ = 1 (far plane).
What's the difference in these ranges?
If it really is windows space z, then there is only the [0,1] range. Note that 0 doesn't necessarily mean the near plane and one the far plane, though. One can set up a projection matrix where near will end up at 1 and far at 0 (reversed z, which, in combination with the [0,1] z clip condition as explained below, has some precision advantages).
Also not that glDepthRange can further modify to which values (inside [0,1]) these two planes will be mapped.
To understand the unproject operation, you first need to understand the different coordinate spaces. Typically in a render API, you have to deal with these 3 spaces at the end of the transform chain:
clip space: This is what the output of the vertex shader is in, and where the actual clipping at least on a conceptual level happens. This space is still homogeneous with an arbitrary value for the w coordinate.
normalized device coordinates (NDC). This is what you get after the perspective division by the clip space w coordinate (after the clipping has been applied, which will eliminate the w<=0 cases completely).
window space. The 2D xy part are the actual pixel coordinates inside your output window (or your render target), and the transformation from NDC xy to window space xy is defined by the viewport settings. The z coordinate is the value which will go into the depth test, and depth buffer, it is in the range [0,1] or some sub-range of that (controlled via glDepthRange in the GL).
Other spaces before these in this list, like model/object space, world space, eye/view space, are completely up to you and do not concern the GPU directly at all (legacy fixed-function GL did care about eye space for lighting and fog, but nowadays, implementing this is all your job, and you can use whatever spaces you see fit).
Having established the spaces, the next relevant thing here is the viewing volume. This is the volume of the space which will actually mapped to your viewport, and it is a 3D volume bounded by the six planes: left, right, bottom, top, near far.
However, the actual view volume is set up by pure convention in the various render APIs (and the actual GPU hardware).
Since you tagged this question with "OpenGL", I'm going to begin with the default GL convention first:
Standard GL convention is the view volume is the completely symmetrical [-1,1] cube in NDC. Actually, this means that the clip condition in clip space is -w <= x,y,z <= w.
Direct3D uses a different convention: they use [-1,1] in NDC for x and y just like GL does, but [0,1] for the z dimension. This also means that the depth range transformation into window space can be identity in many cases (you seldom need to limit it to a sub-range of [0,1]). This convention has some numerical advantages because the GL convention of moving [-1,1] range to [0,1] for window space will make it loose precision around the (NDC) zero point.
Modern GL since GL 4.5 optionally allows you to switch to the [0,1] convention for z via glClipControl. Vulkan also supports both conventions, but uses [0,1] as the default.
There is not "the" unproject function, but the concept of "unprojecting" a point means calculating these transformations in reverse, going from window space back to some space before clip space, undoing the proejction matrix. For implementing an unproject function, you need to know which conventions were used.
Because in some implementations I see two ways: winZ = 0 (near plane), winZ = 1 (far plane) or winZ = -1 (near plane), winZ = 1 (far plane). What's the difference in these ranges?
Maybe they are not taking in a window space Z, but NDC z directly. Maybe the parameters are just named in a confusing or wrong manner. Maybe some of the implementations out there are just flat-out wrong.

From what I understand, OpenGL uses a right-hand coordinate system, that, at least in clip space, works like this:
X points right
Y points up
Z points into the screen
This means that, without any modifications to all the matrices used for transformations, world space coordinates work like this:
The X-Z plane is horizontal
The X-Y and Z-Y planes are vertical
What if I want to change it so that the Z axis is the one pointing up? How could I go about doing this? I've thought about multiplying all matrices by a rotation matrix that just shifts all coordinates by 90 degrees, or maybe I could change the Y and Z components of a vector once I send data to the GPU, but those seem more like workarounds than actual solutions, and they might also take a hit on performance if done for every mesh in the scene. Is there any standard way to do this? Am I getting something wrong?
The clip and NDC spaces are left-handed axis system, not as you defined each X,Y,Z, axis.
You can have several axis systems. For example some "objects store" use a left-handed system. If you're starting with OpenGL, try to set everything in right-handed system, will be easier for you to understand.
Your objects are normally defined in its own local system (right handed or not). You place them by a "world" matrix. And you see the world from a camera position, which requieres a "view" matrix. And then you project all of them, another "proj" matrix.
As you can see, matrices are used everywhere. Don't be afraid of them.
Changing from an axis system to another is just another matrix. There are many examples in the web.

I am learning openGL from this scratchpixel, and here is a quote from the perspective project matrix chapter:
Cameras point along the world coordinate system negative z-axis so that when a point is converted from world space to camera space (and then later from camera space to screen space), if the point is to left of the world coordinate system y-axis, it will also map to the left of the camera coordinate system y-axis. In other words, we need the x-axis of the camera coordinate system to point to the right when the world coordinate system x-axis also points to the right; and the only way you can get that configuration, is by having camera looking down the negative z-axis.
I think it has something to do with the mirror image? but this explanation just confused me...why is the camera's coordinate by default does not coincide with the world coordinate(like every other 3D objects we created in openGL)? I mean, we will need to transform the camera coordinate anyway with a transformation matrix (whatever we want with the negative z set up, we can simulate it)...why bother?
It is totally arbitrary what to pick for z direction.
But your pick has a lot of deep impact.
One reason to stick with the GL -z way is that the culling of faces will match GL constant names like GL_FRONT. I'd advise just to roll with the tutorial.
Flipping the sign on just one axis also flips the "parity". So a front face becomes a back face. A znear depth test becomes zfar. So it is wise to pick one early on and stick with it.
By default, yes, it's "right hand" system (used in physics, for example). Your thumb is X-axis, index finger Y-axis, and when you make those go to right directions, Z-points (middle finger) to you. Why Z-axis has been selected to point inside/outside screen? Because then X- and Y-axes go on screen, like in 2D graphics.
But in reality, OpenGL has no preferred coordinate system. You can tweak it as you like. For example, if you are making maze game, you might want Y to go outside/inside screen (and Z upwards), so that you can move nicely at XY plane. You modify your view/perspective matrices, and you get it.
What is this "camera" you're talking about? In OpenGL there is no such thing as a "camera". All you've got is a two stage transformation chain:
vertex position → viewspace position (by modelview transform)
viewspace position → clipspace position (by projection transform)
To see why be default OpenGL is "looking down" -z, we have to look at what happens if both transformation steps do "nothing", i.e. full identity transform.
In that case all vertex positions passed to OpenGL are unchanged. X maps to window width, Y maps to window height. All calculations in OpenGL by default (you can change that) have been chosen adhere to the rules of a right hand coordinate system, so if +X points right and +Y points up, then Z+ must point "out of the screen" for the right hand rule to be consistent.
And that's all there is about it. No camera. Just linear transformations and the choice of using right handed coordinates.

In OpenGL (all versions, though I happen to be working in OpenGL ES 2.0) there is the option of using a perspective projection versus an orthogonal one. Is there a way to control the degree of orthogonality?
For the sake of picturing the issue (and please don't take this as the actual question, I am well aware there is no camera in OpenGL) assume that a scene is rendered with the viewport "looking" down the -z axis. Two parallel lines extending a finite distance down the -z axis at (x,y)=1,1 and (x,y)=-1,1 will appear as points in orthogonal projection, or as two lines that eventually converge to a single pixel in perspective projection. Is there a way to have the x- and y- values represented by the outer edges of the screen remain the same as in projection space - I assume this requires not changing the frustum - but have the lines only converge part of the way to a single pixel?
Is there a way to control the degree of orthogonality?
Either something is orthogonal, or it is not. There's no such thing like "just a little orthogonal".
Anyway, from a mathematical point of view, a perspective projection with an infinitely narrow field of view is orthogonal. So you can use glFrustum with a very large near and far plane distance, together with a countering translation in modelview to bring the far away viewing volume back to the origin.

As I have understood, it is recommended to use glTranslate / glRotate in favour of glutLootAt. I am not going to seek the reasons beyond the obvious HW vs SW computation mode, but just go with the wave. However, this is giving me some headaches as I do not exactly know how to efficiently stop the camera from breaking through walls. I am only interested in point-plane intersections, not AABB or anything else.
So, using glTranslates and glRotates means that the viewpoint stays still (at (0,0,0) for simplicity) while the world revolves around it. This means to me that in order to check for any intersection points, I now need to recompute the world's vertices coordinates (which was not needed with the glutLookAt approach) for every camera movement.
As there is no way in obtaining the needed new coordinates from GPU-land, they need to be calculated in CPU land by hand. For every camera movement ... :(
It seems there is the need to retain the current rotations aside each of the 3 axises and the same for translations. There is no scaling used in my program. My questions:
1 - is the above reasoning flawed ? How ?
2 - if not, there has to be a way to avoid such recalculations.
The way I see it (and by looking at it needs one matrix multiplication for translations and another one for rotating (only aside the y axis needed). However, having to compute so many additions / multiplications and especially the sine / cosine will certainly be killing FPS. There are going to be thousands or even tens of thousands of vertices to compute on. Every frame... all the maths... After having computed the new coordinates of the world things seem to be very easy - just see if there is any plane that changed its 'd' sign (from the planes equation ax + by + cz + d = 0). If it did, use a lightweight cross products approach to test if the point is inside the space inside each 'moving' triangle of that plane.
edit: I have found about glGet and I think it is the way to go but I do not know how to properly use it:
// Retains the current modelview matrix
glGetFloatv(GL_MODELVIEW_MATRIX, m_vt16CurrentMatrixVerts);
m_vt16CurrentMatrixVerts is a float[16] which gets filled with 0.f or 8.67453e-13 or something similar. Where am I screwing up ?
gluLookAt is a very handy function with absolutely no performance penalty. There is no reason not to use it, and, above all, no "HW vs SW" consideration about that. As Mk12 stated, glRotatef is also done on the CPU. The GPU part is : gl_Position = ProjectionMatrix x ViewMatrix x ModelMatrix x VertexPosition.
"using glTranslates and glRotates means that the viewpoint stays still" -> same thing for gluLookAt
"at (0,0,0) for simplicity" -> not for simplicity, it's a fact. However, this (0,0,0) is in the Camera coordinate system. It makes sense : relatively to the camera, the camera is at the origin...
Now, if you want to prevent the camera from going through the walls, the usual method is to trace a ray from the camera. I suspect this is what you're talking about ("to check for any intersection points"). But there is no need to do this in camera space. You can do this in world space. Here's a comparison :
Tracing rays in camera space : ray always starts from (0,0,0) and goes to (0,0,-1). Geometry must be transformed from Model space to World space, and then to Camera space, which is what annoys you
Tracing rays in world space : ray starts from camera position (in world space) and goes to (eyeCenter - eyePos).normalize(). Geometry must be transformed from Model space to World space.
Note that there is no third option (Tracing rays in Model space) which would avoid to transform the geometry from Model space to World space. However, you have a pair of workarounds :
First, your game's world is probably still : the Model matrix is probably always identity. So transforming its geometry from Model to World space is equivalent to doing nothing at all.
Secondly, for all other objets, you can take the opposite approach. Intead of transforming the entire geometry in one direction, transform only the ray the other way around : Take your Model matrix, inverse it, and you've got a matrix which goes from world space to model space. Multiply your ray's origin and direction by this matrix : your ray is now in model space. Intersect the normal way. Done.
Note that all I've said is standard techniques. No hacks or other weird stuff, just math :)