Does the OpenGL fixed function pipeline compute lighting in view-space?
If the answer is yes, then how does it cope with view transformations with non-uniform scale? Actually, how does it cope with view transformations incorporating any scale at all?
If this is true then scaling the view space will result in different light-to-vertex distances, meaning the lighting intensity for point-lights will change as the view matrix is scaled.
Lighting in world-space would make the computed point-light intensity independent of view space scaling, but would require:
That an object-to-world matrix is supplied to the API (such as in DirectX, where the light positions are specified in world-space).
That the API transform all geometry twice when drawing. Once by world*view*proj into clip space, and again by world, in order to compute lighting at the vertices in world space.
Points awarded for a good answer with any additional background info you can dig up on fixed-function lighting pipelines in general.
I think your question comes from a confusion of "view space" with "post-projection space". They are not the same.
View space, or camera space, is the space of the scene relative to the camera. Thus, the camera is sitting at the origin, looking down the -Z axis, with +Y being up. In terms of OpenGL fixed-function, camera space is the space after multiplying positions and normals by the GL_MODELVIEW matrix.
Post-projection space is what you get after multiplying camera space values by the GL_PROJECTION matrix. This is in fact why there are two separate matrices. You do lighting in camera space, and you send the post-projection positions off for rasterization.
OpenGL does not do lighting in post-projection space. So the aspect ratio, camera zoom, and so forth does not affect lighting. Nor does the perspective divide.
Does the OpenGL fixed function pipeline compute lighting in view-space?
Yes, and so should you.
If the answer is yes, then how does it cope with view transformations with non-uniform scale? Actually, how does it cope with view transformations incorporating any scale at all?
The exact same way that it copes with the model-to-world transform incorporating scale.
It's just a matrix. The math neither knows nor cares where a particular scale transform happens to be, whether it is in the model-to-world part or the world-to-camera part. All that matters is that a scale is present. Or a skew or any other form of transform.
And remember: it is far more likely that the model-to-world transform uses scales than the world-to-camera transform does. You are more likely to need to rescale geometry to fit into the world than you are to need to rescale geometry for the camera matrix. The scaling for camera zooms, aspect ratio, and the like is a part of the perspective matrix, not the camera matrix.
It "copes" with this in the usual way: normals are transformed by the inverse-transpose of the model-to-view matrix. This alters the normals (full disclosure: that's my eBook tutorial) so that they still fit the model after the scaling. This is necessary regardless of what space you're in.
If this is true then scaling the view space will result in different light-to-vertex distances, meaning the lighting intensity for point-lights will change as the view matrix is scaled.
... and? Since all of the objects are transformed by the same camera matrix (within a single scene), all of the objects will have the same scale applied. Therefore, if they were all in the same scale in world-space, they will all be in the same scale in camera-space.
So what's the problem? Yes, the attenuation changes, but it changes equally for all objects. Thus, there isn't a problem, so long as your attenuation factors are designed for this camera space.
Related
I have four arbitrary points (lt,rt,rb,lb) in 3d space and I would like these points to define my near clipping plane (lt stands for left-top, rt for right-top and so on).
Unfortunately, these points are not necessarily a rectangle (in screen space). They are however a rectangle in world coordinates.
The context is that I want to have a mirror surface by computing the mirrored world into a texture. The mirror is an arbitary translated and rotated rectangle in 3d space.
I do not want to change the texture coordinates on the vertices, because that would lead to ugly pixelisation when you e.g. look at the mirror from the side. When I would do that, also culling would not work correctly which would lead to huge performance impacts in my case (small mirror, huge world).
I also cannot work with the stencil buffer, because in some scenarios I have mirrors facing each other which would also lead to a huge performance drop. Furthermore, I would like to keep my rendering pipeline simple.
Can anyone tell me how to compute the according projection matrix?
Edit: Of cause I already have moved my camera accordingly. That is not the problem here.
Instead of tweaking the projection matrix (which I don't think can be done in the general case), you should define an additional clipping plane. You do that by enabling:
glEnable(GL_CLIP_DISTANCE0);
And then set gl_ClipDistance vertex shader output to be the distance of the vertex from the mirror:
gl_ClipDistance[0] = dot(vec4(vertex_position, 1.0), mirror_plane);
I usually find matrix libraries building both modelview and cameras matrices from the RUB (right-up-back) vectors, as depicted in these pages:
http://3dengine.org/Right-up-back_from_modelview
http://3dengine.org/Modelview_matrix
Is the RUB tuple just a common standard?
Otherwise, is there a reason the RUB vectors are preferred over any other orientation (such as forward-up-right)?
Particularly if you're using the programmable pipeline, you have almost complete freedom about the coordinate system you work in, and how you transform your geometry. But once all your transformations are applied in the vertex shader (resulting in the vector assigned to gl_Position), there is still a fixed function block in the pipeline between the vertex shader and fragment shader. That fixed function block relies on the transformed vertices being in a well defined coordinate system.
gl_Position is in a coordinate system called "clip coordinates", which then turns into "normalized device coordinates" (NDC) after dividing by the w coordinate of the vector.
Based on the vector in NDC, the fixed function rasterization block generates pixels. It will use the first coordinate to map to the horizontal window direction, and the second coordinate to map to the vertical window direction. The third coordinate will be used to calculate the depth, which can be used for depth testing.
This means that after all transformations are applied, the first coordinate has to be left-right, the second coordinate has to be bottom-up, and the third coordinate has to be front-back (well, it could be back-front if you change the depth test).
If you use a classic setup with modelview and projection matrix, it makes sense to use the modelview matrix to transform the original geometry into this orientation, and then use the projection matrix to apply e.g. a perspective.
I don't think there's anything stopping you from using a different orientation as the result of the modelview transformation, and then include a rotation in the projection matrix to transform the whole thing into the correct clip coordinate space. But I don't see a benefit, and it looks like it would just add unnecessary confusion.
In OpenGL (all versions, though I happen to be working in OpenGL ES 2.0) there is the option of using a perspective projection versus an orthogonal one. Is there a way to control the degree of orthogonality?
For the sake of picturing the issue (and please don't take this as the actual question, I am well aware there is no camera in OpenGL) assume that a scene is rendered with the viewport "looking" down the -z axis. Two parallel lines extending a finite distance down the -z axis at (x,y)=1,1 and (x,y)=-1,1 will appear as points in orthogonal projection, or as two lines that eventually converge to a single pixel in perspective projection. Is there a way to have the x- and y- values represented by the outer edges of the screen remain the same as in projection space - I assume this requires not changing the frustum - but have the lines only converge part of the way to a single pixel?
Is there a way to control the degree of orthogonality?
Either something is orthogonal, or it is not. There's no such thing like "just a little orthogonal".
Anyway, from a mathematical point of view, a perspective projection with an infinitely narrow field of view is orthogonal. So you can use glFrustum with a very large near and far plane distance, together with a countering translation in modelview to bring the far away viewing volume back to the origin.
When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.
I have a very general question. I wish to determine the boundary points of a number of objects (comprising 30-50 closed polygons (z) each having around 300 points(x,y,z)). I am working with a fixed viewport which is rotated about x,y and z-axes (alpha, beta, gamma) wrt origin of coordinate system for polygons.
As I see it there are two possibilities: perspective projection or raytracing. Perspective projection would seem to requires a large number of matrix operations for each point to determine its position is within or without the viewport.
Or given the large number of points would I better to raytrace the viewport pixels to object?
i.e. determine whether there is an intersection and then whether intersection occurs within or without object(s).
In either case I will write this result as 0 (outside) or 1 (inside) to 200x200 an integer matrix representing the viewport
Thank you in anticipation
Perspective projection (and then scan-converting the polygons in image coordinates) is going to be a lot faster.
The matrix transform that is required in the case of perspective projection (essentially the world-to-camera matrix) is required in exactly the same way when raytracing. However, with perspective projection, you're only transforming the corner points, whereas with raytracing, you're transforming all the points in the image.
You should be able to use perspective projection and a perspective projection matrix to compute the position of the vertices in screen space? It's hard to understand what you want to do really. If you want to create an image of that 3D scene then with only few polygons it would be hard to see any difference anyway between ray tracing and rasterisation if your code is optimised (you will still need to use an acceleration structure for the ray tracing approach), however yes rasterisation is likely to be faster anyway.
Now if you need to compute the distance from between the eye (the camera's origin) and the geometry visible through the camera's view, the I don't see why you can't use the depth value of any sample for any pixel in the image and use the inverse of the perspective projection matrix to find its distance in camera space.
Why is speed an issue in your problem? Otherwise use RT indeed.
Most of this information can be found on www.scratchapixel.com