Modify vertices after modelview but before projection matrix is applied - opengl

This is for a Minecraft mod, so I'd like to keep the most of the rendering pipeline untouched.
I want to arbitrarily modify (and possibly duplicate) vertex data based on the values of all three vertices of each triangle after the modelview matrix has been applied.
Should this be done in a shader? If so, which one?

Related

Why do modeview and camera matrices use RUB orientation

I usually find matrix libraries building both modelview and cameras matrices from the RUB (right-up-back) vectors, as depicted in these pages:
http://3dengine.org/Right-up-back_from_modelview
http://3dengine.org/Modelview_matrix
Is the RUB tuple just a common standard?
Otherwise, is there a reason the RUB vectors are preferred over any other orientation (such as forward-up-right)?
Particularly if you're using the programmable pipeline, you have almost complete freedom about the coordinate system you work in, and how you transform your geometry. But once all your transformations are applied in the vertex shader (resulting in the vector assigned to gl_Position), there is still a fixed function block in the pipeline between the vertex shader and fragment shader. That fixed function block relies on the transformed vertices being in a well defined coordinate system.
gl_Position is in a coordinate system called "clip coordinates", which then turns into "normalized device coordinates" (NDC) after dividing by the w coordinate of the vector.
Based on the vector in NDC, the fixed function rasterization block generates pixels. It will use the first coordinate to map to the horizontal window direction, and the second coordinate to map to the vertical window direction. The third coordinate will be used to calculate the depth, which can be used for depth testing.
This means that after all transformations are applied, the first coordinate has to be left-right, the second coordinate has to be bottom-up, and the third coordinate has to be front-back (well, it could be back-front if you change the depth test).
If you use a classic setup with modelview and projection matrix, it makes sense to use the modelview matrix to transform the original geometry into this orientation, and then use the projection matrix to apply e.g. a perspective.
I don't think there's anything stopping you from using a different orientation as the result of the modelview transformation, and then include a rotation in the projection matrix to transform the whole thing into the correct clip coordinate space. But I don't see a benefit, and it looks like it would just add unnecessary confusion.

How to manage the Model, View and Projection matrices in modern OpenGL

I have a couple of questions about modern opengl:
(i) A Model matrix is described as a "contains every translations, rotations or scaling, applied to an object"(1)
(ii) So that must mean for every VAO(contains a scene object, such as a chair) there must be a vertex attribute, a 4x4 Model matrix, that contains the translation, rotation and scaling of that object in order for the vertex shader to transform that each vertex into world space, right?
Therefore, would i have 32 Model matrices if i had 32 scene objects(1 Model matrix per scene object)?
(iii) Then could i deal with the View and Projection matrices as a couple of uniforms to the shader?
(iv) If a program has more than 1 scene object, such as a table and chair with different translation, rotation and scaling, is it possible to have 1 Model matrix that accommodates each scene object's different translation, rotation and scaling?
(i) A Model matrix is described as a "contains every translations, rotations or scaling, applied to an object"(1)
No. A matrix is just a matrix. Only in a certain context a matrix gains additional meaning.
(ii) So that must mean for every VAO(contains a scene object, such as a chair) there must be a vertex attribute, a 4x4 Model matrix, that contains the translation, rotation and scaling of that object in order for the vertex shader to transform that each vertex into world space, right?
No.
VAOs are not models. VAOs are just collections of references to memory hunks. There can be any kind of data contained in a VAO. And if the VAO contains geometry data there can be multiple, independent models in a single VAO.
Therefore, would i have 32 Model matrices if i had 32 scene objects(1 Model matrix per scene object)?
Not necessarily. You could as well have 16 matrices, where 2 objects each share their modelview transformation.
(iv) If a program has more than 1 scene object, such as a table and chair with different translation, rotation and scaling, is it possible to have 1 Model matrix that accommodates each scene object's different translation, rotation and scaling?
Well, yes, but don't overthink the problem. There's no strict tie between objects and transformation matrices.

Questions about Orthogonal vs Perspective in OpenGL [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I've gotten a 3 vertices triangle rotating around the y-axis. One of the things I find "weird" is normalized coordinates in GL's default orthogonal projection. I've used 2-D libraries like SDL and SFML, which almost always deal with pixels. You say you want an image surface that is 50x50 pixels, that is what you get. So initially it was strange for me to say limit my vertex position choices from [-1,1].
Why do orthogonal coordinates have to be normalized? Is perspective projection the same? If so, how would you say you want the origin of your object to be at z=-10? (My quick look over matrix m ath says perspective is different. Something about division by 'w' creating homogenous (same thing as normalized?) coordinates, but I'm not sure).
gl_Position = View * Model * Project * Vertex;
I've seen that equation above and I'm boggled by how the variable gl_Position used in shaders can represent both the position of the current vertices of a model/object, and at the same time a view/projection, or the position of the camera. How does that work? I understand by multiplication all that information is stored in one matrix, but how does OpenGL use that one matrix whose information is now combined to say, "ok, this part/fraction of gl_Position is for the camera, and that other part is information for where the model is going to go."? (BTW, I'm not quite sure what the Vertex vec4 represents. I thought all vertices of a model were inside Model. Any ideas?
One more question, if you just wanted to move the camera, for example in FPS games you move the mouse up to look up, but no objects are being rotated or translated (I think) other than the camera, would the equation above look something like this?
gl_Position = View * Project;
Why do orthogonal coordinates have to be normalized?
They don't. You can set the limits of the orthogonal projection volume however you desire. The left, right, bottom, top, near and far parameters of the glOrtho call define the limits of the viewport volume. If you chose them left=0, right=win_pixel_width, bottom=0, top=win_pixel_height you end up with a pixel unit projection volume as you're used to. However why bother with pixels? You'd just have to compensate for the actual window size later. Just choose the ortho projection volume extents to match the scene you want to draw.
Maybe you're confusing this with normalized device coordinates. And for those it simply has been defined that it is the value range [-1, 1] that's mapped to the viewport extents.
Update
BTW, I'm not quite sure what the Vertex vec4 represents. I thought all vertices of a model were inside Model. Any ideas?
I'm getting quite fatigued right now, because I've been answering several questions like this numerous times over the last few days. So, here it goes again:
In OpenGL there is no camera.
In OpenGL there is no scene.
In OpenGL there are no models.
"Wait, what?!" you may wonder now. But it's true.
All OpenGL cares about is, that there is some target framebuffer, i.e. a canvas it can draw to, and a stream of vertex attributes that make geometric primitives. The primitives are points, lines and triangles. Somehow the vertex attributes for, say a triangle, must be mapped to a position on the framebuffer canvas. For this an vertex attribute we call position goes through a number of affine transformations.
The first is from a local model space into world space , the Model transform.
From world space into eye space, the View transform. It is this view transform, which acts like placing the camera in a scene.
After that it put through the equivalent of a camera's lens, which is a Projection transform.
After the Projection transform the position is in clip space where it undergoes some operations, that are not essential to understand for the time being. After clipping the so called homogenous divide is applied to reach normalized device coordinate space, by dividing the clip space position vector by its own w-component.
v_position_ndc = v_position_clip / v_position_clip.w
This step is, what makes a perspective projection actually work. The z-distance of a vertex' position is worked into the clip space w-component. And by the homogenous divide vertices with a larger position w get scaled proportionally to 1/w in the XY plane which creates a perspective effect.
You mistook this operation as normalization, but it is not!
After the homogenous divide vertex position has been mapped from clip to NDC space. And OpenGL defines, that the visible volume of NDC space is the box [-1, 1]^3 ; vertices outside this box are clipped.
It's crucial to understand that View transform and Projection are different. For a position it's not so obvious, but another vertex attribute called the normal, which is an important ingredient for lighting calculations, must be transformed in a slightly different way (instead of Projection · View · Model it must be transformed by inverse(transpose(View · Model)), i.e. the Projection takes no part in it but the viewpoint does).
The matrices itself are 4×4 grids of real valued scalars (ignore for the time being that numbers in a computer are always rational numbers). So the rank of the matrix is 4 and hence it must be multiplied of vectors of dimension 4 (hence the type vec4)
OpenGL treats vertex attributes as column vectors so matrix multiplication is left associative i.e. a vector enters an expression on the right side and comes out on the left. The order of matrix multiplication matters. You can not freely reorder things!
The statement
gl_Position = Projection * View * Model * vertex_position; // note the order
makes the vertex shader perform this very transformation process I just described.
"Note that there is no separate camera (view) matrix in OpenGL. Therefore, in order to simulate transforming the camera or view, the scene (3D objects and lights) must be transformed with the inverse of the view transformation. In other words, OpenGL defines that the camera is always located at (0, 0, 0) and facing to -Z axis in the eye space coordinates, and cannot be transformed. See more details of GL_MODELVIEW matrix in ModelView Matrix."
Source: http://www.songho.ca/opengl/gl_transform.html
That is what I was getting hung up on. I thought there would a separate camera view matrix in OpenGL.

Confused about OpenGL transformations

In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.

Why would it be beneficial to have a separate projection matrix, yet combine model and view matrix?

When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.