Transform to NDC, calculate and transform back to worldspace - opengl

I have a problem moving world coordinates to ndc coordinates than calculate something with it and move it back inside the shader.
The Code looks like that:
vec3 testFunc(vec3 pos, vec3 dir){
//pos and dir are in worldspace, convert to NDC
vec4 NDC_dir = MVP * vec4(dir,0);
vec4 NDC_pos = MVP * vec4(pos,1);
NDC_dir /= NDC_dir.w;
NDC_pos /= NDC_pos.w;
//... do some caclulations => get newPos in NDC
//Transform newPos back to worldspace
vec4 WS_newPos = inverse(MVP) * vec4(newPos,1);
return WS_newPos.xyz / Ws_newPos.w;
}
What I found when testing is, that while the NDC_dir.x and NDC_dir.y seem to be reasonable, the NDC_dir.w and NDC_dir.z are always almost equal. Therefor when dividing by "w" the z-Value is always about 1. I dont think this is how it should be right? (Same for NDC_pos).
On the other hand, when I transform "pos" to NDC and than transform it back to Worldspace (without any calculations) it seems to get the original Point which actually would mean the transformations are correct.
Can anybody tell me if I am doing something wrong here, and if not, why the z-Value is always 1?
Update:
This is a little embarrasing. I had two problems: 1. one was the direction problem #Arne pointed out. The other one was just a scaling problem on my side. I had a really small near clip and a big far clip. And as the value are logarithmic, I just moved in two big steps to realize the z-Value actually really goes from -1 to 1.

What I found when testing is, that whilte the NDC_dir.x and NDC_dir.y seem to be reasonable, the NDC_dir.w and NDC_dir.z are always almost equal. Therefor when dividing by "w" the z-Value is always about 1. I dont think this is how it should be right? (Same for NDC_pos).
Actually yes, that is how it should be. If your input is formed as {x,y,z,0} then it will be interpreted as a point infinitely far in the direction of {x,y,z}. Therefore the depth component should always be most distant, if visible at all. By the way a point infinitely far away is still translation invariant
But you should know that a point infinitely far away is probably not what you want within NDC.
A vector in 3D space is invariant to translation, it just stays the same vector, because a vector has only a direction, not a position. This is not really possible in the distorted NDC.
If you want to transform a vector, you should better transform two points, and then take the difference in NDC again. But you should know that your result then depends on the position.

Related

"Specular" color with phong shading and raytracing overflows

I implemented phong shading on my raytracer. Unfortunatelly I am receiving a segmentation fault when running it. The likely cause is related to the "specular_color" Vec3f variable (with is an object made of three floats). When I print its values, I receive:
specular_color: inf inf inf
To figure out the reason was not complicated:
It is calculated this way:
specular_color += light_intensity * std::pow (reflected*camera_dir,mat.ns);
where mat.ns is the specular exponent, that exhibits the expected values.
reflected*camera_dir is the cause, because it always end being a big value. In three of four of my test cases it is a five digit value. In the other is a six digit value (close to seven). So five or six digit values taken to any exponent that isn't fairly close to one is likely to overflow the float variables.
Now the questions is why these values are so big, if they can or cannot be that way. If they cannot, then likely there is a bug in previous code. If they can, all that is missing is to normalize the reflected vector (camera_dir is already unitary)
Reflected vector is calculated this way:
Vec3f reflected = light_ray_dir - 2 * (light_ray_dir * hit_normal) * hit_normal;
where light_ray_dir is calculated as:
Vec3f light_ray_dir (current.pos - intersection);
With current being the name of the light_source and intersection being the point of encounter of the ray with the triangle.
It always worked fine when it did only diffuse light.
hit_normal, however, is new code and is calculated as:
Vec3f hit_normal = (1 - u - v) * vert1 + u * vert2 + v * vert3;
with u and v being the barycentric coordinates and vertX the triangle's vertices.
So what seems more likely? A more complex bug perhaps in the above code or just a missing normalization to the reflected vector. Or perhaps something else? Thanks for your time.
Yes, the phong reflection model uses normalised vectors to compute the intensity of the reflected light. Note the hats in the expression:
source: https://en.wikipedia.org/wiki/Phong_reflection_model
R denotes the reflection vector, L the light direction, and N the normal of the surface at the point being rendered, note also that you've transposed the two parts of the expression - this could be a problem if you expect both the light vector and reflection vector to point away from the surface, but depending on your model you might have computed this differently.
I had a look at your code and your light direction is away from the light, rather than away from the surface, so I guess the main thing to be mindful of is that your direction vectors might be pointing into or out of the surface, i.e with or against the normal. I don't think that would have caused the NaN though. I'd ensure that the camera direction is also normalised, and check everywhere for zero length vectors as these can blow out your normal calculations when you divide by zero.
I also had a bit of a look at your normal calculation. Did you mean to calculate the normal from the triangles vertices? Or did you mean the normal at each of the vertices?
Hope this helps

Direct3D find 2D screen coordinate works but mirrored glitch when stressed

I have this function to get a 2D pixel location from 3D coordinate position. The x y z are pre-transform coordinates (1 to -1). This is a model view architecture with camera permanently at -3.5,0,0 looking at 0,0,0 while the object/scenes
coordinates are transformed by a horizontal xz rotation and vertical y rotation, etc to produce the final frame.
This function is mostly used to overlay 2D text on top of the 3D scene. Where the 2D text is positioned relative to the 3D underlying scene.
void My3D::Get2Dfrom3Dx(float x, float y, float z, float* psx, float* psy) {
XMVECTOR xmScreenCoord = XMLoadFloat3( (XMFLOAT3*) &screenCoord);
XMMATRIX xmWorldViewProjection = XMLoadFloat4x4( (XMFLOAT4X4*) &m_WorldViewProjection);
XMVECTOR result = XMVector3TransformCoord( xmScreenCoord, xmWorldViewProjection);
XMStoreFloat3( (XMFLOAT3*) &screenCoord, result);
screenCoord.x = ((screenCoord.x + 1.0f) / 2.0f) * m_nCurrWidth;
screenCoord.y = ((-screenCoord.y + 1.0f) / 2.0f) * m_nCurrHeight;
*psx = screenCoord.x;
*psy = screenCoord.y; }
This function works perfectly when the scene is fully/mostly visible, (the eyeat between -4 and -1.5.)
I have a nagging problem with text showing up mirrored in 3D position where it should not be.
This happens when for example I'm viewing the image from below (60+ degrees upward below object), and zooming(moving the eyeat location closer to say -.5,0,0.) The text should not be visible as it should be behind the eye (note eyeat is not past 0,0,0 which really messes the image up),
but somehow the above function causes the calculated screen x y coordinates to show within the viewport in situations where they should not.
I seem to think there is a simple solution to this side effect but can't find it. Hopefully someone has seen this 2d mirrored problem/effect before and knows the simple tweak.
I realize I could go down a more complex path of determining if the view vector is opposite the target point and filter this way, but I seem to think there should be a simpler solution.
Again, the camera is permanently on the line -3.5, 0, 0 to say -.5,0,0 as the world is transformed around it.
The problem lies in the way the projection works. Basically, the perspective projection will divide the x and y coordinates by the z coordinate. That's how you get the effect of perspective, i.e., that things that are farther away (larger z coordinate) appear smaller on screen. One issue with this perspective division is (simplified) that it doesn't work correctly for stuff that's behind the camera. Stuff behind the camera will have a negative z coordinate. When you divide x and y by a negative value, you'll have your point reflected around the origin. Which is exactly what you see. Since stuff that's located behind the camera is not going to be visible anyways, one way to solve this problem is to simply clip all geometry before dividing by z such that everything that has a negative z value is cut off and removed.
I assume the division in your code here happens inside XMVector3TransformCoord(). As you note yourself, the text should not be visible in the problematic cases anyways. So I suggest you simply check whether the text is behind the camera and don't render it if it is. One way to do so would be to simply check the result of transforming your world-space position with the xmWorldViewProjection matrix and only continue if it happens to be in front of the camera. xmScreenCoord holds the homogeneous clipspace coordinates of your point. The point will be in front of the camera iff the z coordinate of xmScreenCoord is larger than zero. So I guess you'd want to do something like
if (XMVectorGetZ(xmScreenCoord) > 0)
{
…
}
Sidenote due to the discussion in the comments below: When one wants to solve a problem involving the projections of objects on screen, one can often avoid explicitly computing the projection by instead transforming the problem into its dual and working directly in projective space on homogeneous coordinates. Since your problem is about placing text in 2D on screen, however, I don't think this is an option here. You could place the geometry for drawing your text in clip-space directly. You would start again by computing the clip-space coordinates of the 3D point to which you want your 2D text attached (by multiplying them with m_WorldViewProjection but not dividing by w). You can then generate homogeneous coordinates for the geometry for drawing your text by simply offsetting the x- and y- coordinates from that point to get the corners of a quad or whatever you need to construct. If you then also scale the size of the quad by the w coordinate of the point, you will get a quad at that position that projects to always the same size on the screen (since the premultiplication with w effectively cancels out the projection). However, all you're effectively doing then is leaving the application of the projection and necessarily clipping to the GPU. If you want to render a large number of quads, that might be an option to consider as it could be done completely on the GPU, e.g., using a geometry shader. However, if you just have a few text elements, it would be much simpler and probably also more efficient to just skip the drawing of text elements that would be behind the camera as described above…
Michael's response was very helpful in making me continue down a path that the solution should be a simple comparison. In my case, I had to re-evaluate the screen coordinates by only applying the World transform, versus the full WorldViewProjection. I call this TargetTransformed. My comparison value was then simply the Eye/Camera location (this never gets adjusted (except zoom) as the world is transformed around the Eye.) And again note my Camera in this case is at -3.5,0,0 looking at 0,0,0 (center of model, really 8,0,0 thus a line through the center). So I had to compare the x component, not the z component. I add a bit of fudge .1F as the mirror artifact happens when the target is significantly behind the camera. In which case I return the final screenCoord locations translated (-8000) way out in outer space as to guarantee they are not seen in the viewport.
if ((Eye.x + 0.1F) > TargetTransformed.x)
{
screenCoord.x += -8000;
screenCoord.y += -8000;
//TRACE("point is behind camera.\n");
*psx = screenCoord.x;
*psy = screenCoord.y;
}
else
{
*psx = screenCoord.x;
*psy = screenCoord.y;
}
And for completeness, my project has 2 view models: a) looking along a line through the center of the model. Which this can be translated to look from any direction and offset by screen x and y. The first view model works fine with this above code. The second view model b) targets the camera to look at a focal point of the model and then allows full rotation around that random point (not the center of the model) which requires calculating a tricky additional translation matrix and vector I call TargetViewTranslation. And for this additional translation, the formula adds the z component of the additional transform.
if ((Eye.x + 0.1F - m_structTargetViewTranslation.Z) > TargetTransformed.x)
{
screenCoord.x += -8000;
screenCoord.y += -8000;
//TRACE("point is behind camera.\n");
*psx = screenCoord.x;
*psy = screenCoord.y;
}
else
{
*psx = screenCoord.x;
*psy = screenCoord.y;
}
And success, my mirrored text problem is resolved. Hopefully this helps others with this mirrored text problem. Realizing that one may need to only transform the test case by the World transform, and it should be a simple comparison, and the location of the camera may impact which x or z component is used. And if you are translating the world in any additional ways, then this translation could also impact if x or z is compared. Using TRACE and looking at the x y z values was helpful in figuring out what components I needed to use in my specific case.

Should all vectors be transformed into perspective space when working with them in the fragment shader?

I'm implementing the PHONG shading model in OpenGL. I need the normal, viewer direction, and the light direction for each fragment. A lot of demos pass in these vectors in world coordinates from vertex shader. Maybe it's because there isn't much difference between the normalized world coordinate vectors and the normalized perspective coordinate vectors?
I'm thinking for the "true" PHONG solution, these vectors should be transformed to be perspective coordinate system in vertex shader and then perform the .w divide in fragment shader because they are not the gl_position. Is this thinking correct?
Edit:
From this link seems to suggest OpenGl's varying qualifier requires the original 'Z' coordinate of the fragment to perform correct perspective interpolation. See https://www.opengl.org/wiki/Type_Qualifier_(GLSL)#Interpolation_qualifiers
So the question I'm wondering can OpenGL derive the Z-value from the depth value?
Edit: Yes it can. Getting the true z value from the depth buffer
First, you cannot forgo the division-by-W step. Why? Because it's hard-wired. It happens as part of OpenGL's fixed-functionality. The gl_Position your last vertex processing step generates will have its W component divided into the other three.
Now, you could try to trick your way around that, by sticking 1.0 in the gl_Position's W, and passing it as some unrelated output. But the W component is a crucial part of perspective-correct interpolation. By faking your transforms this way, you lose that.
And that's kinda important. So unless you intend to re-interpolate all of your per-vertex outputs in the FS and perform perspective-correct interpolation, this just isn't going to work.
Second, post-projective space, when using a perspective projection, is a non-linear transformation, relative to world space. This means that parallel lines are no longer parallel. This also means that vector directions don't point at what they used to point at. So your light direction doesn't necessarily point at where your light is.
Oh, and distances are not linear either. So light attenuation no longer makes sense, since the attenuation factors were designed in a space linearly equivalent to world space. And post-projection space is not.
Here's an image to give you an idea of what I'm talking about:
What you see on the left is a rendering in world space. What you see on the right is the same scene as on the left, only viewed in post-projection space.
That is not a reasonable space to do lighting in.

GLSL Shader - change 'camera' position

I'm trying to create some kind of 'camera' object with OpenGL. By changing its values, you can zoom in/out, and move the camera around. (imagine a 2d world and you're on top of it). This results in the variables center.x, center.y, and center.z.
attribute vec2 in_Position;
attribute vec4 in_Color;
attribute vec2 in_TexCoords;
uniform vec2 uf_Projection;
uniform vec3 center;
varying vec4 var_Color;
varying vec2 var_TexCoords;
void main() {
var_Color = in_Color;
var_TexCoords = in_TexCoords;
gl_Position = vec4(in_Position.x / uf_Projection.x - center.x,
in_Position.y / -uf_Projection.y + center.y,
0, center.z);
}
I'm using uniform vec3 center to manipulate the camera location. (I'm feeling it should be called an attribute, but I don't know for sure; I only know how to manipulate the uniform values. )
uf_Projection has values of half the screen height and width. This was already the case (forking someones code), and I can only assume it's to make sure the values in gl_Position are normalized?
Entering values for i.e. center.x does change the camera angle correctly. However, it does not match the location at which certain things appear to be rendered.
In addition to the question: how bad is the code?, I'm actually asking these concrete questions:
What is in_Position supposed to be? I've seen several code examples use it, but no-one explains it. It's not explicitly defined either; which values does it take?
What values is gl_Position supposed to take? uf_Projection seems to normalize the values, but when adding values (more than 2000) at center.x, it still works (correctly moved the screen).
Is this the correct way to create a kind of "camera" effect? Or is there a better way? (the idea is that things that aren't on the screen, don't have to get rendered)
The questions you ask can only be answered if one considers the bigger picture. In this case, this means we should have a look at the vertex shader and the typical coordinate transformations which are used for rendering.
The purpose of the vertex shader is to calculate a clip space position for each vertex of the object(s) to be drawn.
In this context, an object is just a sequence of geometrical primitives like points, lines or triangles, each specified by some vertices.
These verties typically specify some position with respect to some completely user-defined coorinate frame of reference. The space those vertex positions are defined in is typically called the object space.
Now the vertex shader's job is to transform from object space to clip space using some mathematical or algorithmical way. Typically, these transformation rules also implicitely or explicitetely consist of some "virtual camera", so that the object is transformd as if observed by said camera.
However, what rules are used, and how they are described, and which inputs are needed is completely free.
What is in_Position supposed to be? I've seen several code examples use it, but no-one explains it. It's not explicitly defined either; which values does it take?
So in_Position in your case is just some attribute (meaning it is a value which is specified per vertex). The "meaning" of this attribute depends solely on how it is used. Since you are using it as input for some coordinate transformation, we could interpret it as meaning the object space position of the vertex. In your case, that is a 2D object space.The values it "takes" are completely up to you.
What values is gl_Position supposed to take? uf_Projection seems to normalize the values, but when adding values (more than 2000) at center.x, it still works (correctly moved the screen).
gl_Position is the clip space position of the vertex. Now clip space is a bit hard to describe. The "normalization" you see here has to do with the fact that there is another space, the normalized device coords (NDC). And in the GL, the convention for NDC is such that the viewing volume is represented by the -1 <= x,y,z <=1 cube in NDC.
So if x_ndc is -1, the object will appear at the left border of your viewport, x=1 at the right borde, y=-1 at bottom border and so on. You also have clipping at z, so object which are too far away or are too near of the hypothetical camera position will also not be visible. (Note that the near clipping plane will also exclude everything which is behind the observer.)
The rule to transform from clip space to NDC is by dividing the clip space x,y and z vlaues by the clip space w value.
The rationale for this is that clip space represents a so called projective space, and the clip space coordinates are homogenuous coordinates. It would be far to much to explain the theory behind this in an StackOverflow article.
But what this means is that by setting gl_Position.w to center.z, the GL will later effectively divide gl_Position.xyz by center.z to reach NDC coordinates. Such a division basiaclly creates the perspective effect that points which are farther away appear closer together.
It is unclear to me if this is exactly what you want. Your current solution has the effect that increasing center.z will increase the object space range that is mapped to the viewing volume, so it does give a zoom effect. Let's consider the x coordinate:
x_ndc = (in_Position.x / uf_Projection.x - center.x) / center.z
= in_Position.x / (uf_Projection.x * center.z) - center.x / center.z
To put it the other way around, the object space x range you can see on the screen will be the inverse transformation applied to x_ndc=-1 and x_ndc=1:
x_obj = (x_ndc + center.x/center.z) * (uf_Projection.x * center.z)
= x_ndc * uf_Projection.x * center.z + center.x * uf_Projection.x
= uf_Projection.x * (x_ndc * center.z + center.x);
So basically, the visibile object space range will be center.xy +- uf_Projection.xy * center.z.
Is this the correct way to create a kind of "camera" effect? Or is there a better way? (the idea is that things that aren't on the screen, don't have to get rendered)
Conceptually, the steps are right. Usually, one uses transformation matrices to define the necessary steps. But in your case, directly applying the transformations as some multiplcations and additions is even more efficient (but less flexible).
I'm using uniform vec3 center to manipulate the camera location. (I'm feeling it should be called an attribute, but I don't know for sure.
Actually, using a uniform for this is the right thing to do. Attributes are for values which can change per vertex. Uniforms are for values which are constant during the draw call (hence, are "uniform" for all shader invocations they are accesed by). Your camera specification should be the same for each vertex you are processing. Only the vertex position does vary between vertices, so that each vertex will end up at a different point with respect to some fixed camera location (and parameters).

point - plane collision without the glutLookAt* functions

As I have understood, it is recommended to use glTranslate / glRotate in favour of glutLootAt. I am not going to seek the reasons beyond the obvious HW vs SW computation mode, but just go with the wave. However, this is giving me some headaches as I do not exactly know how to efficiently stop the camera from breaking through walls. I am only interested in point-plane intersections, not AABB or anything else.
So, using glTranslates and glRotates means that the viewpoint stays still (at (0,0,0) for simplicity) while the world revolves around it. This means to me that in order to check for any intersection points, I now need to recompute the world's vertices coordinates (which was not needed with the glutLookAt approach) for every camera movement.
As there is no way in obtaining the needed new coordinates from GPU-land, they need to be calculated in CPU land by hand. For every camera movement ... :(
It seems there is the need to retain the current rotations aside each of the 3 axises and the same for translations. There is no scaling used in my program. My questions:
1 - is the above reasoning flawed ? How ?
2 - if not, there has to be a way to avoid such recalculations.
The way I see it (and by looking at http://www.glprogramming.com/red/appendixf.html) it needs one matrix multiplication for translations and another one for rotating (only aside the y axis needed). However, having to compute so many additions / multiplications and especially the sine / cosine will certainly be killing FPS. There are going to be thousands or even tens of thousands of vertices to compute on. Every frame... all the maths... After having computed the new coordinates of the world things seem to be very easy - just see if there is any plane that changed its 'd' sign (from the planes equation ax + by + cz + d = 0). If it did, use a lightweight cross products approach to test if the point is inside the space inside each 'moving' triangle of that plane.
Thanks
edit: I have found about glGet and I think it is the way to go but I do not know how to properly use it:
// Retains the current modelview matrix
//glPushMatrix();
glGetFloatv(GL_MODELVIEW_MATRIX, m_vt16CurrentMatrixVerts);
//glPopMatrix();
m_vt16CurrentMatrixVerts is a float[16] which gets filled with 0.f or 8.67453e-13 or something similar. Where am I screwing up ?
gluLookAt is a very handy function with absolutely no performance penalty. There is no reason not to use it, and, above all, no "HW vs SW" consideration about that. As Mk12 stated, glRotatef is also done on the CPU. The GPU part is : gl_Position = ProjectionMatrix x ViewMatrix x ModelMatrix x VertexPosition.
"using glTranslates and glRotates means that the viewpoint stays still" -> same thing for gluLookAt
"at (0,0,0) for simplicity" -> not for simplicity, it's a fact. However, this (0,0,0) is in the Camera coordinate system. It makes sense : relatively to the camera, the camera is at the origin...
Now, if you want to prevent the camera from going through the walls, the usual method is to trace a ray from the camera. I suspect this is what you're talking about ("to check for any intersection points"). But there is no need to do this in camera space. You can do this in world space. Here's a comparison :
Tracing rays in camera space : ray always starts from (0,0,0) and goes to (0,0,-1). Geometry must be transformed from Model space to World space, and then to Camera space, which is what annoys you
Tracing rays in world space : ray starts from camera position (in world space) and goes to (eyeCenter - eyePos).normalize(). Geometry must be transformed from Model space to World space.
Note that there is no third option (Tracing rays in Model space) which would avoid to transform the geometry from Model space to World space. However, you have a pair of workarounds :
First, your game's world is probably still : the Model matrix is probably always identity. So transforming its geometry from Model to World space is equivalent to doing nothing at all.
Secondly, for all other objets, you can take the opposite approach. Intead of transforming the entire geometry in one direction, transform only the ray the other way around : Take your Model matrix, inverse it, and you've got a matrix which goes from world space to model space. Multiply your ray's origin and direction by this matrix : your ray is now in model space. Intersect the normal way. Done.
Note that all I've said is standard techniques. No hacks or other weird stuff, just math :)