Almost all the theoretical stuff I read about projection matrices have the first element being 2n/(r-l), but most of the open source implementations I've seen have it as 2n/((t-b)*a), -- which makes sense to me at first since (r-l) should be ((t-b)*a), but when I actually run the numbers, something feels off.
If we have a vertical field of view of 65 degrees, a near plane of .1, and an aspect ratio of 4:3, then I seem to get:
2n/(r-l) = .2 / (tan(65*(4/3)*.5) * .2) = 1.0599
but
2n((t-b)*a) = .2 / (tan(65*.5) * (4/3) * .2) = 1.1773
Why is there a difference between everything I read, and everything I see implemented? I didn't notice until I started implementing the same analytical inverse I see whose first element is (r-l)/2n, which isn't the inverse of these other implementations.
You can't multiply the aspect ratio into the angle. The tangens isn't a linear function. Having 65 degress vertical field of view does not mean that you're going to have 86,67 degrees horizontal FOV with 4:3 aspect, but ~80.69 degrees.
Related
For a university project I am currently working on, I have to create a point cloud by reading images from this dataset. These are basically video frames, and for each frame there is an rgb image along with a corresponding depth image.
I am familiar with the equation z = f*b/d, however I am unable to figure out how the data should be interpreted. Information about the camera that was used to take the video is not provided, and also the project states the following:
"Consider a horizontal/vertical field of view of the camera 48.6/62
degrees respectively"
I have little to no experience in computer vision, and I have never encountered 2 fields of view being used before. Assuming I use the depth from the image as is (for the z coordinate), how would I go about calculating the x and y coordinates of each point in the point cloud?
Here's an example of what the dataset looks like:
Yes, it's unusual to specify multiple fields of view. Given a typical camera (squarish pixels, minimal distortion, view vector through the image center), usually only one field-of-view angle is given -- horizontal or vertical -- because the other can then be derived from the image aspect ratio.
Specifying a horizontal angle of 48.6 and a vertical angle of 62 is particularly surprising here, since the image is a landscape view, where I'd expect the horizontal angle to be greater than the vertical. I'm pretty sure it's a typo:
When swapped, the ratio tan(62 * pi / 360) / tan(48.6 * pi / 360) is the 640 / 480 aspect ratio you'd expect, given the image dimensions and square pixels.
At any rate, a horizontal angle of t is basically saying that the horizontal extent of the image, from left edge to right edge, covers an arc of t radians of the visual field, so the pixel at the center of the right edge lies along a ray rotated t / 2 radians to the right from the central view ray. This "righthand" ray runs from the eye at the origin through the point (tan(t / 2), 0, -1) (assuming a right-handed space with positive x pointing right and positive y pointing up, looking down the negative z axis). To get the point in space at distance d from the eye, you can just normalize a vector along this ray and multiply by it by d. Assuming the samples are linearly distributed across a flat sensor, I'd expect that for a given pixel at (x, y) you could calculate its corresponding ray point with:
p = (dx * tan(hfov / 2), dy * tan(vfov / 2), -1)
where dx is 2 * (x - width / 2) / width, dy is 2 * (y - height / 2) / height, and hfov and vfov are the field-of-view angles in radians.
Note that the documentation that accompanies your sample data links to a Matlab file that shows the recommended process for converting the depth images into a point cloud and distance field. In it, the fields of view are baked with the image dimensions to a constant factor of 570.3, which can be used to recover the field of view angles that the authors believed their recording device had:
atan(320 / 570.3) * (360 / pi / 2) * 2 = 58.6
which is indeed pretty close to the 62 degrees you were given.
From the Matlab code, it looks like the value in the image is not distance from a given point to the eye, but instead distance along the view vector to a perpendicular plane containing the given point ("depth", or basically "z"), so the authors can just multiply it directly with the vector (dx * tan(hfov / 2), dy * tan(vfov / 2), -1) to get the point in space, skipping the normalization step mentioned earlier.
I'm actually realising a C++ raytracer and I'm confronting a classic problem on raytracing. When putting a high vertical FOV, the shapes get a bigger distortion the nearer they are from the edges.
I know why this distortion happens, but I don't know to resolve it (of course, reducing the FOV is an option but I think that there is something to change in my code). I've been browsing different computing forums but didn't find any way to resolve it.
Here's a screenshot to illustrate my problem.
I think that the problem is that the view plane where I'm projecting my rays isn't actually flat, but I don't know how to resolve this. If you have any tip to resolve it, I'm open to suggestions.
I'm on a right-handed oriented system.
The Camera system vectors, Direction vector and Light vector are normalized.
If you need some code to check something, I'll put it in an answer with the part you ask.
code of ray generation :
// PixelScreenX = (pixelx + 0.5) / imageWidth
// PixelCameraX = (2 ∗ PixelScreenx − 1) ∗
// ImageAspectRatio ∗ tan(fov / 2)
float x = (2 * (i + 0.5f) / (float)options.width - 1) *
options.imageAspectRatio * options.scale;
// PixelScreeny = (pixely + 0.5) / imageHeight
// PixelCameraY = (1 − 2 ∗ PixelScreeny) ∗ tan(fov / 2)
float y = (1 - 2 * (j + 0.5f) / (float)options.height) * options.scale;
Vec3f dir;
options.cameraToWorld.multDirMatrix(Vec3f(x, y, -1), dir);
dir.normalize();
newColor = _renderer->castRay(options.orig, dir, objects, options);
There is nothing wrong with your projection. It produces exactly what it should produce.
Let's consider the following figure to see how all the quantities interact:
We have the camera position, the field of view (as an angle) and the image plane. The image plane is the plane that you are projecting your 3D scene onto. Essentially, this represents your screen. When you are viewing your rendering on the screen, your eye serves as the camera. It sees the projected image and if it is positioned at the right point, it will see exactly what it would see if the actual 3D scene was there (neglecting effects like depth of field etc.)
Obviously, you cannot modify your screen (you could change the window size but let's stick with a constant-size image plane). Then, there is a direct relationship between the camera's position and the field of view. As the field of view increases, the camera moves closer and closer to the image plane. Like this:
Thus, if you are increasing your field of view in code, you need to move your eye closer to the screen to get the correct perception. You can actually try that with your image. Move your eye very close to the screen (I'm talking about 3cm). If you look at the outer spheres now, they actually look like real balls again.
In summary, the field of view should approximately match the geometry of the viewing setup. For a given screen size and average watch distance, this can be calculated easily. If your viewing setup does not match your assumptions in code, 3D perception will break down.
NOTICE: I have edited the question below which is more relevant to my real issue than the text right below, you can skip this if you but I'll leave it here for historic reasons.
To see if I get this right, a float in C is the same as a value in radians right? I mean, 360º = 6.28318531 radians and I just noticed on my OpenGL app that a full rotation goes from 0.0 to 6.28, which seems to add up correctly. I just want to make sure I got that right.
I'm using a float (let's call it anglePitch) from 0.0 to 360.0 (it's easier to read in degrees and avoids casting int to float all the time) and all the code I see on the web uses some kind of DEG2RAD() macro which is defined as DEG2RAD 3.141593f / 180. In the end it would be something like this:
anglePitch += direction * 1; // direction will be 1 or -1
refY = tan(anglePitch * DEG2RAD);
This really does a full rotation but that full rotation will be when anglePitch = 180 and anglePitch * DEG2RAD = 3.14, but a full rotation should be 360|6.28. If I change the macro to any of the following:
#define DEG2RAD 3.141593f / 360
#define DEG2RAD 3.141593f / 2 / 180
It works as expected, a full rotation will happen when anglePitch = 360.
What am I missing here and what should I use to properly convert angles to radians/floats?
IMPORTANT EDIT (REAL QUESTION):
I understand now the code I see everywhere on the web about DEG2RAD, I'm just too dumb at math (yeah, I know, it's important when working with this kind of stuff). So I'm going to rephrase my question:
I have now added this to my code:
#define PI 3.141592654f
#define DEG2RAD(d) (d * PI / 180)
Now, when working the pitch/yawn angles in degrees, which are floats, once again, to avoid casting all the time, I just use the DEG2RAD macro and the degree value will be correctly converted to radians. These values will be passed to sin/cos/tan functions and will return the proper values to be used in GLUT camera.
Now the real question, where I was really confused before but couldn't explain myself better:
angleYaw += direction * ROTATE_SPEED;
refX = sin(DEG2RAD(angleYaw));
refZ = -cos(DEG2RAD(angleYaw));
This code will be executed when I press the LEFT/RIGHT keys and the camera will rotate in the Y axis accordingly. A full rotation goes from 0º to 360º.
anglePitch += direction * ROTATE_SPEED;
refY = tan(DEG2RAD(anglePitch));
This is similar code and will be executed when I press the UP/DOWN keys and the camera will rotate in the X axis. But in this situation, a full rotation goes from 0º to 180º degrees and that's what's really confusing me. I'm sure it has something to do with the tangent function but I can't get my head around it.
Is there way I could use sin/cos (as I do in the yawn code) to achieve the same rotation? What is the right way, the most simple code I can add/fix and what makes more sense to create a full pitch rotation from 0º to 360º?
360° = 2 * Pi, Pi = 3.141593…
Radians are defined by the arc length of an angle along a circle of radius 1. The circumfence of a circle is 2*r*Pi, so one full turn on a unit circle has an arc length of 2*Pi = 6.28…
The measure of angles in degrees stem from the fact, that by aligning 6 equilateral triangles you span a full turn. So we have 6 triangles, each making up a 6th of the turn, so the old babylonians divided a circle into pieces of 1/(6*6) = 1/36, and to further refine it this was subdivded by 10. That's why we ended up with 360° in a full circle. This number is arbitrarily choosen, though.
So if there are 2*Pi/360° this makes Pi/180° = 3.141593…/180° which is the conversion factor from degrees to radians. The reciprocal, 180°/Pi = 180/3.141593…
Why on earth the old OpenGL function glRotate and GLU's gluPerspective used degrees instead of radians I cannot fathom. From a mathematical point of view only radians make sense. Which I think is most beautifully demonstrated by Euler's equation
e^(i*Pi) - 1 = 0
There you have it, all the important numbers of mathematics in one single equation. What's this got to do with angles? Well:
e^(i*alpha) = cos(alpha) + i * sin(alpha), alpha is in radians!
EDIT, with respect to modified question:
Your angles being floats is all fine. Why would you even think degress being integers I cannot understand. Normally you don't have to define PI yourself, it comes predefined in math.h, usually called M_PI, M_2PI, and M_PI2 for Pi, 2*Pi and Pi/2. You also should change your macro, the way it's written now can create strange effects.
#define DEG2RAD(d) ( (d) * M_PI/180. )
GLUT has no camera at all. GLUT is a rather dumb OpenGL framework I recommend not using. You probably refer to gluLookAt.
Those obstacles out of the way let's see what you're doing there. Remember that trigonometric functions operate on the unit circle. Let the angle 0 point towards the right and angles increment counterclockwise. Then sin(a) is defined as the amount of rightwards and cos(a) and the amount of forwards to reach the point at angle a on the unit circle. This is what the refX and refZ are getting assigned to.
refY however makes no sense written that way. tan = sin/cos so as we approach n*pi/2 (i.e. 90°) it diverges to +/- infinity. At least it explains your pi/180° cyclic range, because that's the period of tan.
I was first thinking that tan may have been used to normalize the direction vector, but didn't make sense either. The factor would have been 1./sqrt(sin²(Pitch) + 1)
I double checked: using tan there does the right thing.
EDIT2: I don't see where your problem is: The pitch angle is -90° to +90°, which makes perfect sense. Go get yourself a globe (of the earth): The east-west coordinates (longitude) go from -180° to +180°, the south-north coordinate (latitude) goes -90° to +90°. Think about it: Any larger coordinate range would create ambiguities.
The only good suggestion I offer you is: Grab some math text book and bend your mind around spherical coordinates! Sorry to tell you that way. Whatever you have works perfectly fine, you just need to understand sperical geometry.
You're using the terms Yaw and Pitch. Those are normally used in Euler angles. Now unfortunately Euler angles, which compelling at first, cause serious trouble later on (like gimbal lock). You should not use them at all. It may also be a good idea if you used some pencil/sticks/whatever to decompose the rotations you're intending with your hands to understand their mechanics.
And by the way: There are also non-integer degrees. Just hop over to http://maps.google.com to see them in action (just select some place and let http://maps.google.com give you the link to it).
'float' is a type, like int or double. radians and degrees are units of measure, both of which can be represented with any precision you want. i.e., there's no reason you can't have 22.5 degrees, and keep that value in a float.
a full rotation in radians is 2*pi, about 6.283, whereas a full rotation in degrees is 360. You can convert between them by dividing out the starting unit's full circle, then multiplying by the desired unit's full circle.
for example, to get from 90 degrees to radians, first divide out the degrees. 90 over 360 is 0.25 (note this value is in 'revolutions'). Now multiply that 0.25 by 6.283 to arrive at 1.571 radians.
follow up
the reason you're seeing your pitch cycle twice as fast as it should is precisely because you're using tan(pitch) to compute the Y component. What you should have is that the Y component depends on sin(pitch). i.e., try changing
refY = tan(DEG2RAD(anglePitch));
to
refY = sin(DEG2RAD(anglePitch));
a technical detail: the numbers that go into the look matrix should all be in the range of -1 to +1, and if you were to inspect the values you're feeding to refY, and run your pitch outside of -45 to +45 degrees, you'd see the problem; tan() runs off to infinity at +/-90 degrees.
also, note that casting a value from int to float in no sense converts between degrees and radians. casting just gives you the nearest equivalent value in the new storage type. for example, if you cast the integer 22 to floating point, you get 22.0f, whereas if you cast 33.3333f to type int, you'd be left with 33. when working with angles, you really should just stick with floating point, unless you're constrained by working with an embedded processor or something. this is especially important with radians, where whole number increments represent leaps of (about) 57.3 degrees.
Assuming that your ref components are intended to be used as your look-at vector, I think what you need is
refY = sin(DEG2RAD(anglePitch));
XZfactor = cos(DEG2RAD(anglePitch));
refX = XZfactor*sin(DEG2RAD(angleYaw));
refZ = -XZfactor*cos(DEG2RAD(angleYaw));
This one has been eating up my entire night, and I'm finally throwing up my hands for some assistance. Basically, it's fairly straightforward to calculate the Pitch and Yaw from the View Matrix right after you do a camera update:
D3DXMatrixLookAtLH(&m_View, &sCam.pos, &vLookAt, &sCam.up);
pDev->SetTransform(D3DTS_VIEW, &m_View);
// Set the camera axes from the view matrix
sCam.right.x = m_View._11;
sCam.right.y = m_View._21;
sCam.right.z = m_View._31;
sCam.up.x = m_View._12;
sCam.up.y = m_View._22;
sCam.up.z = m_View._32;
sCam.look.x = m_View._13;
sCam.look.y = m_View._23;
sCam.look.z = m_View._33;
// Calculate yaw and pitch and roll
float lookLengthOnXZ = sqrtf( sCam.look.z^2 + sCam.look.x^2 );
fPitch = atan2f( sCam.look.y, lookLengthOnXZ );
fYaw = atan2f( sCam.look.x, sCam.look.z );
So my problem is: there must be some way to obtain the Roll in radians similar to how the Pitch and Yaw are being obtained at the end of the code there. I've tried several dozen algorithms that seemed to make sense to me, but none of them gave quite the desired result. I realize that many developers don't track these values, however I am writing some re-centering routines and setting clipping based on the values, and manual tracking breaks down as you apply mixed rotations to the View Matrix. So, does anyone have the formula for getting the Roll in radians (or degrees, I don't care) from the View Matrix?
Woohoo, got it! Thank you stacker for the link to the Euler Angles entry on Wikipedia, the gamma formula at the end was indeed the correct solution! So, to calculate Roll in radians from a "vertical plane" I'm using the following line of C++ code:
fRoll = atan2f( sCam.up.y, sCam.right.y ) - D3DX_PI/2;
As Euler pointed out, you're looking for the arc tangent of the Y (up) matrix's Z value against the X (right) matrix's Z value - though in this case I tried the Z values and they did not yield the desired result so I tried the Y values on a whim and I got a value which was off by +PI/2 radians (right angle to the desired value), which is easily compensated for. Finally I have a gryscopically accurate and stable 3D camera platform with self-correcting balance. =)
It seems the Wikipedia article about Euler angles contains the formula you're lookin for at the end.
I'm making a software rasterizer for school, and I'm using an unusual rendering method instead of traditional matrix calculations. It's based on a pinhole camera. I have a few points in 3D space, and I convert them to 2D screen coordinates by taking the distance between it and the camera and normalizing it
Vec3 ray_to_camera = (a_Point - plane_pos).Normalize();
This gives me a directional vector towards the camera. I then turn that direction into a ray by placing the ray's origin on the camera and performing a ray-plane intersection with a plane slightly behind the camera.
Vec3 plane_pos = m_Position + (m_Direction * m_ScreenDistance);
float dot = ray_to_camera.GetDotProduct(m_Direction);
if (dot < 0)
{
float time = (-m_ScreenDistance - plane_pos.GetDotProduct(m_Direction)) / dot;
// if time is smaller than 0 the ray is either parallel to the plane or misses it
if (time >= 0)
{
// retrieving the actual intersection point
a_Point -= (m_Direction * ((a_Point - plane_pos).GetDotProduct(m_Direction)));
// subtracting the plane origin from the intersection point
// puts the point at world origin (0, 0, 0)
Vec3 sub = a_Point - plane_pos;
// the axes are calculated by saying the directional vector of the camera
// is the new z axis
projected.x = sub.GetDotProduct(m_Axis[0]);
projected.y = sub.GetDotProduct(m_Axis[1]);
}
}
This works wonderful, but I'm wondering: can the algorithm be made any faster? Right now, for every triangle in the scene, I have to calculate three normals.
float length = 1 / sqrtf(GetSquaredLength());
x *= length;
y *= length;
z *= length;
Even with a fast reciprocal square root approximation (1 / sqrt(x)) that's going to be very demanding.
My questions are thus:
Is there a good way to approximate the three normals?
What is this rendering technique called?
Can the three vertex points be approximated using the normal of the centroid? ((v0 + v1 + v2) / 3)
Thanks in advance.
P.S. "You will build a fully functional software rasterizer in the next seven weeks with the help of an expert in this field. Begin." I ADORE my education. :)
EDIT:
Vec2 projected;
// the plane is behind the camera
Vec3 plane_pos = m_Position + (m_Direction * m_ScreenDistance);
float scale = m_ScreenDistance / (m_Position - plane_pos).GetSquaredLength();
// times -100 because of the squared length instead of the length
// (which would involve a squared root)
projected.x = a_Point.GetDotProduct(m_Axis[0]).x * scale * -100;
projected.y = a_Point.GetDotProduct(m_Axis[1]).y * scale * -100;
return projected;
This returns the correct results, however the model is now independent of the camera position. :(
It's a lot shorter and faster though!
This is called a ray-tracer - a rather typical assignment for a first computer graphics course* - and you can find a lot of interesting implementation details on the classic Foley/Van Damm textbook (Computer Graphics Principes and Practice). I strongly suggest you buy/borrow this textbook and read it carefully.
*Just wait until you get started on reflections and refraction... Now the fun begins!
It is difficult to understand exactly what your code doing, because it seems to be performing a lot of redundant operations! However, if I understand what you say you're trying to do, you are:
finding the vector from the pinhole to the point
normalizing it
projecting backwards along the normalized vector to an "image plane" (behind the pinhole, natch!)
finding the vector to this point from a central point on the image plane
doing dot products on the result with "axis" vectors to find the x and y screen coordinates
If the above description represents your intentions, then the normalization should be redundant -- you shouldn't have to do it at all! If removing the normalization gives you bad results, you are probably doing something slightly different from your stated plan... in other words, it seems likely that you have confused yourself along with me, and that the normalization step is "fixing" it to the extent that it looks good enough in your test cases, even though it probably still isn't doing quite what you want it to.
The overall problem, I think, is that your code is massively overengineered: you are writing all your high-level vector algebra as code to be executed in the inner loop. The way to optimize this is to work out all your vector algebra on paper, find the simplest expression possible for your inner loop, and precompute all the necessary constants for this at camera setup time. The pinhole camera specs would only be the inputs to the camera setup routine.
Unfortunately, unless I miss my guess, this should reduce your pinhole camera to the traditional, boring old matrix calculations. (ray tracing does make it easy to do cool nonstandard camera stuff -- but what you describe should end up perfectly standard...)
Your code is a little unclear to me (plane_pos?), but it does seem that you could cut out some unnecessary calculation.
Instead of normalizing the ray (scaling it to length 1), why not scale it so that the z component is equal to the distance from the camera to the plane-- in fact, scale x and y by this factor, you don't need z.
float scale = distance_to_plane/z;
x *= scale;
y *= scale;
This will give the x and y coordinates on the plane, no sqrt(), no dot products.
Well, off the bat, you can calculate normals for every triangle when your program starts up. Then when you're actually running, you just have to access the normals. This sort of startup calculation to save costs later tends to happen a lot in graphics. This is why we have large loading screens in a lot of our video games!