I'm currently working on a STL file viewer which provides two types of views.
A perspective view (it's working perfectly):
and a parallel view (light calculation having trouble on this one but it's not the subject):
As you can see the parallel view is deformed. Here is how I calculate the view matrix:
float distance = glm::distance(focus,transform.GetPosition())* (ScreenSize.cx/ScreenSize.cy);
float distanceY = glm::distance(focus,transform.GetPosition())* (ScreenSize.cy/ScreenSize.cx);
return glm::ortho(-distance ,distance ,-distanceY ,distanceY, 0.00001f, 10000.0f);
I must explain a bit this code:
focus is the central point of the model I'm viewing (0 0 0)
transform.GetPosition() return the actual position of the camera
glm::distance(...) return the length between both vector ( the camera vector can zoom or dezoom to the focus point)
My idea to avoid distortion was to get the screenSize of my actual ratio of my OpenGL context (stored in ScreenSize) and multiply my distance with it in order to keep the rabbit unchanged whatever the screensize is. But it's not working, the rabbit are still stretched or crushed.
how could I prevent my 3D model to get distorted whatever the screensize is on a parallel view ?
Delete one of the screen size bits? Right now, if your window is 3/4 as tall as it is wide, then you're making the height 3/4 as tall (so far so good) but you're also making the width 4/3 as wide. You need to do one or the other, but not both.
Most computer games seem to keep the height the same - so the vertical view distance is always 1 (for example) - and adjust the width:
float distance = glm::distance(focus,transform.GetPosition())*(ScreenSize.cx/ScreenSize.cy);
// delete this part vvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
float distanceY = glm::distance(focus,transform.GetPosition())*(ScreenSize.cy/ScreenSize.cx);
For a university project I am currently working on, I have to create a point cloud by reading images from this dataset. These are basically video frames, and for each frame there is an rgb image along with a corresponding depth image.
I am familiar with the equation z = f*b/d, however I am unable to figure out how the data should be interpreted. Information about the camera that was used to take the video is not provided, and also the project states the following:
"Consider a horizontal/vertical field of view of the camera 48.6/62
degrees respectively"
I have little to no experience in computer vision, and I have never encountered 2 fields of view being used before. Assuming I use the depth from the image as is (for the z coordinate), how would I go about calculating the x and y coordinates of each point in the point cloud?
Here's an example of what the dataset looks like:
Yes, it's unusual to specify multiple fields of view. Given a typical camera (squarish pixels, minimal distortion, view vector through the image center), usually only one field-of-view angle is given -- horizontal or vertical -- because the other can then be derived from the image aspect ratio.
Specifying a horizontal angle of 48.6 and a vertical angle of 62 is particularly surprising here, since the image is a landscape view, where I'd expect the horizontal angle to be greater than the vertical. I'm pretty sure it's a typo:
When swapped, the ratio tan(62 * pi / 360) / tan(48.6 * pi / 360) is the 640 / 480 aspect ratio you'd expect, given the image dimensions and square pixels.
At any rate, a horizontal angle of t is basically saying that the horizontal extent of the image, from left edge to right edge, covers an arc of t radians of the visual field, so the pixel at the center of the right edge lies along a ray rotated t / 2 radians to the right from the central view ray. This "righthand" ray runs from the eye at the origin through the point (tan(t / 2), 0, -1) (assuming a right-handed space with positive x pointing right and positive y pointing up, looking down the negative z axis). To get the point in space at distance d from the eye, you can just normalize a vector along this ray and multiply by it by d. Assuming the samples are linearly distributed across a flat sensor, I'd expect that for a given pixel at (x, y) you could calculate its corresponding ray point with:
p = (dx * tan(hfov / 2), dy * tan(vfov / 2), -1)
where dx is 2 * (x - width / 2) / width, dy is 2 * (y - height / 2) / height, and hfov and vfov are the field-of-view angles in radians.
Note that the documentation that accompanies your sample data links to a Matlab file that shows the recommended process for converting the depth images into a point cloud and distance field. In it, the fields of view are baked with the image dimensions to a constant factor of 570.3, which can be used to recover the field of view angles that the authors believed their recording device had:
atan(320 / 570.3) * (360 / pi / 2) * 2 = 58.6
which is indeed pretty close to the 62 degrees you were given.
From the Matlab code, it looks like the value in the image is not distance from a given point to the eye, but instead distance along the view vector to a perpendicular plane containing the given point ("depth", or basically "z"), so the authors can just multiply it directly with the vector (dx * tan(hfov / 2), dy * tan(vfov / 2), -1) to get the point in space, skipping the normalization step mentioned earlier.
I am implementing a 3D engine for spatial visualisation, and am writing a camera with the following navigation features:
Rotate the camera (ie, analogous to rotating your head)
Rotate around an arbitrary 3D point (a point in space, which is probably not in the center of the screen; the camera needs to rotate around this keeping the same relative look direction, ie the look direction changes too. This does not look directly at the chosen rotation point)
Pan in the camera's plane (so move up/down or left/right in the plane orthogonal to the camera's look vector)
The camera is not supposed to roll - that is, 'up' remains up. Because of this I represent the camera with a location and two angles, rotations around the X and Y axes (Z would be roll.) The view matrix is then recalculated using the camera location and these two angles. This works great for pan and rotating the eye, but not for rotating around an arbitrary point. Instead I get the following behaviour:
The eye itself apparently moving further up or down than it should
The eye not moving up or down at all when m_dRotationX is 0 or pi. (Gimbal lock? How can I avoid this?)
The eye's rotation being inverted (changing the rotation makes it look further up when it should look further down, down when it should look further up) when m_dRotationX is between pi and 2pi.
(a) What is causing this 'drift' in rotation?
This may be gimbal lock. If so, the standard answer to this is 'use quaternions to represent rotation', said many times here on SO (1, 2, 3 for example), but unfortunately without concrete details (example. This is the best answer I've found so far; it's rare.) I've struggled to implemented a camera using quaternions combining the above two types of rotations. I am, in fact, building a quaternion using the two rotations, but a commenter below said there was no reason - it's fine to immediately build the matrix.
This occurs when changing the X and Y rotations (which represent the camera look direction) when rotating around a point, but does not occur simply when directly changing the rotations, i.e. rotating the camera around itself. To me, this doesn't make sense. It's the same values.
(b) Would a different approach (quaternions, for example) be better for this camera? If so, how do I implement all three camera navigation features above?
If a different approach would be better, then please consider providing a concrete implemented example of that approach. (I am using DirectX9 and C++, and the D3DX* library the SDK provides.) In this second case, I will add and award a bounty in a couple of days when I can add one to the question. This might sound like I'm jumping the gun, but I'm low on time and need to implement or solve this quickly (this is a commercial project with a tight deadline.) A detailed answer will also improve the SO archives, because most camera answers I've read so far are light on code.
Thanks for your help :)
Some clarifications
Thanks for the comments and answer so far! I'll try to clarify a few things about the problem:
The view matrix is recalculated from the camera position and the two angles whenever one of those things changes. The matrix itself is never accumulated (i.e. updated) - it is recalculated afresh. However, the camera position and the two angle variables are accumulated (whenever the mouse moves, for example, one or both of the angles will have a small amount added or subtracted, based on the number of pixels the mouse moved up-down and/or left-right onscreen.)
Commenter JCooper states I'm suffering from gimbal lock, and I need to:
add another rotation onto your transform that rotates the eyePos to be
completely in the y-z plane before you apply the transformation, and
then another rotation that moves it back afterward. Rotate around the
y axis by the following angle immediately before and after applying
the yaw-pitch-roll matrix (one of the angles will need to be negated;
trying it out is the fastest way to decide which).
double fixAngle = atan2(oEyeTranslated.z,oEyeTranslated.x);
Unfortunately, when implementing this as described, my eye shoots off above the scene at a very fast rate due to one of the rotations. I'm sure my code is simply a bad implementation of this description, but I still need something more concrete. In general, I find unspecific text descriptions of algorithms are less useful than commented, explained implementations. I am adding a bounty for a concrete, working example that integrates with the code below (i.e. with the other navigation methods, too.) This is because I would like to understand the solution, as well as have something that works, and because I need to implement something that works quickly since I am on a tight deadline.
Please, if you answer with a text description of the algorithm, make sure it is detailed enough to implement ('Rotate around Y, then transform, then rotate back' may make sense to you but lacks the details to know what you mean. Good answers are clear, signposted, will allow others to understand even with a different basis, are 'solid weatherproof information boards.')
In turn, I have tried to be clear describing the problem, and if I can make it clearer please let me know.
My current code
To implement the above three navigation features, in a mouse move event moving based on the pixels the cursor has moved:
// Adjust this to change rotation speed when dragging (units are radians per pixel mouse moves)
// This is both rotating the eye, and rotating around a point
static const double dRotatePixelScale = 0.001;
// Adjust this to change pan speed (units are meters per pixel mouse moves)
static const double dPanPixelScale = 0.15;
switch (m_eCurrentNavigation) {
case ENavigation::eRotatePoint: {
// Rotating around m_oRotateAroundPos
const double dX = (double)(m_oLastMousePos.x - roMousePos.x) * dRotatePixelScale * D3DX_PI;
const double dY = (double)(m_oLastMousePos.y - roMousePos.y) * dRotatePixelScale * D3DX_PI;
// To rotate around the point, translate so the point is at (0,0,0) (this makes the point
// the origin so the eye rotates around the origin), rotate, translate back
// However, the camera is represented as an eye plus two (X and Y) rotation angles
// This needs to keep the same relative rotation.
// Rotate the eye around the point
const D3DXVECTOR3 oEyeTranslated = m_oEyePos - m_oRotateAroundPos;
D3DXMATRIX oRotationMatrix;
D3DXMatrixRotationYawPitchRoll(&oRotationMatrix, dX, dY, 0.0);
D3DXVECTOR4 oEyeRotated;
D3DXVec3Transform(&oEyeRotated, &oEyeTranslated, &oRotationMatrix);
m_oEyePos = D3DXVECTOR3(oEyeRotated.x, oEyeRotated.y, oEyeRotated.z) + m_oRotateAroundPos;
// Increment rotation to keep the same relative look angles
RotateXAxis(dX);
RotateYAxis(dY);
break;
}
case ENavigation::ePanPlane: {
const double dX = (double)(m_oLastMousePos.x - roMousePos.x) * dPanPixelScale;
const double dY = (double)(m_oLastMousePos.y - roMousePos.y) * dPanPixelScale;
m_oEyePos += GetXAxis() * dX; // GetX/YAxis reads from the view matrix, so increments correctly
m_oEyePos += GetYAxis() * -dY; // Inverted compared to screen coords
break;
}
case ENavigation::eRotateEye: {
// Rotate in radians around local (camera not scene space) X and Y axes
const double dX = (double)(m_oLastMousePos.x - roMousePos.x) * dRotatePixelScale * D3DX_PI;
const double dY = (double)(m_oLastMousePos.y - roMousePos.y) * dRotatePixelScale * D3DX_PI;
RotateXAxis(dX);
RotateYAxis(dY);
break;
}
The RotateXAxis and RotateYAxis methods are very simple:
void Camera::RotateXAxis(const double dRadians) {
m_dRotationX += dRadians;
m_dRotationX = fmod(m_dRotationX, 2 * D3DX_PI); // Keep in valid circular range
}
void Camera::RotateYAxis(const double dRadians) {
m_dRotationY += dRadians;
// Limit it so you don't rotate around when looking up and down
m_dRotationY = std::min(m_dRotationY, D3DX_PI * 0.49); // Almost fully up
m_dRotationY = std::max(m_dRotationY, D3DX_PI * -0.49); // Almost fully down
}
And to generate the view matrix from this:
void Camera::UpdateView() const {
const D3DXVECTOR3 oEyePos(GetEyePos());
const D3DXVECTOR3 oUpVector(0.0f, 1.0f, 0.0f); // Keep up "up", always.
// Generate a rotation matrix via a quaternion
D3DXQUATERNION oRotationQuat;
D3DXQuaternionRotationYawPitchRoll(&oRotationQuat, m_dRotationX, m_dRotationY, 0.0);
D3DXMATRIX oRotationMatrix;
D3DXMatrixRotationQuaternion(&oRotationMatrix, &oRotationQuat);
// Generate view matrix by looking at a point 1 unit ahead of the eye (transformed by the above
// rotation)
D3DXVECTOR3 oForward(0.0, 0.0, 1.0);
D3DXVECTOR4 oForward4;
D3DXVec3Transform(&oForward4, &oForward, &oRotationMatrix);
D3DXVECTOR3 oTarget = oEyePos + D3DXVECTOR3(oForward4.x, oForward4.y, oForward4.z); // eye pos + look vector = look target position
D3DXMatrixLookAtLH(&m_oViewMatrix, &oEyePos, &oTarget, &oUpVector);
}
It seems to me that "Roll" shouldn't be possible given the way you form your view matrix. Regardless of all the other code (some of which does look a little funny), the call D3DXMatrixLookAtLH(&m_oViewMatrix, &oEyePos, &oTarget, &oUpVector); should create a matrix without roll when given [0,1,0] as an 'Up' vector unless oTarget-oEyePos happens to be parallel to the up vector. This doesn't seem to be the case since you're restricting m_dRotationY to be within (-.49pi,+.49pi).
Perhaps you can clarify how you know that 'roll' is happening. Do you have a ground plane and the horizon line of that ground plane is departing from horizontal?
As an aside, in UpdateView, the D3DXQuaternionRotationYawPitchRoll seems completely unnecessary since you immediately turn around and change it into a matrix. Just use D3DXMatrixRotationYawPitchRoll as you did in the mouse event. Quaternions are used in cameras because they're a convenient way to accumulate rotations happening in eye coordinates. Since you're only using two axes of rotation in a strict order, your way of accumulating angles should be fine. The vector transformation of (0,0,1) isn't really necessary either. The oRotationMatrix should already have those values in the (_31,_32,_33) entries.
Update
Given that it's not roll, here's the problem: you create a rotation matrix to move the eye in world coordinates, but you want the pitch to happen in camera coordinates. Since roll isn't allowed and yaw is performed last, yaw is always the same in both the world and camera frames of reference. Consider the images below:
Your code works fine for local pitch and yaw because those are accomplished in camera coordinates.
But when you rotate around a reference point, you are creating a rotation matrix that is in world coordinates and using that to rotate the camera center. This works okay if the camera's coordinate system happens to line up with the world's. However, if you don't check to see if you're up against the pitch limit before you rotate the camera position, you will get crazy behavior when you hit that limit. The camera will suddenly start to skate around the world--still 'rotating' around the reference point, but no longer changing orientation.
If the camera's axes don't line up with the world's, strange things will happen. In the extreme case, the camera won't move at all because you're trying to make it roll.
The above is what would normally happen, but since you handle the camera orientation separately, the camera doesn't actually roll.
Instead, it stays upright, but you get strange translation going on.
One way to handle this would be to (1)always put the camera into a canonical position and orientation relative to the reference point, (2)make your rotation, and then (3)put it back when you're done (e.g., similar to the way that you translate the reference point to the origin, apply the Yaw-Pitch rotation, and then translate back). Thinking more about it, however, this probably isn't the best way to go.
Update 2
I think that Generic Human's answer is probably the best. The question remains as to how much pitch should be applied if the rotation is off-axis, but for now, we'll ignore that. Maybe it'll give you acceptable results.
The essence of the answer is this: Before mouse movement, your camera is at c1 = m_oEyePos and being oriented by M1 = D3DXMatrixRotationYawPitchRoll(&M_1,m_dRotationX,m_dRotationY,0). Consider the reference point a = m_oRotateAroundPos. From the point of view of the camera, this point is a'=M1(a-c1).
You want to change the orientation of the camera to M2 = D3DXMatrixRotationYawPitchRoll(&M_2,m_dRotationX+dX,m_dRotationY+dY,0). [Important: Since you won't allow m_dRotationY to fall outside of a specific range, you should make sure that dY doesn't violate that constraint.] As the camera changes orientation, you also want its position to rotate around a to a new point c2. This means that a won't change from the perspective of the camera. I.e., M1(a-c1)==M2(a-c2).
So we solve for c2 (remember that the transpose of a rotation matrix is the same as the inverse):
M2TM1(a-c1)==(a-c2) =>
-M2TM1(a-c1)+a==c2
Now if we look at this as a transformation being applied to c1, then we can see that it is first negated, then translated by a, then rotated by M1, then rotated by M2T, negated again, and then translated by a again. These are transformations that graphics libraries are good at and they can all be squished into a single transformation matrix.
#Generic Human deserves credit for the answer, but here's code for it. Of course, you need to implement the function to validate a change in pitch before it's applied, but that's simple. This code probably has a couple typos since I haven't tried to compile:
case ENavigation::eRotatePoint: {
const double dX = (double)(m_oLastMousePos.x - roMousePos.x) * dRotatePixelScale * D3DX_PI;
double dY = (double)(m_oLastMousePos.y - roMousePos.y) * dRotatePixelScale * D3DX_PI;
dY = validatePitch(dY); // dY needs to be kept within bounds so that m_dRotationY is within bounds
D3DXMATRIX oRotationMatrix1; // The camera orientation before mouse-change
D3DXMatrixRotationYawPitchRoll(&oRotationMatrix1, m_dRotationX, m_dRotationY, 0.0);
D3DXMATRIX oRotationMatrix2; // The camera orientation after mouse-change
D3DXMatrixRotationYawPitchRoll(&oRotationMatrix2, m_dRotationX + dX, m_dRotationY + dY, 0.0);
D3DXMATRIX oRotationMatrix2Inv; // The inverse of the orientation
D3DXMatrixTranspose(&oRotationMatrix2Inv,&oRotationMatrix2); // Transpose is the same in this case
D3DXMATRIX oScaleMatrix; // Negative scaling matrix for negating the translation
D3DXMatrixScaling(&oScaleMatrix,-1,-1,-1);
D3DXMATRIX oTranslationMatrix; // Translation by the reference point
D3DXMatrixTranslation(&oTranslationMatrix,
m_oRotateAroundPos.x,m_oRotateAroundPos.y,m_oRotateAroundPos.z);
D3DXMATRIX oTransformMatrix; // The full transform for the eyePos.
// We assume the matrix multiply protects against variable aliasing
D3DXMatrixMultiply(&oTransformMatrix,&oScaleMatrix,&oTranslationMatrix);
D3DXMatrixMultiply(&oTransformMatrix,&oTransformMatrix,&oRotationMatrix1);
D3DXMatrixMultiply(&oTransformMatrix,&oTransformMatrix,&oRotationMatrix2Inv);
D3DXMatrixMultiply(&oTransformMatrix,&oTransformMatrix,&oScaleMatrix);
D3DXMatrixMultiply(&oTransformMatrix,&oTransformMatrix,&oTranslationMatrix);
D3DXVECTOR4 oEyeFinal;
D3DXVec3Transform(&oEyeFinal, &m_oEyePos, &oTransformMatrix);
m_oEyePos = D3DXVECTOR3(oEyeFinal.x, oEyeFinal.y, oEyeFinal.z)
// Increment rotation to keep the same relative look angles
RotateXAxis(dX);
RotateYAxis(dY);
break;
}
I think there is a much simpler solution that lets you sidestep all rotation issues.
Notation: A is the point we want to rotate around, C is the original camera location, M is the original camera rotation matrix that maps global coordinates to the camera's local viewport.
Make a note of the local coordinates of A, which are equal to A' = M × (A - C).
Rotate the camera like you would in normal "eye rotation" mode. Update the view matrix M so that it is modified to M2 and C remains unchanged.
Now we would like to find C2 such that A' = M2 × (A - C2).
This is easily done by the equation C2 = A - M2-1 × A'.
Voilà, the camera has been rotated and because the local coordinates of A are unchanged, A remains at the same location and the same scale and distance.
As an added bonus, the rotation behavior is now consistent between "eye rotation" and "point rotation" mode.
You rotate around the point by repeatedly applying small rotation matrices, this probably cause the drift (small precision errors add up) and I bet you will not really do a perfect circle after some time. Since the angles for the view use simple 1-dimension double, they have much less drift.
A possible fix would be to store a dedicated yaw/pitch and relative position from the point when you enter that view mode, and using those to do the math. This requires a bit more bookkeeping, since you need to update those when moving the camera. Note that it will also make the camera move if the point move, which I think is an improvement.
If I understand correctly, you are satisfied with the rotation component in the final matrix (save for inverted rotation controls in the problem #3), but not with the translation part, is that so?
The problem seems to come from the fact that you treating them differently: you are recalculating the rotation part from scratch every time, but accumulate the translation part (m_oEyePos). Other comments mention precision problems, but it's actually more significant than just FP precision: accumulating rotations from small yaw/pitch values is simply not the same---mathematically---as making one big rotation from the accumulated yaw/pitch. Hence the rotation/translation discrepancy. To fix this, try recalculating eye position from scratch simultaneously with the rotation part, similarly to how you find "oTarget = oEyePos + ...":
oEyePos = m_oRotateAroundPos - dist * D3DXVECTOR3(oForward4.x, oForward4.y, oForward4.z)
dist can be fixed or calculated from the old eye position. That will keep the rotation point in the screen center; in the more general case (which you are interested in), -dist * oForward here should be replaced by the old/initial m_oEyePos - m_oRotateAroundPos multiplied by the old/initial camera rotation to bring it to the camera space (finding a constant offset vector in camera's coordinate system), then multiplied by the inverted new camera rotation to get the new direction in the world.
This will, of course, be subject to gimbal lock when the pitch is straight up or down. You'll need to define precisely what behavior you expect in these cases to solve this part. On the other hand, locking at m_dRotationX=0 or =pi is rather strange (this is yaw, not pitch, right?) and might be related to the above.
I'm making a software rasterizer for school, and I'm using an unusual rendering method instead of traditional matrix calculations. It's based on a pinhole camera. I have a few points in 3D space, and I convert them to 2D screen coordinates by taking the distance between it and the camera and normalizing it
Vec3 ray_to_camera = (a_Point - plane_pos).Normalize();
This gives me a directional vector towards the camera. I then turn that direction into a ray by placing the ray's origin on the camera and performing a ray-plane intersection with a plane slightly behind the camera.
Vec3 plane_pos = m_Position + (m_Direction * m_ScreenDistance);
float dot = ray_to_camera.GetDotProduct(m_Direction);
if (dot < 0)
{
float time = (-m_ScreenDistance - plane_pos.GetDotProduct(m_Direction)) / dot;
// if time is smaller than 0 the ray is either parallel to the plane or misses it
if (time >= 0)
{
// retrieving the actual intersection point
a_Point -= (m_Direction * ((a_Point - plane_pos).GetDotProduct(m_Direction)));
// subtracting the plane origin from the intersection point
// puts the point at world origin (0, 0, 0)
Vec3 sub = a_Point - plane_pos;
// the axes are calculated by saying the directional vector of the camera
// is the new z axis
projected.x = sub.GetDotProduct(m_Axis[0]);
projected.y = sub.GetDotProduct(m_Axis[1]);
}
}
This works wonderful, but I'm wondering: can the algorithm be made any faster? Right now, for every triangle in the scene, I have to calculate three normals.
float length = 1 / sqrtf(GetSquaredLength());
x *= length;
y *= length;
z *= length;
Even with a fast reciprocal square root approximation (1 / sqrt(x)) that's going to be very demanding.
My questions are thus:
Is there a good way to approximate the three normals?
What is this rendering technique called?
Can the three vertex points be approximated using the normal of the centroid? ((v0 + v1 + v2) / 3)
Thanks in advance.
P.S. "You will build a fully functional software rasterizer in the next seven weeks with the help of an expert in this field. Begin." I ADORE my education. :)
EDIT:
Vec2 projected;
// the plane is behind the camera
Vec3 plane_pos = m_Position + (m_Direction * m_ScreenDistance);
float scale = m_ScreenDistance / (m_Position - plane_pos).GetSquaredLength();
// times -100 because of the squared length instead of the length
// (which would involve a squared root)
projected.x = a_Point.GetDotProduct(m_Axis[0]).x * scale * -100;
projected.y = a_Point.GetDotProduct(m_Axis[1]).y * scale * -100;
return projected;
This returns the correct results, however the model is now independent of the camera position. :(
It's a lot shorter and faster though!
This is called a ray-tracer - a rather typical assignment for a first computer graphics course* - and you can find a lot of interesting implementation details on the classic Foley/Van Damm textbook (Computer Graphics Principes and Practice). I strongly suggest you buy/borrow this textbook and read it carefully.
*Just wait until you get started on reflections and refraction... Now the fun begins!
It is difficult to understand exactly what your code doing, because it seems to be performing a lot of redundant operations! However, if I understand what you say you're trying to do, you are:
finding the vector from the pinhole to the point
normalizing it
projecting backwards along the normalized vector to an "image plane" (behind the pinhole, natch!)
finding the vector to this point from a central point on the image plane
doing dot products on the result with "axis" vectors to find the x and y screen coordinates
If the above description represents your intentions, then the normalization should be redundant -- you shouldn't have to do it at all! If removing the normalization gives you bad results, you are probably doing something slightly different from your stated plan... in other words, it seems likely that you have confused yourself along with me, and that the normalization step is "fixing" it to the extent that it looks good enough in your test cases, even though it probably still isn't doing quite what you want it to.
The overall problem, I think, is that your code is massively overengineered: you are writing all your high-level vector algebra as code to be executed in the inner loop. The way to optimize this is to work out all your vector algebra on paper, find the simplest expression possible for your inner loop, and precompute all the necessary constants for this at camera setup time. The pinhole camera specs would only be the inputs to the camera setup routine.
Unfortunately, unless I miss my guess, this should reduce your pinhole camera to the traditional, boring old matrix calculations. (ray tracing does make it easy to do cool nonstandard camera stuff -- but what you describe should end up perfectly standard...)
Your code is a little unclear to me (plane_pos?), but it does seem that you could cut out some unnecessary calculation.
Instead of normalizing the ray (scaling it to length 1), why not scale it so that the z component is equal to the distance from the camera to the plane-- in fact, scale x and y by this factor, you don't need z.
float scale = distance_to_plane/z;
x *= scale;
y *= scale;
This will give the x and y coordinates on the plane, no sqrt(), no dot products.
Well, off the bat, you can calculate normals for every triangle when your program starts up. Then when you're actually running, you just have to access the normals. This sort of startup calculation to save costs later tends to happen a lot in graphics. This is why we have large loading screens in a lot of our video games!