Transforming an object between two coordinate spaces - c++

So I'm reading the "3D Math Primer For Graphics And Game Development" book, coming from pretty much a non-math background I'm finally starting to grasp vector/matrix math - which is a relief.
But, yes there's always a but, I'm having trouble understand the translation of an object from one coordinate space to another. In the book the author takes an example with gun shooting at a car (image) that is turned 20 degrees (just a 2D space for simplicity) in "world space". So we have three spaces: World Space, Gun Object Space and Car Object Space - correct? The book then states this:
"In this figure, we have introduced a rifle that is firing a bullet at the car. As indicated by the
coordinate space on the left, we would normally begin by knowing about the gun and the trajectory
of the bullet in world space. Now, imagine transforming the coordinate space in line with the
car’s object space while keeping the car, the gun, and the trajectory of the bullet still. Now we
know the position of the gun and the trajectory of the bullet in the object space of the car, and we
could perform intersection tests to see if and where the bullet would hit the car."
And I follow this explanation, and when I beforehand know that the car is rotated 20* degrees in world space this isn't a problem - but how does this translate into a situation say when I have an archer in a game shooting from a hill down on someone else? I don't know the angle at which everything is displaced there?
And which object space is rotated here? The World or Gun space? Yeah as you can see I'm a bit confused.
I think the ideal response would be using the car and gun example using arbitrary variables for positions, angle, etc.

You should read how to change basis and think in vector, not arrays but the math ones :P

I used to be a game programmer and I did that time after time. Eventually, I got away from using angles. For every object, I had a forward-facing vector and an up vector. You can get the right-facing vector, then, from a cross-product. And all the conversions between spaces become dot products.

Do you understand how the notion of how coordinate spaces and transforms work in 2D? I find that coordinate spaces and transforms are a lot easier to visualize in 2D before trying to move to 3D. That way you can work "what-if" scenarios out on paper, and helps you to just grok the major concepts.
In the image you posted I think the interpretation is that the car itself has not changed in its internal coordinate system, but that its system has been rotated with respect to the World's system.
You have to understand that the car has its own local coordinate system. The geometry of the car is defined in terms of its local coordinate system. So the length of the car always extends along the x-axis in its own local system regardless of its orientation in the World. The car can be oriented by transforming its local coordinate system.
Coordinate systems are always defined relative to another system, except for the root, in this case the World. So the gun has its own system, the car has its own system and they are both embedded into the World's system. If I rotate or move the car's system with respect to the World then the car will appear to rotate even though the geometry is unchanged.
This is something that is very hard to explain without being able to draw out visual scenarios and my google-fu is failing to find good descriptions of the basics.

As a previous reply suggests, keeping an up, forward and right vector is a good way to define a (Euclidean) coordinate space. Its even better if you add an origin as well, since you can represent a wider range of spaces.
Lets say we have two spaces A and B, in A, up, forward and right are (0,1,0), (0,0,1) and (1,0,0) respectively, and the origin is at zero this gives the usual left-handed xyz coordinates for A. Say for B we have u=(ux,uy,uz), f=(fx,fy,fz) and r=(rx,ry,rz) with origin o = (ox,oy,oz). Then for a point at p = (x,y,z) in B we have in A (x*rx + y*ux + z*fx + ox, x*ry + y*uy + z*fy + oy, x*rz + y*uz + z*fz + oz).
This can be arrived at by inspection. Observe that, since the right, up and forward vectors for B have components in each axis of A, a component of some coordinates in B must contribute to all three components of the coordinates in A. i.e. since (0,1,0) in B is equal to (ux,uy,uz), then (x,y,z) = y*u + (some other stuff). If we do this for each coordinate we have that (x,y,z) = x*r + y*u + z*f + (some other stuff). If we make the observation that the at the origin these terms vanish except for (some other stuff) then we realise that (some other stuff) must in fact be o, which gives the coordinates in A as x*r + y*u + z*f + o, which is (x*rx + y*ux + z*fx + ox, x*ry + y*uy + z*fy + oy, x*rz + y*uz + z*fz + oz) once the vector operations are expanded.
This operation can be reversed as well, we just set the coordinates in A and solve equations to find them in B. e.g. (1,1,1) in A is equal to x*r + y*u + z*f + o in B. This gives three equations in three unknowns and can be solved by the method of simultaneous equations. I won't bother explaining that here... but here is a link if you get stuck: link
How does all of this relate to your original example of a bullet and a car? Well, if you rotate a set of up/right/forward vectors with the car, and update the origin as the car is translated you can move from world space to the car's local space and make some tests easier. e.g instead of transforming vertices for a collision model, you can transform the bullet into 'car local' space and use the local coordinates. This is handy if you are going to transform the car's vertices for rendering on a GPU, but don't want to suffer the overhead of reading that information back to use for physics calculations on the CPU.
In other uses it can save you transforming x points by transforming three points and performing these operations instead, this allows you to combine x transformations on a large number of points without a significant performance hit over a single transformation across the same number of points.

In a game situation generally you wouldn't know the car was rotated 20 degrees, per se; instead your positioning information for the car would implicitly contain that knowledge. So in this two dimensional example, you'd know the x,y coordinates of the center of the car and x,y vector the car is pointing (both pieces of information in the world space) -- otherwise you wouldn't be able to draw it. Those two pieces of information are all you need to find the matrix to transform between world space and the car's object space. (And then a person could look at that matrix in this example and say, oh, look, rotation by 20 degrees -- but that's not a piece of information you'd normally worry about in the game.)
The problem of the gun and the car can be solved in any of the three spaces. So the question is, which is it easiest in? Presumably the gun's space is set up so that the bullet is fired down the X axis. So it's easy to translate that into either of the other spaces. A 2D car is probably going to be represented in its own object space -- maybe as a set of 2D line segments or 2D pixels or something. You certainly could translate those into world space or the gun's object space, but if you solve the problem in car object space you don't have to translate them at all, so that's the easiest one to work in for this problem.
It's sort of like relativity: from its own perspective, none of the spaces are rotated. Unlike relativity, though, we treat the world space as a privileged fixed frame of reference. So the objects' model spaces are rotated, mirrored, scaled, translated, etc with respect to the world space.

Related

What is the standard place to keep the Model Matrix?

I have a "3D engine" which has a single model matrix.
All of my 3D objects uses this model matrix (for transformations stuff).
For each object i set the model identity before using it.
So far so great as it appears to be working just fine, and as expected.
But now i am suddenly wondering if i have a design flaw.
Should each 3D object (the base object) have their own model matrix?
Is there any value in doing it this way (model matrix per 3D object)?
That is literally what the point of the model matrix is. It is an optional transformation for each object you draw from object-space (its local coordinate system) to world-space (a shared coordinate system). You can define your objects such that their positions are already in world-space and then no model matrix is necessary (as you are doing if you use an identity model matrix).
GL uses (or at least historically, it did) a matrix stack for this and it is technically the very reason it uses column-major matrices and post-multiplication. That allows you to push/pop your local coordinate transformations to the top of the stack immediately before/after you draw an object while leaving the other transformations that rarely change such as world->eye (view matrix) or eye->clip (projection matrix) untouched.
Elaborating in a separate answer because a comment is too short, but Andon's answer is the right one.
Think for instance of loading two 3d meshes done from two different artists. ¹
The first artist assumed that 1 unit in your model space is 1 meter, while the other artist assumed 1 inch.
So, you load a car mesh done by the first artist, which maybe is 3 units long; and a banana mesh, which then is 8 units long.
Also, the first artist decided to put the origin of the points for the car mesh in its center, while the artist who did the banana mesh put the banana lying on the X axis, with the base of the fruit on the X=2000 point.
So how do you show both of this meshes in your 3d world? Are you going to have a banana whose length is almost thrice the length of your car? That makes absolutely no sense.
How are you going to place them next to each other? Or how are you going to place them lying on a plane? The fact that the local coordinate systems are totally random makes it impossible to translate your objects in a coherent way.
This is where the model->world matrix comes in.
It allows you to specify a per-model transformation that brings all models into a "shared", unique, coherent space -- the world space.
For instance, you could translate both models so that their origin would lie "in a corner" and all the mesh's vertices in the first octant. Also, if your world space uses "meters", you would need to scale the banana mesh by 0.0254 to bring its size in meters as well. Also, maybe you'd like to rotate the banana and having it lying on the Y axis instead of the X.
At the end of the game, for each "unique model" in your world, you'd have its associated model matrix, and use it when drawing it.
Also: for each instance of a model, you could think of applying an extra local trasformation (in world coordinates). For instance, you'd like to translate the car model to a certain point in the world. So instead of having
(Model space) --> Model matrix ---> (World space) ---> World matrix ---> (Final world space)
you could multiply the two matrices together and having only one matrix that brings points from model space straight to final world space.
¹ This point is a bit moot in that in any proper asset pipeline the first thing you'd do would be bringing all the models in a coherent coordinate system, just doing an example...

my plane is not vertical, How to update coordinate of point cloud to lie on a vertical plane

I have a bunch of points lying on a vertical plane. In reality this plane
should be exactly vertical. But, when I visualize the point cloud, there is a
slight inclination (nearly 2 degrees) from the verticality. At the moment, I can calculate
this inclination only. Concerning other errors, I assume there are no
shifts or something like that.
So, I want to update coordinates of my point data so that they lie on the vertical plane. I think, I should do some kind of transformation. It may be only via rotation along X-axis. Not sure what it would be.
I guess, you understood my question. Honestly, I am poor at
mathematics. So, please let me know how to update my point coordinates
to lie on the exact vertical plane.
Note: AS I am implementing this in c++ and there are many programmers who have sound knowledge on these things, I am posting this question under c++.
UPDATES
If I say exactly what I have done so far;
I have point cloud data representing a vertical object + its surroundings things. (The data is collected by a moving scanner and may have axes deviations from the correct world axes). The problem is, I cannot say exactly that there is an error on my data or not. Therefore, I checked this with a vertical planar object (which is the dominated object in my data as well). In reality that plane is truly vertical. But, when I fit a plane by removing outliers, then that plane is not truly vertical and has nearly 2 degree inclination. Therefore, I am suspecting that my data has some error. So I want to update all my point clouds (including points on the plane and points which represent other objects) in a way to lay that particular planar points exactly on the vertical plane. Then, I guess, all the points will be updated into their correct positions as in the reality. That is all (x,y,z) coordinates should be updated.
As an example please refer the below figure.
left-represents original point cloud (as you can see, points themselves are not vertical) and back line tells the vertical plane which I fitted and red is the zenith line. as you can see, there is an inclination of the vertical plane.
So, I want to update whole my data in the right figure. then, after updating if i fit a plane again (removing outliers), then it is exactly parallel to the zenith line. please help me.
I may be able to help you out, considering I worked with planes recently. First of all, how come the points aren't coplanar from the get go? I'd make the points coplanar in the first place instead of them being at an inclination (from what origin?), and then having to fix them. Also, having the points be coplanar on your first go would increase efficiency.
Sorry if this is the answer you're not looking for, but I need more information before I can help you out. Also, 3D math is hard. If you work with it enough, it starts to get pounded into your head, where you will NEVER forget it, especially if you went through the headaches I had to go through.
I did a bit of thinking on it, and since you want to rotate along the x-axis, your rotation will be done on the xz-plane, which means we can make this a 2D problem. After doing a bit of research on Wikipedia, this may be your solution.
new z = ((x - intended x) * sin(angle)) + (z * cos(angle)) + intended x
What I'm doing here is subtracting our intended x value from our current x value, so that we make (intended x, 0) our point of origin to rotate around. After the point is rotated, I add (intended x, 0) back to our coordinate so that we get the correct result.
Depending on where you got your points from (some kind of measurement, I guess) and what you want to do with them, there are several different things you could do with your data.
The search keyword "regression plane" might help - there are several ways of finding planes approximating point clouds, and several ways to "snap" points to planes.
Edit: You want to apply a rotation around the axis defined by the cross product of the normal vector on your regression plane and the normal of your desired plane, and a point your choice. From your illustration I take it that you probably want the bottom of your vertical planar object to be the point of reference for the rotation.
So you've got your point of reference, you now the axis around which you want to rotate, and the angle. All you need to do is:
Translation (to get to your point of reference)
Rotation
I read your question again, and hopefully this answer will help you out. If there's anything else I need to know, please tell me.
Now, In order to rotate anything, there must be a center point to rotate around. Now you've already been able to detect the angle of inclination, so now we need a formula for rotating a point a certain angle around an origin. In addition, since this problem only occurs on a 2D plane, we can use this basic formula to readjust the points. For any two axis x and y:
Theta is the angle that you will rotate around in a counter-clockwise direction. x' and y' are your new points. x.origin and y.origin are the coordinates for the point you will be going around. Now I don't know if my math is 100% correct on this but if it's not, hopefully you can change a thing or two and it will work.

Should Euler rotations be stored as three matrices or one matrix?

I am trying to create a simple matrix library in C++ that I will hopefully be able to use in game development afterwards.
I have the basic implementation done, but I have just realized a problem with storing only one matrix per object: the rotation order will get mixed up fairly quickly.
To the best of my knowledge: AB != BA
Therefore, if I am continually multiplying arbitrary rotations to my matrix, than the rotation will get mixed up, correct? In my case, I need to rotate globally on the Y axis, and locally on the X axis (and locally on the Z axis would be nice as well). These seem like the qualities of the average first person shooter. So by "mixed up", I mean that if I go to rotate on the Y axis (or Z axis), then it will start rotating around the local X axis, instead of the intended axis (if that makes any sense).
So, these are the solutions I came up with:
Keep 3 Euler angles, and rebuild the matrix in the correct order when one angle changes
Keep 3 Matrices, one for each axis
Somehow destruct the matrix during multiplication, and reconstruct it properly afterwards (?)
Or am I worrying about nothing? Are my qualms false, and the order will somehow magically solve itself?
You are correct that the order of rotation matrices can be an issue here.
Especially if you use Euler angles, you can suffer from the issue of gimbal lock: let's say your first rotation is +90° positive "pitch", meaning you're looking straight upward; then if the next rotation is +45° "roll", then you're still just looking straight up. But if you do the rotations in the opposite order, you end up looking somewhere different altogether. (see the Wikipedia link for an illustration that makes this clearer.)
One common answer in game development is what you've got in (1): store the Euler angles independently, and then build the rotation matrix out of all three of them at once every time you want to get the object's orientation in world space.
Another common solution is to store rotation as an angle around a single axis, rather than as Euler angles. (That is often less convenient for animators and player motion.)
We also often use quaternions as a more efficient way of storing and combining rotations.
Each of the links above should take you to an article illustrating the relevant math. I also like Eric Lengyel's Mathematics for 3D Game Programming and Computer Graphics book, which explains this whole subject very well.
I don't know how other people usually do this, but I generally just store the angles, and then reconstruct a matrix if necessary.
You are right that if you had one matrix and kept multiplying something onto it, you would end up messing things up. But again, I don't think this is the route you probably want to take.
I don't know what sort of graphics system you want to be using, but with OpenGL, you don't even have to worry about the matrix representation (unless you're doing something super performance-critical), and can simply use some calls to glRotate and the like.

OpenGL LookAt function: is the up vector arbitrary?

I am trying to understand the glLookAt function.
It takes 3 triplets. The first is the eye position, the second is the point at which the eye stares. That point will appear in the center of my viewport, right? The third is the 'up' vector. I understand the meaning of the 'up' vector if it is perpendicular to the vector from eye to starepoint. The question is, is it allowed to specify other vectors for up, and, if yes, what's the meaning then?
A link to a graphical detailed explanation of gluPerstpective, glLookAt and glFrustum would be also much appreciated. The official OpenGL documentation appears not to be intended for newbies.
Please note that I understand the meaning of up vector when it is perpendicular to eye->object vector. The question is what is the meaning (if any), if it is not. I can't figure that out with playing with parameters.
It works as long as it is "sufficiently perpendicular" to the up vector. What matters is the plane between the up-vector and the look-at vector.
If these two become aligned the up-direction will be more or less random (based on the very small bits in your values), as a small adjustment of it will leave it pointing above/left/right of the look-at vector.
If they have a sufficiently large separating angle (in 32-bit floating point math) it will work well. This angle needs usually not be more than a degree or so, so they can be very close. But if the difference is down to a few bits, each changed bit will yield a huge direcitonal change.
It comes down to numerical precision.
(I'm sure there are more mathematical terms & definitions for this, but it's been a few years since college.. :)
final word: If the vectors are parallel, then the up-direction is completely undefined and you'll get a degenerate view matrix.
The up vector lets openGL know what way your have your camera.
Think in the really world, if you have to points in space, you can draw a line from one to the other. You can then align an object, such as a camera so that it points from one to the other. But you have no way of knowing how you object should be rotated around this axis that the line makes. The up vector dictates which direction the camera should be standing.
most of the time, your up vector will be (0,1,0) which means that the camera will be rotated just like you would normally hold a camera, or if you held your head up straight. if you set your up vector (1,0,0) it would be like holding your head on its side, so from the base of your head to the top of your head it pointing to the right. You are still looking from the same point (more or less) to the same point, but your 'up' has changed. A look vector of(0,-1,0) would make the camera be up side down, like if you where doing a hand stand.
One way you could think about this, your arm is a vector from the camera position (your shoulder) to the camera look at point (your index finger) if you stick you thumb out, this is your up vector.
This picture may help you http://images.gamedev.net/features/programming/oglch3excerpt/03fig11.jpg
EIDT
Perpendicular or not.
I see what you are asking now. example, you at (10,10,10) looking at (0,0,0) the resulting vector for your looking direction is (-10,-10,-10) the vector perpendicular to this does not matter for the purpose of you up vector glLookAt, if you wanted the view to orientated so that you are like a normal person just looking down a bit, just set you up vector to (0,1,0) In fact, unless you want to be able to roll the camera, you don't need this to be nay thing else.
In this website you have a great tutorial
http://www.xmission.com/~nate/tutors.html
http://users.polytech.unice.fr/~buffa/cours/synthese_image/DOCS/www.xmission.com/Nate/tutors.html
Download the executables and you can change the values of the parameters to the glLookAt function and see what happens "in real-time".
The up vector does not need to be perpendicular to the looking direction. As long as it is not parallel (or very close to being parallel) to the looking direction, you should be fine.
Given that you have a view plane normal, N (the looking direction) and a up vector (which mustn't be parallel to N), UV you calculate the actual up vector which will be used in the camera transform by first calculating the vector V = UV - (N * UV)N. V is in turn used to calculate the actual up vector used by creating a vector which is perpendicular to both N and V as U = N x V.
Yes. It is arbitrary, which lets you make the camera "roll", i.e. appear as if the scene is rotating around the eye axis.

How can I determine distance from an object in a video?

I have a video file recorded from the front of a moving vehicle. I am going to use OpenCV for object detection and recognition but I'm stuck on one aspect. How can I determine the distance from a recognized object.
I can know my current speed and real-world GPS position but that is all. I can't make any assumptions about the object I'm tracking. I am planning to use this to track and follow objects without colliding with them. Ideally I would like to use this data to derive the object's real-world position, which I could do if I could determine the distance from the camera to the object.
Your problem's quite standard in the field.
Firstly,
you need to calibrate your camera. This can be done offline (makes life much simpler) or online through self-calibration.
Calibrate it offline - please.
Secondly,
Once you have the calibration matrix of the camera K, determine the projection matrix of the camera in a successive scene (you need to use parallax as mentioned by others). This is described well in this OpenCV tutorial.
You'll have to use the GPS information to find the relative orientation between the cameras in the successive scenes (that might be problematic due to noise inherent in most GPS units), i.e. the R and t mentioned in the tutorial or the rotation and translation between the two cameras.
Once you've resolved all that, you'll have two projection matrices --- representations of the cameras at those successive scenes. Using one of these so-called camera matrices, you can "project" a 3D point M on the scene to the 2D image of the camera on to pixel coordinate m (as in the tutorial).
We will use this to triangulate the real 3D point from 2D points found in your video.
Thirdly,
use an interest point detector to track the same point in your video which lies on the object of interest. There are several detectors available, I recommend SURF since you have OpenCV which also has several other detectors like Shi-Tomasi corners, Harris, etc.
Fourthly,
Once you've tracked points of your object across the sequence and obtained the corresponding 2D pixel coordinates you must triangulate for the best fitting 3D point given your projection matrix and 2D points.
The above image nicely captures the uncertainty and how a best fitting 3D point is computed. Of course in your case, the cameras are probably in front of each other!
Finally,
Once you've obtained the 3D points on the object, you can easily compute the Euclidean distance between the camera center (which is the origin in most cases) and the point.
Note
This is obviously not easy stuff but it's not that hard either. I recommend Hartley and Zisserman's excellent book Multiple View Geometry which has described everything above in explicit detail with MATLAB code to boot.
Have fun and keep asking questions!
When you have moving video, you can use temporal parallax to determine the relative distance of objects. Parallax: (definition).
The effect would be the same we get with our eyes which which can gain depth perception by looking at the same object from slightly different angles. Since you are moving, you can use two successive video frames to get your slightly different angle.
Using parallax calculations, you can determine the relative size and distance of objects (relative to one another). But, if you want the absolute size and distance, you will need a known point of reference.
You will also need to know the speed and direction being traveled (as well as the video frame rate) in order to do the calculations. You might be able to derive the speed of the vehicle using the visual data but that adds another dimension of complexity.
The technology already exists. Satellites determine topographic prominence (height) by comparing multiple images taken over a short period of time. We use parallax to determine the distance of stars by taking photos of night sky at different points in earth's orbit around the sun. I was able to create 3-D images out of an airplane window by taking two photographs within short succession.
The exact technology and calculations (even if I knew them off the top of my head) are way outside the scope of discussing here. If I can find a decent reference, I will post it here.
You need to identify the same points in the same object on two different frames taken a known distance apart. Since you know the location of the camera in each frame, you have a baseline ( the vector between the two camera positions. Construct a triangle from the known baseline and the angles to the identified points. Trigonometry gives you the length of the unknown sides of the traingles for the known length of the baseline and the known angles between the baseline and the unknown sides.
You can use two cameras, or one camera taking successive shots. So, if your vehicle is moving a 1 m/s and you take fames every second, then successibe frames will gibe you a 1m baseline which should be good to measure the distance of objects up to, say, 5m away. If you need to range objects further away than the frames used need to be further apart - however more distant objects will in view for longer.
Observer at F1 sees target at T with angle a1 to velocity vector. Observer moves distance b to F2. Sees target at T with angle a2.
Required to find r1, range from target at F1
The trigonometric identity for cosine gives
Cos( 90 – a1 ) = x / r1 = c1
Cos( 90 - a2 ) = x / r2 = c2
Cos( a1 ) = (b + z) / r1 = c3
Cos( a2 ) = z / r2 = c4
x is distance to target orthogonal to observer’s velocity vector
z is distance from F2 to intersection with x
Solving for r1
r1 = b / ( c3 – c1 . c4 / c2 )
Two cameras so you can detect parallax. It's what humans do.
edit
Please see ravenspoint's answer for more detail. Also, keep in mind that a single camera with a splitter would probably suffice.
use stereo disparity maps. lots of implementations are afloat, here are some links:
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT11/node4.html
http://www.ece.ucsb.edu/~manj/ece181bS04/L14(morestereo).pdf
In you case you don't have stereo camera, but depth can be evaluated using video
http://www.springerlink.com/content/g0n11713444148l2/
I think the above will be what might help you the most.
research has progressed so far that depth can be evaluated ( though not to a satisfactory extend) from a single monocular image
http://www.cs.cornell.edu/~asaxena/learningdepth/
Someone please correct me if I'm wrong, but it seems to me that if you're going to simply use a single camera and simply relying on a software solution, any processing you might do would be prone to false positives. I highly doubt that there is any processing that could tell the difference between objects that really are at the perceived distance and those which only appear to be at that distance (like the "forced perspective") in movies.
Any chance you could add an ultrasonic sensor?
first, you should calibrate your camera so you can get the relation between the objects positions in the camera plan and their positions in the real world plan, if you are using a single camera you can use the "optical flow technic"
if you are using two cameras you can use the triangulation method to find the real position (it will be easy to find the distance of the objects) but the probem with the second method is the matching, which means how can you find the position of an object 'x' in camera 2 if you already know its position in camera 1, and here you can use the 'SIFT' algorithme.
i just gave you some keywords wish it could help you.
Put and object of known size in the cameras field of view. That way you can have a more objective metric to measure angular distances. Without a second viewpoint/camera you'll be limited to estimating size/distance but at least it won't be a complete guess.