2D Matrix Transformation (with a Player and Ground) - c++

I have a simple game that I'm trying to do for learning purposes, but Matrices are a bit hard, especially in DirectX.
I currently have a tilesystem that renders tiles at the screen and a character position at the center (on startup)
When the player move to the left, the ground should move to the right so he's always at the center of the screen. I use a Sprite to transform the Tiles and the player, but having trouble because the ground is moving and the player stands still on the ground still positioned at the center.
This is a topdown 2D game so I only need to transform Position (and perhaps rotation)
My Camera class has this method:
D3DXMatrixTransformation2D(&View, NULL, 0, &Scale, &Center, D3DXToRadian(Rotation), &Position);
I've also tried (for the Camera):
D3DXMatrixTranslation(&View, Position.x, Position.y, 0);
then when I render the ground sprite I set it to transform to the camera, but the Player has it's own Matrix that I use when he's moving.
Exactly like the one above but with his position, rotation and such...
What am I missing and what do I need?
I only have a tilesystem that fits the width/height of the screen so I should see when he's moving but he's standing still at the center and the ground and the player is moving.
Do I need to invert the matrix so that the ground moves in the opposite direction?
Only the player moves here and the world transforms around him.
I must say that being the number #1 language in game programming there's very little guides that explains simple things as "TopDown Camera in 2D"...

For your problem you should read something about the different spaces: object space, world space and camera space. (Mostly referred in relation to a 3D-space, but I'll try to explain it for 2D)
Object space
Each object should be modelled or in your case drawn to be in objectspace. This means that for each object their is a matrix, which determine the center, e.g. of your player.
World space
This is your game world. It describes the position and orientation of each object in your scene.
Camera space
This describes the current view at your game world. The camera is assumed to be fix at the middle of your screen and you move the whole world to fit to the current viewpoint (e.g. translating with the negative position).
For your case it means, that you need a matrix for each object, which describes the offset that the image is correctly centered at the position (objectspace). Then you need to transform the object into the world space. Therefore you need a matrix with the position and orientation of the object and multiply it with the objectmatrix. Finally you have to take the viewpoint into account and multiply the cameramatrix, so that the object are transformed from worldspace to viewspace. The resulting matrix is the one you use for rendering.
Hope that helps :)

With the help of people at the chat I realized what the problem was with my system, altough I tried similiar things but probably in the wrong way / order.
For a simple 2D TopDown camera with just positioning:
Camera2D Translation:
D3DXMatrixTranslation(&CameraView, Position.x, Position.y, 0);
For each object in the camera's view:
D3DXMATRIX ObjectWorld;
D3DXMatrixTransformation2D(&ObjectWorld, NULL, 0, NULL, NULL, 0, &Position);
D3DXMATRIX Translation;
D3DXMatrixMultiply(&Translation, ObjectWorld, CameraView);
and then to render the sprite:
sprite->SetTransform(&Translation);
sprite->Draw(Texture, NULL, NULL, NULL, Color);
altough this method requires me to move the camera in the opposite direction that the player is moving, so if player is moving +100, the camera should move -100 to still be positioned at the same location relative to the player.

Related

Get screen position of the center of a mesh

My goal is to create an intuitive 3D manipulator to handle rotations of meshes displayed in my 3D editor, made with Qt / QML.
To do that, when the user clicks on an entity, 3 tori are spawned around the mesh, representing the euler angles the user can act on. If the user then clicks on one torus, I want him to be able to rotate the mesh by dragging the mouse. The natural way users seem to do that is by dragging the mouse around the torus in the direction they want the mesh to rotate.
I therefore need a way to know how the user is rotating his mouse. I thought of a way: when the user clicks on the torus, I retrieve the position of the center of the torus. Then, I translate this world position to its screen position. Then, I monitor the angle between the cursor of the mouse and the center of the torus. The evolution of this angle should tell me everything I need: if the angle increases clockwise, the mesh should rotate clockwise and vice versa. This solution should yield a result good enough for my application, since it won't depend on the angle of the camera, or only very minimally.
However, I can't find a way to translate a world position to its screen position with Qt. I found the function QVector3D::project(const QMatrix4x4 &modelView, const QMatrix4x4 &projection, const QRect &viewport), but its documentation is very scarce and I couldn't find anyone using it... I might have found what to feed in for the projection argument (the projectionMatrix property from QCamera, here https://doc.qt.io/qt-5/qml-qt3d-render-camera.html), but that's it. What is the modelView ? And viewport ? Is it simply QRect(0, 0, 1920, 1080) ?
If anyone have any kind of lead, it would be amazing, I can't find anything anywhere and I'm kind of losing hope now. Or maybe another, simpler, solution to my problem ? Please note that the user can also freely move the camera around the mesh, which adds in complexity.
Thanks a lot for your time, and have a nice day !
Yes, you should be able to translate from world position to screen position using the mentioned function. You are correct about the projection argument. As for the modelView argument, you should use viewMatrix property from QCamera, which is missing from the official documentation, but it works for me. The viewport parameter represents the dimensions of the part of the screen, you are projecting on. You could use QRect(0, 0, 1920, 1080) if you use full screen Full HD projection, otherwise use something like QRect(QPoint(0, 0), view->size()), where view is the wigdet or window with your 3D image. Be careful, that the resulting screen position will have y = 0 being down and positive values being above, which the opposite to usual screen coordinates.

What are we trying to achieve by creating a right and down vectors in this ray tracing depiction?

Often, we see the following picture when talking about ray tracing.
Here, I see the Z axis as the sort of direction if the camera pointed straight ahead, and the XY grid as the grid that the camera is seeing here. From the camera's point of view, we see the usual Cartesian grid me and my classmates are used to.
Recently I was examining code that simulates this. One thing that is not obvious from this picture to me is the requirement for the "right" and "down" vectors. Obviously we have look_at, which shows where the camera is looking. And campos is where the camera is located. But why do we need camright and camdown? What are we trying to construct?
Vect X (1, 0, 0);
Vect Y (0, 1, 0);
Vect Z (0, 0, 1);
Vect campos (3, 1.5, -4);
Vect look_at (0, 0, 0);
Vect diff_btw (
campos.getVectX() - look_at.getVectX(),
campos.getVectY() - look_at.getVectY(),
campos.getVectZ() - look_at.getVectZ()
);
Vect camdir = diff_btw.negative().normalize();
Vect camright = Y.crossProduct(camdir);
Vect camdown = camright.crossProduct(camdir);
Camera scene_cam (campos, camdir, camright, camdown);
I was searching about this question recently and found this post as well: Setting the up vector in the camera setup in a ray tracer
Here, the answerer says this: "My answer assumes that the XY plane is the "ground" in world space and that Z is the altitude. Imagine standing on the floor holding a camera. It's position has a positive Z component, and it's view vector is nearly parallel to the ground. (In my diagram, it's pointing slightly upwards.) The plane of the camera's film (the uv grid) is perpendicular to the view grid. A common approach is to transform everything until the film plane coincides with the XY plane and the camera points along the Z axis. That's equivalent, but not what I'm describing here."
I'm not entirely sure why "transformations" are necessary.. How is this point of view different from the picture at the top? Here they also say that they need an "up" vector and "right" vector to "construct an image plane". I'm not sure what an image plane is..
Could someone explain better the relationship between the physical representation and code representation?
How do you know that you always want the camera's "up" to be aligned with the vertical lines in the grid in your image?
Trying to explain it another way: The grid is not really there. That imaginary grid is the result of the calculations of camera's directional vectors and the resolution you are rendering in. The grid is not what decides the camera angle.
When you are holding a physical camera in your hand, like the camera in the cell phone, don't you ever rotate the camera little bit for effect? Or when filming, you may want to slowly rotate the camera? Have you not seen any movies where the camera is rotated?
In the same way, you may want to rotate the "camera" in your ray traced world. And rotating only the camera is much easier than rotating all your objects in the scene(may be millions!)
Check out the example of rotating the camera from the movie Ice Age here:
https://youtu.be/22qniGtZhZ8?t=61
The (up or down) and right vectors constructs the plane you project the scene onto. Since the scene is in 3D you need to project the scene onto a 2D scene in order to render a picture to display on your screen.
If you have the camera position and direction you still don't know whether you're holding the camera right-side up, upside down, or tilted to the left and right.
Using camera position, lookat, up (down) and right vectors we can uniquely define the 3D scene is projected into a 2D picture.
Concretely, if you look at the code and the picture. The 3D scene are the objects displayed. The image/projection plane is the grid infront of the camera. It's orientation is defined by the the camright and camdir vectors (because we are assuming the cameras line of sight is perpendicular to camdir, camdown is uniquely defined by the other two).
The placement of the grid is based on the camera's position and intrinsic properties (it's not being displayed here, but the camera will have a specific field of view).

OpenGL - have object follow mouse

I want to have an object follow around my mouse on the screen in OpenGL. (I am also using GLEW, GLFW, and GLM). The best idea I've come up with is:
Get the coordinates within the window with glfwGetCursorPos.
The window was created with
window = glfwCreateWindow( 1024, 768, "Test", NULL, NULL);
and the code to get coordinates is
double xpos, ypos;
glfwGetCursorPos(window, &xpos, &ypos);
Next, I use GLM unproject, to get the coordinates in "object space"
glm::vec4 viewport = glm::vec4(0.0f, 0.0f, 1024.0f, 768.0f);
glm::vec3 pos = glm::vec3(xpos, ypos, 0.0f);
glm::vec3 un = glm::unProject(pos, View*Model, Projection, viewport);
There are two potential problems I can already see. The viewport is fine, as the initial x,y, coordinates of the lower left are indeed 0,0, and it's indeed a 1024*768 window. However, the position vector I create doesn't seem right. The Z coordinate should probably not be zero. However, glfwGetCursorPos returns 2D coordinates, and I don't know how to go from there to the 3D window coordinates, especially since I am not sure what the 3rd dimension of the window coordinates even means (since computer screens are 2D). Then, I am not sure if I am using unproject correctly. Assume the View, Model, Projection matrices are all OK. If I passed in the correct position vector in Window coordinates, does the unproject call give me the coordinates in Object coordinates? I think it does, but the documentation is not clear.
Finally, to each vertex of the object I want to follow the mouse around, I just increment the x coordinate by un[0], the y coordinate by -un[1], and the z coordinate by un[2]. However, since my position vector that is being unprojected is likely wrong, this is not giving good results; the object does move as my mouse moves, but it is offset quite a bit (i.e. moving the mouse a lot doesn't move the object that much, and the z coordinate is very large). I actually found that the z coordinate un[2] is always the same value no matter where my mouse is, possibly because the position vector I pass into unproject always has a value of 0.0 for z. However, varying the last coordinate in the position vector doesn't fix the fact that the object moves much slower than the mouse.
I think your main issue is actually the z coordinate. When you consider a point on the screen, this will not just specify a point in object space, but a straight line. When you use a persepctive projection, you can draw a line from the eye position to any object space point, and every point on such a line will be projected to the same screen-space point.
So what you get when you unproject with z=0 is actually the point on the near plane. Now it will depend on how far your object is away from the camera, as sketched here in top view.
What you get is point A in coordinates relative to the object space origin. You could get point B back if you just read the Z value from the pixel under the mouse curser back from the depth buffer.
However, I think that neither point A or B are really helping you. You need some further constraint. For example, you could specify that the distance of the object is not changed, and the pixel the mouse is pointing at shall be the point where the object center (or any other reference point in object space) should appear. Then, you could just compute the point on the ray you have to move the point to. But it is not really clear what kind of movement you actually want to implement.
As a side note: Manually adjusting the vertex coordinates is not a good idea. The GPU can do this far better. You should just manipulate the model matrix to move the object around. And it would be useful in such a scheme not to project the point or ray into the object space, but into world space, and use that to update the model matrix of the object.

How do I implement basic camera operations in OpenGL?

I'm trying to implement an application using OpenGL and I need to implement the basic camera movements: orbit, pan and zoom.
To make it a little clearer, I need Maya-like camera control. Due to the nature of the application, I can't use the good ol' "transform the scene to make it look like the camera moves". So I'm stuck using transform matrices, gluLookAt, and such.
Zoom I know is dead easy, I just have to hook to the depth component of the eye vector (gluLookAt), but I'm not quite sure how to implement the other two, pan and orbit. Has anyone ever done this?
I can't use the good ol' "transform the scene to make it look like the camera moves"
OpenGL has no camera. So you'll end up doing exactly this.
Zoom I know is dead easy, I just have to hook to the depth component of the eye vector (gluLookAt),
This is not a Zoom, this is a Dolly. Zooming means varying the limits of the projection volume, i.e. the extents of a ortho projection, or the field of view of a perspective.
gluLookAt, which you've already run into, is your solution. First three arguments are the camera's position (x,y,z), next three are the camera's center (the point it's looking at), and the final three are the up vector (usually (0,1,0)), which defines the camera's y-z plane.*
It's pretty simple: you just glLoadIdentity();, call gluLookAt(...), and then draw your scene as normally. Personally, I always do all the calculations in the CPU myself. I find that orbiting a point is an extremely common task. My template C/C++ code uses spherical coordinates and looks like:
double camera_center[3] = {0.0,0.0,0.0};
double camera_radius = 4.0;
double camera_rot[2] = {0.0,0.0};
double camera_pos[3] = {
camera_center[0] + camera_radius*cos(radians(camera_rot[0]))*cos(radians(camera_rot[1])),
camera_center[1] + camera_radius* sin(radians(camera_rot[1])),
camera_center[2] + camera_radius*sin(radians(camera_rot[0]))*cos(radians(camera_rot[1]))
};
gluLookAt(
camera_pos[0], camera_pos[1], camera_pos[2],
camera_center[0],camera_center[1],camera_center[2],
0,1,0
);
Clearly you can adjust camera_radius, which will change the "zoom" of the camera, camera_rot, which will change the rotation of the camera about its axes, or camera_center, which will change the point about which the camera orbits.
*The only other tricky bit is learning exactly what all that means. To clarify, because the internet is lacking:
The position is the (x,y,z) position of the camera. Pretty straightforward.
The center is the (x,y,z) point the camera is focusing at. You're basically looking along an imaginary ray from the position to the center.
Now, your camera could still be looking any direction around this vector (e.g., it could be upsidedown, but still looking along the same direction). The up vector is a vector, not a position. It, along with that imaginary vector from the position to the center, form a plane. This is the camera's y-z plane.

How does zooming, panning and rotating work?

Using OpenGL I'm attempting to draw a primitive map of my campus.
Can anyone explain to me how panning, zooming and rotating is usually implemented?
For example, with panning and zooming, is that simply me adjusting my viewport? So I plot and draw all my lines that compose my map, and then as the user clicks and drags it adjusts my viewport?
For panning, does it shift the x/y values of my viewport and for zooming does it increase/decrease my viewport by some amount? What about for rotation?
For rotation, do I have to do affine transforms for each polyline that represents my campus map? Won't this be expensive to do on the fly on a decent sized map?
Or, is the viewport left the same and panning/zooming/rotation is done in some otherway?
For example, if you go to this link you'll see him describe panning and zooming exactly how I have above, by modifying the viewport.
Is this not correct?
They're achieved by applying a series of glTranslate, glRotate commands (that represent camera position and orientation) before drawing the scene. (technically, you're rotating the whole scene!)
There are utility functions like gluLookAt which sorta abstract some details about this.
To simplyify things, assume you have two vectors representing your camera: position and direction.
gluLookAt takes the position, destination, and up vector.
If you implement a vector class, destinaion = position + direction should give you a destination point.
Again to make things simple, you can assume the up vector to always be (0,1,0)
Then, before rendering anything in your scene, load the identity matrix and call gluLookAt
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt( source.x, source.y, source.z, destination.x, destination.y, destination.z, 0, 1, 0 );
Then start drawing your objects
You can let the user span by changing the position slightly to the right or to the left. Rotation is a bit more complicated as you have to rotate the direction vector. Assuming that what you're rotating is the camera, not some object in the scene.
One problem is, if you only have a direction vector "forward" how do you move it? where is the right and left?
My approach in this case is to just take the cross product of "direction" and (0,1,0).
Now you can move the camera to the left and to the right using something like:
position = position + right * amount; //amount < 0 moves to the left
You can move forward using the "direction vector", but IMO it's better to restrict movement to a horizontal plane, so get the forward vector the same way we got the right vector:
forward = cross( up, right )
To be honest, this is somewhat of a hackish approach.
The proper approach is to use a more "sophisticated" data structure to represent the "orientation" of the camera, not just the forward direction. However, since you're just starting out, it's good to take things one step at a time.
All of these "actions" can be achieved using model-view matrix transformation functions. You should read about glTranslatef (panning), glScalef (zoom), glRotatef (rotation). You also should need to read some basic tutorial about OpenGL, you might find this link useful.
Generally there are three steps that are applied whenever you reference any point in 3d space within opengl.
Given a Local point
Local -> World Transform
World -> Camera Transform
Camera -> Screen Transform (usually a projection. depends on if you're using perspective or orthogonal)
Each of these transforms is taking your 3d point, and multiplying by a matrix.
When you are rotating the camera, it is generally changing the world -> camera transform by multiplying the transform matrix by your rotation/pan/zoom affine transformation. Since all of your points are re-rendered each frame, the new matrix gets applied to your points, and it gives the appearance of a rotation.