Using glOrtho for scrolling game. Spaceship "shudders" - c++

I want to draw a spaceship in the centre of a window as it powers through space. To scroll the world window, I compute the new position of the ship and centre a 4000x3000 window around this using glOrtho. My test is to press the forward button and move the ship upwards. On most machines there is no problem. On a slower linux laptop, where I am getting only 30 frames per second, the ship shudders back and forth. Comparing its position relative to the mouse pointer (which does not move), the ship can clearly be seen jumping forward and back by a couple of pixels. The stars are also shown to be blurred into short lines.
I would like to query the pixel value of the centre of the ship to see if it is changing.
Is there an OpenGL way to supply a world point and get back the pixel point it will be transformed to?

I'd consider doing it the other way around. Keep the ship at 0,0 world coordinates, and move the world relative to it. Then you only need the glOrtho call when the size of the window changes (the camera is in a fixed position). Apart from not having to calculate the projection matrix every time, you also have the benefit that if your world ends up being massive in the future then you have the option of using double precision positions, since large offsets on floats results in inaccurate positioning.
To draw your space scene, you then manipulate the modelview matrix before drawing any objects, and use a different matrix when drawing the ship.
To get a pixel coordinate from a world coordinate, take the point and multiply it by the projection matrix multiplied by the modelview matrix (make sure you get the multiplication around the right way). You'll then have a value that has x and y in the ranges -1 to 1. You can add [1,1] and multiply by half the screen size to get the pixel position. (if you wanted, you could add this into the matrix transformation).

Related

Why does the camera face the negative end of the z-axis by default?

I am learning openGL from this scratchpixel, and here is a quote from the perspective project matrix chapter:
Cameras point along the world coordinate system negative z-axis so that when a point is converted from world space to camera space (and then later from camera space to screen space), if the point is to left of the world coordinate system y-axis, it will also map to the left of the camera coordinate system y-axis. In other words, we need the x-axis of the camera coordinate system to point to the right when the world coordinate system x-axis also points to the right; and the only way you can get that configuration, is by having camera looking down the negative z-axis.
I think it has something to do with the mirror image? but this explanation just confused me...why is the camera's coordinate by default does not coincide with the world coordinate(like every other 3D objects we created in openGL)? I mean, we will need to transform the camera coordinate anyway with a transformation matrix (whatever we want with the negative z set up, we can simulate it)...why bother?
It is totally arbitrary what to pick for z direction.
But your pick has a lot of deep impact.
One reason to stick with the GL -z way is that the culling of faces will match GL constant names like GL_FRONT. I'd advise just to roll with the tutorial.
Flipping the sign on just one axis also flips the "parity". So a front face becomes a back face. A znear depth test becomes zfar. So it is wise to pick one early on and stick with it.
By default, yes, it's "right hand" system (used in physics, for example). Your thumb is X-axis, index finger Y-axis, and when you make those go to right directions, Z-points (middle finger) to you. Why Z-axis has been selected to point inside/outside screen? Because then X- and Y-axes go on screen, like in 2D graphics.
But in reality, OpenGL has no preferred coordinate system. You can tweak it as you like. For example, if you are making maze game, you might want Y to go outside/inside screen (and Z upwards), so that you can move nicely at XY plane. You modify your view/perspective matrices, and you get it.
What is this "camera" you're talking about? In OpenGL there is no such thing as a "camera". All you've got is a two stage transformation chain:
vertex position → viewspace position (by modelview transform)
viewspace position → clipspace position (by projection transform)
To see why be default OpenGL is "looking down" -z, we have to look at what happens if both transformation steps do "nothing", i.e. full identity transform.
In that case all vertex positions passed to OpenGL are unchanged. X maps to window width, Y maps to window height. All calculations in OpenGL by default (you can change that) have been chosen adhere to the rules of a right hand coordinate system, so if +X points right and +Y points up, then Z+ must point "out of the screen" for the right hand rule to be consistent.
And that's all there is about it. No camera. Just linear transformations and the choice of using right handed coordinates.

Keeping OpenGL object a fixed size on screen

I'm trying to keep a simple cube a fixed size on screen no matter how far it translates into the scene (slides along the z axis).
I know that using an orthogonal projection, I could draw an object at fixed size, but I'm not interested in just having it look right. I need to read out the x and z coordinates (in world space) at any given time.
So the further along the -z axis the cube translates, the larger are its x and z values getting in order for the cube to still be a defined pixel size on screen (let's say the cube should be 50x50x50 pixel).
I'm not sure how to begin tackling this, any suggestions?

convert 2D screen coordinates to 3D space coordinates in C++?

I'm writing a 2D game using a wrapper of OpenGLES. There is a camera aiming at a bunch of textures, which are the sprites for the game. The user should be able to move the view around by moving their fingers around on the screen. The problem is, the camera is about 100 units away from the textures, so when the finger is slid across the screen to pan the camera, the sprites move faster than the finger due to parallax effect.
So basically, I need to convert 2D screen coordinates, to 3D coordinates at a specific z distance away (in my case 100 away because that's how far away the textures are).
There are some "Unproject" functions in C#, but I'm using C++ so I need the math behind this function. I'm extremely new to 3D stuff and I'm very bad at math so if you can explain like you are explaining to a 10 year old that would be much appreciated.
If I can do this, I can pan the camera at such a speed so it looks like the distant sprites are panning with the users finger.
For picking purposes, there are better ways than doing a reverse projection. See this answer: https://stackoverflow.com/a/1114023/252687
In general, you will have to scale your finger-movement-distance to use it in a far-away plane (z unit away).
i.e, it l unit is the amount of finger movement and if you want to find the effect z unit away, the length l' = l/z
But, please check the effect and adjust the l' (double/halve etc) to get the desired effect.
Found the answer at:
Wikipedia
It has the following formula:
To determine which screen x-coordinate corresponds to a point at Ax,Az multiply the point coordinates by:
where
Bx is the screen x coordinate
Ax is the model x coordinate
Bz is the focal length—the axial distance from the camera center to the image plane
Az is the subject distance.

labels in an opengl map application

Short Version
How can I draw short text labels in an OpenGL mapping application without having to manually recompute coordinates as the user zooms in and out?
Long Version
I have an OpenGL-based mapping application where I need to be able to draw data sets with up to about 250k points. Each point can have a short text label, usally about 4 or 5 characters long.
Currently, I do this using a single textue containing all the characters. For each point, I define a quad for each character in its label. So a point with the label "Fred" would have four quads associated with it, and each quad uses texture coordinates into that single texture to draw its corresponding character.
When I draw the map, I draw the map points themselves in map coordinates (e.g., longitude/latitude). Then I compute the position of each point in screen coordinates and update the four corner points for each of that point's label quads, again in screen coordinates. (For instance, if I determine the point is drawn at screen point 100, 150, I could set the quad for the first character in the point's label to be the rectangle starting with left-top point of 105, 155 and having a width of 6 pixels and a height of 12 pixels, as appropriate for the particular character. Then the second character might start at 120, 155, and so on.) Then once all these label character quads are positioned correctly, I draw them using an orthogonal screen projection.
The problem is that the process of updating all of those character quad coordinates is slow, taking about half a second for a particular test data set with 150k points (meaning that, since each label is about four characters long, there are about 150k * [ 4 characters per point] * [ 4 coordinate pairs per character] coordinate pairs that need to be set on each update.
If the map application didn't involve zooming, I would not need to recompute all these coordinates on each refresh. I could just compute the label coordinates once and then simply shift my viewing rectangle to show the right area. But with zooming, I can't see how to make it work without doing coordniate computation, because otherwise the characters will grow huge as you zoom in and tiny as you zoom out.
What I want (and what I understand OpenGL doesn't provide) is a way to tell OpenGL that a quad should be drawn in a fixed screen-coordinate rectangle, but that the top-left position of that rectangle should be a fixed distance from a given point in map coordinate space. So I want both a primitive hierarchy (a given map point is that parent of its label character quads) and the ability to mix two different coordinate systems within this hierarchy.
I'm trying to understand whether there is some magic transformation matrix I can set that will do all this form me, but I can't see how to do it.
The other alternative I've considered is using a shader on each point to handle computing the label character quad coordinates for that point. I haven't worked with shaders before, and I'm just trying to understand (a) if it's possible to use shaders to do this, and (b) whether computing all those points in shader code actually buys me anything over computing them myself. (By the way, I have confirmed that the big bottleneck is computing the quad coordinates, not in uploading the updated coordinates to the GPU. The latter takes a bit of time, but it's the computation, the sheer number of coordinates being updated, that takes up the bulk of that half second.)
(Of course, the other other alternative is to be smarter about which labels need to be drawn in a given view in the first place. But for now I'd like to concentrate on the solution assuming all labels need to be drawn.)
So the basic problem ("because otherwise the characters will grow huge as you zoom in and tiny as you zoom out") is that you are doing calculations in map coordinates rather than screen coordinates? And if you did it in screen coords, this would require more computations? Obviously, any rendering needs to translate from map coordinates to screen coordinates. The problem seems to be that you are translating from map to screen too late. Therefore, rather than doing a single map-to-screen for each point, and then working in screen coords, you are working mostly in map coords, and then translating per-character to screen coords at the very end. And the slow part is that you are working in screen coords, then having to manually translate back to map coords just to tell OpenGL the map coords, and it will convert those back to screen coords! Is that a fair assessment of your problem?
The solution therefore is to push that transformation earlier in your pipeline. However, I can see why it is tricky, because at first glance, OpenGL seems want to do everything in "world coordinates" (for you, map coords), but not in screen coords.
Firstly, I am wondering why you are doing separate coordinate calculations for each character. What font rendering system are you using? Something like FreeType will automatically generate a bitmap image of an entire string, and doesn't require you to work per-character [edit: this isn't quite true; see comments]. You definitely shouldn't need to calculate the map coordinate (or even screen coordinate) for every character. Calculate the screen coordinate for the top-left corner of the label, and have your font rendering system produce the bitmap of the entire label in one go. That should speed things up about fourfold (since you assume 4 characters per label).
Now as for working in screen coords, it may be helpful to learn a bit about shaders. The more you learn about OpenGL, the more you learn that really it isn't a 3D rendering engine at all. It's just a 2D graphics library with some very fast matrix primitives built-in. OpenGL actually works, at the lowest level, in screen coordinates (not pixel coordinates -- it works in normalized screen space, I think from memory from -1 to 1 in both the X and Y axis). The only reason it "feels" like you're working in world coordinates is because of these matrices you have set up.
So I think the reason why you are working in map coords all the way until the end is because it's easiest: OpenGL naturally does the map-to-screen transform for you (using the matrices). You have to change that, because you want to work in screen coords yourself, and therefore you need to make the transformation a long time before OpenGL gets its hands on your data. So when you go to draw a label, you should manually apply the map-to-screen transformation matrix on each point, as follows:
You have a particular point (which needs a label drawn) in map coords.
Apply the map-to-screen matrix to convert the point to screen coords. This probably means multiplying the point by the MODELVIEW and PROJECTION matrices, using the same algorithm that OpenGL does when it's rendering a vertex. So you could either glGet the GL_MODELVIEW_MATRIX and GL_PROJECTION_MATRIX to extract OpenGL's current matrices, or you could manually keep around a copy of the matrix yourself.
Now you have the map label in screen coords, compute the position of the label's text. This is simply adding 5 pixels in the X and Y axis, as you said above. However, remember that you aren't in pixel space, but normalised screen space, so you are working in percentages (add 0.05 units, would add 5% of the screen space, for example). It's probably better not to think in pixels, because then your application will scale to match the resolution. But if you really want to think in pixels, you will have to calculate the pixels-to-units based on the resolution.
Use glPushMatrix to save the current matrix, then glLoadIdentity to set the current matrix to the identity -- tell OpenGL not to transform your vertices. (I think you will have to do this for both the PROJECTION and MODELVIEW matrices.)
Draw your label, in screen coordinates.
So you don't really need to write a shader. You could certainly do this in a shader, and it would certainly make step 2 faster (no need to write your own software matrix multiply code; multiplying matrices on the GPU is extremely fast). But that would be a later optimisation, and a lot of work. I think the above steps will help you work in screen coordinates and avoid having to waste a lot of time just to give OpenGL map coordinates.
Side comment on:
"""
generate a bitmap image of an entire string, and doesn't require you to work per-character
...
Calculate the screen coordinate for the top-left corner of the label, and have your font rendering system produce the bitmap of the entire label in one go. That should speed things up about fourfold (since you assume 4 characters per label).
"""
Freetype or no, you could certainly compute a bitmap image for each label, rather than each character, but that would require one of:
storing thousands of different textures, one for each label
It seems like a bad idea to store that many textures, but maybe it's not.
or
rendering each label, for each point, at each screen update.
this would certainly be too slow.
Just to follow up on the resolution:
I didn't really solve this problem, but I ended up being smarter about when I draw labels in the first place. I was able to quickly determine whether I was about to draw too many characters (i.e., so many characters that on a typical screen with a typical density of points the labels would be too close together to read in a useful way) and then I simply don't label at all. With drawing up to about 5000 characters at a time there isn't a noticeable slowdown recomputing the character coordinates as described above.

How does zooming, panning and rotating work?

Using OpenGL I'm attempting to draw a primitive map of my campus.
Can anyone explain to me how panning, zooming and rotating is usually implemented?
For example, with panning and zooming, is that simply me adjusting my viewport? So I plot and draw all my lines that compose my map, and then as the user clicks and drags it adjusts my viewport?
For panning, does it shift the x/y values of my viewport and for zooming does it increase/decrease my viewport by some amount? What about for rotation?
For rotation, do I have to do affine transforms for each polyline that represents my campus map? Won't this be expensive to do on the fly on a decent sized map?
Or, is the viewport left the same and panning/zooming/rotation is done in some otherway?
For example, if you go to this link you'll see him describe panning and zooming exactly how I have above, by modifying the viewport.
Is this not correct?
They're achieved by applying a series of glTranslate, glRotate commands (that represent camera position and orientation) before drawing the scene. (technically, you're rotating the whole scene!)
There are utility functions like gluLookAt which sorta abstract some details about this.
To simplyify things, assume you have two vectors representing your camera: position and direction.
gluLookAt takes the position, destination, and up vector.
If you implement a vector class, destinaion = position + direction should give you a destination point.
Again to make things simple, you can assume the up vector to always be (0,1,0)
Then, before rendering anything in your scene, load the identity matrix and call gluLookAt
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
gluLookAt( source.x, source.y, source.z, destination.x, destination.y, destination.z, 0, 1, 0 );
Then start drawing your objects
You can let the user span by changing the position slightly to the right or to the left. Rotation is a bit more complicated as you have to rotate the direction vector. Assuming that what you're rotating is the camera, not some object in the scene.
One problem is, if you only have a direction vector "forward" how do you move it? where is the right and left?
My approach in this case is to just take the cross product of "direction" and (0,1,0).
Now you can move the camera to the left and to the right using something like:
position = position + right * amount; //amount < 0 moves to the left
You can move forward using the "direction vector", but IMO it's better to restrict movement to a horizontal plane, so get the forward vector the same way we got the right vector:
forward = cross( up, right )
To be honest, this is somewhat of a hackish approach.
The proper approach is to use a more "sophisticated" data structure to represent the "orientation" of the camera, not just the forward direction. However, since you're just starting out, it's good to take things one step at a time.
All of these "actions" can be achieved using model-view matrix transformation functions. You should read about glTranslatef (panning), glScalef (zoom), glRotatef (rotation). You also should need to read some basic tutorial about OpenGL, you might find this link useful.
Generally there are three steps that are applied whenever you reference any point in 3d space within opengl.
Given a Local point
Local -> World Transform
World -> Camera Transform
Camera -> Screen Transform (usually a projection. depends on if you're using perspective or orthogonal)
Each of these transforms is taking your 3d point, and multiplying by a matrix.
When you are rotating the camera, it is generally changing the world -> camera transform by multiplying the transform matrix by your rotation/pan/zoom affine transformation. Since all of your points are re-rendered each frame, the new matrix gets applied to your points, and it gives the appearance of a rotation.