So I've written a program that renders a mesh using a Vertex Buffer Object, and lets me move the camera around. I now want to make the object move independently of the camera/view.
However, I'm not sure how to go about moving my meshes through space. Googling tends to find sources either telling me to rotate the objects with glRotatef(), etc., or that using glRotatef() and its siblings is a bad idea because they are deprecated. Perhaps I'm not using the right search terms, but I'm not finding all that much that seems like a good starting point. I see vague references to matrix math, but I don't know how to use that and what approach to take. Other sources say I should apply a vertex shader to transform the objects.
I suppose I could manually reconstruct my mesh each frame, but that seems like a horrible idea (the meshes frequently have upwards of 50k triangles, and I'd like to have dozens of them at least), and I don't really need to have the vertices constantly in use in the rest of my memory if they are already stored in a VBO... right?
So how do I go about manipulating meshes that are stored in VBOs independently of the global space? What resources should I use in learning to do so?
You should be using your ModelView matrix to apply transformations to your vertices. To apply a transformation to a particular object/mesh and not to the entire screen, push a copy of your ModelView matrix onto the stack, apply your transformation, draw your object, then pop that matrix off to go back to your old ModelView matrix.
No need to recompute your vertex positions! That's exactly what these matrices are designed to help you avoid. And the fact that they're stored in a VBO won't matter to you - vertices passed to OpenGL manually are treated exactly the same.
And you might want to check out this question, and the transformation article its accepted answer links to - they'll be useful if you're still getting a hang of transformations and the matrix stack.
Hope that helps!
Edit: A quick example of why the stack is useful. Say you're drawing a simple scene: a guy on a raft (with a sail) in the ocean.
First, you'll want to set up your camera angle, so do whatever transformations you need to set that up. You don't need - and in fact don't want - to push and pop matrices here, because these transformations apply to everything in your scene (In OpenGL, moving the camera = moving the entire world. Weird to think about, but you get used to it.).
Then you draw your ocean. No need to transform it, 'cause it's a static object, and doesn't move.
Then you draw your raft. But your raft has moved! It's drifted along the X axis. Now, since the raft is an independent object and transformations that apply to the raft shouldn't apply to the larger world, you push a matrix onto the stack. This copies the existing ModelView matrix. All those camera transformations are already applied; Your "drifting" transformation on the raft is in addition to the transformations you did at lower levels of the stack.
Draw the raft. Then, before you pop that matrix off the stack, draw the things that are on the raft - the guy and the sail. Since they move with the raft, all the transformations that apply to the raft should be applied to them, to.
Say you draw your castaway first. But he's moved too - he's jumping into the air. So you push another matrix onto the stack, apply a "jumping" transformation, and then render your person. If there's anything that should move with the person - if he were holding anything, say - you'd draw it here, too. But he's not. So pop the "jumping" matrix off the stack.
Now you're back in the "raft" context. Since you applied the "jumping" transformation to a copy, the "drifting" transformation was left untouched a stack level down. Draw the sail now, and it'll be on top of the raft, right where it should be.
And then you're done with raft, so you can pop that matrix off the stack too. You're back down to your plain camera transform. Draw some more static geometry - islands or something.
And that's why the matrix stack is useful. It's also why people build more complicated scenes scenes as "scene graphs" - so they can keep track of the nesting of transformations. It's also useful in skeletal animation, where the position of the wrist depends on the position of the elbow, which depends on the position of the shoulder, and so forth.
And that was way longer than I expected - but hopefully useful. Cheers!
Related
I'm pretty new to OpenGL and am trying to implement a simple program where I can draw cubes, move them around with the mouse, and delete them.
Previously I had done my drag operations by translating on the CPU. In this way I was able to use ray-tracing to pick out the element I wanted because the vertices themselves were being updated.
However, I'm trying to move all of the transformations to the GPU and in doing so realized that I would then be giving up updated access to the vertices on the CPU (as the CPU still thinks the vertices are the un-transformed ones). How does one do this communication so that I wouldn't have to manually do transformations on the CPU as well as in the Vertex Shader?
No matter where you're doing your transformations, you will typically have a model matrix that describes where each object is in the scene. Instead of transforming each object into world space just so you can check for intersection with a world-space ray, you can also transform the ray into the object space of each object by transforming the ray with the inverse model matrix.
One general issue with ray-tracing is that, as your scene gets larger, brute force testing of each object will get increasingly slow. You can use acceleration structures like an Octree or a Bounding Volume Hierarchy to speed things up. A completely different approach when it comes to picking would be just render an ID buffer, i.e. a buffer that has the same resolution as your currently rendered frame and for each pixel saves the ID of the object that is visible at that pixel. Then you can simply read back the value of the pixel underneath the cursor to find out what object you hit without the need to do any raytracing. Rendering the ID buffer could be done as a separate pass or can likely just be added as an additional render target to a pass you're already doing, e.g., prefilling the depth buffer or just when rendering the scene in case you only do one pass.
In my program I have a gizmo wich moves any objects in the scene. As I already know, is the usual way of storing any transformations is store that transformation in model matrices of this objects and execute any transformation directly in the shader. BUT also in my program I implement a classic ray-picking algorythm wich works only with a real transformed data. A ray detect any intersection with real(transformed) vertex position. How is the common way to solve this conflict:
Multiply any transformation immediatly on CPU and store transformed data. I think it's a clear way but it's expensive: for example I drag my object on screen in during 100 frames, and each frame I convert the delta of moving to matrices and multiply whole data by it.
Store any transformation in matrices until the mouse picking will starts and then quick multiply verticies by matrix to prepare data for picking. This is very fancy but there is ways to optimize it.
Which is the more performant way. Maybe there is some other method?
Update for Robinson:
I think you misunderstood me. Or I did not fully understand you. I have a box and a sphere and I move it by gizmo (I edit their model matrix) on 1,0,0 and 0,1,0 respectively. His model matrix is now different. HERE I get data that I need for ray-picking - ever objects has own individual place.
Then I transform the entire scene to eye space (view matrix) and then to clip space (projetion matrix) and render it. My ray makes return journey from viewport to world space (unproject a view and a projection matrix) and should interacts with the actual scene. My ray transformed rather than scene!
My question was how to interact with the objects wich the real place is unknown until it's will render (or transformed)? Or may be I'm not on the right track and I should have done it differently - multiply entire data each step (it's expensive, look at my first question).
You use ray-picking which technically is "get x,y screen coordinates, transform them to NDC and set the z as anyone in the range [-1,1]; and finally transform them all back to world coordinates".
This is useful when you want to intersect a ray from the point of view (the camera) to "mouse coordinates" AND you want to do all of this intersection calculations on CPU side.
Notice you can do it even when nothing is drawn in the screen, just mouse coordinates are needed; well, plus the viewport and the current transformations, but you know them before any glDrawxxx command.
Next, the question is: what are you going to do with that ray or intersections?
You may wish to modify some property (like color) or position. Right?
How many objects are to be modified? If it's just a bunch then it's OK to do it on CPU modifying the data to send to GPU. But if you have thousands of objects then think of the hardware-accelerated way: keep their coordinates but send the new tranformation matrices and properties to GPU and let it do the hard work.
If you are worried about some objects stay as before but others get modified, remember that you can draw groups of objects that share the matrices and other uniforms with a single glDrawxxx call. If you have different groups, use several glDrawxxx calls with different uniforms, even different shaders.
So let's say I have a single vertex (to make things easy) in my program, (0, 0, 0). Right at the origin. I render a single frame with a simple translation matrix, moving the vertex two units down the x-axis. The vertex is rendered accordingly. Does the same vertex now show up in the VRAM as
(2, 0, 0)? I've read that it's important to load all the respective identity matrices in OpenGL every time a frame is rendered--and I assume that's because everything would continually move, rotate, etc. further and further, implying that applying transformations DOES modify actual data, not just the appearance onscreen.
Strictly speaking, OpenGL is just an API definition. An implementation can do whatever it wants as long as it meets the specifications.
That being said, the answer to your question is generally: NO. It's hard to picture how storing transformed vertices back into the memory that also contained the original vertices would ever make sense.
The original vertex positions are passed into the vertex shader, where they are processed, which can include transformations. Once they exit the vertex shader, the transformed positions will most likely be stored in some kind of cache or dedicated on-chip GPU memory until they are processed by the next steps of the pipeline, which includes perspective division, application of the viewport transform, and rasterization. Once those vertex processing steps are completed, the transformed vertices can be discarded. They may stay in a cache for a little longer, for possible reuse of the processed vertex in case the same original vertex is used again. But they are not stored in any persistent way.
The way I interpret it, what you heard about having to reset the matrices for each frame was probably a misunderstanding. If you want to apply the same matrices in the next frame, you don't have to do anything at all.
What they were most likely talking about is related to how the matrix stack in legacy OpenGL works. Most calls that modify the current matrix, like glTranslatef(), glRotatef(), etc, are applied incrementally to the current matrix. For example, if you call glRotatef(), the rotation is combined with the transformation that was already on the matrix stack. The result it that your newly specified rotation is applied to the vertices first, followed by the transformations that were already on the matrix stack.
Based on this, if you want to specify transformations from scratch at the start of each frame, you will call glLoadIdentity() to reset the current transformation on the matrix stack before you start specifying your new transformations. Or you can use glPushMatrix()/glPopMatrix() to save and restore the desired state of the matrix stack.
If you use what many people call "modern OpenGL", meaning that you don't use the legacy fixed pipeline functionality, you don't have to worry about any of that. The matrix stack is gone for good, and you get to calculate your own transformation matrices, and pass them to your shader code.
Here is a link on wiki about the mathematics involved with Transformation Matrices. https://en.wikipedia.org/wiki/Transformation_matrix this will give you an understanding of the math behind the scenes. Another way to look at this is also on the lines of linear or vector algebra. So what happens under the hood when you render a scene is that all of the vertex (pixel) data is sent from the CPU to the GPU to be rasterized and drawn to the screen. This is your batch process or render call, now you also have a frame function that will happen x amount of times per second which will give you your frames per second. So if you are rendering at say 60 FPS then these pixels, vertices, triangles etc., will be drawn 60 times each second. When you apply a transformation to this set of vertices what happens here is you have a transformation matrix that is being multiplied to your model view projection matrix. MVP * T which this will be saved back into your existing MVP matrix if this is how you have your calculations set up. There are some differences between which version of OpenGL you are using as you go from OpenGL v1.0 Pure CPU calls up to v4.5. As far as I know after version 3.2 or 3.3 I don't remember which version off hand you have to implement the MVP yourself where versions greater than v1.5 where shaders were first introduced was handled for you already. Here is the documentation on OpenGL https://www.opengl.org/ and on the main page there will be a topic that says documentation from there you can either select OpenGL Registry or which ever specific version you want to look at. From here you can read their documentation about the OpenGL API since this site covers everything that is available in their API. So as you begin to understand this process, yes the actual coordinate data for these vertices does change, however it will not continuously change unless you are incrementing a static type variable with a factor of time thus giving you some kind of simulation of movement or animation. If you apply only a single transformation then these pixels, vertices, triangles, etc., will either Rotate, Translate, Scale, or Shear depending on which Transformation you are applying. I will tell you that the order of these operations does matter, but I will not tell you which order they are, that will be for you to read up on and to figure out. These reason this does matter is due to the fact that not every Matrix Multiplication has a valid Inverse Matrix. The Identity is used for reasons such as round off errors and floating point precision, so that if you happen to apply say 1,000 transformations in a matter of about 10 seconds, you do not have astronomical errors. This should be enough to point you in the right direction and also serve as a guide as to how the OpenGL API works.
So i'm learning openGL and one thing I find very strange is that the camera has the stay at the origin and look in the same direction. To achieve camera movement and rotation you have to move and rotate the entire world instead of the camera.
My question is, why can't you move the camera? Does directx allow you to move the camera?
This is an interesting question. I think the answer depends on what you actually mean, when you are talking about a fixed camera.
As a matter of fact instead of saying openGL has a fixed camera I'd rather tend to say there isn't any camera at all in openGL.
On the other hand I wouldn't agree with your interpretation that the openGL API moves or rotates the world.
Instead I'd say the openGL API doesn't move or rotate the world at all.
I think the reason why there isn't any concept of a camera in the openGL API is, because it isn't meant as a high level abstraction layer, but rather linked to the computational necesissities in displaying computer graphics.
As I suggest you're mainly talking about displaying 3-dimensional scenes this means transforming 3D vertex data to a 2D raster image.
For every frame rendered this involves transforms transforming the 3-D coordinates of every vertex in your scene to their corresponding 2D location on the screen.
As every vertex has to be placed at the right position on screen it doesn't make any computational difference at all if you conceptually move something like a camera around or just move the whole world, you'll have to do the same transformation nonetheless.
The mathmatics involved in computing the "right" position for a vertex on screen can be described by a mathmatical object called matrix that, when applied (the mathmatical term used for this application is matrix-multiplication) to 3-D data will result in the desired 2D screen coordinates.
So essentially what happens in rendering a 3-D scene - regardless of the fact if there is any camera at all or not - is that every vertex is processed by some transformation matrix, leaving the original 3-D data of your vertex intact.
As the 3-D vertex data doesn't get changed at all, I'd say the openGL doesn't move or rotate the world at all, but this "observation" may depend on the observers perspective.
As a matter of fact leaving the 3-D vertex data intact without changing it all is essential to prevent your 3-D scene from deforming due to accumulated rounding inaccuracy.
I hope I could help by expressing my opinon on who or what moves whom when or why in the openGL API.
Even if I couldn't convice you there is no word-moving involved in using the openGL API don't forget the fact it doesn't weight anything at all so moving it around shouldn't be too painful.
BTW. don't bother to investigate about the proprietary library mentioned in your question and keep relying on open standards.
What's the difference between moving the world and moving the camera? Mathematically... there isn't any; it's the same number either way. It's all a matter of perspective. As long as you code your camera abstraction correctly, you don't have to think of it as moving the world if you don't want to.
Since OpenGL is a state machine, I am constantly glEnable() and glDisable()-ing things in my program. There are a select few calls that I make only at the beginning (such as glClearColor) but most others I flip on and off (like lighting, depending on if I'm rendering a model or 3d text or the gui).
How do you keep track of what state things in? Do you constantly set/reset these things at the top of each function? Isn't that a lot of unnecessary overhead?
For example, when I write a new function, sometimes I know what state things will be in when the function is called, and I leave out glEnable or glDisable or other related state-switching calls at the top of the function. Other times, I'm just writing the function in advance and I add in these sorts of things. So my functions end up being very messy, some of them modifying OpenGL state and others just making assumptions (that are later broken, and then I have to go back and figure out why something turned yellow or why another thing is upside down, etc.).
How do you keep track of OpenGL across functions in an object oriented environment?
Also related to this question, is how to know when to use push and pop, and when to just set the value.
For example, let's say you have a program that draws some 3D stuff, then draws some 2D stuff. Obviously the projection matrix is different in each case. So do you:
set up a 3d projection matrix, draw 3D, set up a 2d projection matrix, draw 2d, loop
set up a 3d projection matrix at the program; then draw 3d, push matrix, draw 2d, pop matrix, loop
And why?
Great question. Think about how textures work. There are an insane amount of textures for OpenGL to switch between, and you need to enable/disable texturing after every object is drawn. You generally try to optimize this by drawing all objects with the same texture at once, but there's still a remarkable amount of state-switching that goes on. Because of this fact, I do my best to draw everything with the same state at once, but I'm not sure if it's terribly important to optimize it the way you're thinking.
As for pushing and popping between 3D and 2D projection modes, pushing and popping is intended to be used for hierarchical modeling. If you need your 2D projection to be at a location relative to a 3D object, by all means, push the matrix and switch to 2D projection mode, then pop when you're done. However, if you're writing a 2D GUI overlay that has a constant location on the screen, I see no reason why pushing and popping is important.
You might be interested to read more about scene graph libraries. They are meant to manage your graphics from a higher level. They don't handle everything well, but they excel at organizing your geometry for optimized rendering. They can additionally be used for sorting your scenes based on OpenGL state, although many do not do this. By sorting on state, you can render your geometry in an order that results in the fewer state transitions. Here's a nice overview about scene graph libraries:
http://www.realityprime.com/articles/scenegraphs-past-present-and-future
I think the simplest approach would be to write your own render class, wrapping all OpenGL state manipulation functions you are using and do book keeping of the states you set. You need to take into account that changing screen resolution or toggling fullscreen mode will invalidate your current OpenGL render context's states and data, which means that after such an event you will have to set all states, re-upload all textures and shader programs, etc.