Transformation of vertex data. Mouse picking algorithm - opengl

In my program I have a gizmo wich moves any objects in the scene. As I already know, is the usual way of storing any transformations is store that transformation in model matrices of this objects and execute any transformation directly in the shader. BUT also in my program I implement a classic ray-picking algorythm wich works only with a real transformed data. A ray detect any intersection with real(transformed) vertex position. How is the common way to solve this conflict:
Multiply any transformation immediatly on CPU and store transformed data. I think it's a clear way but it's expensive: for example I drag my object on screen in during 100 frames, and each frame I convert the delta of moving to matrices and multiply whole data by it.
Store any transformation in matrices until the mouse picking will starts and then quick multiply verticies by matrix to prepare data for picking. This is very fancy but there is ways to optimize it.
Which is the more performant way. Maybe there is some other method?
Update for Robinson:
I think you misunderstood me. Or I did not fully understand you. I have a box and a sphere and I move it by gizmo (I edit their model matrix) on 1,0,0 and 0,1,0 respectively. His model matrix is now different. HERE I get data that I need for ray-picking - ever objects has own individual place.
Then I transform the entire scene to eye space (view matrix) and then to clip space (projetion matrix) and render it. My ray makes return journey from viewport to world space (unproject a view and a projection matrix) and should interacts with the actual scene. My ray transformed rather than scene!
My question was how to interact with the objects wich the real place is unknown until it's will render (or transformed)? Or may be I'm not on the right track and I should have done it differently - multiply entire data each step (it's expensive, look at my first question).

You use ray-picking which technically is "get x,y screen coordinates, transform them to NDC and set the z as anyone in the range [-1,1]; and finally transform them all back to world coordinates".
This is useful when you want to intersect a ray from the point of view (the camera) to "mouse coordinates" AND you want to do all of this intersection calculations on CPU side.
Notice you can do it even when nothing is drawn in the screen, just mouse coordinates are needed; well, plus the viewport and the current transformations, but you know them before any glDrawxxx command.
Next, the question is: what are you going to do with that ray or intersections?
You may wish to modify some property (like color) or position. Right?
How many objects are to be modified? If it's just a bunch then it's OK to do it on CPU modifying the data to send to GPU. But if you have thousands of objects then think of the hardware-accelerated way: keep their coordinates but send the new tranformation matrices and properties to GPU and let it do the hard work.
If you are worried about some objects stay as before but others get modified, remember that you can draw groups of objects that share the matrices and other uniforms with a single glDrawxxx call. If you have different groups, use several glDrawxxx calls with different uniforms, even different shaders.


I want to scale a 3D object so that it always appeats to be the same size, no matter how far from the camera

I have the world, view, and projection matrices. I have an object that is part of the UI and should be the same size no matter how far from the camera it is. I'm using this to create a gizmo similar to the "move" gizmos you see in 3D modelling program.
How can I extract a proper scale to do this from world/view/projection (or from the view frustrum if preferred, I have the six planes).
Bonus question: Using these same matrices, is there a decent way to figure out how much I'd need to move a 3D vector to make it move ONE PIXEL visually?

Select object in OpenGL when doing transformations in the vertex shader

I'm pretty new to OpenGL and am trying to implement a simple program where I can draw cubes, move them around with the mouse, and delete them.
Previously I had done my drag operations by translating on the CPU. In this way I was able to use ray-tracing to pick out the element I wanted because the vertices themselves were being updated.
However, I'm trying to move all of the transformations to the GPU and in doing so realized that I would then be giving up updated access to the vertices on the CPU (as the CPU still thinks the vertices are the un-transformed ones). How does one do this communication so that I wouldn't have to manually do transformations on the CPU as well as in the Vertex Shader?
No matter where you're doing your transformations, you will typically have a model matrix that describes where each object is in the scene. Instead of transforming each object into world space just so you can check for intersection with a world-space ray, you can also transform the ray into the object space of each object by transforming the ray with the inverse model matrix.
One general issue with ray-tracing is that, as your scene gets larger, brute force testing of each object will get increasingly slow. You can use acceleration structures like an Octree or a Bounding Volume Hierarchy to speed things up. A completely different approach when it comes to picking would be just render an ID buffer, i.e. a buffer that has the same resolution as your currently rendered frame and for each pixel saves the ID of the object that is visible at that pixel. Then you can simply read back the value of the pixel underneath the cursor to find out what object you hit without the need to do any raytracing. Rendering the ID buffer could be done as a separate pass or can likely just be added as an additional render target to a pass you're already doing, e.g., prefilling the depth buffer or just when rendering the scene in case you only do one pass.

Why would it be beneficial to have a separate projection matrix, yet combine model and view matrix?

When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.

OpenGL - Object Transformations and VBOs

So I've written a program that renders a mesh using a Vertex Buffer Object, and lets me move the camera around. I now want to make the object move independently of the camera/view.
However, I'm not sure how to go about moving my meshes through space. Googling tends to find sources either telling me to rotate the objects with glRotatef(), etc., or that using glRotatef() and its siblings is a bad idea because they are deprecated. Perhaps I'm not using the right search terms, but I'm not finding all that much that seems like a good starting point. I see vague references to matrix math, but I don't know how to use that and what approach to take. Other sources say I should apply a vertex shader to transform the objects.
I suppose I could manually reconstruct my mesh each frame, but that seems like a horrible idea (the meshes frequently have upwards of 50k triangles, and I'd like to have dozens of them at least), and I don't really need to have the vertices constantly in use in the rest of my memory if they are already stored in a VBO... right?
So how do I go about manipulating meshes that are stored in VBOs independently of the global space? What resources should I use in learning to do so?
You should be using your ModelView matrix to apply transformations to your vertices. To apply a transformation to a particular object/mesh and not to the entire screen, push a copy of your ModelView matrix onto the stack, apply your transformation, draw your object, then pop that matrix off to go back to your old ModelView matrix.
No need to recompute your vertex positions! That's exactly what these matrices are designed to help you avoid. And the fact that they're stored in a VBO won't matter to you - vertices passed to OpenGL manually are treated exactly the same.
And you might want to check out this question, and the transformation article its accepted answer links to - they'll be useful if you're still getting a hang of transformations and the matrix stack.
Hope that helps!
Edit: A quick example of why the stack is useful. Say you're drawing a simple scene: a guy on a raft (with a sail) in the ocean.
First, you'll want to set up your camera angle, so do whatever transformations you need to set that up. You don't need - and in fact don't want - to push and pop matrices here, because these transformations apply to everything in your scene (In OpenGL, moving the camera = moving the entire world. Weird to think about, but you get used to it.).
Then you draw your ocean. No need to transform it, 'cause it's a static object, and doesn't move.
Then you draw your raft. But your raft has moved! It's drifted along the X axis. Now, since the raft is an independent object and transformations that apply to the raft shouldn't apply to the larger world, you push a matrix onto the stack. This copies the existing ModelView matrix. All those camera transformations are already applied; Your "drifting" transformation on the raft is in addition to the transformations you did at lower levels of the stack.
Draw the raft. Then, before you pop that matrix off the stack, draw the things that are on the raft - the guy and the sail. Since they move with the raft, all the transformations that apply to the raft should be applied to them, to.
Say you draw your castaway first. But he's moved too - he's jumping into the air. So you push another matrix onto the stack, apply a "jumping" transformation, and then render your person. If there's anything that should move with the person - if he were holding anything, say - you'd draw it here, too. But he's not. So pop the "jumping" matrix off the stack.
Now you're back in the "raft" context. Since you applied the "jumping" transformation to a copy, the "drifting" transformation was left untouched a stack level down. Draw the sail now, and it'll be on top of the raft, right where it should be.
And then you're done with raft, so you can pop that matrix off the stack too. You're back down to your plain camera transform. Draw some more static geometry - islands or something.
And that's why the matrix stack is useful. It's also why people build more complicated scenes scenes as "scene graphs" - so they can keep track of the nesting of transformations. It's also useful in skeletal animation, where the position of the wrist depends on the position of the elbow, which depends on the position of the shoulder, and so forth.
And that was way longer than I expected - but hopefully useful. Cheers!

The purpose of Model View Projection Matrix

For what purposes are we using Model View Projection Matrix?
Why do shaders require Model View Projection Matrix?
The model, view and projection matrices are three separate matrices. Model maps from an object's local coordinate space into world space, view from world space to camera space, projection from camera to screen.
If you compose all three, you can use the one result to map all the way from object space to screen space, making you able to work out what you need to pass on to the next stage of a programmable pipeline from the incoming vertex positions.
In the fixed functionality pipelines of old, you'd apply model and view together, then work out lighting using another result derived from them (with some fixes so that e.g. normals are still unit length even if you've applied some scaling to the object), then apply projection. You can see that reflected in OpenGL, which never separates the model and view matrices — keeping them as a single modelview matrix stack. You therefore also sometimes see that reflected in shaders.
So: the composed model view projection matrix is often used by shaders to map from the vertices you loaded for each model to the screen. It's not required, there are lots of ways of achieving the same thing, it's just usual because it allows all possible linear transforms. Because of that, a lesser composed version of it was also the norm in ye olde fixed pipeline world.
Because matrices are convenient. Matrices help to convert locations/directions with respect to different spaces (A space can be defined by 3 perpendicular axes and an origin).
Here is an example from a book specified by #legends2k in comments.
The residents of Cartesia use a map of their city with the origin
centered quite sensibly at the center of town and axes directed along
the cardinal points of the compass. The residents of Dyslexia use a
map of their city with the coordinates centered at an arbitrary point
and the axes running in some arbitrary directions that probably seemed
a good idea at the time. The citizens of both cities are quite happy
with their respective maps, but the State Transportation Engineer
assigned a task of running up a budget for the first highway between
Cartesia and Dyslexia needs a map showing the details of both cities,
which therefore introduces a third coordinate system that is superior
to him, though not necessarily to anybody else.
Here is another example,
Assume that you have created a car object in a game with it's vertex positions using world's co-ordinates. Suppose you have to use this same car in some other game in an entirely different world, you have to define the positions again and the calculations will go complex. This is because you again have to calculate the positions of window, hood, headlight, wheels etc., in the car with respect to new world.
See this video to understand the concepts of model, view and projection. (highly recommended)
Then see this to understand how the vertices in the world are represented as Matrices and how they are transformed.