I was wondering about the problem of getting 3D position from a 2D one. In my application, I got everything set up (mvp matrix, screen position of mouse).
We go from 3D to 2D by the following transform.
gl_Position = projection* view * model * vertex;
Then these clip coordinates are divided by w to NDC, and then converted to screen coordinates. I want to do the reverse (mouse click to 3D). Screen to NDC is easy for x and y. But there is a loss of z data.
As far as I can tell, the ONLY method to recover the 3D data is using ray-casting with a tree based spatial data structure and maybe CUDA/OpenCL. But I am not sure. Is there any other way? Is there a method to recover 3D position given that we have the MVP matrix and screen coordinates?
Read the deph value at the mouse pointer position (glReadPixels(…, GL_DEPTH,…)) and use that for recovering Z. Or do the raycasting into a scene structure. Depending on your applications either one may be the better solution.
Related
Im making an editor in which I want to build a terrain map. I want to use the mouse to increase/decrease terrain altitude to create mountains and lakes.
Technically I have a heightmap I want to modify at a certain texcoord that I pick out with my mouse. To do this I first go from screen coordinates to world position - I have done that. The next step, going from world position to picking the right texture coordinate puzzles me though. How do I do that?
If you are using a simple hightmap, that you use as a displacement map in lets say the y direction. The base mesh lays in the xz plain (y=0).
You can discard the y coordinate from world coordinate that you have calculated and you get the point on the base mesh. From there you can map it to texture space the way, you map your texture.
I would not implement it that way.
I would render the scene to a framebuffer and instead of rendering a texture the the mesh, colorcode the texture coordinate onto the mesh.
If i click somewhere in screen space, i can simple read the pixel value from the framebuffer and get the texture coordinate directly.
The rendering to the framebuffer should be very inexpensive anyway.
Assuming your terrain is a simple rectangle you first calculate the vector between the mouse world position and the origin of your terrain. (The vertex of your terrain quad where the top left corner of your height map is mapped to). E.g. mouse (50,25) - origin(-100,-100) = (150,125).
Now divide the x and y coordinates by the world space width and height of your terrain quad.
150 / 200 = 0.75 and 125 / 200 = 0.625. This gives you the texture coordinates, if you need them as pixel coordinates instead simply multiply with the size of your texture.
I assume the following:
The world coordinates you computed are those of the mouse pointer within the view frustrum. I name them mouseCoord
We also have the camera coordinates, camCoord
The world consists of triangles
Each triangle point has texture coordiantes, those are interpolated by barycentric coordinates
If so, the solution goes like this:
use camCoord as origin. Compute the direction of a ray as mouseCoord - camCoord.
Compute the point of intersection with a triangle. Naive variant is to check for every triangle if it is intersected, more sophisticated would be to rule out several triangles first by some other algorithm, like parting the world in cubes, trace the ray along the cubes and only look at the triangles that have overlappings with the cube. Intersection with a triangle can be computed like on this website: http://www.lighthouse3d.com/tutorials/maths/ray-triangle-intersection/
Compute the intersection points barycentric coordinates with respect to that triangle, like that: https://www.scratchapixel.com/lessons/3d-basic-rendering/ray-tracing-rendering-a-triangle/barycentric-coordinates
Use the barycentric coordinates as weights for the texture coordinates of the corresponding triangle points. The result are the texture coordinates of the intersection point, aka what you want.
If I misunderstood what you wanted, please edit your question with additional information.
Another variant specific for a height map:
Assumed that the assumptions are changed like that:
The world has ground tiles over x and y
The ground tiles have height values in their corners
For a point within the tile, the height value is interpolated somehow, like by bilinear interpolation.
The texture is interpolated in the same way, again with given texture coordinates for the corners
A feasible algorithm for that (approximative):
Again, compute origin and direction.
Wlog, we assume that the direction has a higher change in x-direction. If not, exchange x and y in the algorithm.
Trace the ray in a given step length for x, that is, in each step, the x-coordinate changes by that step length. (take the direction, multiply it with step size divided by it's x value, add that new direction to the current position starting at the origin)
For your current coordinate, check whether it's z value is below the current height (aka has just collided with the ground)
If so, either finish or decrease step size and do a finer search in that vicinity, going backwards until you are above the height again, then maybe go forwards in even finer steps again et cetera. The result are the current x and y coordinates
Compute the relative position of your x and y coordinates within the current tile. Use that for weights for the corner texture coordinates.
This algorithm can theoretically jump over very thin tops. Choose a small enough step size to counter that. I cannot give an exact algorithm without knowing what type of interpolation the height map uses. Might be not the worst idea to create triangles anyway, out of bilinear interpolated coordinates maybe? In any case, the algorithm is good to find the tile in which it collides.
Another variant would be to trace the ray over the points at which it's x-y-coordinates cross the tile grid and then look if the z coordinate went below the height map. Then we know that it collides in this tile. This could produce a false negative if the height can be bigger inside the tile than at it's edges, as certain forms of interpolation can produce, especially those that consider the neighbour tiles. Works just fine with bilinear interpolation, though.
In bilinear interpolation, the exact intersection can be found like that: Take the two (x,y) coordinates at which the grid is crossed by the ray. Compute the height of those to retrieve two (x,y,z) coordinates. Create a line out of them. Compute the intersection of that line with the ray. The intersection of those is that of the intersection with the tile's height map.
Simplest way is to render the mesh as a pre-pass with the uvs as the colour. No screen to world needed. The uv is the value at the mouse position. Just be careful though with mips/filtering etv
in an OpenGL context, I have seen it is possible to convert mouse coordinates to 3D world coordinates (e.g. MFC with Opengl get 3d coordinate from 2d coordinate of mouse). However, this does not work when I have simply a set of GLPoints and lots of empty space: when I'm hovering the mouse over empty space, the 3D coordinates have no meaning.
How can I get the coordinates of the nearest 3D point to my mouse position?
Recognize that "unprojecting" 2D mouse coordinates into a 3D world requires additional information. A 2D position on the screen corresponds to an infinite number of points along a line, from the near plane to the far plane in 3D. What this means is that to get a 3D point, you need to provide a depth value as well.
The gluUnProject function is convenient way of doing this and provides a parameter for this winZ. 0.0 would find the 3D point at the near plane and 1.0 would find it at the far plane.
gluUnProject(winX, winY, winZ, model, proj, view, objX, objY, objZ);
Note: If gluUnProject is not available you can figure out the same thing pretty easily with matrices. source here
When the mouse is used to interact with 3D objects this depth value is typically found by sampling from the depth buffer, or intersecting a ray with scene geometry, or a primitive (such as a sphere or box). If the objective is to interact with points, you could use a sphere around each point. If you just want a 3D point "somewhere out there" you could decide on a depth value (maybe 0.0 on the near plane or 0.5 - halfway between the near and far plane).
I am working on voxelisation using the rendering pipeline and now I successfully voxelise the scene using vertex+geometry+fragment shaders. Now my voxels are stored in a 3D texture which has size, for example, 128x128x128.
My original model of the scene is centered at (0,0,0) and it extends in both positive and negative axis. The texure, however, is centered at (63,63,63) in tex coordinates.
I implemented a simple ray marcing for visualizing but it doesn't take into account the camera movements (I can render only from very fixed positions because my ray have to be generated taking into account the different coordinates of the 3D texture).
My question is: how can I map my rays so that they are generated at point Po with direction D in the coordinates of my 3D model but intersect the voxels at the corresponding position in texture coordinates and every movement of the camera in the 3D world is remapped in the voxel coordinates?
Right now I generate the rays in this way:
create a quad in front of the camera at position (63,63,-20)
cast rays in direction towards (63,63,3)
I think you should store your entire view transform matrix in your shader uniform params. Then for each shader execution you can use its screen coords and view transform to compute the view ray direction for your particular pixel.
Having the ray direction and camera position you just use them the same as currently.
There's also another way to do this that you can try:
Let's say you have a cube (0,0,0)->(1,1,1) and for each corner you assign a color based on its position, like (1,0,0) is red, etc.
Now for every frame you draw your cube front faces to the texture, and cube back faces to the second texture.
In the final rendering you can use both textures to get enter and exit 3D vectors, already in the texture space, which makes your final shader much simpler.
You can read better descriptions here:
http://graphicsrunner.blogspot.com/2009/01/volume-rendering-101.html
http://web.cse.ohio-state.edu/~tong/vr/
I've rendered a 3D scene using glFrustum() perspective mode. I then have a 2D object that I place over the 3D scene to act as a label for a particular 3D object. I have calculated the 2D position of the 3D object using using gluProject() at which position I then place my 2D label object. The 2D label object is rendered using glOrtho() orthographic mode. This works perfectly and the 2D label object hovers over the 3D object.
Now, what I want to do is to give the 2D object a z value so that it can be hidden behind other 3D objects in the scene using the depth buffer. I have given the 2D object a z value that I know should be hidden by the depth buffer, but when I render the object it is always visible.
So the question is, why is the 2D object still visible and not hidden?
I did read somewhere that orthographic and perspective projections store incompatible depth buffer values. Is this true, and if so how do I convert between them?
I don't want it to be transformed, instead I want it to appear as flat 2D label that always faces the camera and remains the same size on the screen at all times. However, if it is hidden behind something I want it to appear hidden.
First, you should have put that in your question; it explains far more about what you're trying to do than your question.
To achieve this, what you need to do is find the z-coordinate in the orthographic projection that matches the z-coordinate in pre-projective space of where you want the label to appear.
When you used gluProject, you got three coordinates back. The Z coordinate is important. What you need to do is reverse-transform the Z coordinate based on the zNear and zFar values you give to glOrtho.
Pedantic note: gluProject doesn't transform the Z coordinate to window space. To do so, it would have to take the glDepthRange parameters. What it really does is assume a depth range of near = 0.0 and far = 1.0.
So our first step is to transform from window space Z to normalized device coordinate (NDC) space Z. We use this simple equation:
ndcZ = (2 * winZ) - 1
Simple enough. Now, we need to go to clip space. Which is a no-op, because with an orthographic projection, the W coordinate is assumed to be 1.0. And the division by W is the difference between clip space and NDC space.
clipZ = ndcZ
But we don't need clip space Z. We need pre-orthographic projection space Z (aka: camera space Z). And that requires the zNear and zFar parameters you gave to glOrtho. To get to camera space, we do this:
cameraZ = ((clipZ + (zFar + zNear)/(zFar - zNear)) * (zFar - zNear))/-2
And you're done. Use that Z position in your rendering. Oh, and make sure your modelview matrix doesn't include any transforms in the Z direction (unless you're using the modelview matrix to apply this Z position to the label, which is fine).
Based on Nicol's answer you can simply set zNear to 0 (which generally makes sense for 2D elements that are acting as part of the GUI) and then you simply have:
cameraZ = -winZ*zFar
I have a 3D model consisting of point vertices (XYZ) and eventually triangular faces.
Using OpenGL or camera-view-matrix-projection I can project the 3D model to a 2D plane, i.e. a view window or an image with m*n resolution.
The question is how can I determine the correspondence between a pixel from the 2D projection plan and its corresponding vertex (or face) from the original 3D model.
Namely,
What is the closest vertices in 3D model for a given pixel from 2D projection?
It sounds like picking in openGL or raytracing problem. Is there however any easy solution?
With the idea of ray tracing it is actually about finding the first vertex/face intersected with the ray from a view point. Can someone show me some tutorial or examples? I would like to find an algorithm independent from using OpenGL.
Hit testing in OpenGL usually is done without raytracing. Instead, as each primitive is rendered, a plane in the output is used to store the unique ID of the primitive. Hit testing is then as simple as reading the ID plane at the cursor location.
My (possibly-naive) thought would be to create an array of the vertices and then sort them by their distance (or distance-squared, for speed) once projected to your screen point. The first item in the list will be closest. It will be O(n) for n vertices, but no worse.
Edit: Better for speed and memory: simply loop through all vertices and keep track of the vertex whose projection is closest (distance squared) to your viewport pixel. This assumes that you are able to perform the projection yourself, without relying on OpenGL.
For example, in pseudo-code:
function findPointFromViewPortXY( pointOnViewport )
closestPoint = false
bestDistance = false
for (each point in points)
projectedXY = projectOntoViewport(point)
distanceSquared = distanceBetween(projectedXY, pointOnViewport)
if bestDistance==false or distanceSquared<bestDistance
closestPoint = point
bestDistance = distanceSquared
return closestPoint
In addition to Ben Voigt's answer:
If you do a separate pass over pickable objects, then you can set the viewport to contain only a single pixel that you will read.
You can also encode triangle ID by using geometry shader (gl_PrimitiveID).