How to calculate the 3D coordinate from a picture with given Z-value - c++

I've got a picture of a plane with 4 known points on it. I've got the intrinsic and extrinsic camera parameters and also (using the Rodriguez function) the position of the camera. The plane is defined as my ground level (Z = 0). If I select a point in my image, is there an easy way to calculate the coordinates, where this point would be on my plane?

Not much can be labeled as 'easy' when dealing with 3D rendering.
For your question, I would look into ray tracing. I am not going to try to explain it, as most sites will do a better job of explaining it then I can.

When you look at opencv in calib3d module. you will see this equation:
https://docs.opencv.org/master/d9/d0c/group__calib3d.html
Please scroll down the link and see the perspective transformation equations
From what you say, you declare the plane ground level(Z=0). you also know the internsic (focal point in pixels , image center) camera parameter and you know the exterinsic (rotation and translation) camera parameter. and you want to access some pixels in your image (is it?) and from there, you want to estimate where it is on the plane??
You can use triangulatePoints() function in calib3D module of opencv. you need at least 2 images tough.
But your case seems unlikely to me, if you try to detect 4 known points, you will have to define the world coordinate of those plane first, usually, you define top left corner of the plane as original (0,0,0), then, you will know the position of those 4 known points in world coordinate by manual calculation. when you detect it in opencv program, it gives you the pixel coordinates of those 4 points. then, usually, what people expect to calculate is the pose ( rotation and translation ).
Alternatively, if your case is what you said, you can make a simple matrix operation code based on perspective transformation equation.

Related

Rendering an atmosphere around a planet with shading

I have a made a planet and wanted to make an atmosphere around it. So I was referring to this site:
Click to visit site
I don't understand this:
As with the lookup table proposed in Nishita et al. 1993, we can get the optical depth for the ray to the sun from any sample point in the atmosphere. All we need is the height of the sample point (x) and the angle from vertical to the sun (y), and we look up (x, y) in the table. This eliminates the need to calculate one of the out-scattering integrals. In addition, the optical depth for the ray to the camera can be figured out in the same way, right? Well, almost. It works the same way when the camera is in space, but not when the camera is in the atmosphere. That's because the sample rays used in the lookup table go from some point at height x all the way to the top of the atmosphere. They don't stop at some point in the middle of the atmosphere, as they would need to when the camera is inside the atmosphere.
Fortunately, the solution to this is very simple. First we do a lookup from sample point P to the camera to get the optical depth of the ray passing through the camera to the top of the atmosphere. Then we do a second lookup for the same ray, but starting at the camera instead of starting at P. This will give us the optical depth for the part of the ray that we don't want, and we can subtract it from the result of the first lookup. Examine the rays starting from the ground vertex (B 1) in Figure 16-3 for a graphical representation of this.
First Question - isn't optical depth dependent on how you see that is, on the viewing angle? If yes, the table just gives me the optical depth of the rays going from land to the top of the atmosphere in a straight line. So what about the case where the rays pierce the atmosphere to reach the camera? How do I get the optical depth in this case?
Second Question - What is the vertical angle it is talking about...like, is it the same as the angle with the z-axis as we use in polar coordinates?
Third Question - The article talks about scattering of the rays going to the sun..shouldn't it be the other way around? like coming from the sun to a point?
Any explanation on the article or on my questions will help a lot.
Thanks in advance!
I am no expert in the matter but have played with Atmospheric scattering and various physical and optical simulations. I strongly recommend to look at this:
my VEEERRRYYY Simplified version of atmospheric scattering in GLSL
It odes not do the full volume intergration but just linear path integration along the ray and does only the Rayleight scatering with isotropic coefficients. As you can see its still good enough.
In real scattering the viewing angle is impacting the real scattering equation as the scattering coefficients are different in different angles (against main light source and viewer) So answer to your first question is Yes it does.
Not sure what you are refer to in your second question. The scattering itself is dependent on angle between light source, particle and camera. That lies on arbitrary plane. However if the Earth surface is accounted to the equation too then its dependent on the horizontal and vertical angles (against terrain) so azimuth,elevation as usually more light is reflected when camera is facing sun (azimuth) and the reflected rays are closer to your elevation. So my guess is that's what the horizontal angle is about accounting for reflected light from the surface.
To answer your 3th question is called back ray tracing. You can cast rays both ways (from camera or from sun) however if you start from light source you do not know which way to go to hit a pixel on camera screen so you need to cast a lot of rays to increase the probability of hit enough to fill the screen which is too slow and inaccurate (produce holes). If you start from screen pixel then you cast just single or per wavelength ray instead which is much much faster. The resulting color is the same.
[Edit1] vertical angle
OK I read the linked topic a bit and this is How I understand it:
So its just angle between surface normal and the casted ray. Its scaled so vert.angle=0 means that ray and normal are the same and vert.angle=1 means the are opposite directions.

OpenCV Structure from Motion Reprojection Issue

I am currently facing an issue with my Structure from Motion program based on OpenCv.
I'm gonna try to depict you what it does, and what it is supposed to do.
This program lies on the classic "structure from motion" method.
The basic idea is to take a pair of images, detect their keypoints and compute the descriptors of those keypoints. Then, the keypoints matching is done, with a certain number of tests to insure the result is good. That part works perfectly.
Once this is done, the following computations are performed : fundamental matrix, essential matrix, SVD decomposition of the essential matrix, camera matrix computation and finally, triangulation.
The result for a pair of images is a set of 3D coordinates, giving us points to be drawn in a 3D viewer. This works perfectly, for a pair.
Indeed, here is my problem : for a pair of images, the 3D points coordinates are calculated in the coordinate system of the first image of the image pair, taken as the reference image. When working with more than two images, which is the objective of my program, I have to reproject the 3D points computed in the coordinate system of the very first image, in order to get a consistent result.
My question is : How do I reproject 3D points coordinate given in a camera related system, into an other camera related system ? With the camera matrices ?
My idea was to take the 3D point coordinates, and to multiply them by the inverse of each camera matrix before.
I clarify :
Suppose I am working on the third and fourth image (hence, the third pair of images, because I am working like 1-2 / 2-3 / 3-4 and so on).
I get my 3D point coordinates in the coordinate system of the third image, how do I do to reproject them properly in the very first image coordinate system ?
I would have done the following :
Get the 3D points coordinates matrix, apply the inverse of the camera matrix for image 2 to 3, and then apply the inverse of the camera matrix for image 1 to 2.
Is that even correct ?
Because those camera matrices are non square matrices, and I can't inverse them.
I am surely mistaking somewhere, and I would be grateful if someone could enlighten me, I am pretty sure this is a relative easy one, but I am obviously missing something...
Thanks a lot for reading :)
Let us say you have a 3 * 4 extrinsic parameter matrix called P. To match the notations of OpenCV documentation, this is [R|t].
This matrix P describes the projection from world space coordinates to the camera space coordinates. To quote the documentation:
[R|t] translates coordinates of a point (X, Y, Z) to a coordinate system, fixed with respect to the camera.
You are wondering why this matrix is non-square. That is because in the usual context of OpenCV, you are not expecting homogeneous coordinates as output. Therefore, to make it square, just add a fourth row containing (0,0,0,1). Let's call this new square matrix Q.
You have one such matrix for each pair of cameras, that is you have one Qk matrix for each pair of images {k,k+1} that describes the projection from the coordinate space of camera k to that of camera k+1. Those matrices are inversible because they describe isometries in homogeneous coordinates.
To go from the coordinate space of camera 3 to that of camera 1, just apply to your points the inverse of Q2 and then the inverse of Q1.

Matching top view human detections with floor projection on interactive floor project

I'm building an interactive floor. The main idea is to match the detections made with a Xtion camera with objects I draw in a floor projection and have them following the person.
I also detect the projection area on the floor which translates to a polygon. the camera can detect outside the "screen" area.
The problem is that the algorithm detects the the top most part of the person under it using depth data and because of the angle between that point and the camera that point isn't directly above the person's feet.
I know the distance to the floor and the height of the person detected. And I know that the camera is not perpendicular to the floor but I don't know the camera's tilt angle.
My question is how can I project that 3D point onto the polygon on the floor?
I'm hoping someone can point me in the right direction. I've been reading about camera projections but I'm not seeing how to use it in this particular problem.
Thanks in advance
UPDATE:
With the awnser from Diego O.d.L I was able to get an almost perfect detection. I'll write the steps I used for those who might be looking for the same solution (I won't get into much detail on how detection is made):
Step 1 : Calibration
Here I get some color and depth frames from the camera, using openNI, with the projection area cleared.
The projection area is detected on the color frames.
I then convert the detection points to real world coordinates (using OpenNI's CoordinateConverter). With the new real world detection points I look for the plane that better fits them.
Step 2: Detection
I use the detection algorithm to get new person detections and to track them using the depth frames.
These detection points are converted to real world coordinates and projected to the plane previously computed. This corrects the offset between the person's height and the floor.
The points are mapped to screen coordinates using a perspective transform.
Hope this helps. Thank you again for the awnsers.
Work with the camera coordinate system initially. I'm assuming you don't have problems converting from (row,column,distance) to a real world system aligned with the camera axis (x,y,z):
calculate the plane with three or more points (for robustness) with
the camera projection (x,y,z). (choose your favorite algorithm,
i.e
Then Find the projection of your head point to the floor plane
(example)
Finally, you can convert it to the floor coordinate system or just
keep it in the camera system
From the description of your intended application, it is probably more useful for you to recover the image coordinates, I guess.
This type of problems usually benefits from clearly defining the variables.
In this case, you have a head at physical position {x,y,z} and you want the ground projection {x,y,0}. That's trivial, but your camera gives you {u,v,d} (d being depth) and you need to transform that to {x,y,z}.
The easiest solution to find the transform for a given camera positioning may be to simply put known markers on the floor at {0,0,0}, {1,0,0}, {0,1,0} and see where they pop up in your camera.

Get the value of a voxel from a volume

I need a function to get the value of a voxel (3d pixel) from a X.volume object, given the x,y,z coords as an input. This is to use a lablemap as a reference for an atlas function. Is there a way to do this?
Many thanks,
Edward
Currently no, though the XTK developers are creating something similar as explained in Finding world coordinates from screen coordinates, though what I believe whats being done there is an Unproject function and Ray/Triangle intersection test. A Ray/Triangle intersection tests justs casts a Ray from your screen into the 3d world and returns the first intersected triangle coordinates, but of course you need to find the voxels. You can try creating an Unproject function and something similar to the Ray/Triangle intersection but instead finding the closest intersected voxel instead of triangle. Help with an Unproject function is here http://myweb.lmu.edu/dondi/share/cg/unproject-explained.pdf, but rememeber that link explains GluUnproject from OpenGL, but still explains what we need to do, we are just making an alternative version of GluUnproject for WebGL. Any solutions you may find would be greatly appreciated if contributed to XTK. Or you may wait on the Unproject function that may possibly come from the related problem in finding the 3d coordinates.
What are your (x,y,z) ? Are they the coordinates in the world's basis or in the volume's basis ? In both case a solution could be computed (it will be a bit more complicated in the 1st case) but I think it would require a few changes in the sources.
Do you want to code a bit in the sources and participate ? It would be great for xtk users like us ! You'd just to go through the volume's hiearchy and to do a few operations :
If (x,y,z) are coordinates in the world's basis : complicated. Must do some tests :-)
Choose a direction to work in ( 1 volume can have slices in 3 directions), for ex slicesX for the 1st direction
With the volume's center X coordinate, the volume X spacing, and the X coordinate of the picked point find the good slice in slicesX
With the volume's center (y,z), the volume Y and Z spacing, and the Y and Z coordinates of the picked point get the (x',y') coordinate on the choosen slice's texture map
With the choosen slice's texture size get (x'',y'') the coordinates on the slice's texture image
Read the (x'',y'') point of the texture
I'm sorry I've not time to look further this week, but maybe later.

How to use a chessboard to find the rotation/translation between 2 cameras

I am using opencv with C, and I am trying to get the extrinsic parameters (Rotation and translation) between 2 cameras.
I'm told that a checkerboard pattern can be used to calibrate, but I can't find any good samples on this. How do I go about doing this?
edit
The suggestions given are for calibrating a single camera with a checkerboard. How would you find the rotation and translation between 2 cameras given the checkerboard images from both views?
I was using code from http://www.starlino.com/opencv_qt_stereovision.html. It has some useful information and code of the author is pretty easy to understand and analyze, it covers both - chessboard calibrate and getting depth image from stereo cameras. I think it's based on this OpenCV book
opencv library here and about 3 chapters of the opencv book
A picture from a camera is just a projection of a bunch of color samples onto a plane. Assuming that the camera itself creates pictures with square pixels, the possible position of a given pixel is a vector from the camera's origin through the plane the pixel was projected onto. We'll refer to that plane as the picture plane.
One sample doesn't give you that much information. Two samples tells you a little bit more - the position of the camera relative to the plane created by three points: the two sample points and the camera position. And a third sample tells you the relative position of the camera in the world; this will be a single point in space.
If you take the same three samples and find them in another picture taken from a different camera, you will be able to determine the relative position of the cameras from the three samples (and their orientations based on the right and up vectors of the picture plane). To the correct distance, you need to know the distance between the actual sample points. In the case of a checkerboard, it's the physical dimensions of the checkerboard.