Projection of set of 3D points into virtual image plane in opencv c++ - c++

Anyone know how to project set of 3D points into virtual image plane in opencv c++
Thank you

First you need to have your transformation matrix defined (rotation, translation, etc) to map the 3D space to the 2D virtual image plane, then just multiply your 3D point coordinates (x, y, z) to the matrix to get the 2D coordinates in the image.

registration (OpenNI 2) or alternative viewPoint capability (openNI 1.5) indeed help to align depth with rgb using a single line of code. The price you pay is that you cannot really restore exact X, Y point locations in 3D space since the row and col are moved after alignment.
Sometimes you need not only Z but also X, Y and want them to be exact; plus you want the alignment of depth and rgb. Then you have to align rgb to depth. Note that this alignment is not supported by Kinect/OpenNI. The price you pay for this - there is no RGB values in the locations where depth is undefined.
If one knows extrinsic parameters that is rotation and translation of the depth camera relative to color one then alignment is just a matter of making an alternative viewpoint: restore 3D from depth, and then look at your point cloud from the point of view of a color camera: that is apply inverse rotation and translation. For example, moving camera to the right is like moving the world (points) to the left. Reproject 3D into 2D and interpolate if needed. This is really easy and is just an inverse of 3d reconstruction; below, Cx is close to w/2 and Cy to h/2;
col = focal*X/Z+Cx
row = -focal*Y/Z+Cy // this is because row in the image increases downward
A proper but also more expensive way to get a nice depth map after point cloud rotation is to trace rays from each pixel till it intersects the point cloud or come sufficiently close to one of the points. In this way you will have less holes in your depth map due to sampling artifacts.

Related

Film coordinate to world coordinate

I am working on building 3D point cloud from features matching using OpenCV3.1 and OpenGL.
I have implemented 1) Camera Calibration (Hence I am having Intrinsic Matrix of the camera) 2) Feature extraction( Hence I have 2D points in Pixel Coordinates).
I was going through few websites but generally all have suggested the flow for converting 3D object points to pixel points but I am doing completely backword projection. Here is the ppt that explains it well.
I have implemented film coordinates(u,v) from pixel coordinates(x,y)(With the help of intrisic matrix). Can anyone shed the light on how I can render "Z" of camera coordinate(X,Y,Z) from the film coordinate(x,y).
Please guide me on how I can utilize functions for the desired goal in OpenCV like solvePnP, recoverPose, findFundamentalMat, findEssentialMat.
With single camera and rotating object on fixed rotation platform I would implement something like this:
Each camera has resolution xs,ys and field of view FOV defined by two angles FOVx,FOVy so either check your camera data sheet or measure it. From that and perpendicular distance (z) you can convert any pixel position (x,y) to 3D coordinate relative to camera (x',y',z'). So first convert pixel position to angles:
ax = (x - (xs/2)) * FOVx / xs
ay = (y - (ys/2)) * FOVy / ys
and then compute cartesian position in 3D:
x' = distance * tan(ax)
y' = distance * tan(ay)
z' = distance
That is nice but on common image we do not know the distance. Luckily on such setup if we turn our object than any convex edge will make an maximum ax angle on the sides if crossing the perpendicular plane to camera. So check few frames and if maximal ax detected you can assume its an edge (or convex bump) of object positioned at distance.
If you also know the rotation angle ang of your platform (relative to your camera) Then you can compute the un-rotated position by using rotation formula around y axis (Ay matrix in the link) and known platform center position relative to camera (just subbstraction befor the un-rotation)... As I mention all this is just simple geometry.
In an nutshell:
obtain calibration data
FOVx,FOVy,xs,ys,distance. Some camera datasheets have only FOVx but if the pixels are rectangular you can compute the FOVy from resolution as
FOVx/FOVy = xs/ys
Beware with Multi resolution camera modes the FOV can be different for each resolution !!!
extract the silhouette of your object in the video for each frame
you can subbstract the background image to ease up the detection
obtain platform angle for each frame
so either use IRC data or place known markers on the rotation disc and detect/interpolate...
detect ax maximum
just inspect the x coordinate of the silhouette (for each y line of image separately) and if peak detected add its 3D position to your model. Let assume rotating rectangular box. Some of its frames could look like this:
So inspect one horizontal line on all frames and found the maximal ax. To improve accuracy you can do a close loop regulation loop by turning the platform until peak is found "exactly". Do this for all horizontal lines separately.
btw. if you detect no ax change over few frames that means circular shape with the same radius ... so you can handle each of such frame as ax maximum.
Easy as pie resulting in 3D point cloud. Which you can sort by platform angle to ease up conversion to mesh ... That angle can be also used as texture coordinate ...
But do not forget that you will lose some concave details that are hidden in the silhouette !!!
If this approach is not enough you can use this same setup for stereoscopic 3D reconstruction. Because each rotation behaves as new (known) camera position.
You can't, if all you have is 2D images from that single camera location.
In theory you could use heuristics to infer a Z stacking. But mathematically your problem is under defined and there's literally infinitely many different Z coordinates that would evaluate your constraints. You have to supply some extra information. For example you could move your camera around over several frames (Google "structure from motion") or you could use multiple cameras or use a camera that has a depth sensor and gives you complete XYZ tuples (Kinect or similar).
Update due to comment:
For every pixel in a 2D image there is an infinite number of points that is projected to it. The technical term for that is called a ray. If you have two 2D images of about the same volume of space each image's set of ray (one for each pixel) intersects with the set of rays corresponding to the other image. Which is to say, that if you determine the ray for a pixel in image #1 this maps to a line of pixels covered by that ray in image #2. Selecting a particular pixel along that line in image #2 will give you the XYZ tuple for that point.
Since you're rotating the object by a certain angle θ along a certain axis a between images, you actually have a lot of images to work with. All you have to do is deriving the camera location by an additional transformation (inverse(translate(-a)·rotate(θ)·translate(a)).
Then do the following: Select a image to start with. For the particular pixel you're interested in determine the ray it corresponds to. For that simply assume two Z values for the pixel. 0 and 1 work just fine. Transform them back into the space of your object, then project them into the view space of the next camera you chose to use; the result will be two points in the image plane (possibly outside the limits of the actual image, but that's not a problem). These two points define a line within that second image. Find the pixel along that line that matches the pixel on the first image you selected and project that back into the space as done with the first image. Due to numerical round-off errors you're not going to get a perfect intersection of the rays in 3D space, so find the point where the ray are the closest with each other (this involves solving a quadratic polynomial, which is trivial).
To select which pixel you want to match between images you can use some feature motion tracking algorithm, as used in video compression or similar. The basic idea is, that for every pixel a correlation of its surroundings is performed with the same region in the previous image. Where the correlation peaks is, where it likely was moved from into.
With this pixel tracking in place you can then derive the structure of the object. This is essentially what structure from motion does.

OpenCV stereo vision 3D coordinates to 2D camera-plane projection different than triangulating 2D points to 3D

I get an image point in the left camera (pointL) and the corresponding image point in the right camera (pointR) of my stereo camera using feature matching. The two cameras are parallel and are at the same "hight". There is only a x-translation between them.
I also know the projection matrices for each camera (projL, projR), which I got during calibration using initUndistortRectifyMap.
For triangulating the point, I call:
triangulatePoints(projL, projR, pointL, pointR, pos3D) (documentation), where pos3D is the output 3D position of the object.
Now, I want to project the 3D-coordinates to the 2D-image of the left camera:
2Dpos = projL*3dPos
The resulting x-coordinate is correct. But the y-coodinate is about 20 pixels wrong.
How can I fix this?
Edit:
Of course, I need to use homogeneous coordinates, in order to multiply it with the projection matrix (3x4). For that reason, I set:
3dPos[0] = x;
3dPos[1] = y;
3dPos[2] = z;
3dPos[3] = 1;
Is it wrong, to set 3dPos[3]to 1?
Note:
All images are remapped, I do this in a kind of preprocessing step.
Of course, I always use the homogeneous coordinates
You are likely projecting into the rectified camera. Need to apply the inverse of the rectification warp to obtain the point in the original (undistorted) linear camera coordinates, then apply distortion to get into the original image.

Relate textures areas of a cube with the current Oculus viewport

I'm creating a 360° image player using Oculus rift SDK.
The scene is composed by a cube and the camera is posed in the center of it with just the possibility to rotate around yaw, pitch and roll.
I've drawn the object using openGL considering a 2D texture for each cube's face to create the 360° effect.
I would like to find the portion in the original texture that is actual shown on the Oculus viewport in a certain instant.
Up to now, my approach was try to find the an approximate pixel position of some significant point of the viewport (i.e. the central point and the corners) using the Euler Angles in order to identify some areas in the original textures.
Considering all the problems of using Euler Angles, do not seems the smartest way to do it.
Is there any better approach to accomplish it?
Edit
I did a small example that can be runned in the render loop:
//Keep the Orientation from Oculus (Point 1)
OVR::Matrix4f rotation = Matrix4f(hmdState.HeadPose.ThePose);
//Find the vector respect to a certain point in the viewport, in this case the center (Point 2)
FovPort fov_viewport = FovPort::CreateFromRadians(hmdDesc.CameraFrustumHFovInRadians, hmdDesc.CameraFrustumVFovInRadians);
Vector2f temp2f = fov_viewport.TanAngleToRendertargetNDC(Vector2f(0.0,0.0));// this values are the tangent in the center
Vector3f vector_view = Vector3f(temp2f.x, temp2f.y, -1.0);// just add the third component , where is oriented
vector_view.Normalize();
//Apply the rotation (Point 3)
Vector3f final_vect = rotation.Transform(vector_view);//seems the right operation.
//An example to check if we are looking at the front face (Partial point 4)
if (abs(final_vect.z) > abs(final_vect.x) && abs(final_vect.z) > abs(final_vect.y) && final_vect.z <0){
system("pause");
}
Is it right to consider the entire viewport or should be done for each single eye?
How can be indicated a different point of the viewport respect to the center? I don't really understood which values should be the input of TanAngleToRendertargetNDC().
You can get a full rotation matrix by passing the camera pose quaternion to the OVR::Matrix4 constructor.
You can take any 2D position in the eye viewport and convert it to its camera space 3D coordinate by using the fovPort tan angles. Normalize it and you get the direction vector in camera space for this pixel.
If you apply the rotation matrix gotten earlier to this direction vector you get the actual direction of that ray.
Now you have to convert from this direction to your texture UV. The component with the highest absolute value in the direction vector will give you the face of the cube it's looking at. The remaining components can be used to find the actual 2D location on the texture. This depends on how your cube faces are oriented, if they are x-flipped, etc.
If you are at the rendering part of the viewer, you will want to do this in a shader. If this is to find where the user is looking at in the original image or the extent of its field of view, then only a handful of rays would suffice as you wrote.
edit
Here is a bit of code to go from tan angles to camera space coordinates.
float u = (x / eyeWidth) * (leftTan + rightTan) - leftTan;
float v = (y / eyeHeight) * (upTan + downTan) - upTan;
float w = 1.0f;
x and y are pixel coordinates, eyeWidth and eyeHeight are eye buffer size, and *Tan variables are the fovPort values. I first express the pixel coordinate in [0..1] range, then scale that by the total tan angle for the direction, and then recenter.

OpenCV triangulatePoints - what are the correct coordinates to feed it with?

Previously, I was using another method to determine 3D positions from two 2D images. For that (mediocre) method I had to get 2D coordinates with a point of origin in the center of the image. Because of that, I get a lot of negative values. OpenCV normally uses the let bottom corner as the coordinates origin point (no negative values at all)
The app user is supposed to be able to use either method. Can I keep collecting 2D coordinates that way, or do I have to change it? If not, do I have to use the new center point of the image (result of cv::stereoCalibrate) instead of the default one, (frame.cols/2 , frame.rows/2)?
When using OpenCV's triangulatePoints, you need to pass as arguments:
projectionMatrixA - implicitly contains intrinsic camera parameters (focal length & Principal Point Offset which is the pixel offset from the left and from the top that should be considered as 0,0)
projectionMatrixB - besides intrinsic camera parameteres, projection matrix also reflects the position of the camera in relation to some coordinate system. So if you have two identical cameras, the two projection matrices would still differ because they are positioned differently.
2D points that are a result of 3D points being projected using projectionMatrixA
2D points that are a result of 3D points being projected using projectionMatrixB
To answer the question, there is nothing wrong with the fact that 2D points have negative values.
AFAIK, in the calib module, when dealing with 2D points (pixel coordinates), the coordinate (0,0) should always be around the center of the image and not in the top left corner. So naturally, any points in the left region of the image have x < 0, and any point in the upper region of the image have y < 0.

OpenGL/GLUT - Project ModelView Coordinate to Texture Matrix

Is there a way using OpenGL or GLUT to project a point from the model-view matrix into an associated texture matrix? If not, is there a commonly used library that achieves this? I want to modify the texture of an object according to a ray cast in 3D space.
The simplest case would be:
A ray is cast which intersects a quad, mapped with a single texture.
The point of intersection is converted to a value in texture space clamped between [0.0,1.0] in the x and y axis.
A 3x3 patch of pixels centered around the rounded value of the resulting texture point is set to an alpha value of 0.( or another RGBA value which is convenient, for the desired effect).
To illustrate here is a more complex version of the question using a sphere, the pink box shows the replaced pixels.
I just specify texture points for mapping in OpenGL, I don't actually know how the pixels are projected onto the sphere. Basically I need to to the inverse of that projection, but I don't quite know how to do that math, especially on more complex shapes like a sphere or an arbitrary convex hull. I assume that you can somehow find a planar polygon that makes up the shape, which the ray is intersecting, and from there the inverse projection of a quad or triangle would be trivial.
Some equations, articles and/or example code would be nice.
There are a few ways you could accomplish what you're trying to do:
Project a world coordinate point into normalized device coordinates (NDCs) by doing the model-view and projection transformation matrix multiplications by yourself (or if you're using old-style OpenGL, call gluProject), and perform the perspective division step. If you use a depth coordinate of zero, this would correspond to intersecting your ray at the imaging plane. The only other correction you'd need to do map from NDCs (which are in the range [-1,1] in x and y) into texture space by dividing the resulting coordinate by two, and then shifting by .5.
Skip the ray tracing all together, and bind your texture as a framebuffer attachment to a framebuffer object, and then render a big point (or sprite) that modifies the colors in the neighborhood of the intersection as you want. You could use the same model-view and projection matrices, and will (probably) only need to update the viewport to match the texture resolution.
So I found a solution that is a little complicated, but does the trick.
For complex geometry you must determine which quad or triangle was intersected, and use this as the plane. The quad must be planar(obviously).
Draw a plane in the identity matrix with dimensions 1x1x0, map the texture on points identical to the model geometry.
Transform the plane, and store the inverse of each transform matrix in a stack
Find the point at which the the plane is intersected
Transform this point using the inverse matrix stack until it returns to identity matrix(it should have no depth(
Convert this point from 1x1 space into pixel space by multiplying the point by the number of pixels and rounding. Or start your 2D combining logic here.