I have a color image along with its depth map which both are captured by Kinect. I want to project it to another location (to see how it looks like in another viewpoint). Since I don't have the intrinsic parameters (camera param.) of the Kinect; How can I implement it?
P.S: I'm writing my codes in C++.
With the depth frame and the color frame you should have enough data to achieve something similar to what you want to do.
In the color frame, you have the color of each pixels.
In the depth frame, you have the distance of each pixels.
(Keep in mind that there is a small gap between the data in the depth frame and the color frame due to the position of each sensor. Have a look at the mapping helper methods : MapDepthFrameToColorFrame)
If you take all the data in both the depth and the color frame, at the same time, you could draw each pixel as a point in a 3-dimensional world. Let's say you have a resolution of 640x480. You'll have a scene drawn in a rectangle of 640 (x = width), 480 (y = height), ~3000 (z = depth). Then you can change the point of view !
The only problem is that you won't have the right scale for the Z axis. If you want a better result, you should also use the SkeletonFrame. Thanks to that you'll have the actual X, Y, and Z values (in meters). Once again you can use the helper method (MapDepthToSkeletonPoint) to get the corresponding skeleton point for each depth point !
If you look at this post, you'll find a video that shows the result, some piece of code in C#, and a project sample that you can reuse.
Related
I am trying to draw a 3D cuboid by clicking on one of the corner points and then extending it based on the dimensions provided by the user, and then rotating it about any axis. However, I am not sure about how I can specify the (x, y, z) tuple after the mouse-click, since the output window is on 2D. Also, I cannot understand how to extend the point to form a cuboid.
What you want is called 3D Picking. Usually, this is done using Raycasting.
However, there is an easy solution (with acceptable performance) that involves rendering the scene off-screen on a framebuffer with 32-bit floats for the R/G/B values.
In the shader you use the x, y and z coordinates of that pixel as the color values.
Then, when the user clicks somewhere, you simply read out that pixels color to get its position.
I am working on building 3D point cloud from features matching using OpenCV3.1 and OpenGL.
I have implemented 1) Camera Calibration (Hence I am having Intrinsic Matrix of the camera) 2) Feature extraction( Hence I have 2D points in Pixel Coordinates).
I was going through few websites but generally all have suggested the flow for converting 3D object points to pixel points but I am doing completely backword projection. Here is the ppt that explains it well.
I have implemented film coordinates(u,v) from pixel coordinates(x,y)(With the help of intrisic matrix). Can anyone shed the light on how I can render "Z" of camera coordinate(X,Y,Z) from the film coordinate(x,y).
Please guide me on how I can utilize functions for the desired goal in OpenCV like solvePnP, recoverPose, findFundamentalMat, findEssentialMat.
With single camera and rotating object on fixed rotation platform I would implement something like this:
Each camera has resolution xs,ys and field of view FOV defined by two angles FOVx,FOVy so either check your camera data sheet or measure it. From that and perpendicular distance (z) you can convert any pixel position (x,y) to 3D coordinate relative to camera (x',y',z'). So first convert pixel position to angles:
ax = (x - (xs/2)) * FOVx / xs
ay = (y - (ys/2)) * FOVy / ys
and then compute cartesian position in 3D:
x' = distance * tan(ax)
y' = distance * tan(ay)
z' = distance
That is nice but on common image we do not know the distance. Luckily on such setup if we turn our object than any convex edge will make an maximum ax angle on the sides if crossing the perpendicular plane to camera. So check few frames and if maximal ax detected you can assume its an edge (or convex bump) of object positioned at distance.
If you also know the rotation angle ang of your platform (relative to your camera) Then you can compute the un-rotated position by using rotation formula around y axis (Ay matrix in the link) and known platform center position relative to camera (just subbstraction befor the un-rotation)... As I mention all this is just simple geometry.
In an nutshell:
obtain calibration data
FOVx,FOVy,xs,ys,distance. Some camera datasheets have only FOVx but if the pixels are rectangular you can compute the FOVy from resolution as
FOVx/FOVy = xs/ys
Beware with Multi resolution camera modes the FOV can be different for each resolution !!!
extract the silhouette of your object in the video for each frame
you can subbstract the background image to ease up the detection
obtain platform angle for each frame
so either use IRC data or place known markers on the rotation disc and detect/interpolate...
detect ax maximum
just inspect the x coordinate of the silhouette (for each y line of image separately) and if peak detected add its 3D position to your model. Let assume rotating rectangular box. Some of its frames could look like this:
So inspect one horizontal line on all frames and found the maximal ax. To improve accuracy you can do a close loop regulation loop by turning the platform until peak is found "exactly". Do this for all horizontal lines separately.
btw. if you detect no ax change over few frames that means circular shape with the same radius ... so you can handle each of such frame as ax maximum.
Easy as pie resulting in 3D point cloud. Which you can sort by platform angle to ease up conversion to mesh ... That angle can be also used as texture coordinate ...
But do not forget that you will lose some concave details that are hidden in the silhouette !!!
If this approach is not enough you can use this same setup for stereoscopic 3D reconstruction. Because each rotation behaves as new (known) camera position.
You can't, if all you have is 2D images from that single camera location.
In theory you could use heuristics to infer a Z stacking. But mathematically your problem is under defined and there's literally infinitely many different Z coordinates that would evaluate your constraints. You have to supply some extra information. For example you could move your camera around over several frames (Google "structure from motion") or you could use multiple cameras or use a camera that has a depth sensor and gives you complete XYZ tuples (Kinect or similar).
Update due to comment:
For every pixel in a 2D image there is an infinite number of points that is projected to it. The technical term for that is called a ray. If you have two 2D images of about the same volume of space each image's set of ray (one for each pixel) intersects with the set of rays corresponding to the other image. Which is to say, that if you determine the ray for a pixel in image #1 this maps to a line of pixels covered by that ray in image #2. Selecting a particular pixel along that line in image #2 will give you the XYZ tuple for that point.
Since you're rotating the object by a certain angle θ along a certain axis a between images, you actually have a lot of images to work with. All you have to do is deriving the camera location by an additional transformation (inverse(translate(-a)·rotate(θ)·translate(a)).
Then do the following: Select a image to start with. For the particular pixel you're interested in determine the ray it corresponds to. For that simply assume two Z values for the pixel. 0 and 1 work just fine. Transform them back into the space of your object, then project them into the view space of the next camera you chose to use; the result will be two points in the image plane (possibly outside the limits of the actual image, but that's not a problem). These two points define a line within that second image. Find the pixel along that line that matches the pixel on the first image you selected and project that back into the space as done with the first image. Due to numerical round-off errors you're not going to get a perfect intersection of the rays in 3D space, so find the point where the ray are the closest with each other (this involves solving a quadratic polynomial, which is trivial).
To select which pixel you want to match between images you can use some feature motion tracking algorithm, as used in video compression or similar. The basic idea is, that for every pixel a correlation of its surroundings is performed with the same region in the previous image. Where the correlation peaks is, where it likely was moved from into.
With this pixel tracking in place you can then derive the structure of the object. This is essentially what structure from motion does.
Let's see an image first:
The model in the image is create by texture mapping. I want to have a mouse clicked on the screen, then I want to place a fixed point on the surface of the model. What's more, as the model rotates, the fixed point is still on the surface of the model.
My question is:
How can I place the fixed point on the surface of the model?
How can I get the coordinate (x, y, z) of the fixed point?
My thought is as follows:
use gluUnproject function to get two points when I have the mouse clicked on the screen. One point is on the near clip plane and another is on the far one.
concatenate the two points to form a line.
iterate points on the line of step 2 and use glReadPixels to get the pixel value of the iterated points. If the the values jump from zero to nonzero or jump from nonzero to zero(the pixel value of background is zero), the surface points are found.
This is my thought. But it seems that it does not work!!! Can anyone give me some advice. Thank you!
The model in the image is create by texture mapping.
No, it's not. First and foremost there is no model at all. What you have there is a 3D dataset of voxels and then you have a volume rasterizer that "shoots" rays through the dataset, integrates them up and for each ray produces a color and opacity value.
This process is not(!!!) texture mapping. Texture mapping is when you draw a "solid" primitive and for each fragment (a fragment is what eventually becomes a pixel) determines a single location in the texture data set and samples it. But a volume raycaster as you have it there performs a whole ray integration effectively sampling many voxels from the whole dataset into a single pixel. That's a completely different way of creating a color-opacity value.
My question is:
How can I place the fixed point on the surface of the model?
You can't because the dataset you have there does not have a "fixed" surface point. You have to define some segmentation operation that decides which position along the ray constitutes as "this is the surface". The simple most method would be using a threshold cutoff function.
How can I get the coordinate (x, y, z) of the fixed point?
Your best bet would be modifying the volume raycasting code, changing it from an integrator into a segmentizer. Assume that you want to use the threshold method.
You typical volume rasterizer works like this (usually implemented in a shader):
vec4 output;
for(vec3 pos = start
; length(pos - start) <= length(end - start)
; pos += voxel_grid_increment ){
vec4 t = texture3D(voxeldata, pos);
/* integrate t into output */
}
The integration step merges the incoming color and opacity of the texture voxel t into the output color+opacitcy. There are several methods to do this.
You'd change this into a shader that simply stops that loop at a given cutoff threshold and emits the position of that voxel:
vec3 output;
for(vec3 pos = start
; length(pos - start) <= length(end - start)
; pos += voxel_grid_increment ){
float t = texture3D(voxeldata, pos).r;
if( t > threshold ){
output = pos;
break;
}
}
The result of that would be a picture encoding the determined voxel position in its pixels RGB values. Use a 16 bit per channel texture format (or single precision of half precision float) and you've got enough resolution to address withing the limits of what typical GPUs can address in a 3D texture.
You'll want to do this off-screen using a FBO.
Another viable approach is taking the regular voxel raycaster and at the threshold position modify the depth value output for the particular fragment. The drawback of this method is, that depth output modification trashes performance, so you'll not want to do this if framerates matter. The benefit of this method would be, that you then in fact could use glReadPixels on the depth buffer and gluUnProject the depth value at where your moust pointer is.
My thought is as follows:
use gluUnproject function to get two points when I have the mouse
clicked on the screen. One point is on the near clip plane and
another is on the far one. concatenate the two points to form a
line.
iterate points on the line of step 2 and use glReadPixels to get
the pixel value of the iterated points. If the the values jump from
zero to nonzero or jump from nonzero to zero(the pixel value of
background is zero), the surface points are found.
That's not going to work. For the simple reason that glReadPixels sees exactly the same as you see. You can not "select" the depth at which glReadPixels read the pixels, because there's no depth in the picture left. glReadPixels just sees what you see: A flat image as it's shown in the window. You'll have to iterate over the voxel data, but you can't do this post-hoc. You'll have to implement or modify a volume raterizer to extract the information you need.
I am not going to write here a full implementation of what you need.Also,you could just search the web and find quite alot of info on this subject.But what you are looking for is called "Decals".Nvidia also presented a technique called "Texture bombing".In the nutshell,you draw a planar (or enclosing volume)geometry to project the decal texture onto it.The actual process is a little bit more complex as you can see from the examples.
I am working on a project for my thesis and I am building my own path tracer. Afterwards, I have to modify it in such a way to be able to implement the following paper:
https://mediatech.aalto.fi/publications/graphics/GPT/kettunen2015siggraph_paper.pdf
Of course I DO NOT want you to read the paper, but I link it anyway for those who are more curious. In brief, instead of rendering an image by just using the normal path tracing procedure, I have to calculate the gradients for each pixel, which means: if before we were shooting only rays through each pixel, we now shoot also rays for the neighbouring pixels, 4 in total, left, right, top, bottom. Let me explain in other words, I shoot one ray through a pixel and calculate its final colour as for normal path tracing, but, moreover, I shoot rays for its neighbour pixels, calculate the same final colour for those and, in order to calculate the gradients, I subtract their final colours from the main pixel. It means that for each pixel I will have 5 values in total:
colour of the pixel
gradient with right pixel = colour of the right pixel - colour of the pixel
gradient with left pixel = colour of the left pixel - colour of the pixel
gradient with top pixel = colour of the top pixel - colour of the pixel
gradient with bottom pixel = colour of the bottom pixel - colour of the pixel
The problem is that I don't know how to build the final image by both using the main colour and the gradients. What the paper says is that I have to use the screened Poisson reconstruction.
"Screened Poisson reconstruction combines the image and its
gradients using a parameter α that specifies the relative weights of
the sampled image and the gradients".
Everywhere I search for this Poisson reconstruction I see, of course, a lot of math but it comes hard to apply it to my project. Any idea? Thanks in advance!
Anyone know how to project set of 3D points into virtual image plane in opencv c++
Thank you
First you need to have your transformation matrix defined (rotation, translation, etc) to map the 3D space to the 2D virtual image plane, then just multiply your 3D point coordinates (x, y, z) to the matrix to get the 2D coordinates in the image.
registration (OpenNI 2) or alternative viewPoint capability (openNI 1.5) indeed help to align depth with rgb using a single line of code. The price you pay is that you cannot really restore exact X, Y point locations in 3D space since the row and col are moved after alignment.
Sometimes you need not only Z but also X, Y and want them to be exact; plus you want the alignment of depth and rgb. Then you have to align rgb to depth. Note that this alignment is not supported by Kinect/OpenNI. The price you pay for this - there is no RGB values in the locations where depth is undefined.
If one knows extrinsic parameters that is rotation and translation of the depth camera relative to color one then alignment is just a matter of making an alternative viewpoint: restore 3D from depth, and then look at your point cloud from the point of view of a color camera: that is apply inverse rotation and translation. For example, moving camera to the right is like moving the world (points) to the left. Reproject 3D into 2D and interpolate if needed. This is really easy and is just an inverse of 3d reconstruction; below, Cx is close to w/2 and Cy to h/2;
col = focal*X/Z+Cx
row = -focal*Y/Z+Cy // this is because row in the image increases downward
A proper but also more expensive way to get a nice depth map after point cloud rotation is to trace rays from each pixel till it intersects the point cloud or come sufficiently close to one of the points. In this way you will have less holes in your depth map due to sampling artifacts.