Creating a steepest descent algorithm to determine a 3D affine warp - c++

I want to implement the 2.5D inverse compositional image alignment. For that I need to create an steepest descent image. I followed the implementation from Code Project for a 2D image alignment. But I am searching for 3D warp information and because of that also for a 3D steepest descent image.
To my project, I have a 3D model interpretation, with raycasting I am creating a rgbd-image. Now I want to search for a 3D warp, which aligns this template image with a given live image to estimate the camera position.
I have currently only the gradients in X and Y direction
cv::Sobel(grayImg_T, Grad_TX, CV_32F, 1, 0, 3);
cv::Sobel(grayImg_T, Grad_TY, CV_32F, 0, 1, 3);
And I am estimating the steepest descent as follows:
float* p_sd_pixel = &p_sd[cols*j * 3 + i * 3];
p_sd_pixel[0] = (float) (-cols*Tx + rows*Ty);
p_sd_pixel[1] = (float) Tx;
p_sd_pixel[2] = (float) Ty;
for(int l = 0; l < 3; l++){
for(int m = 0; m < 3; m++){
float* p_h = (float*)(H.data);
p_h[3*l+m] += p_sd_pixel[l]*p_sd_pixel[m];
}
}
Both is from the 2D inverse compositional image alignment code, I have from the website of the link I posted before. I think I need also a gradient in Z direction. But I have no idea how to create the steepest descent image for 2.5D alignment and also how to determine the affine warp. How can I tackle the math or find a better way to implement this?

Related

unrolled label of a Cap using chessboard pattern (OpenCV c++)

I´m trying to use a chessboard pattern, to get the information of the cylinder map and rectifie the "distortion" so that image shows the cap surface unrolled. I made a first test with a one shot calibration and cv::fisheye::undistortImage to get a un-distortion (attached two images).
*//runCalibrationFishEye
void runCalibrationFishEye(cv::Mat& image, cv::Matx33d&, cv::Vec4d&);
cv::Mat removeFisheyeLensDist(cv::Mat&, cv::Matx33d&, cv::Vec4d&);*
It is to remark that i am not interested in calibrate the image, to get metric values. I just want to use the chessboard information to unroll the image on the cylinder surface.
The final aim is to use the rectified images of 4 cameras and to stitch the rectified images to one unrolled image.
Do i need to make a full calibration of the camera? Or is there another way to get a remap of the cylinder surface?
I will try to implement this interesting unwarp method: https://dsp.stackexchange.com/questions/2406/how-to-flatten-the-image-of-a-label-on-a-food-jar/2409#2409
cap with chessboard
Rectification
I have found a similar approach, of another problem but with a similar Mathematics. And it was solved without a calibration pattern. Link here. Its a approximation, but the result is quite good enough.
the user Hammer gave an answer that helped me to get a solution. I have changed the way he do the mapping, using OpenCV remap. The formula to recalculate the coordinates is just as he gave it, using different values, and making a preprocessing to adjust the image (Rotation, zoom, and other adjustments).Unrolled image. I am now improving the distortion of the edges, so that it is not so pronounced at the edges. But the main question is solved.
cv::Point2f convert_pt(cv::Point2f point, int w, int h)
{
cv::Point2f pc(point.x - w / 2, point.y - h / 2);
float f = w;
float r = w;
float omega = w / 2;
float z0 = f - sqrt(r*r - omega*omega);
//Formula para remapear el cylindro
float zc = (2 * z0 + sqrt(4 * z0*z0 - 4 * (pc.x*pc.x / (f*f) + 1)*(z0*z0 - r*r))) / (2 * (pc.x*pc.x / (f*f) + 1));
cv::Point2f final_point(pc.x*zc / f, pc.y*zc / f);
final_point.x += w / 2;
final_point.y += h / 2;
return final_point;
}

Perspective Transformation for bird's eye view opencv c++

I am interested in perspective transformation to bird's eye view. So far I have tried getPerspectiveTransform and findHomography and then passing it onto warpPerspective. The results are quite close but a skew in TL and BR is present. Also the contourArea are not translated equally post transformation.
The contour is a square with multiple shapes inside.
Any suggestion on how to go ahead.
Code block of what I have done so far.
std::vector<Point2f> quad_pts;
std::vector<Point2f> squre_pts;
cv::approxPolyDP( Mat(validContours[largest_contour_index]), contours_poly[0], epsilon, true );
if (approx_poly.size() > 4) return false;
for (int i=0; i< 4; i++)
quad_pts.push_back(contours_poly[0][i]);
if (! orderRectPoints(quad_pts))
return false;
float widthTop = (float)distanceBetweenPoints(quad_pts[1], quad_pts[0]); // sqrt( pow(quad_pts[1].x - quad_pts[0].x, 2) + pow(quad_pts[1].y - quad_pts[0].y, 2));
float widthBottom = (float)distanceBetweenPoints(quad_pts[2], quad_pts[3]); // sqrt( pow(quad_pts[2].x - quad_pts[3].x, 2) + pow(quad_pts[2].y - quad_pts[3].y, 2));
float maxWidth = max(widthTop, widthBottom);
float heightLeft = (float)distanceBetweenPoints(quad_pts[1], quad_pts[2]); // sqrt( pow(quad_pts[1].x - quad_pts[2].x, 2) + pow(quad_pts[1].y - quad_pts[2].y, 2));
float heightRight = (float)distanceBetweenPoints(quad_pts[0], quad_pts[3]); // sqrt( pow(quad_pts[0].x - quad_pts[3].x, 2) + pow(quad_pts[0].y - quad_pts[3].y, 2));
float maxHeight = max(heightLeft, heightRight);
int mDist = (int)max(maxWidth, maxHeight);
// transform TO points
const int offset = 50;
squre_pts.push_back(Point2f(offset, offset));
squre_pts.push_back(Point2f(mDist-1, offset));
squre_pts.push_back(Point2f(mDist-1, mDist-1));
squre_pts.push_back(Point2f(offset, mDist-1));
maxWidth += offset; maxHeight += offset;
Size matSize ((int)maxWidth, (int)maxHeight);
Mat transmtx = getPerspectiveTransform(quad_pts, squre_pts);
// Mat homo = findHomography(quad_pts, squre_pts);
warpPerspective(mRgba, mRgba, transmtx, matSize);
return true;
Link to transformed image
Image pre-transformation
corner on pre-transformed image
Corners from CornerSubPix
Your original pre-transformation image is not so good, the squares have different sizes there and it looks wavy. The results you get are quite good given the quality of your input.
You could try to calibrate your camera (https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html) to compensate lens distortion, and your results may improve.
EDIT: Just to summarize the comments below, approxPolyDp may not locate the corners properly if the square has rounded corners or it is blurred. You may need to improve the corner location by other means such as a sharper original image, different preprocessing (median filter or threshold, as you suggest in the comments), or other algorithms for finer corner location (such as using the cornersubpix function or detecting the sides with Hough Transform and then calculating the intersections of them)

Find rectangular object quality with perspective

I get image from a camera (calibrated and without lens distortions) and I need to detect a rectangular object. Markers are a good example. For markers I check corner count, min size, board contrast and convexity. I had an idea on how to improve this in cases where there is large amount of false rectangles.
Here is an example image:
Normally all of these are valid, because without knowing anything about camera we cannot determine if perspective allows these kinds of shapes. I know the size (or at least the ratio) of the rectangle in real-life. So I had an idea that I should be able to disregard many of these shapes just by reprojecting them and checking for error.
Like if I use solvePnPRansac it would not be able to converge if the shape is not possible. If it doesn't converge I just disregard it. Sadly, none of the OpenCV solve functions allow checking me for error or convergence. I actually need some ratio or quality, because it is possible that some of the rectangles overlap. For example my object finder identifies these rectangles:
One of the three is actually correct, or at least "the best". But I need some way to know which one it is. I cannot use things like line lengths because of the camera perspective. So I just thought I could solve and see which has the smallest error.
There are no lens distortions in the image, but even if there were solvePnP usually allows passing D to it as well.
Is this even possible or am I missing something?
I guess I could try hacking around solvePnPRansac just to return convergence, but maybe there is a simpler way?
I figured I can do something like what is done during calibration with a grid. I can calculate the reprojection error. So first I solve to get the transformation matrix. Then I transform the points in 3D using the transformation matrix and afterwards use projectPoints to project them back in 2D. Then I check distance between original 2D points and the projected 2D points. This can then be used for quality. Objects that are not possible often have 100 pixels or more reprojection error in my images, but possible objects have less than 20px. So I just did a 25 pixel cutoff and it seems to work fine.
Note that more transformations are possible than I though. In my original image maybe two are not possible with my current camera, but it still did reject a lot of fakes.
If nobody else has some ideas I will accept this as answer.
Here is some code for the method I use:
//This is the object in 3D
double width = 50.0; //Object is 50mm wide
double height = 30.0; //Object is 30mm tall
cv::Mat object_points(4,3,CV_64FC1);
object_points.at<double>(0,0)=0;
object_points.at<double>(0,1)=0;
object_points.at<double>(0,2)=0;
object_points.at<double>(1,0)=width;
object_points.at<double>(1,1)=0;
object_points.at<double>(1,2)=0;
object_points.at<double>(2,0)=width;
object_points.at<double>(2,1)=height;
object_points.at<double>(2,2)=0;
object_points.at<double>(3,0)=0;
object_points.at<double>(3,1)=height;
object_points.at<double>(3,2)=0;
//Check all rectangles for error
cv::Mat image_points(4,2,CV_64FC1);
for (size_t i = 0; i < rectangles_to_test.size(); i++) {
// Get rectangle points
for (size_t c = 0; c < 4; ++c) {
image_points.at<double>(c,0) = (rectangles_to_test[i].points[c].x);
image_points.at<double>(c,1) = (rectangles_to_test[i].points[c].y);
}
// Calculate transformation matrix
cv::Mat rvec, tvec;
cv::solvePnP(object_points, image_points, M1, D1, rvec, tvec);
cv::Mat rotation;
Matrix4<double> transform;
transform.init_identity();
cv::Rodrigues(rvec, rotation);
for(size_t row = 0; row < 3; ++row) {
for(size_t col = 0; col < 3; ++col) {
transform.set(row, col, rotation.at<double>(row, col));
}
transform.set(row, 3, tvec.at<double>(row, 0));
}
// Calculate projection
std::vector<cv::Point3f> p3(4);
std::vector<cv::Point2f> p2;
Vector4<double> p = transform * Vector4<double>(0, 0, 0, 1);
p3[0] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(width, 0, 0, 1);
p3[1] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(width, height, 0, 1);
p3[2] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(0, height, 0, 1);
p3[3] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
cv::projectPoints(p3, cv::Mat::zeros(1, 3, CV_64FC1), cv::Mat::zeros(1, 3, CV_64FC1), M1, D1, p2);
// Calculate reprojection error
rectangles_to_test[i].reprojection_error = 0.0;
for (size_t c = 0; c < 4; ++c) {
double dx = p2[c].x - rectangles_to_test[i].points[c].x;
double dy = p2[c].y - rectangles_to_test[i].points[c].y;
rectangles_to_test[i].reprojection_error += std::sqrt(dx*dx + dy*dy);
}
if (rectangles_to_test[i].reprojection_error > reprojection_error_threshold) {
//rectangle is no good
}
}

Kinect for Windows v2 depth to color image misalignment

currently I am developing a tool for the Kinect for Windows v2 (similar to the one in XBOX ONE). I tried to follow some examples, and have a working example that shows the camera image, the depth image, and an image that maps the depth to the rgb using opencv. But I see that it duplicates my hand when doing the mapping, and I think it is due to something wrong in the coordinate mapper part.
here is an example of it:
And here is the code snippet that creates the image (rgbd image in the example)
void KinectViewer::create_rgbd(cv::Mat& depth_im, cv::Mat& rgb_im, cv::Mat& rgbd_im){
HRESULT hr = m_pCoordinateMapper->MapDepthFrameToColorSpace(cDepthWidth * cDepthHeight, (UINT16*)depth_im.data, cDepthWidth * cDepthHeight, m_pColorCoordinates);
rgbd_im = cv::Mat::zeros(depth_im.rows, depth_im.cols, CV_8UC3);
double minVal, maxVal;
cv::minMaxLoc(depth_im, &minVal, &maxVal);
for (int i=0; i < cDepthHeight; i++){
for (int j=0; j < cDepthWidth; j++){
if (depth_im.at<UINT16>(i, j) > 0 && depth_im.at<UINT16>(i, j) < maxVal * (max_z / 100) && depth_im.at<UINT16>(i, j) > maxVal * min_z /100){
double a = i * cDepthWidth + j;
ColorSpacePoint colorPoint = m_pColorCoordinates[i*cDepthWidth+j];
int colorX = (int)(floor(colorPoint.X + 0.5));
int colorY = (int)(floor(colorPoint.Y + 0.5));
if ((colorX >= 0) && (colorX < cColorWidth) && (colorY >= 0) && (colorY < cColorHeight))
{
rgbd_im.at<cv::Vec3b>(i, j) = rgb_im.at<cv::Vec3b>(colorY, colorX);
}
}
}
}
}
Does anyone have a clue of how to solve this? How to prevent this duplication?
Thanks in advance
UPDATE:
If I do a simple depth image thresholding I obtain the following image:
This is what more or less I expected to happen, and not having a duplicate hand in the background. Is there a way to prevent this duplicate hand in the background?
I suggest you use the BodyIndexFrame to identify whether a specific value belongs to a player or not. This way, you can reject any RGB pixel that does not belong to a player and keep the rest of them. I do not think that CoordinateMapper is lying.
A few notes:
Include the BodyIndexFrame source to your frame reader
Use MapColorFrameToDepthSpace instead of MapDepthFrameToColorSpace; this way, you'll get the HD image for the foreground
Find the corresponding DepthSpacePoint and depthX, depthY, instead of ColorSpacePoint and colorX, colorY
Here is my approach when a frame arrives (it's in C#):
depthFrame.CopyFrameDataToArray(_depthData);
colorFrame.CopyConvertedFrameDataToArray(_colorData, ColorImageFormat.Bgra);
bodyIndexFrame.CopyFrameDataToArray(_bodyData);
_coordinateMapper.MapColorFrameToDepthSpace(_depthData, _depthPoints);
Array.Clear(_displayPixels, 0, _displayPixels.Length);
for (int colorIndex = 0; colorIndex < _depthPoints.Length; ++colorIndex)
{
DepthSpacePoint depthPoint = _depthPoints[colorIndex];
if (!float.IsNegativeInfinity(depthPoint.X) && !float.IsNegativeInfinity(depthPoint.Y))
{
int depthX = (int)(depthPoint.X + 0.5f);
int depthY = (int)(depthPoint.Y + 0.5f);
if ((depthX >= 0) && (depthX < _depthWidth) && (depthY >= 0) && (depthY < _depthHeight))
{
int depthIndex = (depthY * _depthWidth) + depthX;
byte player = _bodyData[depthIndex];
// Identify whether the point belongs to a player
if (player != 0xff)
{
int sourceIndex = colorIndex * BYTES_PER_PIXEL;
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // B
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // G
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // R
_displayPixels[sourceIndex] = 0xff; // A
}
}
}
}
Here is the initialization of the arrays:
BYTES_PER_PIXEL = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;
_colorWidth = colorFrame.FrameDescription.Width;
_colorHeight = colorFrame.FrameDescription.Height;
_depthWidth = depthFrame.FrameDescription.Width;
_depthHeight = depthFrame.FrameDescription.Height;
_bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
_bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
_depthData = new ushort[_depthWidth * _depthHeight];
_bodyData = new byte[_depthWidth * _depthHeight];
_colorData = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_depthPoints = new DepthSpacePoint[_colorWidth * _colorHeight];
Notice that the _depthPoints array has a 1920x1080 size.
Once again, the most important thing is to use the BodyIndexFrame source.
Finally I get some time to write the long awaited answer.
Lets start with some theory to understand what is really happening and then a possible answer.
We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. To do that it is enough to use the camera pinhole model:
In here, u and v are the coordinates in the image plane of the RGB camera. the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. The last part is the 3D point.
Basically, something like this, is what the Kinect SDK does. So, what could go wrong that makes the hand gets duplicated? well, actually more than one point projects to the same pixel....
To put it in other words and in the context of the problem in the question.
The depth image, is a representation of an ordered point cloud, and I am querying the u v values of each of its pixels that in reality can be easily converted to 3D points. The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily.
Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering.... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk).
If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. Then, you will have a valid mapping without any problems. This way is kind of a naive way, probably you can get better ones, since the problem is now clear :)
I hope it is clear enough.
P.S.:
I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :)

Finding extrinsics between cameras

I'm in the situation where I need to find the relative camera poses between two/or more cameras based on image correspondences (so the cameras are not in the same point). To solve this I tried the same approach as described here (code below).
cv::Mat calibration_1 = ...;
cv::Mat calibration_2 = ...;
cv::Mat calibration_target = calibration_1;
calibration_target.at<float>(0, 2) = 0.5f * frame_width; // principal point
calibration_target.at<float>(1, 2) = 0.5f * frame_height; // principal point
auto fundamental_matrix = cv::findFundamentalMat(left_matches, right_matches, CV_RANSAC);
fundamental_matrix.convertTo(fundamental_matrix, CV_32F);
cv::Mat essential_matrix = calibration_2.t() * fundamental_matrix * calibration_1;
cv::SVD svd(essential_matrix);
cv::Matx33f w(0,-1,0,
1,0,0,
0,0,1);
cv::Matx33f w_inv(0,1,0,
-1,0,0,
0,0,1);
cv::Mat rotation_between_cameras = svd.u * cv::Mat(w) * svd.vt; //HZ 9.19
But in most of my cases I get extremly weird results. So my next thought was using a full fledged bundle adjuster (which should do what i am looking for?!). Currently my only big dependency is OpenCV and they only have a undocumented bundle adjustment implementation.
So the question is:
Is there a bundle adjuster which has no dependencies and uses a licence which allows commerical use?
Are there other easy way to find the extrinsics?
Are objects with very different distances to the cameras a problem? (heavy parallax)
Thanks in advance
I'm also working on same problem and facing slimier issues.
Here are some suggestions -
Modify Essential Matrix Before Decomposition:
Modify Essential matrix before decomposition [U W Vt] = SVD(E), and new E' = diag(s,s,0) where s = W(0,0) + W(1,1) / 2
2-Stage Fundamental Matrix Estimation:
Recalculate the fundamental matrix with the RANSAC inliers
These steps should make the Rotation estimation more susceptible to noise.
you have to get 4 different solutions and select the one with the most # points having positive Z coordinates. The solution are generated by inverting the sign of the fundamental matrix an substituting w with w_inv which you did not do though you calculated w_inv. Are you reusing somebody else code?