Related
I'm struggling to get the OpenCV triangulatePoints function to work. I'm using the function with point matches generated from optical flow. I'm using two consecutive frames/positions from a single moving camera.
Currently these are my steps:
The intrinsics are given and look like one would expect:
2.6551e+003 0. 1.0379e+003
0. 2.6608e+003 5.5033e+002
0. 0. 1.
I then compute the two extrinsic matrices ([R|t]) based on (highly accurate) GPS and camera position relative to the GPS. Note that the GPS data uses a cartesian coordinate system around The Netherlands which uses meters as units (so no weird lat/lng math is required). This yields the following matrices:
Next, I simply remove the bottom row of these matrices and multiply them with the intrinsic matrices to get the projection matrices:
projectionMat = intrinsics * extrinsics;
This results in:
My image points simply consist of all the pixel coordinates for the first set,
(0, 0)...(1080, 1920)
and all pixel coordinates + their computed optical flow for the second set.
(0 + flowY0, 0 + flowX0)...(1080 + flowYN, 1920 + flowXN)
To compute the 3D points, I feed the image points (in the format OpenCV expects) and projection matrices to the triangulatePoints function:
cv::triangulatePoints(projectionMat1, projectionMat2, imagePoints1, imagePoints2, outputPoints);
Finally, I convert the outputPoints from homogeneous coordinates by dividing them by their fourth coordinate (w) and removing this coordinate.
What I end up with is some weird cone-shaped point cloud:
Now I've tried every combination of tweaks I could think of (inverting matrices, changing X/Y/Z order, inverting X/Y/Z axes, changing multiplication order...), but everything yields similarly strange results. The one thing that did give me better results was simply multiplying the optical flow values by 0.01. This results in the following point cloud:
This is still not perfect (areas far away from the camera look really curved), but much more like I would expect.
I'm wondering if anybody can spot something I'm doing wrong. Do my matrices look ok? Is the output I'm getting related to a certain problem?
What I'm quite certain of, is that it's not related to the GPS or optical flow, since I've tested multiple frames and they all yield the same type of output. I really think it has to do with the triangulation itself.
triangulatePoints() is for stereo camera , not for monocular camera!
In the opencv doc, I read the following express :
The function reconstructs 3-dimensional points (in homogeneous coordinates) by using their observations with a stereo camera. Projections matrices can be obtained from stereoRectify()
In my case I had to fix the convention used in the rotation matrices in order to calculate the projection matrix, please make sure you are using this convention for both cameras :
rotationMatrix0 = rotation_by_Y_Matrix_Camera_Calibration(camera_roll)*rotation_by_X_Matrix_Camera_Calibration(camera_pitch) *rotation_by_Z_Matrix_Camera_Calibration(camera_yaw);
Mat3x3 Algebra::rotation_by_Y_Matrix_Camera_Calibration(double yaw)
{
Mat3x3 matrix;
matrix[2][2] = 1.0f;
double sinA = sin(yaw), cosA = cos(yaw);
matrix[0][0] = +cosA; matrix[0][1] = -sinA;
matrix[1][0] = +sinA; matrix[1][1] = +cosA;
return matrix;
}
Mat3x3 Algebra::rotation_by_X_Matrix_Camera_Calibration(double pitch)
{
Mat3x3 matrix;
matrix[1][1] = 1.0f;
double sinA = sin(pitch), cosA = cos(pitch);
matrix[0][0] = +cosA; matrix[0][2] = +sinA;
matrix[2][0] = -sinA; matrix[2][2] = +cosA;
return matrix;
}
Mat3x3 Algebra::rotation_by_Z_Matrix_Camera_Calibration(double roll)
{
Mat3x3 matrix;
matrix[0][0] = 1.0f;
double sinA = sin(roll), cosA = cos(roll);
matrix[1][1] = +cosA; matrix[1][2] = -sinA;
matrix[2][1] = +sinA; matrix[2][2] = +cosA;
return matrix;
}
The problem is I can't fully understand the principles of convolution in frequency domain.
I have an image of size 256x256, which I want to convolve with 3x3 gaussian matrix. It's coefficients are (1/16, 1/8, 1/4):
PlainImage<float> FourierRunner::getGaussMask(int sz)
{
PlainImage<float> G(3,3);
*G.at(0, 0) = 1.0/16; *G.at(0, 1) = 1.0/8; *G.at(0, 2) = 1.0/16;
*G.at(1, 0) = 1.0/8; *G.at(1, 1) = 1.0/4; *G.at(1, 2) = 1.0/8;
*G.at(2, 0) = 1.0/16; *G.at(2, 1) = 1.0/8; *G.at(2, 2) = 1.0/16;
return G;
}
To get FFT of both image and filter kernel, I zero-pad them. sz_common stands for the extended size. Image and kernel are moved to the center of h and g ComplexImages respectively, so they are zero-padded at right, left, bottom and top.
I've read that size should be sz_common >= sz+gsz-1 because of circular convolution property: filter can change undesired image values on boundaries.
But it don't works: adequate results are only when sz_common = sz, when sz_common = sz+gsz-1 or sz_common = 2*sz, after IFFT I get 2-3 times smaller convolved image! Why?
Also I'm confused that filter matrix values should be multiplied by 256, like pixel values: other questions on SO contain Matlab code without such normalization. As in previous case, without such multiplying it works bad: I get black image. Why?
// fft_in is shifted fourier image with center in [sz/2;sz/2]
void FourierRunner::convolveImage(ComplexImage& fft_in)
{
int sz = 256; // equal to fft_in.width()
// Get original complex image (backward fft_in)
ComplexImage original_complex = fft_in;
fft2d_backward(fft_in, original_complex);
int gsz = 3;
PlainImage<float> filter = getGaussMask(gsz);
ComplexImage filter_complex = ComplexImage::fromFloat(filter);
int sz_common = pow2ceil(sz); // should be sz+gsz-1 ???
ComplexImage h = ComplexImage::zeros(sz_common,sz_common);
ComplexImage g = ComplexImage::zeros(sz_common,sz_common);
copyImageToCenter(h, original_complex);
copyImageToCenter(g, filter_complex);
LOOP_2D(sz_common, sz_common) g.setPoint(x, y, g.at(x, y)*256);
fft2d_forward(g, g);
fft2d_forward(h, h);
fft2d_fft_shift(g);
// CONVOLVE
LOOP_2D(sz_common,sz_common) h.setPoint(x, y, h.at(x, y)*g.at(x, y));
copyImageToCenter(fft_in, h);
fft2d_backward(fft_in, fft_in);
fft2d_fft_shift(fft_in);
// TEST DIFFERENCE BTW DOMAINS
PlainImage<float> frequency_res(sz,sz);
writeComplexToPlainImage(fft_in, frequency_res);
fft2d_forward(fft_in, fft_in);
}
I tried to zero-padd image at right and bottom, such that smaller image is copied to the start of bigger, but it also doesn't work.
I wrote convolution in spatial domain to compare results, frequency blur results are almost the same as in spatial domain (avg. error btw pixels is 5), only when sz_common = sz.
So, could you explain phenomena of zero-padding and normalization for this case? Thanks in advance.
Convolution in the Spatial Domain is equivalent of Multiplication in the Fourier Domain.
This is the truth for Continuous functions which are defined everywhere.
Yet in practice, we have discrete signals and convolution kernels.
Which require more gentle caring.
If you have an image of the size M x N and a Kernel of the size of MM x NN if you apply DFT (FFT is an efficient way to calculate the DFT) on them you'll get functions of the size of M x N and MM x NN respectively.
Moreover, the theorem above, about the multiplication equivalence requires to multiply the same frequencies one with each other.
Since practically the Kernel is much smaller than the image, usually it is zero padded to the size of the image.
Now, by applying the DFT you'll get to matrices of the same M x N size and will be able to multiply them.
Yet, this will be equivalent of the Circular Convolution between the Image and Kernel.
To apply the linear convolution you should make them both in the size of (M + MM - 1) x (N + NN - 1).
Usually this would be by applying "Replicate" boundary condition on the image and zero pad the Kernel.
Enjoy...
P.S.
Could you support a new Community Proposal for SE at - http://area51.stackexchange.com/proposals/86832/.
We need more people to follow, up vote questions with less than 10 up votes and more question to be asked.
Thank You.
I am trying to calculate scale, rotation and translation between two consecutive frames of a video. So basically I matched keypoints and then used opencv function findHomography() to calculate the homography matrix.
homography = findHomography(feature1 , feature2 , CV_RANSAC); //feature1 and feature2 are matched keypoints
My question is: How can I use this matrix to calculate scale, rotation and translation?.
Can anyone provide me the code or explanation as to how to do it?
if you can use opencv 3.0, this decomposition method is available
http://docs.opencv.org/3.0-beta/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#decomposehomographymat
The right answer is to use homography as it is defined dst = H ⋅ src and explore what it does to small segments around a particular point.
Translation
Given a single point, for translation do
T = dst - (H ⋅ src)
Rotation
Given two points p1 and p2
p1 = H ⋅ p1
p2 = H ⋅ p2
Now just calculate the angle between vectors p1 p2 and p1' p2'.
Scale
You can use the same trick but now just compare the lengths: |p1 p2| and |p1' p2'|.
To be fair, use another segment orthogonal to the first and average the result. You will see that there is no constant scale factor or translation one. They will depend on the src location.
Given Homography matrix H:
|H_00, H_01, H_02|
H = |H_10, H_11, H_12|
|H_20, H_21, H_22|
Assumptions:
H_20 = H_21 = 0 and normalized to H_22 = 1 to obtain 8 DOF.
The translation along x and y axes are directly calculated from H:
tx = H_02
ty = H_12
The 2x2 sub matrix on the top left corner is decomposed to calculate shear, scaling and rotation. An easy and quick decomposition method is explained here.
Note: this method assumes invertible matrix.
Since i had to struggle for a couple of days to create my homography transformation function I'm going to put it here for the benefit of everyone.
Here you can see the main loop where every input position is multiplied by the homography matrix h. Then the result is used to copy the pixel from the original position to the destination position.
for (tempIn[0] = 0; tempIn[0] < stride; tempIn[0]++)
{
for (tempIn[1] = 0; tempIn[1] < rows; tempIn[1]++)
{
double w = h[6] * tempIn[0] + h[7] * tempIn[1] + 1; // very important!
//H_20 = H_21 = 0 and normalized to H_22 = 1 to obtain 8 DOF. <-- this is wrong
tempOut[0] = ((h[0] * tempIn[0]) + (h[1] * tempIn[1]) + h[2])/w;
tempOut[1] =(( h[3] * tempIn[0]) +(h[4] * tempIn[1]) + h[5])/w;
if (tempOut[1] < destSize && tempOut[0] < destSize && tempOut[0] >= 0 && tempOut[1] >= 0)
dest_[destStride * tempOut[1] + tempOut[0]] = src_[stride * tempIn[1] + tempIn[0]];
}
}
After such process an image with some kind of grid will be produced. Some kind of filter is needed to remove the grid. In my code i have used a simple linear filter.
Note: Only the central part of the original image is really required for producing a correct image. Some rows and columns can be safely discarded.
For estimating a tree-dimensional transform and rotation induced by a homography, there exist multiple approaches. One of them provides closed formulas for decomposing the homography, but they are very complex. Also, the solutions are never unique.
Luckily, OpenCV 3 already implements this decomposition (decomposeHomographyMat). Given an homography and a correctly scaled intrinsics matrix, the function provides a set of four possible rotations and translations.
The question seems to be about 2D parameters. Homography matrix captures perspective distortion. If the application does not create much perspective distortion, one can approximate a real world transformation using affine transformation matrix (that uses only scale, rotation, translation and no shearing/flipping). The following link will give an idea about decomposing an affine transformation into different parameters.
https://math.stackexchange.com/questions/612006/decomposing-an-affine-transformation
I wanted to detect ellipse in an image. Since I was learning Mathematica at that time, I asked a question here and got a satisfactory result from the answer below, which used the RANSAC algorithm to detect ellipse.
However, recently I need to port it to OpenCV, but there are some functions that only exist in Mathematica. One of the key function is the "GradientOrientationFilter" function.
Since there are five parameters for a general ellipse, I need to sample five points to determine one. Howevere, the more sampling points indicates the lower chance to have a good guess, which leads to the lower success rate in ellipse detection. Therefore, the answer from Mathematica add another condition, that is the gradient of the image must be parallel to the gradient of the ellipse equation. Anyway, we'll only need three points to determine one ellipse using least square from the Mathematica approach. The result is quite good.
However, when I try to find the image gradient using Sobel or Scharr operator in OpenCV, it is not good enough, which always leads to the bad result.
How to calculate the gradient or the tangent of an image accurately? Thanks!
Result with gradient, three points
Result without gradient, five points
----------updated----------
I did some edge detect and median blur beforehand and draw the result on the edge image. My original test image is like this:
In general, my final goal is to detect the ellipse in a scene or on an object. Something like this:
That's why I choose to use RANSAC to fit the ellipse from edge points.
As for your final goal, you may try
findContours and [fitEllipse] in OpenCV
The pseudo code will be
1) some image process
2) find all contours
3) fit each contours by fitEllipse
here is part of code I use before
[... image process ....you get a bwimage ]
vector<vector<Point> > contours;
findContours(bwimage, contours, CV_RETR_LIST, CV_CHAIN_APPROX_NONE);
for(size_t i = 0; i < contours.size(); i++)
{
size_t count = contours[i].size();
Mat pointsf;
Mat(contours[i]).convertTo(pointsf, CV_32F);
RotatedRect box = fitEllipse(pointsf);
/* You can put some limitation about size and aspect ratio here */
if( box.size.width > 20 &&
box.size.height > 20 &&
box.size.width < 80 &&
box.size.height < 80 )
{
if( MAX(box.size.width, box.size.height) > MIN(box.size.width, box.size.height)*30 )
continue;
//drawContours(SrcImage, contours, (int)i, Scalar::all(255), 1, 8);
ellipse(SrcImage, box, Scalar(0,0,255), 1, CV_AA);
ellipse(SrcImage, box.center, box.size*0.5f, box.angle, 0, 360, Scalar(200,255,255), 1, CV_AA);
}
}
imshow("result", SrcImage);
If you focus on ellipse(no other shape), you can treat the value of the pixels of the ellipse as mass of the points.
Then you can calculate the moment of inertial Ixx, Iyy, Ixy to find out the angle, theta, which can rotate a general ellipse back to a canonical form (X-Xc)^2/a + (Y-Yc)^2/b = 1.
Then you can find out Xc and Yc by the center of mass.
Then you can find out a and b by min X and min Y.
--------------- update -----------
This method can apply to filled ellipse too.
More than one ellipse on a single image will fail unless you segment them first.
Let me explain more,
I will use C to represent cos(theta) and S to represent sin(theta)
After rotation to canonical form, the new X is [eq0] X=xC-yS and Y is Y=xS+yC where x and y are original positions.
The rotation will give you min IYY.
[eq1]
IYY= Sum(m*Y*Y) = Sum{m*(xS+yC)(xS+yC)} = Sum{ m(xxSS+yyCC+xySC) = Ixx*S^2 + Iyy*C^2 + Ixy*S*C
For min IYY, d(IYY)/d(theta) = 0 that is
2IxxSC - 2IyySC + Ixy(CC-SS) = 0
2(Ixx-Iyy)/Ixy = (SS-CC)/SC = S/C+C/S = Z+1/Z
While programming, the LHS is just a number, let's said N
Z^2 - NZ +1 =0
So there are two roots of Z hence theta, let's said Z1 and Z2, one will min the IYY and the other will max the IYY.
----------- pseudo code --------
Compute Ixx, Iyy, Ixy for a hollow or filled ellipse.
Compute theta1=atan(Z1) and theta2=atan(Z2)
Put These two theta into eq1 find which is smaller. Then you get theta.
Go back to those non-zero pixels, transfer them to new X and Y by the theta you found.
Find center of mass Xc Yc and min X and min Y by sort().
-------------- by hand -----------
If you need the original equation of the ellipse
Just put [eq0] into the canonical form
You're using terms in an unusual way.
Normally for images, the term "gradient" is interpreted as if the image is a mathematical function f(x,y). This gives us a (df/dx, df/dy) vector in each point.
Yet you're looking at the image as if it's a function y = f(x) and the gradient would be f(x)/dx.
Now, if you look at your image, you'll see that the two interpretations are definitely related. Your ellipse is drawn as a set of contrasting pixels, and as a result there are two sharp gradients in the image - the inner and outer. These of course correspond to the two normal vectors, and therefore are in opposite directions.
Also note that your image has pixels. The gradient is also pixelated. The way your ellipse is drawn, with a single pixel width means that your local gradient takes on only values that are a multiple of 45 degrees:
▄▄ ▄▀ ▌ ▀▄
I'm trying to use LogPolar transform to obtain the scale and the rotation angle from two images. Below are two 300x300 sample images. The first rectangle is 100x100, and the second rectangle is 150x150, rotated by 45 degree.
The algorithm:
Convert both images to LogPolar.
Find the translational shift using Phase Correlation.
Convert the translational shift to scale and rotation angle (how to do this?).
My code:
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/imgproc/imgproc_c.h>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>
int main()
{
cv::Mat a = cv::imread("rect1.png", 0);
cv::Mat b = cv::imread("rect2.png", 0);
if (a.empty() || b.empty())
return -1;
cv::imshow("a", a);
cv::imshow("b", b);
cv::Mat pa = cv::Mat::zeros(a.size(), CV_8UC1);
cv::Mat pb = cv::Mat::zeros(b.size(), CV_8UC1);
IplImage ipl_a = a, ipl_pa = pa;
IplImage ipl_b = b, ipl_pb = pb;
cvLogPolar(&ipl_a, &ipl_pa, cvPoint2D32f(a.cols >> 1, a.rows >> 1), 40);
cvLogPolar(&ipl_b, &ipl_pb, cvPoint2D32f(b.cols >> 1, b.rows >> 1), 40);
cv::imshow("logpolar a", pa);
cv::imshow("logpolar b", pb);
cv::Mat pa_64f, pb_64f;
pa.convertTo(pa_64f, CV_64F);
pb.convertTo(pb_64f, CV_64F);
cv::Point2d pt = cv::phaseCorrelate(pa_64f, pb_64f);
std::cout << "Shift = " << pt
<< "Rotation = " << cv::format("%.2f", pt.y*180/(a.cols >> 1))
<< std::endl;
cv::waitKey(0);
return 0;
}
The log polar images:
For the sample image images above, the translational shift is (16.2986, 36.9105). I have successfully obtain the rotation angle, which is 44.29. But I have difficulty in calculating the scale. How to convert the given translational shift to obtain the scale?
You have two Images f1, f2 with f1(m, n) = f2(m/a , n/a) That is f1 is scaled by factor a
In logarithmic notation that is equivalent to f1(log m, log n) = f2(logm − log a, log n − log a) where log a is the shift in your phasecorrelated image.
Compare B. S. Reddy, B. N. Chatterji: An FFT-Based Technique for Translation, Rotation and
Scale-Invariant Image Registration, IEEE Transactions On Image Processing Vol. 5
No. 8, IEEE, 1996
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.4387&rep=rep1&type=pdf
here is python version
which tells
ir = abs(ifft2((f0 * f1.conjugate()) / r0))
i0, i1 = numpy.unravel_index(numpy.argmax(ir), ir.shape)
angle = 180.0 * i0 / ir.shape[0]
scale = log_base ** i1
The value for the scale factor is indeed exp(pt.y). However, since you used a "magnitude scale parameter" of 40 for the cvLogPolar function, you now need to divide pt.x by 40 to get the correct value for the displacement:
Scale = exp( pt.x / 40) = exp(16.2986 / 40) = 1.503
The value of the "magnitude scale parameter" for the cvLogPolar function does not affect the displacement produced by the rotation angle pt.x, because according to the math, it cancels out. For that reason, your formula for the rotation gives the correct value.
On another note, I believe the formula for the rotation should actually be:
Rotation = pt.y*360/(a.cols)
But, for some strange reason, the ">> 1" that you added is causing the result to be multiplied by 2 (which I believe you compensated for by multiplying by 180 instead of 360?) Remove it, and you'll see what I mean.
Also, ">>1" is causing a division by 2 in:
cvPoint2D32f(a.cols >> 1, a.rows >> 1)
If to set the center parameter of the cvLogPolar function to the center of the image (which is what you want):
cvPoint2D32f(a.cols/2, a.rows/2)
and
cvPoint2D32f(b.cols/2, b.rows/2)
then, you'll also get the correct value for the rotation (i.e. the same value that you got), and for the scale.
This thread was helpful in getting me started on rotation-invariant phase correlation, so I hope my input will help resolve any lingering issues.
We aim to calculate the scale and rotation (which is incorrectly calculated in the code). Let's start by gathering the equations from the logPolar docs. There they state the following:
(1) I = (dx,dy) = (x-center.x, y-center.y)
(2) rho = M * ln(magnitude(I))
(3) phi = Ky * angle(I)_0..360
Note: rho is pt.x and phi is pt.y in the code above
We also know that
(4) M = src.cols/ln(maxRadius)
(5) Ky = src.rows/360
First, let's solve for scale. Solving for magnitude(I) (i.e. scale) in equation 2, we get
(6) magnitude(I) = scale = exp(rho/M)
Then we substitute for M and simplify to get
(7) magnitude(I) = scale = exp(rho*ln(maxRadius)/src.cols) = pow(maxRadius, rho/src.cols)
Now let's solve for rotation. Solving for angle(I) (i.e. rotation) in equation 3, we get
(8) angle(I) = rotation = phi/Ky
Then we substitute for Ky and simplify to get
(9) angle(I) = rotation = phi*360/src.rows
So, scale and rotation can be calculated using equations 7 and 9, respectively. It might be worth noting that you should use equation 4 for calculation M and Point2f center( (float)a.cols/2, (float)a.rows/2 ) for calculating center as opposed to what is in the code above. There are good bits of info in this logpolar example opencv code.
From the values by phase correlation, the coordinates are rectangular coordinates hence (16.2986, 36.9105) are (x,y). The scale is calculated as
scale = log((x^2 + y^ 2)^0.5) which is approximately 1.6(near to 1.5).
When we calculate angle by using formulae theta = arctan(y/x) = 66(approx).
The theta value is way of the real value(45 in this case).