OpenCV estimateAffine3D breaks for coplanar points - c++

I am trying to use OpenCV's estimateAffine3D() function to get the affine transformation between two sets of coplanar points in 3D. If I hold one variable constant, I find there is a constant error in the translation component of that variable.
My test code is:
std::vector<cv::Point3f> first, second;
std::vector<uchar> inliers;
cv::Mat aff(3,4,CV_64F);
for (int i = 0; i <6; i++)
{
first.push_back(cv::Point3f(i,i%3,1));
second.push_back(cv::Point3f(i,i%3,1));
}
int ret = cv::estimateAffine3D(first, second, aff, inliers);
std::cout << aff << std::endl;
The output I expect is:
[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
Edit: My expectation is incorrect. The matrix does not decompose into [R|t] for the case of constant z-coordinates.
but what I get (with some rounding for readability) is:
[1 0 0 0]
[0 1 0 0]
[0 0 0.5 0.5]
Is there a way to fix this behavior? Is there a function which does the same on sets of 2D points?

No matter how I run your code I get fine output. For example when I run it exactly as you posted it I get.
[1,0,0 ,0]
[0,1,0 ,0]
[0,0,.5,.5]
which is correct because the 4th element of a homogeneous coordinate is assumed to be 1. When I run it with 2 as the z value I get
[1,0,0 ,0]
[0,1,0 ,0]
[0,0,.8,.4]
which also works (.8*2+.4 = 2). Are you sure you didn't just read aff(2,2) wrong?

The key problem is:
Your purpose is to estimate the rotation and translation between two sets of 3D points, but the OpenCV function estimateAffine3D() is not for that purpose. As its name suggests, this function is to compute the affine transformation between two sets of 3D points. When computing the affine transformation, the constraints on the rotation matrix is not considered. Of course, the result is not correct. To obtain the rotation and translation, you need to implement the SVD based algorithm.You may search "absolute orientation" in google. This is a classic and closed-form algorithm.

Related

I am having trouble using Eigen translate() and rotate() and doesn't behave as expected

I am trying to find transformation matrices between different coordinate frames. In order to rotate, we basically multiply the rotation matrices and append the translation vector to obtain the final homogeneous matrix.
Here I have attached a snippet of my code where tf_matrix and output are Eigen::Transform variables.
tf_matrix.setIdentity();
tf_matrix.rotate( output.rotation() );
tf_matrix.translate( output.translation() );
When I look at their outputs, it seems like it is generating the rotation and translation matrix into 4x4 matrices and multiplying it instead of appending the translation vector
Output:
//This is rotation matrix
output.rotation()
1 0 0
0 0.0707372 -0.997495
0 0.997495 0.0707372
//translation vector
output.translation()
0.3
0.3
0.3
//After applying rotate() and translate() tf_matrix.transform.matrix() looks like the below
1 0 0 0.3
0 0.0707372 -0.997495 -0.278027
0 0.997495 0.0707372 0.32047
0 0 0 1
//Printing just the tf_matrix.transform.rotation()
1 0 0
0 0.0707372 -0.997495
0 0.997495 0.0707372
//Printing just the tf_matrix.transform.translation()
0.3
-0.278027
0.32047
//Ideally it should look like the below
1 0 0 0.3
0 0.0707372 -0.997495 0.3
0 0.997495 0.0707372 0.3
0 0 0 1
What did I try
I tried to generate a simple 4x4 identity Eigen::Trnasform and append it to the output matrix after the rotation, but the value 1 of the identity matrix gets added
I also tried, multiply tf_matrix.col(3) += output_matrix.col(3) , but it faces similar issues as above.
I am not sure how to go about rotation because my understanding is that I need to just multiply the 3x3 rotation matrix and append/add the 3x3 translation vector to the final column of this matrix. It seems like Eigen should be able to handle this without me writing extra code. But, this rotate, translate clearly doesn't give the right answers.
Could you please point out what am I missing if any or if there's a better way to go about it.
The order of operations is reversed from what you seem to expect: see here. Suppose you have a coordinate in R3 that you want to translate (matrix Mt) and then rotate (matrix Mr), you might expect to write Vec3 = Vec3 * Mt * Mr. Many game-engines and math libraries (eg Ogre, XNA, CRYENGINE, Unity, I believe) use this order of operations. However, Eigen requires Vec3 = Mr * Mt * Vec3; in Eigen, the coordinate being passed through is a column vector, in game engines, it is a row vector. Correspondingly, the matrices in the two different forms are transposes of one another.
To solve you problem:
tf_matrix.setIdentity();
tf_matrix = output.rotation() * tf_matrix;
tf_matrix = translate * tf_matrix;
or
tf_matrix = translate * output.rotation();
The pretranslate() and premultiply() methods can also be used to do this.

combined Scharr derivatives in opencv

I have few questions regarding Scharr derivatives and its OpenCV implementation.
I am interested in second order image derivatives with (3X3) kernels.
I started with Sobel second derivative, which failed to find some thin lines in the images. After reading the Sobel and Charr comparison in the bottom of this page, I decided to try Scharr instead by changing this line:
Sobel(gray, grad, ddepth, 2, 2, 3, scale, delta, BORDER_DEFAULT);
to this line:
Scharr(img, gray, ddepth, 2, 2, scale, delta, BORDER_DEFAULT );
My problem is that it seems like cv::Scharr allows performing an only first order of one partial derivative at a time, So I get the following error:
error: (-215) dx >= 0 && dy >= 0 && dx+dy == 1 in function getScharrKernels
(see assertion line here)
Following this restriction, I have a few questions regarding Scharr derivatives:
Is it considered bad-practice to use high order Scharr derivatives? Why did OpenCV choose to assert dx+dy == 1?
If I am to call Scharr twice for each axis, What is the correct way to combine the results?
I am currently using:
addWeighted( abs_grad_x, 0.5, abs_grad_y, 0.5, 0, grad );
but I am not sure that this how the Sobel function combines the two axis and in what order it should be done for all 4 derivatives.
If I am to compute the (dx=2,dy=2) derivative by using 4 different kernels, I would like to reduce processing time by unifying all 4 kernels into 1 before applying it on the image (I assume that this is what cv::Sobel does). Is there a reasonable way to create such combined Shcarr kernel and convolve it with my image?
Thanks!
I've never read the original Scharr paper (the dissertation is in German) so I don't know the answer to why the Scharr() function doesn't allow higher order derivatives. Maybe because of the first point I make in #3 below?
The Scharr function is supposed to be a derivative. And the total derivative of a multivariable function f(x) = f(x0, ..., xN) is
df/dx = dx0*df/dx0 + ... + dxN*df/dxN
That is, the sum of the partials each multiplied by the change. In the case of images of course, the change dx in the input is a single pixel, so it's equivalent to 1. In other words, just sum the partials; not weighting them by half. You can use addWeighted() with 1s as the weights, or you can just sum them, but to make sure you won't saturate your image you'll need to convert to a float or 16-bit image first. However, it's also pretty common to compute the Euclidean magnitude of the derivatives, too, if you're trying to get the gradient instead of the derivative.
However, that's just for the first-order derivative. For higher orders, you need to apply some chain ruling. See here for the details of combining a second order.
Note that an optimized kernel for first-order derivatives is not necessarily the optimal kernel for second-order derivatives by applying it twice. Scharr himself has a paper on optimizing second-order derivative kernels, you can read it here.
With that said, filters are split into x and y directions to make linear separable filters, which basically turn your 2d convolution problem into two 1d convolutions with smaller kernels. Think of the Sobel and Scharr kernels: for the x direction, they both just have the single column on either side with the same values (except one is negative). When you slide the kernel across the image, at the first location, you're multiplying the first column and the third column by the values in your kernel. And then two steps later, you're multiplying the third and the fifth. But the third was already computed, so that's wasteful. Instead, since both sides are the same, just multiply each column by the vector since you know you need those values, and then you can just look up the values for the results in column 1 and 3 and subtract them.
In short, I don't think you can combine them with built-in separable filter functions, because certain values are positive sometimes, and negative otherwise; and the only way to know when applying a filter linearly is to do them separately. However, we can examine the result of applying both filters and see how they affect a single pixel, construct the 2D kernel, and then convolve with OpenCV.
Suppose we have a 3x3 image:
image
=====
a b c
d e f
g h i
And we have the Scharr kernels:
kernel_x
========
-3 0 3
-10 0 10
-3 0 3
kernel_y
========
-3 -10 -3
0 0 0
3 10 3
The result of applying each kernel to this image gives us:
image * kernel_x
================
-3a -10b -3c
+0d +0e +0f
+3g +10h +3i
image * kernel_y
================
-3a +0b +3c
-10d +0e +10f
-3g +0h +3i
These values are summed and placed into pixel e. Since the sum of both of these is the total derivative, we sum all these values into pixel e at the end of the day.
image * kernel_x + image * kernel y
===================================
-3a -10b -3c +3g +10h +3i
-3a +3c -10d +10f -3g +3i
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
-6a -10b +0c -10d +10f +0g +10h +6i
And this is the same result we'd have gotten if we multiplied by the kernel
kernel_xy
=============
-6 -10 0
-10 0 10
0 10 6
So there's a 2D kernel that does a single-order derivative. Notice anything interesting? It's just the addition of the two kernels. Is that surprising? Not really, as x(a+b) = ax + bx. Now we can pass that into filter2D()
to compute the addition of the derivatives. Does that actually give the same result?
import cv2
import numpy as np
img = cv2.imread('cameraman.png', 0).astype(np.float32)
kernel = np.array([[-6, -10, 0],
[-10, 0, 10],
[0, 10, 6]])
total_first_derivative = cv2.filter2D(img, -1, kernel)
scharr_x = cv2.Scharr(img, -1, 1, 0)
scharr_y = cv2.Scharr(img, -1, 0, 1)
print((total_first_derivative == (scharr_x + scharr_y)).all())
True
Yep. Now I guess you can just do it twice.

What does lu_factorize return?

boost::number::ublas contains the M::size_type lu_factorize(M& m) function. Its name suggests that it performs the LU decomposition of a given matrix m, i.e. should produce two matrices that m = L*U. There seems to be no documentation provided for this function.
It is easy to deduce that it returns 0 to indicate successful decomposition, and a non-zero value when the matrix is singular. However, it is completely unclear where is the result. Taking the matrix by reference suggests that it works in-place, however it should produce two matrices (L and U) not one. So what does it do?
There is no documentation in boost, but looking at the documentation of SciPy's lu_factor one can see, that it's not uncommon to return one result for the LU decomposition.
This is enough, because in a typical approach to LU decomposition, L's diagonal consists of ones only, as presented in this answer from Mathematics, for example.
So, it is possible to fit both L and U into one matrix, putting L in result's lower part, omitting the diagonal (which is assumed to contain only ones), and U in the upper part. For example, for a 3x3 problem the result is:
u11 u12 u13
m = l21 u22 u23
l31 l32 u33
which implies:
1 0 0
L = l21 1 0
l31 l32 1
and
u11 u12 u13
U = 0 u22 u23
0 0 u33
Inspecting boost's void lu_substitute(const M& m, vector_expression<E>& e) function, from the same namespace seems to confirm this. It solves the equation LUx = e, where both L and U are contained in its m argument in two steps.
First solve Lz = e for z, where z = Ux, using lower part of m:
inplace_solve(m, e, unit_lower_tag ());
then, having computed z = Ux (with e modified in place), Ux = e can be solved, using upper part of m:
inplace_solve(m, e, upper_tag ());
inplace_solve is mentioned in the documentation, and it:
Solves a system of linear equations with triangular form, i.e. A is triangular.
So everything seems to make sense.
The boost doesn't have document of LU factorization (a lower triangular matrix L and upper triangular matrix U), but the source code shared with the public.
If the code is hard to follow, please check the webpage by Nick Higham. It had an detailed explanation. Here are an example from the link:
Let's say we need to solve Ax = b.
  (1) Make LU from input matrix, A
[3 -1 1  1]
[-1  3 1 -1] ->
[-1 -1 3  1]
[1  1 1  3]
Low
[1     0    0    0]
[-1/3   1   0   0]
[-1/3 -1/2 1 0]
[1/3    1/2  0 1]
Upper
[3    -1   1   1]
[0 8/3 4/3 -2/3]
[0   0   4    1]
[0   0   0    3]
   This example looks straight forward to human but algorithm wise could be numerous steps. This is why LU Factorization came. Methodically, Relation with Gaussian Elimination, Schur Complements, and Block Implementations are some.
  (2) Solve the triangular systems Ly = b and Ux = y, since then b = L(Ux).

OpenGL custom rendering pipeline: Perspective matrix

I am attempting to work in LWJGL to display a simple quad using my own matrices. I've been looking around for awhile and have found a few perspective matrix implementations, these two in particular:
[cot(fov/2)/a 0 0 0]
[0 cot(fov/2) 0 0]
[0 0 -f/(f-n) -1]
[0 0 -f*n/(f-n) 0]
and:
[cot(fov/2)/a 0 0 0]
[0 cot(fov/2) 0 0]
[0 0 -(f+n)/(f-n) -1]
[0 0 -(2*f*n)/(f-n) 0]
Both of these provide the same effect, as expected (got them from here and here, respectively). The issue is in my understanding of how multiplying this by the modelview matrix, then a vertex, then dividing each x, y, and z value by its w value gives a screen coordinate. More specifically, if I multiply either of these by the modelview matrix then by a vertex (10, 10, 0, 1), it gives a w=0. That in itself is a big smack in the face. I conclude either the matrices are wrong, or I am missing something completely. In my actual test program, the vertices don't even end up on screen even though the camera position at (0,0,0) and no rotation would make it so. I even have tried many different z values, positive and negative, to see if it was just a clipping plane. Am I missing something here?
EDIT: After a lot of checking over, I've narrowed down the problem I am facing. The biggest issue is that the z-axis does not appear to be remapped to the range I specify (n to f). Any object just zooms in or out a little bit when I translate it along the z-axis then pops out of existence as it moves past the range [-1, 1]. I think this is also making me more confused. I set my far plane to 100 and my near to 0.1, and it behaves like anything but.
Both of these provide the same effect, as expected
While the second projection matrix form is very standard, the first one gives a different effect. If you have z==1 and w==0, the projection will be:
Matrix 1: -f/(f-n) / -f*n/(f-n) = f / f*n = 1 / n
Matrix 2: -(f+n)/(f-n) / -(2*f*n)/(f-n) = (f+n) / (2*f2n)
The result is clearly different. You should always use the second form.
if I multiply either of these by the modelview matrix then by a vertex
(10, 10, 0, 1), it gives a w=0. That in itself is a big smack in the
face
For a focal length d the projection is computed as (ignoring aspect ratio):
x'= d*x/z = x / w
y'= d*y/z = y / w
where
w = z / d
If you have z==0 this means that you want to project a point that is already in the eye and only points beyond d are visible. In practice this point will be clipped because z is not within the range n (near) and f (far) (n is expected as a positive constant)

(Pseudo)-Inverse of N by N matrix with zero determinant

I would like to take the inverse of a nxn matrix to use in my GraphSlam.
The issues that I encountered:
.inverse() Eigen-library (3.1.2) doesn't allow zero values, returns NaN
The LAPACK (3.4.2) library doesn't allow to use a zero determinant, but allows zero values (used example code from Computing the inverse of a matrix using lapack in C)
Seldon library (5.1.2) wouldn't compile for some reason
Did anyone successfully implemented an n x n matrix inversion code that allows negative, zero-values and a determinant of zero? Any good library (C++) recommendations?
I try to calculate the omega in the following for GraphSlam:
http://www.acastano.com/others/udacity/cs_373_autonomous_car.html
Simple example:
[ 1 -1 0 0 ]
[ -1 2 -1 0 ]
[ 0 -1 1 0 ]
[ 0 0 0 0 ]
Real example would be 170x170 and contain 0's, negative values, bigger positive values.
Given simple example is used to debug the code.
I can calculate this in matlab (Moore-Penrose pseudoinverse) but for some reason I'm not able to program this in C++.
A = [1 -1 0 0; -1 2 -1 0; 0 -1 1 0; 0 0 0 0]
B = pinv(A)
B=
[0.56 -0.12 -0.44 0]
[-0.12 0.22 -0.11 0]
[-0.44 -0.11 0.56 0]
[0 0 0 0]
For my application I can (temporarily) remove the dimension with zero's.
So I am going to remove the 4th column and the 4th row.
I can also do that for my 170x170 matrix, the 4x4 was just an example.
A:
[ 1 -1 0 ]
[ -1 2 -1 ]
[ 0 -1 1 ]
So removing the 4th column and the 4th row wouldn’t bring a zero determinant.
But I can still have a zero determinant if my matrix is as above.
This when the sum of each row or each column is zero. (Which I will have all the time in GraphSlam)
The LAPACK-solution (Moore-Penrose Inverse based) worked if the determinant was not zero (used example code from Computing the inverse of a matrix using lapack in C). But failed as a "pseudoinverse" with a determinant of zero.
SOLUTION: (all credits to Frank Reininghaus), using SVD(singular value decomposition)
http://sourceware.org/ml/gsl-discuss/2008-q2/msg00013.html
Works with:
Zero values (even full 0 rows and full 0 columns)
Negative values
Determinant of zero
A^-1:
[0.56 -0.12 -0.44]
[-0.12 0.22 -0.11]
[-0.44 -0.11 0.56]
If all you want is to solve problem of the form Ax=B (or equivalently compute products of the form A^-1 * b), then I recommend you not to compute the inverse or pseudo-inverse of A, but directly solve for Ax=b using an appropriate rank-revealing solver. For instance, using Eigen:
x = A.colPivHouseholderQr().solve(b);
x = A.jacobiSvd(ComputeThinU|ComputeThinV).solve(b);
Your Matlab command does not calculate the inverse in your case because the matrix has determinat zero. The pinv commmand calculates the Moore-Penrose pseudoinverse. pinv(A) has some of, but not all, the properties of inv(A).
So you are not doing the same thing in C++ and in Matlab!
Previous
As in my comment. Now as answer. You must make sure that you invert invertible matrices. That means
det A != 0
Your example matrix has determinant equals zero. This is not an invertible matrix. I hope you don't try on this one!
For example a given matrix has determinant zero if there is a full row or column of zero entries.
Are you sure it's because of the zero/negative values, and not because your matrix is non-invertible?
A matrix only has an inverse if its determinant is nonzero (mathworld link), and the matrix example you posted in the question has a zero determinant and so it has no inverse.
That should explain why those libraries do not allow you to take the inverse of the matrix given, but I can't say if the same reasoning holds for your full size 170x170 matrix.
If your matrixes is kind of covariance or weight matrices you can use "generalized cholesky inversion" instead of SVD. The results will be more acceptable for practical use