Bundle adjustment with focal length correction not converging - computer-vision

I am trying to add a new feature to our existing implementation of the bundle adjustment in code.
The algorithm uses the Gauss-Newton method and has been working for well over a decade. The least squares "A" matrix is populated using initial approximations of the image exterior orientations, as well as the object points. The book from Kraus - "Photogrammetry: Fundamental and Standard Processes" - was used for this.
A while ago, self calibration was added to this algorithm, however, only the formulae by Ebner and Gruen were added (formula for Ebner here). I am now trying to add the "Brown-Conrady" formula which is well documented in this paper (final algorithm under "concluding remarks"). It uses 10 parameters to determine deltaX and deltaY.
When I include all the parameters except for deltaC (the correction to the focal length/camera constant), our algorithm works and the adjustment converges and produces the desired residuals. However, as soon as I introduce deltaC (which mathematically I see as "allowing" the image points to scale by some amount in X and Y) the adjustment diverges.
The input to the algorithm is a large set of already undistorted aerial images, along with their control points and a large number of image points. We are therefore expecting the distortion/correction parameters to be close to zero, since the images are already undistorted. This is indeed the case for Ebner and Grun.
For Brown, however, some of the parameters (and therefore the delta corrections) grow uncontrollably. I have tried scaling these parameters (the principle points and focal length correction deltaC) so that they are closer in magnitude to the other parameters (K1,K2,K3,P1,P2) however this did not help - the adjustment diverges all the same.
Is there any reason for this? Could it perhaps be because the images are already undistorted? Or something to do with this aerial job in particular?
I have not provided code as it is simply too complex, however I feel it is maybe an understanding of the implementation as opposed to specific code where I am going wrong.
Thanks!

Related

Integrate RANSAC to compute essential matrix

I have calculated the essential matrix using the 5 point algorithm. I'm not sure how to integrate it with ransac so it gives me a better outcome.
Here is the source code. https://github.com/lunzhang/openar/blob/master/src/utils/5point/computeEssential.js
Currently, I was thinking about computing the essential matrix for 5 random points then convert the essential matrix to fundamental and see the error threshold using this equation x'Fx = 0. But then I'm not sure, what to do after.
How do I know which points to set as outliners? If the errors too big, do I set them as outliners right away? Could it be possible that one point could produce different essential matrices depending on what the other 4 points are?
Well, here is a short explanation, in pseudo-code, of how you can integrate this with ransac. Basically, all Ransac does is compute your model (here the Essential) using a subset of the data, and then sees if the rest of data "is happy" with that result. It keeps the result for which a highest portion of the dataset "is happy".
highest_number_of_happy_points=-1;
best_estimated_essential_matrix=Identity;
for iter=1 to max_iter_number:
n_pts=get_n_random_pts(P);//get a subset of n points from the set of points P. You can use 5, but you can also use more.
E=compute_essential(n_pts);
number_of_happy_points=0;
for pt in P:
//we want to know if pt is happy with the computed E
err=cost_function(pt,E);//for example x^TFx as you propose, or X^TEX with the essential.
if(err<some_threshold):
number_of_happy_points+=1;
if(number_of_happy_points>highest_number_of_happy_points):
highest_number_of_happy_points=number_of_happy_points;
best_estimated_essential_matrix=E;
This should do the trick. Usually, you set some_threshold experimentally to a low value. There are of course more sophisticated Ransacs, you can easily find them by googling.
Your idea of using x^TFx is fine in my opinion.
Once this Ransac completes, you will have best_estimated_essential_matrix. The outliers are those that have a x^TFx value that is greater than your optional threshold.
To answer your final question, yes, a point could produce a different matrix given 4 different points, because their spatial configuration is different (you can have degenerate situations). In an ideal settings this wouldn't be the case, but we always have noise, matching errors and so on, so what happens in the end is that the equations you obtain with 5 points wont produce the exact same results as for 5 other points.
Hope this helps.

OpenCV - PCA analysis on BW image - area vs shape - is there already an implementation?

The shape of an object is detected on a bw image. The object is a black continuous shape, the background is white.
We use PCA (http://docs.opencv.org/3.1.0/d1/dee/tutorial_introduction_to_pca.html) to get the object direction and align the object. Currently the shape itself (the points on the contour) is the input to the opencv PCA implementation. This usually works very well. But from time to time there is small dirt on the object border, causing the shape to pass around the dirt. This causes more points and more weight on one side, slightly turning the object.
Idea: Instead of the contour, we use the area of the object as input for our PCA analysis. The issue there, to check all points on if they are inside the contour and then use them for PCA slows the application down. This part will be about 52352 times slower.
New Approach: We take random points in the image, check if they are inside the shape and if so, use them for our PCA. We have to see if we can get the consistent quality needed from this approach.
Is there already a similar implementation in opencv which is using the area instead of the shape?
Another approach would be to put a mesh over the object and use the mesh points inside the object for PCA.
Is there already something similar available one can just use or does one quickly need to implement something like this?
Going for straight lines around the object isn't an option.
Given that we have received very limited information about your problem (posting images would help a lot) and you do not seem to know the probability density function of the noise, your best bet is to consider the noise to be Gaussian.
As such, and following your intuition, my suggested approach is to take a few (by a few I mean statistically relevant but not raising the computation time that much) random points that lie inside the object and compute the PCA.
Repeat this procedure in an iterative loop and store somewhere the resulting rotation angles you get from the application of the PCA to the object shape.
Stop once you have enough point, compute the mean of the rotation angles: this is a decent estimate of the true angle. Compute also the standard deviation to get a measure of the quality of your estimation. By "enough points" you can consider that ~30 points is usually considered to be "enough" for being representative of the underlying population according to the central limit theorem.
If you want, you can improve on this approach in many ways, for example doing robust estimation of the true angle once you have collected enough points. It all depends on the data you have at hand...take my suggestion just as a starting point.
There are few parameters that you could change, in which may improve your system.
First is the threshold you use to binarize your image. I don't know what your application is about, but you could use other color systems, or normalize your image by cromacity, and after that, apply the new threshold.
Other aspect is to exclude shapes (contours) that have bigger or smaller area that what you are expecting.
To add up, you may use a blur filter before detect contours.
I don't know how the noise looks but when you say "small dirt" I think it might be only some few pixels that is a lot smaller then the object it self, but it might be attached to the object. To reduce this noise it might be possible to perform an opening (morphology) on the binary image.
http://docs.opencv.org/2.4/doc/tutorials/imgproc/opening_closing_hats/opening_closing_hats.html

Camera calibration with single image? It seems to work, but am I missing something?

I have to do camera calibration. I understand the general concept and I have it working, but in many guides it says to use many images or at the very least two with different orientation. Why exactly is this necessary? I seem to be getting reasonably good results with a single image of 14x14 points:
I find the points with cv::findCirclesGrid and use cv::calibrateCamera to find the extrinsic and intrinsic parameters. Intrinsic guess is set to false. Principal point and aspect ratio are not fixed while tangential distortion is fixed to zero.
I then use cv::getOptimalNewCameraMatrix, cv::initUndistortRectifyMap and cv::remap to restore the image.
It seems to me the result is pretty good, but am I missing something? Is it actually wrong and just waiting to cause problems for me later?
Also before you ask why I don't just use multiple images to be sure; the software I am writing will be used with a semi-fixed camera stand to calibrate several cameras one at a time. So first off the stand would need to be modified in order to position the pattern at an angle or off centre, as currently it can only be moved closer or further away. Secondly the process should not be unnecessarily slowed down by having to capture more images.
Edit: To Micka asking "what happens if your viewing angle isnt 90° on the pattern? Can you try to rotate the pattern away from the camera?". I get a somewhat similar result, although it finds less distortion. From looking at the borders with a ruler it seems that the calibration from 90° is better, but it is really hard to tell.
Having more patterns in different orientations is necessary to avoid the situation where the instrinsic parameters are very inaccurate but the pixel reprojection error of the undistortion is still low because different errors compensate.
To illustrate this point: if you only have one image taken at 90 degree viewing angle, then a change in horizontal focal length can be poorly distinguished from viewing the pattern a little bit from the side. The only clue that sets the two parameters apart is the tapering of the lines, but that measurement is very noisy. Hence you need multiple views at significant angles to separate this aspect of the pose from the intrinsic parameters.
If you know your image is viewed at 90 degrees, you can use this to your advantage but it requires modification of the opencv algorithm. If you are certain that all images will be captured from the same pose as your calibration image, then it does not really matter as the undistortion will be good even if the individual calibration parameters are inaccurate but compensating (i.e. they compensate well for this specific pose, but poorly for other poses).
As stated here, the circle pattern (in theory) gets along quite well with only a single image. The reason that you would need multiple images is the noise present in the input data.
My suggestions would be to compare the results of different input images. If the error is low, you will probably be able to get away with one sample.
i check the paper , which opencv method using...that is zhangzhengyou's method.
that is n=1 , you can only get the focus .
in the paper , he says "
If n 3, we will have in general a unique solution b defined up to a
scale factor. If n à 2, we can impose the skewless constraint à 0,
i.e., â0; 1; 0; 0; 0; 0äb à 0, which is added as an additional
equation to (9). (If n à 1, we can only solve two camera intrinsic
parameters, e.g., and , assuming u0 and v0 are known (e.g., at the
image center) and à 0, and that is indeed what we did in [19] for
head pose determination based on the fact that eyes and mouth are
reasonably coplanar. In fact, Tsai [23] already mentions that focal
length from one plane is possible, but incorrectly says that aspect
ratio is not.) "

Calibrating camera with a glass-covered checkerboard

I need to find intrinsic calibration parameters of a single. To do this I take several images of checkerboard patten from different angles and then use calibration software.
To make the calibration pattern as flat as possible, I print it on a paper and cover with a 3mm glass. Obviously image of the pattern is modified by glass, because it has a different refraction coefficient compared to air.
Extrinsic parameters will be distorted by the glass. This is because checkerboard is not in place we see it in. However, if thickness of the glass and refraction coefficients of glass and air are known, it seems to be possible to recover extrinsic parameters.
So, the questions are:
Can extrinsic parameters be calculated, and if yes, then how? (This is not necessary right now, just an interesting theoretical question)
Are intrinsic calibration parameters obtained from these images equivalent to ones obtained from a usual calibration procedure (without cover glass)?
By using a glass, calibration parameters as reported by GML Camera Calibration Toolbox (based on OpenCV), become much more accurate. (Does it make any sense at all?) But this approach has a little drawback - unwanted reflections, especially from light sources.
I commend you on choosing a very flat support (which is what I recommend myself here). But, forgive me for asking the obvious question, why did you cover the pattern with the glass?
Since the point of the exercise is to ensure the target's planarity and nothing else, you might as well glue the side opposite to the pattern of the paper sheet and avoid all this trouble. Yes, in time the pattern will get dirty and worn and need replacement. So you just scrape it off and replace it: printing checkerboards is cheap.
If, for whatever reasons, you are stuck with the glass in the front, I recommend doing first a back-of-the-envelope calculation of the expected ray deflection due to the glass refraction, and check if it is actually measurable by your apparatus. Given the nominal focal length in mm of the lens you are using and the physical width and pixel density of the sensor, you can easily work it out at the image center, assuming an "extreme" angle of rotation of the target w.r.t the focal axis (say, 45 deg), and a nominal distance. To a first approximation, you may model the pattern as "painted" on the glass, and so ignore the first refraction and only consider the glass-to-air one.
If the above calculation suggests that the effect is measurable (deflection >= 1 pixel), you will need to add the glass to your scene model and solve for its parameters in the bundle adjustment phase, along with the intrinsics and extrinsics. To begin with, I'd use two parameters, thickness and refraction coefficient, and assume both faces are really planar and parallel. It will just make the computation of the corner projections in the cost function a little more complicated, as you'll have to take the ray deflection into account.
Given the extra complexity of the cost function, I'd definitely write the model's code to use Automatic Differentiation (AD).
If you really want to go through this exercise, I'd recommend writing the solver on top of Google Ceres bundle adjuster, which supports AD, among many nice things.

3D reconstruction C++ with OpenCV..Fundamental Matrix too large

Ok I am posting my conundrums of life to stackoverflow after 4 days of mindless programming when nothing seems to get things right or atleast close to right. sorry for being a little dramatic but I feel like a lousy programmer today.
Anyway, my problem is:
To obtain Fundamental matrix using RANSAC (N>8).
I have two images with wide baseline but sufficient overlap so that adequate amount of SURF keypoints (~308) are matched correctly (i plot them).
Now lies the problem. I pass the 2D points to cv::findFindamentalMat but I get completly baseless results. The function returns:
FundMat=[2.05148e-13 3.72341 -2.03671e+10
1.6701e+26 -4.17712 4.59533e+29
3.32414e+18 2.8843 1.91069e-26]
To circumvent the large dynamic range of the matrix, Hartley suggested to normalise the data points (in euclidean space and not the projection space normalization)....Even after doing that the result is the almost the same. (10^-9 to 10^9)
I understand that FundMat is accurate only upto scale but a difference of 10^-9 to 10^+9 is too much.
I referred to other questions here but i dont seem to get any leads:findfundamentalmatrix-doesnt-find-fundamental-matrix
how-to-calculate-the-fundamental-matrix-for-stereo-vision
Any ideas would be great. This is a very important step when considering uncalibrated images for the rest of the software pipeline.
n case the code is helpful. (its not indented and colored though..space is too less here.)
https://sites.google.com/site/3drecon124/
its solved...silly human error. there was a data type conversion from double to float and it caused data to be fetched from incorrect locations in memory. now its smooth and epipolar constraint is satisfied upto scale.