I have implemented the P3p algorithm described in the following paper "A Novel Parametrization of the Perspective-Three_Point problem for Direct Computation of Absolute Camera Position and Orientation".
However the procedure provides 4 solutions, i.e., for combination of (translation, orientation).
Now I am supposed to disambiguate the 4 solutions and get a unique solution by BACK PROJECTION OF A FOURTH POINT.
My understanding is that back projection means to take the fourth point and re-project it on the image plane. But how is that going to help me with finding a unique solution from the above 4?
Any help would be much appreciated.
Thanks
Max
I might have an idea to what this means.
Basically you take a fourth world point and re-project it using all four rotations+translations. Then you choose the solution closest to the re-projection.
Related
I'm writing to ask about homography and perspective projection.
I'm trying to write a piece of code, that will "warp" my image so that its corners align with 4 reference points that are in the 3D space - however, the game engine that I'm running it in, already allows me to get the screen position of them, so I already have their screen-space coordinates of both xi,yi and ui,vi, normalized to values between 0 and 1.
I have to mention that I don't have a degree in mathematics, which seems to be a requirement in the posts I've seen on this topic so far, but I'm hoping there is actually a solution to this problem that one can comprehend. I never had a chance to take classes in Computer Vision.
The reason I came here is that in all the posts I've seen online, the simple explanation that I came across is that each point must be put into a 1x3 matrix and multiplied by a 3x3 homography, which consists of 9 components h1,h2,h3...h9, and this transformation matrix will transform each point to the correct perspective. And that's where I'm hitting a brick wall - how do I calculate the transformation matrix? It feels like it should be a relatively simple algebraic task, but apparently it's not.
At this point I spent days reading on the topic, and the solutions I've come across are either based on matlab (which have a ton of mathematical functions built into them), or include elaborations and discussions that don't really explain much; sometimes they suggest tons of different parameters and simplifications, but rarely explain why and what's their purpose, or they are referencing books and studies that have been since removed from the web, and I found myself more confused than I was in the beginning. Most of the resources I managed to find online are also made in a different context - image stitching and 3d engine development.
I also want to mention that I need to run this code each frame on the CPU, and I'm fairly concerned about the effect of having to run too many matrix transformations and solving a ton of linear algebra equations.
I apologize for not asking about any specific code, but my general question is - can anyone point me in the right direction with this issue?
Limit the problem you deal with.
For example, if you always warp the entire rectangular image, you can treat that the coordinates of the image corners are {(0,0), (1,0), (0,1), (1,1)}.
This can simplify the equation, and you'll be able to solve the equation by yourself.
So you'll be able to implement the answer.
Note : Homograpy is scale invariant. So you can decrease the freedom to 8. (e.g. you can solve the equation under h9=1).
Best advice I can give: read a good book on the subject. For example, "Multiple View Geometry" by Hartley and Zisserman
I realize there are many cans of worms related to what I'm asking, but I have to start somewhere. Basically, what I'm asking is:
Given two photos of a scene, taken with unknown cameras, to what extent can I determine the (relative) warping between the photos?
Below are two images of the 1904 World's Fair. They were taken at different levels on the wireless telegraph tower, so the cameras are more or less vertically in line. My goal is to create a model of the area (in Blender, if it matters) from these and other photos. I'm not looking for a fully automated solution, e.g., I have no problem with manually picking points and features.
Over the past month, I've taught myself what I can about projective transformations and epipolar geometry. For some pairs of photos, I can do pretty well by finding the fundamental matrix F from point correspondences. But the two below are causing me problems. I suspect that there's some sort of warping - maybe just an aspect ratio change, maybe more than that.
My process is as follows:
I find correspondences between the two photos (the red jagged lines seen below).
I run the point pairs through Matlab (actually Octave) to find the epipoles. Currently, I'm using Peter Kovesi's
Peter's Functions for Computer Vision.
In Blender, I set up two cameras with the images overlaid. I orient the first camera based on the vanishing points. I also determine the focal lengths from the vanishing points. I orient the second camera relative to the first using the epipoles and one of the point pairs (below, the point at the top of the bandstand).
For each point pair, I project a ray from each camera through its sample point, and mark the closest covergence of the pair (in light yellow below). I realize that this leaves out information from the fundamental matrix - see below.
As you can see, the points don't converge very well. The ones from the left spread out the further you go horizontally from the bandstand point. I'm guessing that this shows differences in the camera intrinsics. Unfortunately, I can't find a way to find the intrinsics from an F derived from point correspondences.
In the end, I don't think I care about the individual intrinsics per se. What I really need is a way to apply the intrinsics to "correct" the images so that I can use them as overlays to manually refine the model.
Is this possible? Do I need other information? Obviously, I have little hope of finding anything about the camera intrinsics. There is some obvious structural info though, such as which features are orthogonal. I saw a hint somewhere that the vanishing points can be used to further refine or upgrade the transformations, but I couldn't find anything specific.
Update 1
I may have found a solution, but I'd like someone with some knowledge of the subject to weigh in before I post it as an answer. It turns out that Peter's Functions for Computer Vision has a function for doing a RANSAC estimate of the homography from the sample points. Using m2 = H*m1, I should be able to plot the mapping of m1 -> m2 over top of the actual m2 points on the second image.
The only problem is, I'm not sure I believe what I'm seeing. Even on an image pair that lines up pretty well using the epipoles from F, the mapping from the homography looks pretty bad.
I'll try to capture an understandable image, but is there anything wrong with my reasoning?
A couple answers and suggestions (in no particular order):
A homography will only correctly map between point correspondences when either (a) the camera undergoes a pure rotation (no translation) or (b) the corresponding points are all co-planar.
The fundamental matrix only relates uncalibrated cameras. The process of recovering a camera's calibration parameters (intrinsics) from unknown scenes, known as "auto-calibration" is a rather difficult problem. You'd need these parameters (focal length, principal point) to correctly reconstruct the scene.
If you have (many) more images of this scene, you could try using a system such as Visual SFM: http://ccwu.me/vsfm/ It will attempt to automatically solve the Structure From Motion problem, including point matching, auto-calibration and sparse 3D reconstruction.
I'm doing Spline interpolation in C++. I've used code from here: http://tehc0dez.blogspot.ch/2010/04/nice-curves-catmullrom-spline-in-c.html (the code is also linked on that page, it's up on github). The app is working just fine for closed contours, since it's copying the first three points to the end.
But in my case I need to be able to make an "open" shape - or rather line-, where the first and last points are not connected.
It is my understanding that since the Catmull-Rom spline is cubic, I won't be able to calculate the interpolated points for the first and last segment without adding any additional points.
I read that a common method to interpolate the points in those two segments is to use Quadratic Interpolation.
Unfortunately I can't wrap my head around how to do this. I've found out how to do quadratic Bezier approximation, but this is not what I want to do since I don't want to introduce any additional support points.
I found this site: http://dafeda.wordpress.com/2010/09/01/newtons-divided-difference-polynomial-quadratic-interpolation/ Which explains quite nicely how to do Quadratic interpolation. But I don't know how to adapt this for my case, where I want to calculate a new Point rather than just y.
Any help would be appreciated. Thanks !
The usual way to do this is to add a second copy of your two end points... So if you have a spline passing through A-B-C-D then you will calculate the spline A-A-B-C-D-D.
Managed to implement a decent solution thanks to the formula found here: http://www.doc.ic.ac.uk/~dfg/AndysSplineTutorial/Parametrics.html
They also provide a nice Java applet to check out the different parameters.
For my problem I set the t1 value to 0.5 and check if t is above/below this threshold, since I only want to draw one segment of the curve! Works out nicely.
I have N points in 3-dimensional space. I need to join them using a line. However, if I do that using a simple line, it is not smooth and looks ugly.
My current approach is to use a Bezier curve, using the DeCasteljau algorithm for 4 points, and running that for each group of 4 points in my data set. However, the problem with this is that since I run it on say points 1-4, 5-8, 9-12, etc., separately, the line is not smooth between 4-5, 8-9, etc.
I also looked for other approaches; specifically I found this article about Catmull-Rom splines, which seem even better suited for my purpose, because the curve passes through all control points, unlike the Bezier curve. So I almost started implementing that, but then, I saw on that site that the formula works "assuming uniform spacing of control points". That is not the case for my problem.
So, my question is, what approach should I use -- Bezier, Catmull-Rom, or something completely different? If Bezier, then how to fix the non-smoothness between 4-5, 8-9, etc.? If Catmull-Rom, why won't the formula work if points are not evenly spaced, and what do I need instead?
EDIT: I am now pretty sure I want the Catmull-Rom spline, as it passes every control point which is an advantage for my application. Therefore, the main question I would like answered is why won't the formula on the link I provided work for non-uniformly spaced control points?
Thanks.
A couple of solutions:
Use a B-spline. This is a generalization of Bezier curves (a Bezier curve is a B-spline with no internal knot points.)
Use a cubic spline. Cubic splines are particularly easy to calculate. A cubic spline is continuous in the zero, first, and second derivatives across the control points. The third derivative, the cubic term, suffers a discontinuity at the control points, but it is very hard to see those discontinuities.
One key difference between a B-spline and a cubic spline is that the cubic spline will pass through all of the control points, while a B-spline does not. One way to think about it: Those internal control points are just suggestions for a B-spline but are mandatory for a cubic spline.
A meaningful line (although not the simplest to evaluate) can be found via Gaussian Processes. You set (or infer) the lengthscale over which you wish the line to vary (i.e. the smoothness of the line) and then the GP line is the most probable line through the data given the lengthscale. You can add noise to the model if you don't mind the line not passing through the data points.
Its a nice interpolation method because you can also obtain the standard deviation of your line. The line becomes more uncertain when you don't have much data in the vacinity.
You can read about them in chapter 45 of David MacKay's Information Theory, Inference, and Learning Algorithms - which you can download from the author's website here.
one solution is the following page in wikipedia: http://en.wikipedia.org/wiki/Bézier_curve, check the generalized approach for N control points.
I apologize for the length of this question and give a pre-emptive thanks for anyone who reads through this!
So i've spent the last few days going over the GJK algorithm. I understand the general concepts behind it, and understand the most of the nitty gritties of its implementation in 2D thanks to the wonderful article by William Bittle at http://www.codezealot.org/archives/88 .
I've implemented his pseudo code (found at the end of the article) into my own c++ project, however i want to make a 3D implementation. My weakness comes into using the dot products to test the voronoi regions and the tripleProducts to get perpandicular lines. But im trying to read up more on that.
My problem comes down to the containsOrigin function. Im having trouble visualizing and accounting for the new voronoi regions that the z axis adds. I just can't seem to wrap my head around how to determine which regions contains the origin. I assume there is 4 I have to account for, each extending from the triangular planes that the comprise the 4 faces of the tetrahedron simplex. If the origin is not within any of those regions, then it is contained, and we have a collision.
How do i go about testing if it is contained in a particular voronoi region/ which triangular face is pointing in the direction of the origin?
The current 2D algorithm checks if a triangle is made, if not, then the simplex is a line and it finds the 3rd point. I assume the 3D algorithm with check if a tetrahedron is made, if not, then it will check for a triangle, if true then it will to find a 4th point to make a tetrahedron(how would i get this? using a normal in direction of origin?). If i trangle isnt made, it will find a 3rd point to make a triangle (do i still use triple product for this like in 2D?).
Any suggestions, outlines, resources, code augmentations, comments are much appretiated.
Depending on what result you expect from the GJK algorithm you might want to look at this nice tutorial from Molly Rocket: https://mollyrocket.com/849
Be aware though that his implementation only outputs intersection? yes/no. But it might be a nice start.