Drawing fractal flames - drawing

Let me start off with the link to the PDF I've been reading to kickstart me into drawing fractal flames.
http://flam3.com/flame_draves.pdf
Following Draves' pseudo code I had no trouble drawing Sierpinski's Gasket using the three provided functions:
F0(x, y) = ( x/2 , y/2 )
F1(x, y) = ( x+1/2 , y/2 )
F2(x, y) = ( x/2 , y+1/2 )
Pseudo code:
(x, y)= a random point in the bi-unit square
iterate {
i = a random integer from 0 to n − 1 inclusive
(x, y) = Fi(x, y)
plot (xf , yf ) except during the first 20 iterations
}
From what I understand, fractal flames are made by applying variations (non-affine functions), but if we look at the catalog of variations in the appendix, the first image (Variation 0) is supposedly made with the identity variation.
Now I can't wrap my head around how can an image like that have been created with using only one function, on top of which an identity function. (Wouldn't it only draw one pixel infinitely as we are applying identity function on a randomly chosen pixel?)
It's not clear to me whether I should use the same pseudo code as for Sierpinski's gasket or is there some other catch I'm not seeing here?
Edit: Here's the end product containing the fractal flames image generator written in Java: https://github.com/xtrinch/fractal_generator

Now I can't wrap my head around how can an image like that have been created with using only one function, on top of which an identity function. (Wouldn't it only draw one pixel infinitely as we are applying identity function on a randomly chosen pixel?)
The variations aren't used on their own however. Rather, they are applied to the results of the i possible affine transformations. Refer the formula at the bottom of page 4 under the heading "3 Variations":
Fi(x, y) = Vj(aix + biy + ci, dix + eiy + fi)
If j is 0 and you are using variation 0:
V0(x, y) = (x, y)
then
Fi(x, y) = (aix + biy + ci, dix + eiy + fi)
In other words using variation 0 just means a simple affine transform with no additional variation applied to it. a, b, c, d, e, f are the affine transform parameters for Fi and you will have i sets of them.
If j is 1 then the variation function is:
V1(x, y) = (sin (x), sin(y))
so
Fi(x, y) = (sin(aix + biy + ci), sin(dix + eiy + fi))
and so on.
As mentioned a bit further on, for each of your i functions you can have a blending parameter vij for each variation, so you compute your affine transformation, apply all the different variation functions to the affine-transformed point and then blend the results together according to the blending factors.

Related

Determine if points are within a rotated rectangle (standard Python 2.7 library only) [duplicate]

This question already has answers here:
Finding whether a point lies inside a rectangle or not
(10 answers)
Closed 2 years ago.
I have a rotated rectangle with these coordinates as vertices:
1 670273 4879507
2 677241 4859302
3 670388 4856938
4 663420 4877144
And I have points with these coordinates:
670831 4867989
675097 4869543
Using only the Python 2.7 standard library, I want to determine if the points fall within the rotated rectangle.
I am not able to add additional Python libraries to my Jython implementation
What would it take to do this?
A line equation of the form ax+by+c==0 can be constructed from 2 points. For a given point to be inside a convex shape, we need testing whether it lies on the same side of every line defined by the shape's edges.
In pure Python code, taking care of writing the equations avoiding divisions, this could be as follows:
def is_on_right_side(x, y, xy0, xy1):
x0, y0 = xy0
x1, y1 = xy1
a = float(y1 - y0)
b = float(x0 - x1)
c = - a*x0 - b*y0
return a*x + b*y + c >= 0
def test_point(x, y, vertices):
num_vert = len(vertices)
is_right = [is_on_right_side(x, y, vertices[i], vertices[(i + 1) % num_vert]) for i in range(num_vert)]
all_left = not any(is_right)
all_right = all(is_right)
return all_left or all_right
vertices = [(670273, 4879507), (677241, 4859302), (670388, 4856938), (663420, 4877144)]
The following plot tests the code visually for several shapes. Note that for shapes with horizontal and vertical lines usual line equations could provoke division by zero.
import matplotlib.pyplot as plt
import numpy as np
vertices1 = [(670273, 4879507), (677241, 4859302), (670388, 4856938), (663420, 4877144)]
vertices2 = [(680000, 4872000), (680000, 4879000), (690000, 4879000), (690000, 4872000)]
vertices3 = [(655000, 4857000), (655000, 4875000), (665000, 4857000)]
k = np.arange(6)
r = 8000
vertices4 = np.vstack([690000 + r * np.cos(k * 2 * np.pi / 6), 4863000 + r * np.sin(k * 2 * np.pi / 6)]).T
all_shapes = [vertices1, vertices2, vertices3, vertices4]
for vertices in all_shapes:
plt.plot([x for x, y in vertices] + [vertices[0][0]], [y for x, y in vertices] + [vertices[0][1]], 'g-', lw=3)
for x, y in zip(np.random.randint(650000, 700000, 1000), np.random.randint(4855000, 4880000, 1000)):
color = 'turquoise'
for vertices in all_shapes:
if test_point(x, y, vertices):
color = 'tomato'
plt.plot(x, y, '.', color=color)
plt.gca().set_aspect('equal')
plt.show()
PS: In case you are running a 32-bit version of numpy, with this size of integers it might be necessary to convert the values to float to avoid overflow.
If this calculation needs to happen very often, the a,b,c values can be precalculated and stored. If the direction of the edges is known, only one of all_left or all_right is needed.
When the shape is fixed, a text version of the function can be generated:
def generate_test_function(vertices, is_clockwise=True, function_name='test_function'):
ext_vert = list(vertices) + [vertices[0]]
unequality_sign = '>=' if is_clockwise else '<='
print(f'def {function_name}(x, y):')
parts = []
for (x0, y0), (x1, y1) in zip(ext_vert[:-1], ext_vert[1:]):
a = float(y1 - y0)
b = float(x0 - x1)
c = a * x0 + b * y0
parts.append(f'({a}*x + {b}*y {unequality_sign} {c})')
print(' return', ' and '.join(parts))
vertices = [(670273, 4879507), (677241, 4859302), (670388, 4856938), (663420, 4877144)]
generate_test_function(vertices)
This would generate a function as:
def test_function(x, y):
return (-20205.0*x + -6968.0*y >= -47543270741.0) and (-2364.0*x + 6853.0*y >= 31699798882.0) and (20206.0*x + 6968.0*y >= 47389003912.0) and (2363.0*x + -6853.0*y >= -31855406372.0)
This function then can be copy-pasted and optimized by the Jython compiler. Note that the shape doesn't need to be rectangular. Any convex shape will do, allowing to use a tighter box.
Take three consequent vertices A, B, C (your 1,2,3)
Find lengths of sides AB and BC
lAB = sqrt((B.x - A.x)^2+(B.y - A.y)^2)
Get unit (normalized) direction vectors
uAB = ((B.x - A.x) / lAB, (B.y - A.y) / lAB)
For tested point P get vector BP
BP = ((P.x - B.x), (P.y - B.y))
And calculate signed distances from sides to point using cross product
SignedDistABP = Cross(BP, uAB) = BP.x * uAB.y - BP.y * uAB.x
SignedDistBCP = - Cross(BP, uBC) = - BP.x * uBC.y + BP.y * uBC.x
For points inside rectangle both distances should have the same sign - either negative or positive depending on vertices order (CW or CCW), and their absolute values should not be larger than lBC and lAB correspondingly
Abs(SignedDistABP) <= lBC
Abs(SignedDistBCP) <= lAB
As the shape is an exact rectangle, the easiest is to rotate all points by the angle
-arctan((4859302-4856938)/(677241-670388))
Doing so, the rectangle becomes axis-aligned and you just have to perform four coordinate comparisons. Rotations are easy to compute with complex numbers.
In fact you can simply represent all points as complex numbers, compute the vector defined by some side, and multiply everything by the conjugate.
A slightly different approach is to consider the change of coordinate frame that brings some corner to the origin and two incident sides to (1,0) and (0,1). This is an affine transformation. Then your test boils down to checking insideness to the unit square.

Given N lines on a Cartesian plane. How to find the bottommost intersection of lines efficiently?

I have N distinct lines on a cartesian plane. Since slope-intercept form of a line is, y = mx + c, slope and y-intercept of these lines are given. I have to find the y coordinate of the bottommost intersection of any two lines.
I have implemented a O(N^2) solution in C++ which is the brute-force approach and is too slow for N = 10^5. Here is my code:
int main() {
int n;
cin >> n;
vector<pair<int, int>> lines(n);
for (int i = 0; i < n; ++i) {
int slope, y_intercept;
cin >> slope >> y_intercept;
lines[i].first = slope;
lines[i].second = y_intercept;
}
double min_y = 1e9;
for (int i = 0; i < n; ++i) {
for (int j = i + 1; j < n; ++j) {
if (lines[i].first ==
lines[j].first) // since lines are distinct, two lines with same slope will never intersect
continue;
double x = (double) (lines[j].second - lines[i].second) / (lines[i].first - lines[j].first); //x-coordinate of intersection point
double y = lines[i].first * x + lines[i].second; //y-coordinate of intersection point
min_y = min(y, min_y);
}
}
cout << min_y << endl;
}
How to solve this efficiently?
In case you are considering solving this by means of Linear Programming (LP), it could be done efficiently, since the solution which minimizes or maximizes the objective function always lies in the intersection of the constraint equations. I will show you how to model this problem as a maximization LP. Suppose you have N=2 first degree equations to consider:
y = 2x + 3
y = -4x + 7
then you will set up your simplex tableau like this:
x0 x1 x2 x3 b
-2 1 1 0 3
4 1 0 1 7
where row x0 represents the negation of the coefficient of "x" in the original first degree functions, x1 represents the coefficient of "y" which is generally +1, x2 and x3 represent the identity matrix of dimensions N by N (they are the slack variables), and b represents the value of the idepent term. In this case, the constraints are subject to <= operator.
Now, the objective function should be:
x0 x1 x2 x3
1 1 0 0
To solve this LP, you may use the "simplex" algorithm which is generally efficient.
Furthermore, the result will be an array representing the assigned values to each variable. In this scenario the solution is:
x0 x1 x2 x3
0.6666666667 4.3333333333 0.0 0.0
The pair (x0, x1) represents the point which you are looking for, where x0 is its x-coordinate and x1 is it's y-coordinate. There are other different results that you could get, for an example, there could exist no solution, you may find out more at plenty of books such as "Linear Programming and Extensions" by George Dantzig.
Keep in mind that the simplex algorithm only works for positive values of X0, x1, ..., xn. This means that before applying the simplex, you must make sure the optimum point which you are looking for is not outside of the feasible region.
EDIT 2:
I believe making the problem feasible could be done easily in O(N) by shifting the original functions into a new position by means of adding a big factor to the independent terms of each function. Check my comment below. (EDIT 3: this implies it won't work for every possible scenario, though it's quite easy to implement. If you want an exact answer for any possible scenario, check the following explanation on how to convert the infeasible quadrants into the feasible back and forth)
EDIT 3:
A better method to address this problem, one that is capable of precisely inferring the minimum point even if it is in the negative side of either x or y: converting to quadrant 1 all of the other 3.
Consider the following generic first degree function template:
f(x) = mx + k
Consider the following generic cartesian plane point template:
p = (p0, p1)
Converting a function and a point from y-negative quadrants to y-positive:
y_negative_to_y_positive( f(x) ) = -mx - k
y_negative_to_y_positive( p ) = (p0, -p1)
Converting a function and a point from x-negative quadrants to x-positive:
x_negative_to_x_positive( f(x) ) = -mx + k
x_negative_to_x_positive( p ) = (-p0, p1)
Summarizing:
quadrant sign of corresponding (x, y) converting f(x) or p to Q1
Quadrant 1 (+, +) f(x)
Quadrant 2 (-, +) x_negative_to_x_positive( f(x) )
Quadrant 3 (-, -) y_negative_to_y_positive( x_negative_to_x_positive( f(x) ) )
Quadrant 4 (+, -) y_negative_to_y_positive( f(x) )
Now convert the functions from quadrants 2, 3 and 4 into quadrant 1. Run simplex 4 times, one based on the original quadrant 1 and the other 3 times based on the converted quadrants 2, 3 and 4. For the cases originating from a y-negative quadrant, you will need to model your simplex as a minimization instance, with negative slack variables, which will turn your constraints to the >= format. I will leave to you the details on how to model the same problem based on a minimization task.
Once you have the results of each quadrant, you will have at hands at most 4 points (because you might find out, for example, that there is no point on a specific quadrant). Convert each of them back to their original quadrant, going back in an analogous manner as the original conversion.
Now you may freely compare the 4 points with each other and decide which one is the one you need.
EDIT 1:
Note that you may have the quantity N of first degree functions as huge as you wish.
Other methods for solving this problem could be better.
EDIT 3: Check out the complexity of simplex. In the average case scenario, it works efficiently.
Cheers!

How do reverse perspective with OpenGL?

I'm looking for a way to do reverse perspective with OpenGL and C++. For now, I'm using glFrustum to have a classic perspective, but I would like to know if a reverse pespective as presented here (https://en.wikipedia.org/wiki/Reverse_perspective and below) is possible ? If not, is there any other way with OpenGL to do this ?
This question really intrigued me. I'm not entirely convinced that what I'm going to refer to now as 'Byzantine perspective' can be accommodated by a transform analogous to that provided by (pre-core profile) glFrustum. I have done some work on it though, inspired by derivations in Computer Graphics: Principles and Practice (2nd Ed), and the formulation for the OpenGL CCS / NDCS.
Unfortunately, the original S.O. site doesn't allow for embedded LaTeX, so the matrices will not be pretty. Consider this answer a work in progress
So far, I've derived the matrix transform resulting in what is sometimes referred to as the 'normalized frustum'. The far plane at Z = -1, the near plane at Z = - N / F, and the R, L, T, B planes having unit slope. (this would be very clear given a good diagram)
[ 2N / (R - L) 0 (R + L) / (R - L) 0 ]
[ 0 2N / (T - B) (T + B) / (T - B) 0 ]
[ 0 0 1 0 ]
[ 0 0 0 F ]
Call this matrix: [F.p]. For any point: P = (x, y, - N, 1)^T on the near plane, it's easy to demonstrate that the transformed homogeneous point lies on the Z = - N / F plane. (Note: ^T is the 'transpose' operator to make it clear that this is in fact a column vector.)
Likewise, given a point: P = (x, y, - F, 1)^T on the far plane, the transformed homogeneous point lines on the Z = -1 plane.
A Byzantine perspective requires another constraint - we'll use the variable D, where Z = - D is a point analogous to the 'eye' at Z = 0, or the 'projection reference point' (PRP).
As you have already gathered from the image you've provided, parallel lines converge at Z = - D, rather than at the 'eye'. You don't want the image from the the point of convergence however. You want to visualise the effect from the 'front'. The question is - can we construct an OpenGL matrix similar to that provided by glFrustum which yields a Byzantine perspective? And can it be made to fit within a GL pipeline?
What I've derived so far, is the 'normalized Byzantine frustum'. Again, the far plane is at Z = -1, the near plane at Z = - N / F, and R, L, T, B planes have unit slope - albeit the negative slope of the regular frustum. (again a clear picture would be worth a thousand words here)
[ 2N / (R - L) 0 (R + L) / (R - L) 0 ]
[ 0 2N / (T - B) (T + B) / (T - B) 0 ]
[ 0 0 N / F 0 ]
[ 0 0 0 N ]
Call this matrix: [F.b]. The X,Y coordinate transforms are identical, but the Z,W component transforms differ. This is somewhat intuitive given that this is, in a sense, a 'reverse' of the orthodox view volume.
Again, given a point: P = (x, y, - N, 1)^T on the near plane, the transformed homogeneous point lies on the Z = - N / F plane, and the homogeneous transform of a point: P = (x, y, - F, 1)^T on the far plane lies on the Z = -1 plane.
Given the similarities in the matrices, and the fact that only simple perspective transforms (and a few trivial scaling and translation matrices) are required to yield parallel projection matrices that conform to OpenGL clip coordinate space (CCS), and its NDCS projection, it seems likely that an OpenGL 'Byzantine' projection could be made to work. I just need more time to work on it...

Why do we need perspective division?

I know perspective division is done by dividing x,y, and z by w, to get normalized device coordinates. But I am not able to understand the purpose of doing that. Also, does it have anything to do with clipping?
Some details that complement the general answers:
The idea is to project a point (x,y,z) on screen to have (xs,ys,d).
The next figure shows this for the y coordinate.
We know from school that
tan(alpha) = ys / d = y / z
This means that the projection is computed as
ys = d*y/z = y /w
w = z / d
This is enough to apply a projection.
However in OpenGL, you want (xs,ys,zs) to be normalized device coordinates in [-1,1] and yes this has something to do with clipping.
The extrema values for (xs,ys,zs) represent the unit cube and everything outside it will be clipped.
So a projection matrix usually takes into consideration the clipping limits (Frustum) to make a single transformation that, with the perspective division, simultaneously apply a projection and transform the projected coordinates along with the z to normalized device coordinates.
I mean why do we need that?
In layman terms: To make perspective distortion work. In a perspective projection matrix, the Z coordinate gets "mixed" into the W output component. So the smaller the value of the Z coordinate, i.e. the closer to the origin, the more things get scaled up, i.e. bigger on screen.
To really distill it to the basic concept, and why the op is division (instead of e.g. square root or some such), consider that an object twice as far should appear with dimensions exactly one half as large. Obtain 1/2 from 2 by... division.
There are many geometric ways to arrive at the same conclusion. A diagram serves as visual proof for this, really.
Dividing x, y, z by w is a "trick" you do with "homogeneous coordinates". To convert a R⁴ vector back to R³ by dividing by the 4th component (or w component as you said). A process called dehomogenizing.
Why you use homogeneous coordinate? That topic is a little bit more involved, I try to explain. I hope I do it justice.
However I will use the x1, x2, x3, x4 as the components of a vector instead of x, y, z, w:
Consider a 3x3 Matrix M and column vectors x, a, b, c of R³. x=(x1, x2, x3) and x1,x2,x3 being scalars or components of x.
With the 3x3 Matrix can do all linear transformations on a vector x you could do with the linear combination:
x' = x1*a + x2*b + x3*c (1).
(x' is the transformed vector that holds the result of transforming x).
Khan Academy on his Course Linear Algebra has a section explaining the fact that every linear transformation can be written as a matrix product with a vector.
You can try this out for example by putting the column vectors a, b, c in the columns of the Matrix M = [ a b c ].
So with the matrix product you essentially get the upper linear combination:
x' = M * x = [a b c] * x = a*x1 + b*x2 + c*x3 (2).
However this operation only accounts for rotation, scaling and shearing transformations. The origin (0, 0, 0) will always stay at (0, 0, 0).
For this you need another kind of transformation named "translation" (moving a vector or adding a vector to the vector).
Consider the translation column vector t = (t1, t2, t3) and the linear combination
x' = x1*a + x2*b + x3*c + t (3).
With this linear combination you can translate, rotate, scale and shear a vector. As you can see this Linear Combination does actually move the origin vector (0, 0, 0) to (0+t1, 0+t2, 0+t3).
However you can't put this translation into a 3x3 Matrix.
So what Graphics Programmers or Mathematicians came up with is adding another dimension to the Matrix and Vectors like this:
M is 4x4 Matrix, x~ vector in R⁴ with x~=(x1, x2, x3, x4). a, b, c, t also being column vectors of R⁴ (last components of a,b,c being 0 and last component for t being 1 - I keep the names the same to later show the similarity between homogeneous linear combination and (3) ). x~ is the homogeneous coordinate of x.
Now watch what happens if we take a vector x of R³ and put it into x~ of R⁴.
This vector will be in homogeneous coordinates in R⁴ x~=(x1, x2, x3, 1). The last component simply being 1 if it is a point and 0 if it's simply a direction (which couldn't be translated anyway).
So you have the linear combination:
x~' = M * x = [a b c t] * x = x1*a + x2*b + x3*c + x4*t (4).
(x~' is the result vector when transforming the homogeneous vector x~)
Since we took a vector from R³ and put it into R⁴ our x4 component is 1 we have:
x~' = x1*a + x2*b + x3*c + 1*t
<=> x~' = x1*a + x2*b + x3*c + t (5).
which is exactly the upper linear transformation (3) with the translation t. This is called an affine transformation (linear transf. + translation).
So with a 3x3 Matrix and a vector of R³ you couldn't do translations. However adding another dimension having a vector in R⁴ and a Matrix in R^4x4 you actually can do it.
However when you want to return to R³ you have to divide the first components with the last one. This is called "dehomogenizing". Which is the the x4 component or in your variable naming the w-component. So x is the original coordinate in R³. Be x~ in R⁴ and the homogenized vector of x. And x' in R³ of x~.
x' = (x1/x4, x2/x4, x3/x4) (6).
Then x' is the dehomogenized vector of the vector x~.
Coming back to perspective division:
(I will leave it out, because many here have explained the divide by z already. It's because of the relationship of a right triangle, being similar which leads you to simplify that with a given focal length f a z values with y coordinate lands at y' = f*y/z. Also since you stated [I hope I didn't misread that you already know why this is done I simply leave a link to a YT-Video here, I find it very well explained on the course lecture CMU 15-462/662 ).
When dehomogenizing the division by the w-component is a pretty handy property when returning to R³. When you apply homogeneous perspective Matrix of 4x4 on a vector you simply put the z component into the w component and let the dehomogenizing process (as in (6) ) perform the perspective divide. So you can setup the w-Component in a way that the division by w divides by z and also maps the values from 0 to 1 (basically you put the range of z-near to z-far values into a range floating points are precise at).
This is also described by Ravi Ramamoorthi in his Course CSE167 when he explains how to set up the perspective projection matrix.
I hope this helped to understand the rational of putting z into the w component. Sorry for my horrible formatting and lengthy text. Yet I hope it helped more than it confused.
Best of luck!
Actually, via standard notational convention from a 4x4 perspective matrix with sightline along a 'z' direction, 'w' differs by 1 from the distance ratio. Also that ratio, though interpreted correctly, is normally expressed as -z/d where 'z' is negative (therefore producing the correct ratio) because, again, in common notational convention, the camera is looking in the negative 'z' direction.
The reason for the offset by 1 needs to be explained. Many references put the origin at the image plane rather than the center of projection. With that convention (again with the camera looking along the negative 'z' direction) the distance labeled 'z' in the similar triangles diagram is thereby replaced by (d-z). Then substituting that for 'z' the expression for 'w' becomes, instead of 'z/d', (d-z)/d = [1-z/d]. To some these conventions may seem unorthodox but they are quite popular among analysts.

findHomography, getPerspectiveTransform, & getAffineTransform

This question is on the OpenCV functions findHomography, getPerspectiveTransform & getAffineTransform
What is the difference between findHomography and getPerspectiveTransform?. My understanding from the documentation is that getPerspectiveTransform computes the transform using 4 correspondences (which is the minimum required to compute a homography/perspective transform) where as findHomography computes the transform even if you provide more than 4 correspondencies (presumably using something like a least squares method?).
Is this correct?
(In which case the only reason OpenCV still continues to support getPerspectiveTransform should be legacy? )
My next concern is that I want to know if there is an equivalent to findHomography for computing an Affine transformation? i.e. a function which uses a least squares or an equivalent robust method to compute and affine transformation.
According to the documentation getAffineTransform takes in only 3 correspondences (which is the min required to compute an affine transform).
Best,
Q #1: Right, the findHomography tries to find the best transform between two sets of points. It uses something smarter than least squares, called RANSAC, which has the ability to reject outliers - if at least 50% + 1 of your data points are OK, RANSAC will do its best to find them, and build a reliable transform.
The getPerspectiveTransform has a lot of useful reasons to stay - it is the base for findHomography, and it is useful in many situations where you only have 4 points, and you know they are the correct ones. The findHomography is usually used with sets of points detected automatically - you can find many of them, but with low confidence. getPerspectiveTransform is good when you kn ow for sure 4 corners - like manual marking, or automatic detection of a rectangle.
Q #2 There is no equivalent for affine transforms. You can use findHomography, because affine transforms are a subset of homographies.
I concur with everything #vasile has written. I just want to add some observations:
getPerspectiveTransform() and getAffineTransform() are meant to work on 4 or 3 points (respectively), that are known to be correct correspondences. On real-life images taken with a real camera, you can never get correspondences that accurate, not with automatic nor manual marking of the corresponding points.
There are always outliers. Just look at the simple case of wanting to fit a curve through points (e.g. take a generative equation with noise y1 = f(x) = 3.12x + gauss_noise or y2 = g(x) = 0.1x^2 + 3.1x + gauss_noise): it will be much more easier to find a good quadratic function to estimate the points in both cases, than a good linear one. Quadratic might be an overkill, but in most cases will not be (after removing outliers), and if you want to fit a straight line there you better be mightily sure that is the right model, otherwise you are going to get unusable results.
That said, if you are mightily sure that affine transform is the right one, here's a suggestion:
use findHomography, that has RANSAC incorporated in to the functionality, to get rid of the outliers and get an initial estimate of the image transformation
select 3 correct matches-correspondances (that fit with the homography found), or reproject 3 points from the 1st image to the 2nd (using the homography)
use those 3 matches (that are as close to correct as you can get) in getAffineTransform()
wrap all of that in your own findAffine() if you want - and voila!
Re Q#2, estimateRigidTransform is the oversampled equivalent of getAffineTransform. I don't know if it was in OCV when this was first posted, but it's available in 2.4.
There is an easy solution for the finding the Affine transform for the system of over-determined equations.
Note that in general an Affine transform finds a solution to the over-determined system of linear equations Ax=B by using a pseudo-inverse or a similar technique, so
x = (A At )-1 At B
Moreover, this is handled in the core openCV functionality by a simple call to solve(A, B, X).
Familiarize yourself with the code of Affine transform in opencv/modules/imgproc/src/imgwarp.cpp: it really does just two things:
a. rearranges inputs to create a system Ax=B;
b. then calls solve(A, B, X);
NOTE: ignore the function comments in the openCV code - they are confusing and don’t reflect the actual ordering of the elements in the matrices. If you are solving [u, v]’= Affine * [x, y, 1] the rearrangement is:
x1 y1 1 0 0 1
0 0 0 x1 y1 1
x2 y2 1 0 0 1
A = 0 0 0 x2 y2 1
x3 y3 1 0 0 1
0 0 0 x3 y3 1
X = [Affine11, Affine12, Affine13, Affine21, Affine22, Affine23]’
u1 v1
B = u2 v2
u3 v3
All you need to do is to add more points. To make Solve(A, B, X) work on over-determined system add DECOMP_SVD parameter. To see the powerpoint slides on the topic, use this link. If you’d like to learn more about the pseudo-inverse in the context of computer vision, the best source is: ComputerVision, see chapter 15 and appendix C.
If you are still unsure how to add more points see my code below:
// extension for n points;
cv::Mat getAffineTransformOverdetermined( const Point2f src[], const Point2f dst[], int n )
{
Mat M(2, 3, CV_64F), X(6, 1, CV_64F, M.data); // output
double* a = (double*)malloc(12*n*sizeof(double));
double* b = (double*)malloc(2*n*sizeof(double));
Mat A(2*n, 6, CV_64F, a), B(2*n, 1, CV_64F, b); // input
for( int i = 0; i < n; i++ )
{
int j = i*12; // 2 equations (in x, y) with 6 members: skip 12 elements
int k = i*12+6; // second equation: skip extra 6 elements
a[j] = a[k+3] = src[i].x;
a[j+1] = a[k+4] = src[i].y;
a[j+2] = a[k+5] = 1;
a[j+3] = a[j+4] = a[j+5] = 0;
a[k] = a[k+1] = a[k+2] = 0;
b[i*2] = dst[i].x;
b[i*2+1] = dst[i].y;
}
solve( A, B, X, DECOMP_SVD );
delete a;
delete b;
return M;
}
// call original transform
vector<Point2f> src(3);
vector<Point2f> dst(3);
src[0] = Point2f(0.0, 0.0);src[1] = Point2f(1.0, 0.0);src[2] = Point2f(0.0, 1.0);
dst[0] = Point2f(0.0, 0.0);dst[1] = Point2f(1.0, 0.0);dst[2] = Point2f(0.0, 1.0);
Mat M = getAffineTransform(Mat(src), Mat(dst));
cout<<M<<endl;
// call new transform
src.resize(4); src[3] = Point2f(22, 2);
dst.resize(4); dst[3] = Point2f(22, 2);
Mat M2 = getAffineTransformOverdetermined(src.data(), dst.data(), src.size());
cout<<M2<<endl;
getAffineTransform:affine transform is combination of translation, scale, shear, and rotation
https://www.mathworks.com/discovery/affine-transformation.html
https://www.tutorialspoint.com/computer_graphics/2d_transformation.htm
getPerspectiveTransform:perspective transform is project mapping
enter image description here