Draw dendrogram using agglomerative algorithm

Draw dendrogram using agglomerative algorithm - data-mining

I have tried making it dendrogram but it was seeming to me very wrong.
Pair I have made:
XB, XBZ, XBZY

The obvious problem is, that your distance matrix is inconsistent.
If the distence from A to B is 3 and the distance from B to Y is 2, how is it possible that the distance from A to Y is 6?
If you slightly adapt your matrix, so that it corresponds to a distance of points in a real space, you get better results.
Example Using R
data <- c(0,3,3.3,1,3,
3,0,3.3,1,3,
3.3,3.3,0,3,6,
1,1,3,0,3,
3,3,6,3,0)
dim_names <- c("X","Z","Y","B","A")
mat <- matrix(data,nrow=5,ncol=5,byrow=TRUE, dimnames = list(dim_names,dim_names) )
dist <- as.dist(mat)
dist
X Z Y B
Z 3.0
Y 3.3 3.3
B 1.0 1.0 3.0
A 3.0 3.0 6.0 3.0
hc <- hclust(dist)
plot(hc, hang = -1)
which produce

Related

How to fit a 2D ellipse to given points

I would like to fit a 2D array by an elliptic function: (x / a)² + (y / b)² = 1 ----> (and so get the a and b)
And then, be able to replot it on my graph.
I found many examples on internet, but no one with this simple Cartesian equation. I probably have searched badly ! I think a basic solution for this problem could help many people.
Here is an example of the data:
Sadly, I can not put the values... So let's assume that I have an X,Y arrays defining the coordinates of each of those points.

This can be solved directly using least squares. You can frame this as minimizing the sum of squares of quantity (alpha * x_i^2 + beta * y_i^2 - 1) where alpha is 1/a^2 and beta is 1/b^2. You have all the x_i's in X and the y_i's in Y so you can find the minimizer of ||Ax - b||^2 where A is an Nx2 matrix (i.e. [X^2, Y^2]), x is the column vector [alpha; beta] and b is column vector of all ones.
The following code solves the more general problem for an ellipse of the form Ax^2 + Bxy + Cy^2 + Dx +Ey = 1 though the idea is exactly the same. The print statement gives 0.0776x^2 + 0.0315xy+0.125y^2+0.00457x+0.00314y = 1 and the image of the ellipse generated is also below
import numpy as np
import matplotlib.pyplot as plt
alpha = 5
beta = 3
N = 500
DIM = 2
np.random.seed(2)
# Generate random points on the unit circle by sampling uniform angles
theta = np.random.uniform(0, 2*np.pi, (N,1))
eps_noise = 0.2 * np.random.normal(size=[N,1])
circle = np.hstack([np.cos(theta), np.sin(theta)])
# Stretch and rotate circle to an ellipse with random linear tranformation
B = np.random.randint(-3, 3, (DIM, DIM))
noisy_ellipse = circle.dot(B) + eps_noise
# Extract x coords and y coords of the ellipse as column vectors
X = noisy_ellipse[:,0:1]
Y = noisy_ellipse[:,1:]
# Formulate and solve the least squares problem ||Ax - b ||^2
A = np.hstack([X**2, X * Y, Y**2, X, Y])
b = np.ones_like(X)
x = np.linalg.lstsq(A, b)[0].squeeze()
# Print the equation of the ellipse in standard form
print('The ellipse is given by {0:.3}x^2 + {1:.3}xy+{2:.3}y^2+{3:.3}x+{4:.3}y = 1'.format(x[0], x[1],x[2],x[3],x[4]))
# Plot the noisy data
plt.scatter(X, Y, label='Data Points')
# Plot the original ellipse from which the data was generated
phi = np.linspace(0, 2*np.pi, 1000).reshape((1000,1))
c = np.hstack([np.cos(phi), np.sin(phi)])
ground_truth_ellipse = c.dot(B)
plt.plot(ground_truth_ellipse[:,0], ground_truth_ellipse[:,1], 'k--', label='Generating Ellipse')
# Plot the least squares ellipse
x_coord = np.linspace(-5,5,300)
y_coord = np.linspace(-5,5,300)
X_coord, Y_coord = np.meshgrid(x_coord, y_coord)
Z_coord = x[0] * X_coord ** 2 + x[1] * X_coord * Y_coord + x[2] * Y_coord**2 + x[3] * X_coord + x[4] * Y_coord
plt.contour(X_coord, Y_coord, Z_coord, levels=[1], colors=('r'), linewidths=2)
plt.legend()
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

Following the suggestion by ErroriSalvo, here is the complete process of fitting an ellipse using the SVD. The arrays x, y are coordinates of the given points, let's say there are N points. Then U, S, V are obtained from the SVD of the centered coordinate array of shape (2, N). So, U is a 2 by 2 orthogonal matrix (rotation), S is a vector of length 2 (singular values), and V, which we do not need, is an N by N orthogonal matrix.
The linear map transforming the unit circle to the ellipse of best fit is
sqrt(2/N) * U * diag(S)
where diag(S) is the diagonal matrix with singular values on the diagonal. To see why the factor of sqrt(2/N) is needed, imagine that the points x, y are taken uniformly from the unit circle. Then sum(x**2) + sum(y**2) is N, and so the coordinate matrix consists of two orthogonal rows of length sqrt(N/2), hence its norm (the largest singular value) is sqrt(N/2). We need to bring this down to 1 to have the unit circle.
N = 300
t = np.linspace(0, 2*np.pi, N)
x = 5*np.cos(t) + 0.2*np.random.normal(size=N) + 1
y = 4*np.sin(t+0.5) + 0.2*np.random.normal(size=N)
plt.plot(x, y, '.') # given points
xmean, ymean = x.mean(), y.mean()
x -= xmean
y -= ymean
U, S, V = np.linalg.svd(np.stack((x, y)))
tt = np.linspace(0, 2*np.pi, 1000)
circle = np.stack((np.cos(tt), np.sin(tt))) # unit circle
transform = np.sqrt(2/N) * U.dot(np.diag(S)) # transformation matrix
fit = transform.dot(circle) + np.array([[xmean], [ymean]])
plt.plot(fit[0, :], fit[1, :], 'r')
plt.show()
But if you assume that there is no rotation, then np.sqrt(2/N) * S is all you need; these are a and b in the equation of the ellipse.

You could try a Singular Value Decomposition of the data matrix.
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.svd.html
First center the data by subtracting mean values of X,Y from each column respectively.
X=X-np.mean(X)
Y=Y-np.mean(Y)
D=np.vstack(X,Y)
Then, apply SVD and extract
-eigenvalues (members of s) -> axis length
-eigenvectors(U) -> axis orientation
U, s, V = np.linalg.svd(D, full_matrices=True)
This should be a least-squares fit.
Of course, things can get more complicated than this, please see
https://www.emis.de/journals/BBMS/Bulletin/sup962/gander.pdf

How to fit a plane to a 3D point cloud?

I want to fit a plane to a 3D point cloud. I use a RANSAC approach, where I sample several points from the point cloud, calculate the plane, and store the plane with the smallest error. The error is the distance between the points and the plane. I want to do this in C++, using Eigen.
So far, I sample points from the point cloud and center the data. Now, I need to fit the plane to the samples points. I know I need to solve Mx = 0, but how do I do this? So far I have M (my samples), I want to know x (the plane) and this fit needs to be as close to 0 as possible.
I have no idea where to continue from here. All I have are my sampled points and I need more data.

From you question I assume that you are familiar with the Ransac algorithm, so I will spare you of lengthy talks.
In a first step, you sample three random points. You can use the Random class for that but picking them not truly random usually gives better results. To those points, you can simply fit a plane using Hyperplane::Through.
In the second step, you repetitively cross out some points with large Hyperplane::absDistance and perform a least-squares fit on the remaining ones. It may look like this:
Vector3f mu = mean(points);
Matrix3f covar = covariance(points, mu);
Vector3 normal = smallest_eigenvector(covar);
JacobiSVD<Matrix3f> svd(covariance, ComputeFullU);
Vector3f normal = svd.matrixU().col(2);
Hyperplane<float, 3> result(normal, mu);
Unfortunately, the functions mean and covariance are not built-in, but they are rather straightforward to code.

Recall that the equation for a plane passing through origin is Ax + By + Cz = 0, where (x, y, z) can be any point on the plane and (A, B, C) is the normal vector perpendicular to this plane.
The equation for a general plane (that may or may not pass through origin) is Ax + By + Cz + D = 0, where the additional coefficient D represents how far the plane is away from the origin, along the direction of the normal vector of the plane. [Note that in this equation (A, B, C) forms a unit normal vector.]
Now, we can apply a trick here and fit the plane using only provided point coordinates. Divide both sides by D and rearrange this term to the right-hand side. This leads to A/D x + B/D y + C/D z = -1. [Note that in this equation (A/D, B/D, C/D) forms a normal vector with length 1/D.]
We can set up a system of linear equations accordingly, and then solve it by an Eigen solver as follows.
// Example for 5 points
Eigen::Matrix<double, 5, 3> matA; // row: 5 points; column: xyz coordinates
Eigen::Matrix<double, 5, 1> matB = -1 * Eigen::Matrix<double, 5, 1>::Ones();
// Find the plane normal
Eigen::Vector3d normal = matA.colPivHouseholderQr().solve(matB);
// Check if the fitting is healthy
double D = 1 / normal.norm();
normal.normalize(); // normal is a unit vector from now on
bool planeValid = true;
for (int i = 0; i < 5; ++i) { // compare Ax + By + Cz + D with 0.2 (ideally Ax + By + Cz + D = 0)
if ( fabs( normal(0)*matA(i, 0) + normal(1)*matA(i, 1) + normal(2)*matA(i, 2) + D) > 0.2) {
planeValid = false; // 0.2 is an experimental threshold; can be tuned
break;
}
}
This method is equivalent to the typical SVD-based method, but much faster. It is suitable for use when points are known to be roughly in a plane shape. However, the SVD-based method is more numerically stable (when the plane is far far away from origin) and robust to outliers.

KalmanFilter(6,2,0) transition matrix

I am working on a object tracking project and I want to improve the results I am getting using a Kalman filter.
I have found a lot of examples on the internet which are working but I really want to understand what is behind it.
Using opencv, here is a part of the code :
KalmanFilter KF(6, 2, 0);
Mat_ state(6, 1);
Mat processNoise(6, 1, CV_32F);
...
KF.statePre.at(0) = mouse_info.x;
KF.statePre.at(1) = mouse_info.y;
KF.statePre.at(2) = 0;
KF.statePre.at(3) = 0;
KF.statePre.at(4) = 0;
KF.statePre.at(5) = 0;
KF.transitionMatrix = *(Mat_(6, 6) << 1,0,1,0,0.5,0, 0,1,0,1,0,0.5, 0,0,1,0,1,0, 0,0,0,1,0,1, 0,0,0,0,1,0, 0,0,0,0,0,1);
KF.measurementMatrix = *(Mat_(2, 6) << 1,0,1,0,0.5,0, 0,1,0,1,0,0.5);
This one gives smoother results than a KalmanFilter(4,2,0) but I don't really understand why.
Can someone explain me what is behind this (6,6) transition matrix ?
EDIT : The solution is probably here but obviously I am not good enough to find it by myself ...
Thank you for your help.

You have a state vector X made up of 6 components, the first two of which are the x and y position of an object; let's assume that the other 4 are their velocities and accelerations:
X = [x, y, v_x, v_y, a_x, a_y] t
In the Kalman filter, your next state, Xt+1, is equal to the previous state Xt multiplied by the transition matrix A, so with the transition matrix you posted, you would have:
x t+1 = x t + v_x t + 0.5 a_x t
y t+1 = y t + v_y t + 0.5 a_y t
v_x t+1 = v_x t + a_x t
v_y t+1 = v_t t + a_t t
a_x t+1 = a_x t
a_y t+1 = a_y t
Which are the discrete approximation of the equations of an object moving with constant acceleration if the time interval between the two states is equal to 1 (and that's why it makes sense to suppose that the other four variables are velocities and accelerations).
This is a Kalman filter that allows for faster variations in the velocity estimation, so it introduces a lower delay than a (4, 2, 0) filter, which would use a constant velocity model.

findHomography, getPerspectiveTransform, & getAffineTransform

This question is on the OpenCV functions findHomography, getPerspectiveTransform & getAffineTransform
What is the difference between findHomography and getPerspectiveTransform?. My understanding from the documentation is that getPerspectiveTransform computes the transform using 4 correspondences (which is the minimum required to compute a homography/perspective transform) where as findHomography computes the transform even if you provide more than 4 correspondencies (presumably using something like a least squares method?).
Is this correct?
(In which case the only reason OpenCV still continues to support getPerspectiveTransform should be legacy? )
My next concern is that I want to know if there is an equivalent to findHomography for computing an Affine transformation? i.e. a function which uses a least squares or an equivalent robust method to compute and affine transformation.
According to the documentation getAffineTransform takes in only 3 correspondences (which is the min required to compute an affine transform).
Best,

Q #1: Right, the findHomography tries to find the best transform between two sets of points. It uses something smarter than least squares, called RANSAC, which has the ability to reject outliers - if at least 50% + 1 of your data points are OK, RANSAC will do its best to find them, and build a reliable transform.
The getPerspectiveTransform has a lot of useful reasons to stay - it is the base for findHomography, and it is useful in many situations where you only have 4 points, and you know they are the correct ones. The findHomography is usually used with sets of points detected automatically - you can find many of them, but with low confidence. getPerspectiveTransform is good when you kn ow for sure 4 corners - like manual marking, or automatic detection of a rectangle.
Q #2 There is no equivalent for affine transforms. You can use findHomography, because affine transforms are a subset of homographies.

I concur with everything #vasile has written. I just want to add some observations:
getPerspectiveTransform() and getAffineTransform() are meant to work on 4 or 3 points (respectively), that are known to be correct correspondences. On real-life images taken with a real camera, you can never get correspondences that accurate, not with automatic nor manual marking of the corresponding points.
There are always outliers. Just look at the simple case of wanting to fit a curve through points (e.g. take a generative equation with noise y1 = f(x) = 3.12x + gauss_noise or y2 = g(x) = 0.1x^2 + 3.1x + gauss_noise): it will be much more easier to find a good quadratic function to estimate the points in both cases, than a good linear one. Quadratic might be an overkill, but in most cases will not be (after removing outliers), and if you want to fit a straight line there you better be mightily sure that is the right model, otherwise you are going to get unusable results.
That said, if you are mightily sure that affine transform is the right one, here's a suggestion:
use findHomography, that has RANSAC incorporated in to the functionality, to get rid of the outliers and get an initial estimate of the image transformation
select 3 correct matches-correspondances (that fit with the homography found), or reproject 3 points from the 1st image to the 2nd (using the homography)
use those 3 matches (that are as close to correct as you can get) in getAffineTransform()
wrap all of that in your own findAffine() if you want - and voila!

Re Q#2, estimateRigidTransform is the oversampled equivalent of getAffineTransform. I don't know if it was in OCV when this was first posted, but it's available in 2.4.

There is an easy solution for the finding the Affine transform for the system of over-determined equations.
Note that in general an Affine transform finds a solution to the over-determined system of linear equations Ax=B by using a pseudo-inverse or a similar technique, so
x = (A At )-1 At B
Moreover, this is handled in the core openCV functionality by a simple call to solve(A, B, X).
Familiarize yourself with the code of Affine transform in opencv/modules/imgproc/src/imgwarp.cpp: it really does just two things:
a. rearranges inputs to create a system Ax=B;
b. then calls solve(A, B, X);
NOTE: ignore the function comments in the openCV code - they are confusing and don’t reflect the actual ordering of the elements in the matrices. If you are solving [u, v]’= Affine * [x, y, 1] the rearrangement is:
x1 y1 1 0 0 1
0 0 0 x1 y1 1
x2 y2 1 0 0 1
A = 0 0 0 x2 y2 1
x3 y3 1 0 0 1
0 0 0 x3 y3 1
X = [Affine11, Affine12, Affine13, Affine21, Affine22, Affine23]’
u1 v1
B = u2 v2
u3 v3
All you need to do is to add more points. To make Solve(A, B, X) work on over-determined system add DECOMP_SVD parameter. To see the powerpoint slides on the topic, use this link. If you’d like to learn more about the pseudo-inverse in the context of computer vision, the best source is: ComputerVision, see chapter 15 and appendix C.
If you are still unsure how to add more points see my code below:
// extension for n points;
cv::Mat getAffineTransformOverdetermined( const Point2f src[], const Point2f dst[], int n )
{
Mat M(2, 3, CV_64F), X(6, 1, CV_64F, M.data); // output
double* a = (double*)malloc(12*n*sizeof(double));
double* b = (double*)malloc(2*n*sizeof(double));
Mat A(2*n, 6, CV_64F, a), B(2*n, 1, CV_64F, b); // input
for( int i = 0; i < n; i++ )
{
int j = i*12; // 2 equations (in x, y) with 6 members: skip 12 elements
int k = i*12+6; // second equation: skip extra 6 elements
a[j] = a[k+3] = src[i].x;
a[j+1] = a[k+4] = src[i].y;
a[j+2] = a[k+5] = 1;
a[j+3] = a[j+4] = a[j+5] = 0;
a[k] = a[k+1] = a[k+2] = 0;
b[i*2] = dst[i].x;
b[i*2+1] = dst[i].y;
}
solve( A, B, X, DECOMP_SVD );
delete a;
delete b;
return M;
}
// call original transform
vector<Point2f> src(3);
vector<Point2f> dst(3);
src[0] = Point2f(0.0, 0.0);src[1] = Point2f(1.0, 0.0);src[2] = Point2f(0.0, 1.0);
dst[0] = Point2f(0.0, 0.0);dst[1] = Point2f(1.0, 0.0);dst[2] = Point2f(0.0, 1.0);
Mat M = getAffineTransform(Mat(src), Mat(dst));
cout<<M<<endl;
// call new transform
src.resize(4); src[3] = Point2f(22, 2);
dst.resize(4); dst[3] = Point2f(22, 2);
Mat M2 = getAffineTransformOverdetermined(src.data(), dst.data(), src.size());
cout<<M2<<endl;

getAffineTransform:affine transform is combination of translation, scale, shear, and rotation
https://www.mathworks.com/discovery/affine-transformation.html
https://www.tutorialspoint.com/computer_graphics/2d_transformation.htm
getPerspectiveTransform：perspective transform is project mapping
enter image description here

Bezier Curve Evaluation

I am following this paper here using de Casteljau's Algorithm http://www.cgafaq.info/wiki/B%C3%A9zier_curve_evaluation and I have tried using the topic Drawing Bezier curves using De Casteljau Algorithm in C++ , OpenGL to help. No success.
My bezier curves look like this when evaluated
As you can see, even though it doesn't work the wanted I wanted it to, all the points are indeed on the curve. I do not think that this algorithm is inaccurate for this reason.
Here are my points on the top curve in that image:
(0,0)
(2,0)
(2,2)
(4,2) The second curve uses the same set of points, except the third point is (0,2), that is, two units above the first point, forming a steeper curve.
Something is wrong. I should put in 0.25 for t and it should spit out 1.0 for the X value, and .75 should always return 3. Assume t is time. It should progress at a constant rate, yeah? Exactly 25% of the way in, the X value should be 1.0 and then the Y should be associated with that value.
Are there any adequate ways to evaluate a bezier curve? Does anyone know what is going on here?
Thanks for any help! :)
EDIT------
I found this book in a google search http://www.tsplines.com/resources/class_notes/Bezier_curves.pdf and here is the page I found on explicit / non-parametric bezier curves. They are polynomials represented as bezier curves, which is what I am going for here. Here is that page from the book:
Anyone know how to convert a bezier curve to a parametric curve? I may open a different thread now...
EDIT AGAIN AS OF 1 NOVEMBER 2011-------
I've realized that I was only asking the question about half as clear as I should have. What I'm trying to build is like Maya's animation graph editor such as this http://www.youtube.com/watch?v=tckN35eYJtg&t=240 where the bezier control points that are used to modify the curve are more like tangent modifiers of equal length. I didn't remember them as being equal length, to be honest. By forcing a system like this, you can insure 100% that the result is a function and contains no overlapping segments.
I found this, which may have my answer http://create.msdn.com/en-US/education/catalog/utility/curve_editor

Here you can see the algorithm implemented in Mathematica following the nomenclature in your link, and your two plots:
(*Function Definitions*)
lerp[a_, b_, t_] := (1 - t) a + t b;
pts1[t_] := {
lerp[pts[[1]], pts[[2]], t],
lerp[pts[[2]], pts[[3]], t],
lerp[pts[[3]], pts[[4]], t]};
pts2[t_] := {
lerp[pts1[t][[1]], pts1[t][[2]], t],
lerp[pts1[t][[2]], pts1[t][[3]], t]};
pts3[t_] := {
lerp[pts2[t][[1]], pts2[t][[2]], t]};
(*Usages*)
pts = {{0, 0}, {2, 0}, {2, 2}, {4, 2}};
Framed#Show[ParametricPlot[pts3[t], {t, 0, 1}, Axes -> True],
Graphics[{Red, PointSize[Large], Point#pts}]]
pts = {{0, 0}, {2, 0}, {0, 2}, {4, 2}};
Framed#Show[ParametricPlot[pts3[t], {t, 0, 1}, Axes -> True],
Graphics[{Red, PointSize[Large], Point#pts}]]
BTW, the curves are defined by the following parametric equations, which are the functions pts3[t] in the code above:
c1[t_] := {2 t (3 + t (-3 + 2 t)), (* <- X component *)
2 (3 - 2 t) t^2} (* <- Y component *)
and
c2[t_] := {2 t (3 + t (-6 + 5 t)), (* <- X component *)
, 2 (3 - 2 t) t^2} (* <- Y component *)
Try plotting them!
Taking any of these curve equations, and by solving a cubic polynomial you can in these cases get an expression for y[x], which is certainly not always possible. Just for you to get a flavor of it, from the first curve you get (C syntax):
y[x]= 3 - x - 3/Power(-2 + x + Sqrt(5 + (-4 + x)*x),1/3) +
3*Power(-2 + x + Sqrt(5 + (-4 + x)*x),1/3)
Try plotting it!
Edit
Just an amusement:
Mathematica is a quite powerful functional language, and in fact the whole algorithm can be expressed as a one liner:
f = Nest[(1 - t) #[[1]] + t #[[2]] & /# Partition[#, 2, 1] &, #, Length## - 1] &
Such as
f#{{0, 0}, {2, 0}, {0, 2}, {4, 2}}
gives the above results, but supports any number of points.
Let's try with six random points:
p = RandomReal[1, {6, 2}];
Framed#Show[
Graphics[{Red, PointSize[Large], Point#p}],
ParametricPlot[f#p, {t, 0, 1}, Axes -> True]]
Moreover, the same function works in 3D:
p = RandomReal[1, {4, 3}];
Framed#Show[
Graphics3D[{Red, PointSize[Large], Point#p}],
ParametricPlot3D[f[p], {t, 0, 1}, Axes -> True]]

A bezier curve can be solved by solving the following parametric equations for the x, y, and z coordinates (if it's just 2D, do only x and y):
Px = (1-t)^3(P1x) + 3t(1-t)^2(P2x) + 3t^2(1-t)(P3x) + t^3(P4x)
Py = (1-t)^3(P1y) + 3t(1-t)^2(P2y) + 3t^2(1-t)(P3y) + t^3(P4y)
Pz = (1-t)^3(P1z) + 3t(1-t)^2(P2z) + 3t^2(1-t)(P3z) + t^3(P4z)
You can also solve this by multiplying the matrix equation ABC = X where:
matrix A is a 1x4 matrix and represents the values of the powers of t
matrix B are the coefficients of the powers of t, and is a lower-triangular 4x4 matrix
matrix C is a 4x3 matrix that represents each of the four bezier points in 3D-space (it would be a 4x2 matrix in 2D-space)
This would look like the following:
(Update - the bottom left 1 ought to be a -1)
An important note in both forms of the equation (the parametric and the matrix forms) is that t is in the range [0, 1].
Rather than attempting to solve the values for t that will give you integral values of x and y, which is going to be time-consuming given that you're basically solving for the real root of a 3rd-degree polynomial, it's much better to simply create a small enough differential in your t value such that the difference between any two points on the curve is smaller than a pixel-value increment. In other words the distance between the two points P(t1) and P(t2) is such that it is less than a pixel value. Alternatively, you can use a larger differential in t, and simply linearly interpolate between P(t1) and P(t2), keeping in mind that the curve may not be "smooth" if the differential between P(t1) and P(t2) is not small enough for the given range of t from [0, 1].
A good way to find the necessary differential in t to create a fairly "smooth" curve from a visual standpoint is to actually measure the distance between the four points that define the bezier curve. Measure the distance from P1 to P2, P2, to P3, and P3 to P4. Then take the longest distance, and use the inverse of that value as the differential for t. You may still need to-do some linear interpolation between points, but the number of pixels in each "linear" sub-curve should be fairly small, and therefore the curve itself will appear fairly smooth. You can always decrease the differential value on t from this initial value to make it "smoother".
Finally, to answer your question:
Assume t is time. It should progress at a constant rate, yeah? Exactly 25% of the way in, the X value should be 1.0 and then the Y should be associated with that value.
No, that is not correct, and the reason is that the vectors (P2 - P1) and (P3 - P4) are not only tangent to the bezier curve at P1 and P4, but their lengths define the velocity along the curve at those points as well. Thus if the vector (P2 - P1) is a short distance, then that means for a given amount of time t, you will not travel very far from the point P1 ... this translates into the x,y values along the curve being packed together very closely for a given fixed differential of t. You are effectively "slowing down" in velocity as you move towards P1. The same effect takes place at P4 on the curve depending on the length of the vector (P3 - P4). The only way that the velocity along the curve would be "constant", and therefore the distance between any points for a common differential of t would be the same, would be if the lengths of all three segements (P2 - P1), (P3 - P2), and (P4 - P3) were the same. That would then indicate that there was no change in velocity along the curve.

It sounds like you actually just want a 1D cubic Bezier curve instead of the 2D that you have. Specifically, what you actually want is just a cubic polynomial segment that starts at 0 and goes up to 2 when evaluated over the domain of 0 to 4. So you could use some basic math and just find the polynomial:
f(x) = a + b*x + c*x^2 + d*x^3
f(0) = 0
f(4) = 2
That leaves two degrees of freedom.
Take the derivative of the function:
f'(x) = b + 2*c*x + 3*d*x^2
If you want it to be steep at the beginning and then level off at the end you might say something like:
f'(0) = 10
f'(4) = 0
Then we can plug in values. a and b come for free because we're evaluating at zero.
a = 0
b = 10
So then we have:
f(4) = 2 = 40 + c*16 + d*64
f'(4) = 0 = 10 + c*8 + d*48
That's a pretty easy linear system to solve. For completeness, we get:
16c + 64d = -38
8c + 48d = -10
So-
1/(16*48 - 8*64)|48 -64||-38| = |c| = |-37/8 |
|-8 16||-10| |d| | 9/16|
f(x) = 10*x - (37/8)*x^2 + (9/16)*x^3
If, instead, you decide that you want to use Bezier control points, just pick your 4 y-value control points and recognize that in order to get t in [0,1], you just have to say t=x/4 (remembering that if you also need derivatives, you'll have to do a change there too).
Added:
If you happen to know the points and derivatives you want to begin and end with, but you want to use Bezier control points P1, P2, P3, and P4, the mapping is just this (assuming a curve parametrized over 0 to 1):
P1 = f(0)
P2 = f'(0)/3 + f(0)
P3 = f(1) - f'(1)/3
P4 = f(1)
If, for some reason, you wanted to stick with your 2D Bezier control points and wanted to ensure that the x dimension advanced linearly from 0 to 2 as t advanced from 0 to 1, then you could do that with control points (0,y1) (2/3,y2) (4/3,y3) (2,y4). You can see that I just made the x dimension start at 0, end at 2, and have a constant slope (derivative) of 2 (with respect to t). Then you just make the y-coordinate be whatever you need it to be. The different dimensions are essentially independent of each other.

After all this time, I was looking for hermite curves. Hermites are good because in one dimension they're guaranteed to produce a functional curve that can be evaluated to an XY point. I was confusing Hermites with Bezier.

"Assume t is time."
This is the problem - t is not the time. The curve has it's own rate of change of t, depending on the magnitude of the tangents. Like Jason said, the distance between the consequent points must be the same in order of t to be the same as the time. This is exactly what the non-weighted mode (which is used by default) in the Maya's curve editor is. So this was a perfectly good answer for how to fix this issue. To make this work for arbitrary tangents, you must convert the time to t. You can find t by calculating the bezier equation in the x (or time) direction.
Px = (1-t)^3(P1x) + 3t(1-t)^2(P2x) + 3t^2(1-t)(P3x) + t^3(P4x)
Px is your time, so you know everything here, but t. You must solve a cubic equation to find the roots. There is a tricky part to find the exact root you need though. Then you solve the other equation to find Py (the actual value you are looking for), knowing now t:
Py = (1-t)^3(P1y) + 3t(1-t)^2(P2y) + 3t^2(1-t)(P3y) + t^3(P4y)
This is what the weighted curves in Maya are.
I know the question is old, but I lost a whole day researching this simple thing, and nobody explains exactly what happens. Otherwise, the calculation process itself is written on many places, the Maya API manual for example. The Maya devkit also has a source code to do this.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js