I'm working on my fast (and accurate) sin implementation in C++, and I have a problem regarding the efficient angle scaling into the +- pi/2 range.
My sin function for +-pi/2 using Taylor series is the following
(Note: FLOAT is a macro expanded to float or double just for the benchmark)
/**
* Sin for 'small' angles, accurate on [-pi/2, pi/2], fairly accurate on [-pi, pi]
*/
// To switch between float and double
#define FLOAT float
FLOAT
my_sin_small(FLOAT x)
{
constexpr FLOAT C1 = 1. / (7. * 6. * 5. * 4. * 3. * 2.);
constexpr FLOAT C2 = -1. / (5. * 4. * 3. * 2.);
constexpr FLOAT C3 = 1. / (3. * 2.);
constexpr FLOAT C4 = -1.;
// Correction for sin(pi/2) = 1, due to the ignored taylor terms
constexpr FLOAT corr = -1. / 0.9998431013994987;
const FLOAT x2 = x * x;
return corr * x * (x2 * (x2 * (x2 * C1 + C2) + C3) + C4);
}
So far so good... The problem comes when I try to scale an arbitrary angle into the +-pi/2 range. My current solution is:
FLOAT
my_sin(FLOAT x)
{
constexpr FLOAT pi = 3.141592653589793238462;
constexpr FLOAT rpi = 1 / pi;
// convert to +-pi/2 range
int n = std::nearbyint(x * rpi);
FLOAT xbar = (n * pi - x) * (2 * (n & 1) - 1);
// (2 * (n % 2) - 1) is a sign correction (see below)
return my_sin_small(xbar);
};
I made a benchmark, and I'm losing a lot for the +-pi/2 scaling.
Tricking with int(angle/pi + 0.5) is a nope since it is limited to the int precision, also requires +- branching, and i try to avoid branches...
What should I try to improve the performance for this scaling? I'm out of ideas.
Benchmark results for float. (In the benchmark the angle could be out of the validity range for my_sin_small, but for the bench I don't care about that...):
Benchmark results for double.
Sign correction for xbar in my_sin():
Algo accuracy compared to python sin() function:
Candidate improvements
Convert the radians x to rotations by dividing by 2*pi.
Retain only the fraction so we have an angle (-1.0 ... 1.0). This simplifies the OP's modulo step to a simple "drop the whole number" step instead. Going forward with different angle units simply involves a co-efficient set change. No need to scale back to radians.
For positive values, subtract 0.5 so we have (-0.5 ... 0.5) and then flip the sign. This centers the possible values about 0.0 and makes for better convergence of the approximating polynomial as compared to the math sine function. For negative values - see below.
Call my_sin_small1() that uses this (-0.5 ... 0.5) rotations range rather than [-pi ... +pi] radians.
In my_sin_small1(), fold constants together to drop the corr * step.
Rather than use the truncated Taylor's series, use a more optimal set. IMO, this will provide better answers, especially near +/-pi.
Notes: No int to/from float code. With more analysis, possible to get a better set of coefficients that fix my_sin(+/-pi) closer to 0.0. This is just a quick set of code to demo less FP steps and good potential results.
C like code for OP to port to C++
FLOAT my_sin_small1(FLOAT x) {
static const FLOAT A1 = -5.64744881E+01;
static const FLOAT A2 = +7.81017968E+01;
static const FLOAT A3 = -4.11145353E+01;
static const FLOAT A4 = +6.27923581E+00;
const FLOAT x2 = x * x;
return x * (x2 * (x2 * (x2 * A1 + A2) + A3) + A4);
}
FLOAT my_sin1(FLOAT x) {
static const FLOAT pi = 3.141592653589793238462;
static const FLOAT pi2i = 1/(pi * 2);
x *= pi2i;
FLOAT xfraction = 0.5f - (x - truncf(x));
return my_sin_small1(xfraction);
}
For negative values, use -my_sin1(-x) or like code to flip the sign - or add 0.5 in the above minus 0.5 step.
Test
#include <math.h>
#include <stdio.h>
int main(void) {
for (int d = 0; d <= 360; d += 20) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = my_sin1(x);
printf("%12.6f %11.8f %11.8f\n", x, sin(x), y);
}
}
Output
0.000000 0.00000000 -0.00022483
0.349066 0.34202013 0.34221691
0.698132 0.64278759 0.64255589
1.047198 0.86602542 0.86590189
1.396263 0.98480775 0.98496443
1.745329 0.98480775 0.98501128
2.094395 0.86602537 0.86603642
2.443461 0.64278762 0.64260530
2.792527 0.34202022 0.34183803
3.141593 -0.00000009 0.00000000
3.490659 -0.34202016 -0.34183764
3.839724 -0.64278757 -0.64260519
4.188790 -0.86602546 -0.86603653
4.537856 -0.98480776 -0.98501128
4.886922 -0.98480776 -0.98496443
5.235988 -0.86602545 -0.86590189
5.585053 -0.64278773 -0.64255613
5.934119 -0.34202036 -0.34221727
6.283185 0.00000017 -0.00022483
Alternate code below makes for better results near 0.0, yet might cost a tad more time. OP seems more inclined to speed.
FLOAT xfraction = 0.5f - (x - truncf(x));
// vs.
FLOAT xfraction = x - truncf(x);
if (x >= 0.5f) x -= 1.0f;
[Edit]
Below is a better set with about 10% reduced error.
-56.0833765f
77.92947047f
-41.0936875f
6.278635918f
Yet another approach:
Spend more time (code) to reduce the range to ±pi/4 (±45 degrees), then possible to use only 3 or 2 terms of a polynomial that is like the usually Taylors series.
float sin_quick_small(float x) {
const float x2 = x * x;
#if 0
// max error about 7e-7
static const FLOAT A2 = +0.00811656036940792f;
static const FLOAT A3 = -0.166597759850666f;
static const FLOAT A4 = +0.999994132743861f;
return x * (x2 * (x2 * A2 + A3) + A4);
#else
// max error about 0.00016
static const FLOAT A3 = -0.160343346851626f;
static const FLOAT A4 = +0.999031566686144f;
return x * (x2 * A3 + A4);
#endif
}
float cos_quick_small(float x) {
return cosf(x); // TBD code.
}
float sin_quick(float x) {
if (x < 0.0) {
return -sin_quick(-x);
}
int quo;
float x90 = remquof(fabsf(x), 3.141592653589793238462f / 2, &quo);
switch (quo % 4) {
case 0:
return sin_quick_small(x90);
case 1:
return cos_quick_small(x90);
case 2:
return sin_quick_small(-x90);
case 3:
return -cos_quick_small(x90);
}
return 0.0;
}
int main() {
float max_x = 0.0;
float max_error = 0.0;
for (int d = -45; d <= 45; d += 1) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = sin_quick(x);
double err = fabs(y - sin(x));
if (err > max_error) {
max_x = x;
max_error = err;
}
printf("%12.6f %11.8f %11.8f err:%11.8f\n", x, sin(x), y, err);
}
printf("x:%.6f err:%.6f\n", max_x, max_error);
return 0;
}
After advice from krlzlx I have posted it as a new question.
From here:
3D Line Segment and Plane Intersection
I have a problem with this algorithm, I have implemented it like so:
template <class T>
class AnyCollision {
public:
std::pair<bool, T> operator()(Point3d &ray, Point3d &rayOrigin, Point3d &normal, Point3d &coord) const {
// get d value
float d = (normal.x * coord.x) + (normal.y * coord.y) + (normal.z * coord.z);
if (((normal.x * ray.x) + (normal.y * ray.y) + (normal.z * ray.z)) == 0) {
return std::make_pair(false, T());
}
// Compute the X value for the directed line ray intersecting the plane
float a = (d - ((normal.x * rayOrigin.x) + (normal.y * rayOrigin.y) + (normal.z * rayOrigin.z)) / ((normal.x * ray.x) + (normal.y * ray.y) + (normal.z * ray.z)));
// output contact point
float rayMagnitude = (sqrt(pow(ray.x, 2) + pow(ray.y, 2) + pow(ray.z, 2)));
Point3d rayNormalised((ray.x / rayMagnitude), (ray.y / rayMagnitude), (ray.z / rayMagnitude));
Point3d contact((rayOrigin.x + (rayNormalised.x * a)), (rayOrigin.y + (rayNormalised.y * a)), (rayOrigin.z + (rayNormalised.z * a))); //Make sure the ray vector is normalized
return std::make_pair(true, contact);
};
Point3d is defined as:
class Point3d {
public:
double x;
double y;
double z;
/**
* constructor
*
* 0 all elements
*/
Point3d() {
x = 0.0;
y = 0.0;
z = 0.0;
}
I am forced to use this structure, because in the larger system my component runs in it is defined like this and it cannot be changed.
My code compiles fine, but testing I get incorrect values for the point. The ratio of x, y, z is correct but the magnitude is wrong.
For example if:
rayOrigin.x = 0;
rayOrigin.y = 0;
rayOrigin.z = 0;
ray.x = 3;
ray.y = -5;
ray.z = 12;
normal.x = -3;
normal.y = 12;
normal.z = 0;
coord.x = 7;
coord.y = -5;
coord.z = 10;
I expect the point to be:
(0.63, 1.26, 1.89)
However, it is:
(3.52, -5.87, 14.09)
A magnitude of 5.09 too big.
And I also tested:
rayOrigin.x = 0;
rayOrigin.y = 0;
rayOrigin.z = 0;
ray.x = 2;
ray.y = 3;
ray.z = 3;
normal.x = 4;
normal.y = 1;
normal.z = 0;
p0.x = 2;
p0.y = 1;
p0.z = 5;
I expect the point to be:
(1.64, 2.45, 2.45)
However, it is:
(3.83761, 5.75642, 5.75642)
A magnitude of 2.34 too big?
Pseudocode (does not require vector normalization):
Diff = PlaneBaseCoordinate - RayOrigin
d = Normal.dot.Diff
e = Normal.dot.RayVector
if (e)
IntersectionPoint = RayOrigin + RayVector * d / e
otherwise
ray belongs to the plane or is parallel
Quick check:
Ray (0,0,0) (2,2,2) //to catch possible scale issues
Plane (0,1,0) (0,3,0) //plane y=1
Diff = (0,1,0)
d = 3
e = 6
IntersectionPoint = (0,0,0) + (2,2,2) * 3 / 6 = (1, 1, 1)
I have an ellipse, defined by Center Point, radiusX and radiusY, and I have a Point. I want to find the point on the ellipse that is closest to the given point. In the illustration below, that would be S1.
Now I already have code, but there is a logical error somewhere in it, and I seem to be unable to find it. I broke the problem down to the following code example:
#include <vector>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <math.h>
using namespace std;
void dostuff();
int main()
{
dostuff();
return 0;
}
typedef std::vector<cv::Point> vectorOfCvPoints;
void dostuff()
{
const double ellipseCenterX = 250;
const double ellipseCenterY = 250;
const double ellipseRadiusX = 150;
const double ellipseRadiusY = 100;
vectorOfCvPoints datapoints;
for (int i = 0; i < 360; i+=5)
{
double angle = i / 180.0 * CV_PI;
double x = ellipseRadiusX * cos(angle);
double y = ellipseRadiusY * sin(angle);
x *= 1.4;
y *= 1.4;
x += ellipseCenterX;
y += ellipseCenterY;
datapoints.push_back(cv::Point(x,y));
}
cv::Mat drawing = cv::Mat::zeros( 500, 500, CV_8UC1 );
for (int i = 0; i < datapoints.size(); i++)
{
const cv::Point & curPoint = datapoints[i];
const double curPointX = curPoint.x;
const double curPointY = curPoint.y * -1; //transform from image coordinates to geometric coordinates
double angleToEllipseCenter = atan2(curPointY - ellipseCenterY * -1, curPointX - ellipseCenterX); //ellipseCenterY * -1 for transformation to geometric coords (from image coords)
double nearestEllipseX = ellipseCenterX + ellipseRadiusX * cos(angleToEllipseCenter);
double nearestEllipseY = ellipseCenterY * -1 + ellipseRadiusY * sin(angleToEllipseCenter); //ellipseCenterY * -1 for transformation to geometric coords (from image coords)
cv::Point center(ellipseCenterX, ellipseCenterY);
cv::Size axes(ellipseRadiusX, ellipseRadiusY);
cv::ellipse(drawing, center, axes, 0, 0, 360, cv::Scalar(255));
cv::line(drawing, curPoint, cv::Point(nearestEllipseX,nearestEllipseY*-1), cv::Scalar(180));
}
cv::namedWindow( "ellipse", CV_WINDOW_AUTOSIZE );
cv::imshow( "ellipse", drawing );
cv::waitKey(0);
}
It produces the following image:
You can see that it actually finds "near" points on the ellipse, but it are not the "nearest" points. What I intentionally want is this: (excuse my poor drawing)
would you extent the lines in the last image, they would cross the center of the ellipse, but this is not the case for the lines in the previous image.
I hope you get the picture. Can anyone tell me what I am doing wrong?
Consider a bounding circle around the given point (c, d), which passes through the nearest point on the ellipse. From the diagram it is clear that the closest point is such that a line drawn from it to the given point must be perpendicular to the shared tangent of the ellipse and circle. Any other points would be outside the circle and so must be further away from the given point.
So the point you are looking for is not the intersection between the line and the ellipse, but the point (x, y) in the diagram.
Gradient of tangent:
Gradient of line:
Condition for perpedicular lines - product of gradients = -1:
When rearranged and substituted into the equation of your ellipse...
...this will give two nasty quartic (4th-degree polynomial) equations in terms of either x or y. AFAIK there are no general analytical (exact algebraic) methods to solve them. You could try an iterative method - look up the Newton-Raphson iterative root-finding algorithm.
Take a look at this very good paper on the subject:
http://www.spaceroots.org/documents/distance/distance-to-ellipse.pdf
Sorry for the incomplete answer - I totally blame the laws of mathematics and nature...
EDIT: oops, i seem to have a and b the wrong way round in the diagram xD
There is a relatively simple numerical method with better convergence than Newtons Method. I have a blog post about why it works http://wet-robots.ghost.io/simple-method-for-distance-to-ellipse/
This implementation works without any trig functions:
def solve(semi_major, semi_minor, p):
px = abs(p[0])
py = abs(p[1])
tx = 0.707
ty = 0.707
a = semi_major
b = semi_minor
for x in range(0, 3):
x = a * tx
y = b * ty
ex = (a*a - b*b) * tx**3 / a
ey = (b*b - a*a) * ty**3 / b
rx = x - ex
ry = y - ey
qx = px - ex
qy = py - ey
r = math.hypot(ry, rx)
q = math.hypot(qy, qx)
tx = min(1, max(0, (qx * r / q + ex) / a))
ty = min(1, max(0, (qy * r / q + ey) / b))
t = math.hypot(ty, tx)
tx /= t
ty /= t
return (math.copysign(a * tx, p[0]), math.copysign(b * ty, p[1]))
Credit to Adrian Stephens for the Trig-Free Optimization.
Here is the code translated to C# implemented from this paper to solve for the ellipse:
http://www.geometrictools.com/Documentation/DistancePointEllipseEllipsoid.pdf
Note that this code is untested - if you find any errors let me know.
//Pseudocode for robustly computing the closest ellipse point and distance to a query point. It
//is required that e0 >= e1 > 0, y0 >= 0, and y1 >= 0.
//e0,e1 = ellipse dimension 0 and 1, where 0 is greater and both are positive.
//y0,y1 = initial point on ellipse axis (center of ellipse is 0,0)
//x0,x1 = intersection point
double GetRoot ( double r0 , double z0 , double z1 , double g )
{
double n0 = r0*z0;
double s0 = z1 - 1;
double s1 = ( g < 0 ? 0 : Math.Sqrt(n0*n0+z1*z1) - 1 ) ;
double s = 0;
for ( int i = 0; i < maxIter; ++i ){
s = ( s0 + s1 ) / 2 ;
if ( s == s0 || s == s1 ) {break; }
double ratio0 = n0 /( s + r0 );
double ratio1 = z1 /( s + 1 );
g = ratio0*ratio0 + ratio1*ratio1 - 1 ;
if (g > 0) {s0 = s;} else if (g < 0) {s1 = s ;} else {break ;}
}
return s;
}
double DistancePointEllipse( double e0 , double e1 , double y0 , double y1 , out double x0 , out double x1)
{
double distance;
if ( y1 > 0){
if ( y0 > 0){
double z0 = y0 / e0;
double z1 = y1 / e1;
double g = z0*z0+z1*z1 - 1;
if ( g != 0){
double r0 = (e0/e1)*(e0/e1);
double sbar = GetRoot(r0 , z0 , z1 , g);
x0 = r0 * y0 /( sbar + r0 );
x1 = y1 /( sbar + 1 );
distance = Math.Sqrt( (x0-y0)*(x0-y0) + (x1-y1)*(x1-y1) );
}else{
x0 = y0;
x1 = y1;
distance = 0;
}
}
else // y0 == 0
x0 = 0 ; x1 = e1 ; distance = Math.Abs( y1 - e1 );
}else{ // y1 == 0
double numer0 = e0*y0 , denom0 = e0*e0 - e1*e1;
if ( numer0 < denom0 ){
double xde0 = numer0/denom0;
x0 = e0*xde0 ; x1 = e1*Math.Sqrt(1 - xde0*xde0 );
distance = Math.Sqrt( (x0-y0)*(x0-y0) + x1*x1 );
}else{
x0 = e0;
x1 = 0;
distance = Math.Abs( y0 - e0 );
}
}
return distance;
}
The following python code implements the equations described at "Distance from a Point to an Ellipse" and uses newton's method to find the roots and from that the closest point on the ellipse to the point.
Unfortunately, as can be seen from the example, it seems to only be accurate outside the ellipse. Within the ellipse weird things happen.
from math import sin, cos, atan2, pi, fabs
def ellipe_tan_dot(rx, ry, px, py, theta):
'''Dot product of the equation of the line formed by the point
with another point on the ellipse's boundary and the tangent of the ellipse
at that point on the boundary.
'''
return ((rx ** 2 - ry ** 2) * cos(theta) * sin(theta) -
px * rx * sin(theta) + py * ry * cos(theta))
def ellipe_tan_dot_derivative(rx, ry, px, py, theta):
'''The derivative of ellipe_tan_dot.
'''
return ((rx ** 2 - ry ** 2) * (cos(theta) ** 2 - sin(theta) ** 2) -
px * rx * cos(theta) - py * ry * sin(theta))
def estimate_distance(x, y, rx, ry, x0=0, y0=0, angle=0, error=1e-5):
'''Given a point (x, y), and an ellipse with major - minor axis (rx, ry),
its center at (x0, y0), and with a counter clockwise rotation of
`angle` degrees, will return the distance between the ellipse and the
closest point on the ellipses boundary.
'''
x -= x0
y -= y0
if angle:
# rotate the points onto an ellipse whose rx, and ry lay on the x, y
# axis
angle = -pi / 180. * angle
x, y = x * cos(angle) - y * sin(angle), x * sin(angle) + y * cos(angle)
theta = atan2(rx * y, ry * x)
while fabs(ellipe_tan_dot(rx, ry, x, y, theta)) > error:
theta -= ellipe_tan_dot(
rx, ry, x, y, theta) / \
ellipe_tan_dot_derivative(rx, ry, x, y, theta)
px, py = rx * cos(theta), ry * sin(theta)
return ((x - px) ** 2 + (y - py) ** 2) ** .5
Here's an example:
rx, ry = 12, 35 # major, minor ellipse axis
x0 = y0 = 50 # center point of the ellipse
angle = 45 # ellipse's rotation counter clockwise
sx, sy = s = 100, 100 # size of the canvas background
dist = np.zeros(s)
for x in range(sx):
for y in range(sy):
dist[x, y] = estimate_distance(x, y, rx, ry, x0, y0, angle)
plt.imshow(dist.T, extent=(0, sx, 0, sy), origin="lower")
plt.colorbar()
ax = plt.gca()
ellipse = Ellipse(xy=(x0, y0), width=2 * rx, height=2 * ry, angle=angle,
edgecolor='r', fc='None', linestyle='dashed')
ax.add_patch(ellipse)
plt.show()
Which generates an ellipse and the distance from the boundary of the ellipse as a heat map. As can be seen, at the boundary the distance is zero (deep blue).
Given an ellipse E in parametric form and a point P
the square of the distance between P and E(t) is
The minimum must satisfy
Using the trigonometric identities
and substituting
yields the following quartic equation:
Here's an example C function that solves the quartic directly and computes sin(t) and cos(t) for the nearest point on the ellipse:
void nearest(double a, double b, double x, double y, double *ecos_ret, double *esin_ret) {
double ax = fabs(a*x);
double by = fabs(b*y);
double r = b*b - a*a;
double c, d;
int switched = 0;
if (ax <= by) {
if (by == 0) {
if (r >= 0) { *ecos_ret = 1; *esin_ret = 0; }
else { *ecos_ret = 0; *esin_ret = 1; }
return;
}
c = (ax - r) / by;
d = (ax + r) / by;
} else {
c = (by + r) / ax;
d = (by - r) / ax;
switched = 1;
}
double cc = c*c;
double D0 = 12*(c*d + 1); // *-4
double D1 = 54*(d*d - cc); // *4
double D = D1*D1 + D0*D0*D0; // *16
double St;
if (D < 0) {
double t = sqrt(-D0); // *2
double phi = acos(D1 / (t*t*t));
St = 2*t*cos((1.0/3)*phi); // *2
} else {
double Q = cbrt(D1 + sqrt(D)); // *2
St = Q - D0 / Q; // *2
}
double p = 3*cc; // *-2
double SS = (1.0/3)*(p + St); // *4
double S = sqrt(SS); // *2
double q = 2*cc*c + 4*d; // *2
double l = sqrt(p - SS + q / S) - S - c; // *2
double ll = l*l; // *4
double ll4 = ll + 4; // *4
double esin = (4*l) / ll4;
double ecos = (4 - ll) / ll4;
if (switched) {
double t = esin;
esin = ecos;
ecos = t;
}
*ecos_ret = copysign(ecos, a*x);
*esin_ret = copysign(esin, b*y);
}
Try it online!
You just need to calculate the intersection of the line [P1,P0] to your elipse which is S1.
If the line equeation is:
and the elipse equesion is:
than the values of S1 will be:
Now you just need to calculate the distance between S1 to P1 , the formula (for A,B points) is:
I've solved the distance issue via focal points.
For every point on the ellipse
r1 + r2 = 2*a0
where
r1 - Euclidean distance from the given point to focal point 1
r2 - Euclidean distance from the given point to focal point 2
a0 - semimajor axis length
I can also calculate the r1 and r2 for any given point which gives me another ellipse that this point lies on that is concentric to the given ellipse. So the distance is
d = Abs((r1 + r2) / 2 - a0)
As propposed by user3235832
you shall solve quartic equation to find the normal to the ellipse (https://www.mathpages.com/home/kmath505/kmath505.htm). With good initial value only few iterations are needed (I use it myself). As an initial value I use S1 from your picture.
The fastest method I guess is
http://wwwf.imperial.ac.uk/~rn/distance2ellipse.pdf
Which has been mentioned also by Matt but as he found out the method doesn't work very well inside of ellipse.
The problem is the theta initialization.
I proposed an stable initialization:
Find the intersection of ellipse and horizontal line passing the point.
Find the other intersection using vertical line.
Choose the one that is closer the point.
Calculate the initial angle based on that point.
I got good results with no issue inside and outside:
As you can see in the following image it just iterated about 3 times to reach 1e-8. Close to axis it is 1 iteration.
The C++ code is here:
double initialAngle(double a, double b, double x, double y) {
auto abs_x = fabs(x);
auto abs_y = fabs(y);
bool isOutside = false;
if (abs_x > a || abs_y > b) isOutside = true;
double xd, yd;
if (!isOutside) {
xd = sqrt((1.0 - y * y / (b * b)) * (a * a));
if (abs_x > xd)
isOutside = true;
else {
yd = sqrt((1.0 - x * x / (a * a)) * (b * b));
if (abs_y > yd)
isOutside = true;
}
}
double t;
if (isOutside)
t = atan2(a * y, b * x); //The point is outside of ellipse
else {
//The point is inside
if (xd < yd) {
if (x < 0) xd = -xd;
t = atan2(y, xd);
}
else {
if (y < 0) yd = -yd;
t = atan2(yd, x);
}
}
return t;
}
double distanceToElipse(double a, double b, double x, double y, int maxIter = 10, double maxError = 1e-5) {
//std::cout <<"p="<< x << "," << y << std::endl;
auto a2mb2 = a * a - b * b;
double t = initialAngle(a, b, x, y);
auto ct = cos(t);
auto st = sin(t);
int i;
double err;
for (i = 0; i < maxIter; i++) {
auto f = a2mb2 * ct * st - x * a * st + y * b * ct;
auto fp = a2mb2 * (ct * ct - st * st) - x * a * ct - y * b * st;
auto t2 = t - f / fp;
err = fabs(t2 - t);
//std::cout << i + 1 << " " << err << std::endl;
t = t2;
ct = cos(t);
st = sin(t);
if (err < maxError) break;
}
auto dx = a * ct - x;
auto dy = b * st - y;
//std::cout << a * ct << "," << b * st << std::endl;
return sqrt(dx * dx + dy * dy);
}
I'm trying to rotate image with interpolation, but it's too slow for real time for big images.
the code something like:
for(int y=0;y<dst_h;++y)
{
for(int x=0;x<dst_w;++x)
{
//do inverse transform
fPoint pt(Transform(Point(x, y)));
//in coor of src
int x1= (int)floor(pt.x);
int y1= (int)floor(pt.y);
int x2= x1+1;
int y2= y1+1;
if((x1>=0&&x1<src_w&&y1>=0&&y1<src_h)&&(x2>=0&&x2<src_w&&y2>=0&&y2<src_h))
{
Mask[y][x]= 1; //show pixel
float dx1= pt.x-x1;
float dx2= 1-dx1;
float dy1= pt.y-y1;
float dy2= 1-dy1;
//bilinear
pd[x].blue= (dy2*(ps[y1*src_w+x1].blue*dx2+ps[y1*src_w+x2].blue*dx1)+
dy1*(ps[y2*src_w+x1].blue*dx2+ps[y2*src_w+x2].blue*dx1));
pd[x].green= (dy2*(ps[y1*src_w+x1].green*dx2+ps[y1*src_w+x2].green*dx1)+
dy1*(ps[y2*src_w+x1].green*dx2+ps[y2*src_w+x2].green*dx1));
pd[x].red= (dy2*(ps[y1*src_w+x1].red*dx2+ps[y1*src_w+x2].red*dx1)+
dy1*(ps[y2*src_w+x1].red*dx2+ps[y2*src_w+x2].red*dx1));
//nearest neighbour
//pd[x]= ps[((int)pt.y)*src_w+(int)pt.x];
}
else
Mask[y][x]= 0; //transparent pixel
}
pd+= dst_w;
}
How I can speed up this code, I try to parallelize this code but it seems there is no speed up because of memory access pattern (?).
The key is to do most of your computations as ints. The only thing that is necessary to do as a float is the weighting. See here for a good resource.
From that same resource:
int px = (int)x; // floor of x
int py = (int)y; // floor of y
const int stride = img->width;
const Pixel* p0 = img->data + px + py * stride; // pointer to first pixel
// load the four neighboring pixels
const Pixel& p1 = p0[0 + 0 * stride];
const Pixel& p2 = p0[1 + 0 * stride];
const Pixel& p3 = p0[0 + 1 * stride];
const Pixel& p4 = p0[1 + 1 * stride];
// Calculate the weights for each pixel
float fx = x - px;
float fy = y - py;
float fx1 = 1.0f - fx;
float fy1 = 1.0f - fy;
int w1 = fx1 * fy1 * 256.0f;
int w2 = fx * fy1 * 256.0f;
int w3 = fx1 * fy * 256.0f;
int w4 = fx * fy * 256.0f;
// Calculate the weighted sum of pixels (for each color channel)
int outr = p1.r * w1 + p2.r * w2 + p3.r * w3 + p4.r * w4;
int outg = p1.g * w1 + p2.g * w2 + p3.g * w3 + p4.g * w4;
int outb = p1.b * w1 + p2.b * w2 + p3.b * w3 + p4.b * w4;
int outa = p1.a * w1 + p2.a * w2 + p3.a * w3 + p4.a * w4;
wow you are doing a lot inside most inner loop like:
1.float to int conversions
can do all on floats ...
they are these days pretty fast
the conversion is what is killing you
also you are mixing float and ints together (if i see it right) which is the same ...
2.transform(x,y)
any unnecessary call makes heap trashing and slow things down
instead add 2 variables xx,yy and interpolate them insde your for loops
3.if ....
why to heck are you adding if ?
limit the for ranges before loop and not inside ...
the background can be filled with other fors before or later