I'm trying to implement the Runge Kutta method in Fortran and am facing a convergence problem. I don't know how much of the code I should show, so I'll describe the problem in detail, and please guide me as to what I should add/remove to/from the post to make it answerable.
I have a 6-dimensional vector of position and velocity of a ball, and a corresponding system of diff. eqs. that describe the equations of motions, from which I want to calculate the trajectory of the ball, and compare results for different orders of the RK method.
Let's focus on 3rd order RK. The model I use is implemented as follows:
k1 = h * f(vec_old,omega,phi)
k2 = h * f(vec_old + 0.5d0 * k1,omega,phi)
k3 = h * f(vec_old + 2d0 * k2 - k1,omega,phi)
vec = vec_old + (k1 + 4d0 * k2 + k3) / 6d0
Where f is the function that constitutes the equations of motion (or equivalently the RHS of my system of diff. eqs). Note that f is time independent, therefore has only 1 argument. h takes the role of a small time step dt.
If we wish to calculate the trajectory of the ball for a finite time total_time, and allow for a total error of epsilon, then we need to ensure each step takes a proportional fraction of the error. For the first step, I then did the following:
vec1 = solve(3,vec_old,h,omega,phi)
vec2 = solve(3,vec_old,h/2d0,omega,phi)
do while (maxval((/(abs(vec1(i) - vec2(i)),i=1,6)/)) > eps * h / (tot_time - current_time))
h = h / 2d0
vec1 = solve(3,vec_old,h,omega,phi)
vec2 = solve(3,vec_old,h/2d0,omega,phi)
end do
vec = (8d0/7d0) * vec2 - (1d0/7d0) * vec1
Where solve(3,vec_old,h,omega,phi) is the function that calculates the single RK step described above. 3 denotes the RK order we are using, vec_old is the current state of the position-velocity vector, h, h/2d0 both represent the time step being used, and omega,phi are just some extra parameters for f. Finally, for the first step we set current_time = 0d0.
The point is that if we use a 3rd order RK, we should have an error in $O(h^3)$, and thus fall off faster than linearly in h. Therefore, we should expect the while loop to eventually come to a halt for small enough h.
My problem is that the loop doesn't converge, and not even close - the ratio
maxval(...) / eps * (...)
remains pretty much constant, all the way until eps * h / (tot_time - current_time)) becomes zero due to finite precision.
For completeness, this is my definition for f:
function f(vec_old,omega,phi) result(vec)
real(8),intent(in) :: vec_old(6),omega,phi
real(8) :: vec(6)
real(8) :: v,Fv
v = sqrt(vec_old(4)**2+vec_old(5)**2+vec_old(6)**2)
Fv = 0.0039d0 + 0.0058d0 / (1d0 + exp((v-35d0)/5d0))
vec(1) = vec_old(4)
vec(2) = vec_old(5)
vec(3) = vec_old(6)
vec(4) = -Fv * v * vec_old(4) + 4.1d-4 * omega * (vec_old(6)*sin(phi) - vec_old(5)*cos(phi))
vec(5) = -Fv * v * vec_old(5) + 4.1d-4 * omega * vec_old(4)*cos(phi)
vec(6) = -Fv * v * vec_old(6) - 4.1d-4 * omega * vec_old(4)*sin(phi) - 9.8d0
end function f
Does anyone have any idea as to why the while loop doesn't converge?
If anything else is needed (output, other pieces of code etc.) please tell me and I'll add it. Also, if trimming is required, I'll cut whatever would be considered unnecessary. Thanks!
To compute the step error using the half step method, you need to compute the approximation at t+h in both cases, which means two steps with step size h/2. As it is now you compare the approximation at t+h to the approximation at t+h/2 which gives you an error of size f(vec(t+h/2))*h/2.
Thus change to a 3-step procedure
vec1 = solve(3,vec_old,h,omega,phi)
vec2 = solve(3,vec_old,h/2d0,omega,phi)
vec2 = solve(3,vec2 ,h/2d0,omega,phi)
in both locations, the difference of vec2-vec1 should then be of order h^4.
Related
I have three sets of 2d points. What i need to do is to find out where one sits in relation to the other two.
Every set has the same points, in the same order. One is 'neutral', one is 'max', and the third is unknown. What I need is to return a single value, between 0 and 1, that illustrates the amount that the unknown set is between the other two.
For example, in the image:
I would somehow get the 'distance' or 'weight' between Set A and Set B, then find out where Set C sits between them. In this example, i would expect a value of around 75%, or 0.75.
I have looked at using point set registration algorithms that return a scale amount to match Set C to Set B, but i am not convinced that this is the best way. What approach would be suitable for this problem? What algorithms should I be searching for?
You could try to solve this with a simple linear interpolation between the two sets. This works if the transition between the sets is indeed nearly linear. If you know that it is something else, you can adapt the interpolation function.
Let us focus on a single point p. We know its coordinates in all sets p_A, p_B, and p_C. Then, we specify that p_C is more or less a linear interpolation between p_A and p_B with parameter t (where t=0 represents set A and t=1 represents set B):
p_C = (1 - t) * p_A + t * p_B
= p_A - t * p_A + t * p_B
= p_A + t * (p_B - p_A)
p_C - p_A = t * (p_B - p_A)
The question now is to find a t that approximately holds for all your points.
We can solve this by stating the problem as a linear least squares problem. I.e. we want to minimize the summed residuals (difference between left-hand sides and right-hand sides of the above equation) for all points:
arg min_t Σ_i (pi_C.x - pi_A.x - t * (pi_B.x - pi_A.x))^2
+ (pi_C.y - pi_A.y - t * (pi_B.y - pi_A.y))^2
The optimal t is then:
numX = Σ_i (pi_A.x^2 - pi_A.x * pi_B.x - pi_A.x * pi_C.x + pi_B.x * pi_C.x)
numY = Σ_i (pi_A.y^2 - pi_A.y * pi_B.y - pi_A.y * pi_C.y + pi_B.y * pi_C.y)
denX = Σ_i (pi_A.x^2 - 2 * pi_A.x * pi_B.x + pi_B.x^2)
denY = Σ_i (pi_A.y^2 - 2 * pi_A.y * pi_B.y + pi_B.y^2)
t = (numX + numY) / (denX + denY)
If your points have higher dimension, just add the new dimension with the same pattern.
I'm trying to use sympy to generate equations for non-linear least squares fitting. My goal is to make this quite complex but for the moment, here's a simple case (but not too simple!). It's basically fitting a two dimensional sinusoid to data. Here's the sympy code:
from sympy import *
S, l, m = symbols('S l m', real=True)
u, v = symbols('u v', real=True)
Vobs = symbols('Vobs', complex=True)
Vres = Vobs - S * exp(- 1j * 2 * pi * (u*l+v*m))
J=Vres*conjugate(Vres)
axes = [S, l, m]
grad = derive_by_array(J, axes)
hess = derive_by_array(grad, axes)
One element of the grad term looks like:
- 2.0*I*pi*S*u*(-S*exp(-2.0*I*pi*(l*u + m*v)) + Vobs)*exp(2.0*I*pi*(l*u + m*v)) + 2.0*I*pi*S*u*(-S*exp(2.0*I*pi*(l*u + m*v)) + conjugate(Vobs))*exp(-2.0*I*pi*(l*u + m*v))
What I'd like is to replace the expanded term (-S*exp(-2.0*I*pi*(l*u + m*v)) + Vobs) by Vres and contract the two conjugate terms into the more compact equivalent is:
4.0*pi*S*u*im(Vres*exp(2.0*I*pi*(l*u + m*v)))
I cannot see how to do this with sympy. This problem is bad for the first derivative (grad) but get really out of hand with the second derivative (hess).
First of all, let's not use 1j in SymPy, it's a float and floats are bad for symbolic math. SymPy's imaginary unit is I. So,
Vres = Vobs - S * exp(- I * 2 * pi * (u*l+v*m))
To replace the expression Vres by a symbol, we first need to create such a symbol. I'm going to call it Vres0, but its name will be Vres, so it prints as "Vres" in formulas.
Vres0 = symbols('Vres')
g1 = grad[1].subs(Vres, Vres0).conjugate().subs(Vres, Vres0).conjugate()
The conjugate-substitute-conjugate back is needed because subs doesn't quite recognize the possibility of replacing the conjugate of an expression with the conjugate of the symbol.
Now g1 is
-2*I*pi*S*Vres*u*exp(2*I*pi*(l*u + m*v)) + 2*I*pi*S*u*exp(-2*I*pi*(l*u + m*v))*conjugate(Vres)
and we want to fold the sum of conjugate terms. I use a custom transformation rule for this: the rule fold_conjugates applies to every sum (Add) of two terms (len(f.args) == 2) where the second is a conjugate of the first (f.args[1] == f.args[0].conjugate()). The transformation it performs: replace the sum by twice the real part of first argument (2*re(f.args[0])). Like so:
from sympy.core.rules import Transform
fold_conjugates = Transform(lambda f: 2*re(f.args[0]),
lambda f: isinstance(f, Add) and len(f.args) == 2 and f.args[1] == f.args[0].conjugate())
g = g1.xreplace(fold_conjugates)
Final result: 4*pi*S*u*im(Vres*exp(2*I*pi*(l*u + m*v))).
I have this function to reach a certain 1 dimensional value accelerated and damped with overshoot. That is: given an inital value, a velocity and a acceleration (force/mass), the target value is attained by accelerating to it and gets increasingly damped while getting closer to the target value.
This all works fine, howver If i want to know what the TotalAngle is after time 't' I have to run this function say N steps with a 'small' dt to find the 'limit'.
I was wondering If i can (and how) to intergrate over dt so that the TotalAngle can be determined given a time 't' initially.
Regards, Tanks for any help.
dt = delta time step per frame
input = 1
TotalAngle = 0 at t=0
Velocity = 0 at t=0
void FAccelDampedWithOvershoot::Update(float dt, float input, float& Velocity, float& TotalAngle)
{
const float Force = 500000.f;
const float DampForce = 5000.f;
const float MaxAngle = 45.f;
const float InvMass = 1.f / 162400.f;
float target = MaxAngle * input;
float ratio = (target - TotalAngle) / MaxAngle;
float fMove = Force * ratio;
float fDamp = -Velocity * DampForce;
Velocity += (fMove + fDamp) * invMass * dt;
TotalAngle += Velocity * dt;
}
Updated with fixed bugs in math
Originally I've lost mass and MaxAngle a few times. This is why you should first solve it on a paper and then enter to the SO rather than trying to solve it in the text editor.
Anyway, I've fixed the math and now it seems to work reasonably well. I put fixed solution just over previous one.
Well, this looks like a Newtonian mechanics which means differential equations. Let's try to solve them.
SO is not very friendly to math formulas and I'm a bit bored to type characters so here is what I use:
F = Force
Fd = DampForce
MA = MaxAngle
A= TotalAngle
v = Velocity
m = 1 / InvMass
' for derivative i.e. something' is 1-st derivative of something by t and something'' is 2-nd derivative
if I divide you last two lines of code by dt and merge in all the other lines I can get (I also assume that input = 1 as other case is obviously symmetrical)
v' = ([F * (1 - A / MA)] - v * Fd) / m
and applying A' = v we get
m * A'' = F(1 - A/MA) - Fd * A'
or moving to one side we get a simple 2-nd order differential equation
m * A'' + Fd * A' + F/MA * A = F
IIRC, the way to solve it is to first solve characteristic equation which here is
m * x^2 + Fd * x + F/MA = 0
x[1,2] = (-Fd +/- sqrt(Fd^2 - 4*F*m/MA))/ (2*m)
I expect that part under sqrt i.e. (Fd^2 - 4*F*m/MA) is negative thus solution should be of the following form. Let
Dm = Fd/(2*m)
K = sqrt(F/MA/m - Dm^2)
(note the negated value under sqrt so it works now) then
A(t) = e^(-Dm*t) * [P * sin(K*t) + Q * cos(K*t)] + C
where P, Q and C are some constants.
The solution is easier to find as a sum of two solutions: some specific solution for
m * A'' + Fd * A' + F/MA * A = F
and a general solution for homogeneou
m * A'' + Fd * A' + F/MA * A = 0
that makes original conditions fit. Obviously specific solution A(t) = MA works and thus C = MA. So now we need to fit P and Q of general solution to match starting conditions. To find them we need
A(0) = - MA
A'(0) = V(0) = 0
Given that e^0 = 1, sin(0) = 0 and cos(0) = 1 you get something like
Q = -MA
P = 0
or
P = 0
Q = - MA
C = MA
thus
A(t) = MA * [1 - e^(-Dm*t) * cos(K*t)]
where
Dm = Fd/(2*m)
K = sqrt(F/MA/m - Dm^2)
which kind of makes sense given your task.
Note also that this equation assumes that everything happens in radians rather than degrees (i.e. derivative of [sin(t)]' is just cos(t)) so you should transform all your constants accordingly or transform the solution.
const float Force = 500000.f * M_PI / 180;
const float DampForce = 5000.f * M_PI / 180;
const float MaxAngle = M_PI_4;
which on my machine produces
Dm = 0.000268677541
K = 0.261568546
This seems to be similar to original funcion is I step with dt = 0.01f and the main obstacle seems to be precision loss because of float
Hope this helps!
This is not a full answer and I am sure someone else can work it out, but there is no room in the comments and it may help you find a better solution.
The image below shows the velocity (blue) as your function integrates at time steps 1. The red shows the function below that calculates the value for time t
The function F(t)
F(t) = sin((t / f) * pi * 2) * (1 / (((t / f) + a) ^ c)) * b
With f = 23.7, a = 1.4, c = 2, and b= 50 that give the red plot in the image above
All the values are just approximation.
f determines the frequency and is close to a match,
a,b,c control the falloff in amplitude and are a by eye guestimate.
If it does not matter that you have a perfect match then this will work for you. totalAngle uses the same function but t has 0.25 added to it. Unfortunately I did not get any values for a,b,c for totalAngle and I did notice that it was offset so you will have to add the offset value d (I normalised everything so have no idea what the range of totalAngle was)
Function F(t) for totalAngle
F(t) = sin(((t+0.25) / f) * pi * 2) * (1 / ((((t+0.25) / f) + a) ^ c)) * b + d
Sorry only have f = 23.7, c= 2, a~1.4 nothing for b=? d=?
I have this runge kutta code. However, one mentioned my approach is wrong. And I couldn't really understand why from him, so anyone here, who could give a hint on why this way is wrong?
Vector3d r = P.GetAcceleration();
Vector3d s = P.GetAcceleration() + 0.5*m_dDeltaT*r;
Vector3d t = P.GetAcceleration() + 0.5*m_dDeltaT*s;
Vector3d u = P.GetAcceleration() + m_dDeltaT*t;
P.Velocity += m_dDeltaT * (r + 2.0 * (s + t) + u) / 6.0);
====EDIT====
Vector3d are storing the coordinates, x, y, z.
The GetAcceleration returns the acceleration for each x, y, and z.
You have some acceleration function
a(p,q) where p=(x,y,z) and q=(vx,vy,vz)
Your order 1 system that can be solved via RK4 is
dotp = q
dotq = a(p,q)
The stages of the RK method involve an offset of the state vector(s)
k1p = q
k1q = a(p,q)
p1 = p + 0.5*dt*k1p
q1 = q + 0.5*dt*k1q
k2p = q1
k2q = a(p1,q1)
p2 = p + 0.5*dt*k2p
q2 = p + 0.5*dt*k2q
k3p = q2
k3q = a(p2,q2)
etc. You can either adjust the state vectors of the point P for each step, saving the original coordinates, or use a temporary copy of P to compute k2, k3, k4.
You haven't defined your methods, but the thing that's jumping out at me is you're mixing your results with your inputs. Since Runge-Kutta is a method for calculating y_(n+1) = y_n + hsum(b_ik_i), I would expect your solution to keep your _n terms on the right, and your (n+1) terms on the left. This is NOT what you're doing. Instead, s(n+1) is dependent on r_(n+1) instead of on r_n, t_(n+1) on s_(n+1), and so on. This smells of an error where you attempted to limit the number of variables being used.
With that in mind, can you indicate the actual intermediate values of the calculations your program generates and compare them with the intended intermediate values?
How can I rewrite the following pseudocode in C++?
real array sine_table[-1000..1000]
for x from -1000 to 1000
sine_table[x] := sine(pi * x / 1000)
I need to create a sine_table lookup table.
You can reduce the size of your table to 25% of the original by only storing values for the first quadrant, i.e. for x in [0,pi/2].
To do that your lookup routine just needs to map all values of x to the first quadrant using simple trig identities:
sin(x) = - sin(-x), to map from quadrant IV to I
sin(x) = sin(pi - x), to map from quadrant II to I
To map from quadrant III to I, apply both identities, i.e. sin(x) = - sin (pi + x)
Whether this strategy helps depends on how much memory usage matters in your case. But it seems wasteful to store four times as many values as you need just to avoid a comparison and subtraction or two during lookup.
I second Jeremy's recommendation to measure whether building a table is better than just using std::sin(). Even with the original large table, you'll have to spend cycles during each table lookup to convert the argument to the closest increment of pi/1000, and you'll lose some accuracy in the process.
If you're really trying to trade accuracy for speed, you might try approximating the sin() function using just the first few terms of the Taylor series expansion.
sin(x) = x - x^3/3! + x^5/5! ..., where ^ represents raising to a power and ! represents the factorial.
Of course, for efficiency, you should precompute the factorials and make use of the lower powers of x to compute higher ones, e.g. use x^3 when computing x^5.
One final point, the truncated Taylor series above is more accurate for values closer to zero, so its still worthwhile to map to the first or fourth quadrant before computing the approximate sine.
Addendum:
Yet one more potential improvement based on two observations:
1. You can compute any trig function if you can compute both the sine and cosine in the first octant [0,pi/4]
2. The Taylor series expansion centered at zero is more accurate near zero
So if you decide to use a truncated Taylor series, then you can improve accuracy (or use fewer terms for similar accuracy) by mapping to either the sine or cosine to get the angle in the range [0,pi/4] using identities like sin(x) = cos(pi/2-x) and cos(x) = sin(pi/2-x) in addition to the ones above (for example, if x > pi/4 once you've mapped to the first quadrant.)
Or if you decide to use a table lookup for both the sine and cosine, you could get by with two smaller tables that only covered the range [0,pi/4] at the expense of another possible comparison and subtraction on lookup to map to the smaller range. Then you could either use less memory for the tables, or use the same memory but provide finer granularity and accuracy.
long double sine_table[2001];
for (int index = 0; index < 2001; index++)
{
sine_table[index] = std::sin(PI * (index - 1000) / 1000.0);
}
One more point: calling trigonometric functions is pricey. if you want to prepare the lookup table for sine with constant step - you may save the calculation time, in expense of some potential precision loss.
Consider your minimal step is "a". That is, you need sin(a), sin(2a), sin(3a), ...
Then you may do the following trick: First calculate sin(a) and cos(a). Then for every consecutive step use the following trigonometric equalities:
sin([n+1] * a) = sin(n*a) * cos(a) + cos(n*a) * sin(a)
cos([n+1] * a) = cos(n*a) * cos(a) - sin(n*a) * sin(a)
The drawback of this method is that during this procedure the round-off error is accumulated.
double table[1000] = {0};
for (int i = 1; i <= 1000; i++)
{
sine_table[i-1] = std::sin(PI * i/ 1000.0);
}
double getSineValue(int multipleOfPi){
if(multipleOfPi == 0) return 0.0;
int sign = 1;
if(multipleOfPi < 0){
sign = -1;
}
return signsine_table[signmultipleOfPi - 1];
}
You can reduce the array length to 500, by a trick sin(pi/2 +/- angle) = +/- cos(angle).
So store sin and cos from 0 to pi/4.
I don't remember from top of my head but it increased the speed of my program.
You'll want the std::sin() function from <cmath>.
another approximation from a book or something
streamin ramp;
streamout sine;
float x,rect,k,i,j;
x = ramp -0.5;
rect = x * (1 - x < 0 & 2);
k = (rect + 0.42493299) *(rect -0.5) * (rect - 0.92493302) ;
i = 0.436501 + (rect * (rect + 1.05802));
j = 1.21551 + (rect * (rect - 2.0580201));
sine = i*j*k*60.252201*x;
full discussion here:
http://synthmaker.co.uk/forum/viewtopic.php?f=4&t=6457&st=0&sk=t&sd=a
I presume that you know, that using a division is a lot slower than multiplying by decimal number, /5 is always slower than *0.2
it's just an approximation.
also:
streamin ramp;
streamin x; // 1.5 = Saw 3.142 = Sin 4.5 = SawSin
streamout sine;
float saw,saw2;
saw = (ramp * 2 - 1) * x;
saw2 = saw * saw;
sine = -0.166667 + saw2 * (0.00833333 + saw2 * (-0.000198409 + saw2 * (2.7526e-006+saw2 * -2.39e-008)));
sine = saw * (1+ saw2 * sine);