Related
I'm using the following algorithm to solve a cubic polynomial equation (x^3 + ax^2 + bx + c = 0):
function find_roots(a, b, c, lower_bound, upper_bound)
implicit none
real*8, intent(in) :: a, b, c, lower_bound, upper_bound
real*8 :: find_roots
real*8 :: Q, R, theta, x, Au, Bu
integer :: i, iter
Q = (a**2 - 3.D0*b)/9.D0
R = (2.D0*a**3 - 9.D0*a*b + 27.D0*c)/54.D0
!If roots are all real, get root in range
if (R**2.lt.Q**3) then
iter = 0
theta = acos(R/sqrt(Q**3))
!print *, "theta = ", theta
do i=-1,1
iter = iter+1
x = -2.D0*sqrt(Q)*cos((theta + dble(i)*PI*2.D0)/3.D0)-a/3.D0
!print *, "iter = ", iter, "root = ", x
if ((x.ge.lower_bound).and.(x.le.upper_bound)) then
find_roots = x
return
end if
end do
!Otherwise, two imaginary roots and one real root, return real root
else
Au = -sign(1.D0, R)*(abs(R)+sqrt(R**2-Q**3))**(1.D0/3.D0)
if (Au.eq.0.D0) then
Bu = 0.D0
else
Bu = Q/Au
end if
find_roots = (Au+Bu)-a/3.D0
return
end if
end function find_roots
Now it turns out that it can be shown analytically that a cubic equation with the following inputs:
Q0 = 1.D0
alpha = 1.D-2
dt = 0.00001D0
Y = 1000000.D0
find_roots(-(2.D0*Q0+Y), &
-(alpha-Q0**2-2.D0*Y*Q0+dt/2.D0*alpha), &
(dt/2.D0*alpha*Q0+Y*alpha-Y*Q0**2), &
Q0-sqrt(alpha), &
Q0+sqrt(alpha)))
MUST have a root between Q0+sqrt(alpha) and Q0-sqrt(alpha). This is a mathematical certainty. However, the function as called above will return 0, not the correct root, due to floating-point error, since the required result is very close to Q0+sqrt(alpha). I've confirmed this by creating a new function which uses quadruple precision. Unfortunately, I can't just always use quadruple precision since this function will be called billions of times and is a performance bottleneck.
So my question is, are there any general ways I could re-write this code to reduce these precision errors, while also maintaining the performance? I tried using the algorithm suggested by wikipedia, but the problem actually got worse.
https://www.cliffsnotes.com/study-guides/algebra/algebra-ii/factoring-polynomials/sum-or-difference-of-cubes
This should reduce rounding error.
Likewise, you should be able to find a much better grouping of terms, where you don't make the compiler guess what you want,
https://en.wikipedia.org/wiki/Horner%27s_method
alpha-Q0**2-2.D0*Y*Q0+dt/2.D0*alpha /= (alpha+alpha*.5*dt)-Q0*(Q0+2*Y)
You might argue that any good optimizer should know what to do with .5dt vs. dt/2. ifort considers that a part of -no-prec-div even though it can't change roundoff.
It's up to you whether you choose single precision constants for readability after checking to make sure that the promotion rules cause them to promote exactly to double. It seems particularly bad style to depend on f77 D0 suffix to choose the same data type as the never-standard real*8; no doubt it does if your compiler doesn't complain.
There is something wrong with the accuracy of your calculations, either the calculation of a,b,c or the find_roots function estimates.
I used the a,b,c that are calculated and found that your lower_bound and upper_bound were better estimates of the roots.
I then modified the bounds to be +/- sqrt(alpha)*1.1 so that the range test would work for 64-bit.
I also simplified constants that promote exactly to double.
Finally I compared your estimate of the root to the fn (0.9d0) and fn (1.1d0), which shows the find_roots function does not work for the a,b,c provided.
You should check your references for the error or it may just be the approach fails when acos (+/- 1.0 ) is used.
The program I used to test this with lots of prints is:
real*8 function find_roots (a, b, c, lower_bound, upper_bound)
implicit none
real*8, intent(in) :: a, b, c, lower_bound, upper_bound
real*8 :: Q, R, theta, x, Au, Bu, thi
integer :: i, iter
real*8 :: two_pi ! = 8 * atan (1.0d0)
Q = (a**2 - 3.*b)/9.
R = (2.*a**3 - 9.*a*b + 27.*c)/54.
two_pi = 8 * atan (1.0d0)
!If roots are all real, get root in range
if (R**2 < Q**3) then
iter = 0
x = R/sqrt(Q**3)
theta = acos(x)
print *, "theta = ", theta, x
do i=-1,1
iter = iter+1
!! x = -2.D0*sqrt(Q)* cos((theta + dble(i)*PI*2.D0)/3.D0) - a/3.D0
thi = (theta + i*two_pi)/3.
x = -2.*sqrt(Q) * cos (thi) - a/3.
!print *, "iter = ", iter, "root = ", x
if ( (x >= lower_bound) .and. (x <= upper_bound) ) then
find_roots = x
print *, "find_roots = ", x
! return
end if
end do
!Otherwise, two imaginary roots and one real root, return real root
else
Au = -sign(1.D0, R)*(abs(R)+sqrt(R**2-Q**3))**(1.D0/3.D0)
if (Au.eq.0.D0) then
Bu = 0.D0
else
Bu = Q/Au
end if
find_roots = (Au+Bu)-a/3.D0
return
end if
end function find_roots
real*8 function get_cubic (x, a, b, c)
implicit none
real*8, intent(in) :: x, a, b, c
get_cubic = ( ( x + a) * x + b ) * x + c
end function get_cubic
! Now it turns out that it can be shown analytically that a cubic equation with the following inputs:
real*8 Q0, alpha, dt, Y, a, b, c, lower_bound, upper_bound, val, fn
real*8, external :: find_roots, get_cubic
!
Q0 = 1.D0
alpha = 1.0D-2
dt = 0.00001D0
Y = 1000000.0D0
!
a = -(2.*Q0 + Y)
b = -(alpha - Q0**2 - 2.*Y*Q0 + dt/2.*alpha)
c = (dt/2.*alpha*Q0 + Y*alpha - Y*Q0**2)
write (*,*) a,b,c
!
lower_bound = Q0-sqrt(alpha)*1.1
upper_bound = Q0+sqrt(alpha)*1.1
write (*,*) lower_bound, upper_bound
!
val = find_roots (a, b, c, lower_bound, upper_bound)
!
fn = get_cubic ( val, a,b,c )
write (*,*) val, fn
!
! Test the better root values
val = 0.9d0
fn = get_cubic ( val, a,b,c )
write (*,*) val, fn
!
val = 1.1d0
fn = get_cubic ( val, a,b,c )
write (*,*) val, fn
end
program main
implicit none
double precision x0, a, b
x0 = -3
call findzero(x0)
contains
function f(x)
implicit none
double precision x,f
f = SIN(x) - exp(-x)
end function
function fprime(x)
implicit none
double precision fprime, x
fprime = COS(x) + exp(-x)
end function
subroutine findbracket(x0,a,b)
implicit none
double precision x0, a, b
double precision fa, fb
double precision dx
dx = 0.001d0
a = x0
b = x0
do
fa = f(a)
fb = f(b)
!print*,"bracket[", a, b, "]"
a = a - dx
if (fa*fb < 0) then
exit
end if
b = b + dx
if (fa*fb < 0) then
exit
end if
dx = dx*2
end do
end subroutine
subroutine findzero(x0)
implicit none
double precision x0, a, b
double precision p, tol
while (x0 <11) do
x0 = -3
print*, "what is x0", x0
call findbracket(x0,a,b)
call newtonbisection (p,a,b,tol)
print*, "this is x0, x", x0, p
x0 = x0 + 1
end do
end subroutine
Using this subroutine I am trying to solve for the zeros of
f(x) = sinx - e^-x
with x0 = -3, -2, ..., 10.
I have working subroutines findbracket and newtonbisection. Using findbracket I intend to get [a,b] and using those a and b, I want to find p for each x0.
When I compile, my code falls into an infinite loop, I tried it with commenting out
call newtonbisection (p,a,b,tol)
so I assume the problem is with findbracket. But if I just use the subroutine to find [a,b] with x0 = -3, it works but not when I combine them.
Why do I have an infinite loop?
Let's narrow this loop down:
do while (x0 < 11)
x0 = -3
call findbracket(x0, a, b)
x0 = x0 + 1
end do
This is a loop that gets executed as long as x0 is smaller than eleven. Inside the loop, you explicitly set x0 to minus 3, then call a subroutine that uses x0, finally increment x0
I didn't see any location inside the findbracket routine where the value of x0 would be changed, so it will still be -3 at the end of the call. Then it gets incremented by one, so it's now -2, which is still smaller than 11, so the loop starts again. The first thing it does is to reset x0 to -3, and the whole spiel starts at the same point again.
So the condition x0 < 11 will always be true, since in every iteration, x0 will get reset to -3. The loop will never exit. Your program hangs.
You need to move the x0 = -3 to before the loop, like this:
x0 = -3
do while (x0 < 11)
call findbracket(x0, a, b)
x0 = x0 + 1
end do
That way, x0 won't get reset to -3 in every iteration of the loop and will eventually become larger than 11 so that the loop ends.
An "infinite" for loop is achieved as follows:
do
** Some executable statements
if (conditional_statement) exit
end do
In brief, a set of statements are executed repetitively until the conditional statement is satisfied, that is,
conditional_statement .eqv. .true.
When this occurs, program control jumps to the next executable statement immediately following the do construct.
The gfortran compiler gives wrong answer, when I run a parallel program using OpenMP. In the same time, ifort gives exact result.
This is the whole compilable code.
!_______________________________________________________________ !
!____________MODULE SECTION_____________________________________ !
MODULE MATRIC
IMPLICIT NONE
INTEGER , PARAMETER :: NG = 40
DOUBLE PRECISION,SAVE :: Z , PA , PB ,CMU
DOUBLE PRECISION , PARAMETER :: PI=2.0D0*ACOS(0.0D0) , &
FPI=4.0D0*PI , SQFPI = SQRT(FPI), DLAM=1.0D0
DOUBLE PRECISION , DIMENSION(450) :: DEL1, DEL2, X, R , SNLO
DOUBLE PRECISION :: XG(60) , WG(60)
END MODULE MATRIC
!_________________________________________________________________________!
! MODULE SECTION
!__________________________________________________________________________!
MODULE POTDATA
IMPLICIT NONE
INTEGER :: IA , IB , ID
DOUBLE PRECISION :: RA , RB , R1s(450)
END MODULE POTDATA
!__________________________________________________________________________!
!______________________________________________________________________!
program check
use matric
use potdata
implicit double precision(a-h,o-z)
pa = 0.72D0 ; pb = 0.19D0
mesh = 441 ; noint= 40 ; z = 2.0d0
CALL GAULEG(-1.d0,1.d0)
NB = MESH/NOINT
I = 1
X(I) = 0.0D+00
DELTAX = 0.0025D+00*40.0D+00/DBLE(NOINT)
DO J=1,NB
IMK = (J-1)*NOINT + 1
DO K=1,NOINT
AK=K
I=I+1
X(I)=X(IMK)+AK*DELTAX
END DO
DELTAX=2.0D+00*DELTAX
END DO
CMU=(9.0D00*PI*PI/(128.0D00*Z))**THIRD
R(1)=0.0D+00 ; SNLO(1) = 0.D00
DO I=2,MESH
R(I)=CMU*X(I)
SNLO(I) = R(I)*dexp(-Z*R(I))
R1S(I) = SNLO(I)/(SQFPI*R(I))
END DO
call EFFPOT(MESH,NOINT)
end program check
subroutine EFFPOT(MESH,NOINT)
USE OMP_LIB
USE MATRIC
USE POTDATA
implicit none
integer, intent(in) :: MESH, NOINT
double precision::anorm(450)
double precision, external :: funct
double precision :: asum, fac, cnorm
!$omp parallel do default(none) private(del1,ia,asum,ib,ra,rb,fac) &
!$omp shared(id,mesh,r,anorm,NOINT,del2,R1s)
do ia = 2,mesh
ra = r(ia)
if(R1s(ia).lt.1.D-7.and.R1s(ia).ge.1.D-8)id = ia
do ib = 2,mesh
rb = r(ib)
call QGAUSS(funct,-1.d0,1.d0,fac)
del1(ib) = rb**2*fac*R1s(ib)**2
end do
CALL NCDF(del1,ASUM,r(2),mesh,NOINT)
anorm(ia) = 2.0d0*pi*asum
del2(ia) = 2.0d0*pi*asum*(ra*R1s(ia))**2
end do
!$omp end parallel do
CALL NCDF(del2,ASUM,r(2),mesh,NOINT)
cnorm = 1.0/dsqrt(4.*pi*ASUM)
write(6,*)'cnorm =',cnorm
return
end
double precision function funct(x)
USE POTDATA , ONLY : RA , RB
USE MATRIC , ONLY : PA , PB , DLAM
implicit none
double precision, intent(in) :: x
double precision :: f1, f2, ramrb
ramrb = dsqrt(ra**2+rb**2-2.d0*ra*rb*x)
f1 = dcosh(pa*ra)+dcosh(pa*rb)
f2 = 1.d0+0.5*dlam*ramrb*dexp(-pb*ramrb)
funct = (f1*f2)**2
return
end
SUBROUTINE QGAUSS(func,aa,bb,ss)
USE OMP_LIB
USE MATRIC , ONLY : XG ,WG , NG
IMPLICIT DOUBLE PRECISION(A-H,O-Z)
external func
xm = 0.5d0*(bb+aa)
xl = 0.5d0*(bb-aa)
ss = 0.d0
do j=1,ng
dx = xl*xg(j)
ss = ss + wg(j)*(func(xm+dx)+func(xm-dx))
end do
ss = xl*ss/2.0
return
END
SUBROUTINE GAULEG(x1,x2)
USE MATRIC , ONLY : XG ,WG ,NG , PI
IMPLICIT DOUBLE PRECISION(A-H,O-Z)
eps = 1.d-14
m = (ng+1)/2
xm = 0.5D0*(x1+x2)
xl = 0.5D0*(x2-x1)
do i=1,m
z = dcos(pi*(dfloat(i)-0.25d0)/(dfloat(ng)+0.5d0))
1 continue
p1 = 1.d0
p2 = 0.d0
do j=1,ng
p3 = p2
p2 = p1
p1 = ((2.d0*dfloat(j)-1.d0)*z*p2 &
- (dfloat(j)-1.d0)*p3)/dfloat(j)
end do
pp = dfloat(ng)*(z*p1-p2)/(z*z-1.d0)
z1 = z
z = z1 - p1/pp
if (dabs(z-z1).gt.eps) go to 1
xg(i) = xm - xl*z
xg(ng+1-i) = xm + xl*z
wg(i) = 2.d0*xl/((1.d0-z*z)*pp*pp)
wg(ng+1-i) = wg(i)
end do
return
end
SUBROUTINE NCDF(F,ASUM,H,KKK,NOINT)
IMPLICIT DOUBLE PRECISION (A-H,O-Z)
DIMENSION F(450)
NBLOCK = (KKK-2)/NOINT + 1
C2HO45 = 2.0D+00*H/45.0D+00
ASUM = 0.0D+00
DO J=1,NBLOCK
ISTAR = NOINT*(J-1)+5
IEND = NOINT*J + 1
IEND = MIN0(KKK,IEND)
DO I=ISTAR,IEND,4
ASUM = ASUM + C2HO45*(7.0D+00*(F(I-4)+F(I)) &
+32.0D+00*(F(I-3)+F(I-1)) + 12.0D+00*F(I-2))
END DO
IF(IEND.EQ.KKK) GO TO 4
C2HO45 = 2.0D+00*C2HO45
4 END DO
RETURN
END
Thanks everybody specially #Vladimir who has taken interest in my problem. Finally i got the right answer by removing ra and rb from the module potdata and defined function as funct(x, ra, rb) and then removing ra and rb from the loop. Because i was writing ra, rb then reading their values in the above code so loop was having flow dependence. Now i get exact result from both compiler (which is 8.7933767516) parallelly, sequentially both. Exact way is this
subroutine EFFPOT(MESH,NOINT)
USE OMP_LIB
USE MATRIC
USE POTDATA
implicit none
integer, intent(in) :: MESH, NOINT
double precision::anorm(450)
double precision, external :: funct
double precision :: asum, fac, cnorm
!$omp parallel do default(none) private(del1,ia,asum,ib,fac) &
!$omp shared(id,mesh,r,anorm,NOINT,del2,R1s)
do ia = 2,mesh
if(R1s(ia).lt.1.D-7.and.R1s(ia).ge.1.D-8)id = ia
do ib = 2,mesh
call QGAUSS(funct,-1.d0,1.d0,fac,r(ia),r(ib))
del1(ib) = r(ib)**2*fac*R1s(ib)**2
end do
CALL NCDF(del1,ASUM,r(2),mesh,NOINT)
anorm(ia) = 2.0d0*pi*asum
del2(ia) = 2.0d0*pi*asum*(r(ia)*R1s(ia))**2
end do
!$omp end parallel do
CALL NCDF(del2,ASUM,r(2),mesh,NOINT)
cnorm = 1.0/dsqrt(4.*pi*ASUM)
write(6,*)'cnorm =',cnorm
return
end
double precision function funct(x,ra,rb)
USE MATRIC , ONLY : PA , PB , DLAM
implicit none
double precision, intent(in) :: x, ra, rb
double precision :: f1, f2, ramrb
ramrb = dsqrt(ra**2+rb**2-2.d0*ra*rb*x)
f1 = dcosh(pa*ra)+dcosh(pa*rb)
f2 = 1.d0+0.5*dlam*ramrb*dexp(-pb*ramrb)
funct = (f1*f2)**2
return
end
SUBROUTINE QGAUSS(func,aa,bb,ss,ra,rb)
USE OMP_LIB
USE MATRIC , ONLY : XG ,WG , NG
IMPLICIT DOUBLE PRECISION(A-H,O-Z)
external func
xm = 0.5d0*(bb+aa)
xl = 0.5d0*(bb-aa)
ss = 0.d0
do j=1,ng
dx = xl*xg(j)
ss = ss + wg(j)*(func(xm+dx,ra,rb)+func(xm-dx,ra,rb))
end do
ss = xl*ss/2.0
return
END
The cause of your problem is that the OpenMP standard does not specify what happens if a private list item is accessed in the region but outside of the construct. See example private.2f (found on page 135 of the OpenMP standard supplement) for a short version of the same problem.
Specifically, the module variables ra and rb are declared private in the OpenMP parallel region inside EFFPOT and also accessed from funct. funct is in the dynamic scope of the parallel region but (lexically) outside of it and therefore it is not specified whether ra and rb referenced by funct are the original module variables or their private copies (most compilers would go for the original variables).
You have already found one of the solutions. The other one would be to declare ra and rb threadprivate since they are only used to pass data from EFFPOT to funct:
MODULE POTDATA
IMPLICIT NONE
INTEGER :: IA , IB , ID
DOUBLE PRECISION :: RA , RB , R1s(450)
!$OMP THREADPRIVATE(RA,RB)
END MODULE POTDATA
You should then also remove ra and rb from the list of the private clause of the parallel region within EFFPOT.
On some platforms, e.g. OS X, using threadprivate with GCC (i.e. gfortran) could be slower than actually passing around the two variables as arguments because of the emulated TLS.
Note that this semantic error is really hard to detect and many OpenMP tools won't actually spot it.
First of all, it is very difficult to say something specific without seeing the actual code. However, I do have some comments on your situation and the conclusions you are drawing.
The fact that your program runs fine both in parallel and sequential execution when compiled with "ifort" doesn't mean that your program is correct. Since compiler bugs leading to programs giving wrong answers are very rare, but on the other hand manual parallel programming is very error-prone, we should assume a problem with the way you parallelized your code. We are probably talking about a race condition.
And no, the problem you are having is not at all unusual. When you have a race condition, it happens often that the sequential execution works everywhere and your parallel execution works in some environments and fails in others. It's even common that your code gives different answers every time you call it (not only depending on the compiler, but on many other factors that can change over time).
What I suggest you should do, is to get a parallel debugger, like for example TotalView that will help you keep track of the various threads and their states. Try to find a simple test environment (as few threads as possible) that fails reliably.
I read about statement functions, such as the example:
C(F) = 5.0*(F - 32.0)/9.0
Isn't this the same as:
C = 5.0*(F - 32.0)/9.0
i.e. without the function part, or maybe I'm missing something?
If they're not the same, when do I need to use a statement function?
C = 5.0*(F - 32.0)/9.0
is just assignment to a variable C, it can be anywhere and is evaluated once every time when the program flow reaches it.
C(F) = 5.0*(F - 32.0)/9.0
is a statement function, and can be evaluated any time it is in the scope by, e.g., C(100) which returns approximately 37.8.
From some code
xx(i) = dx*i
f(a) = a*a
do i = 1, nx
x = xx(i)
print *, f(x)
end do
The f(x) in the print statement is evaluated with each new value of x and yields a new value. The value of x is also result of evaluation of the statement function xx on the previous line.
But statement functions are now (in Fortran 95) declared obsolete. Better use internal functions in any new code. E.g.,
program p
implicit none
!declarations of variables x, i, nx, dx
do i = 1, nx
x = xx(i)
print *, f(x)
end do
contains
real function xx(i)
integer, intent(in) :: i
xx = dx*i
end function
real function f(a)
real, intent(in) :: a
f = a*a
end function
end program
I'm having some troubles to calcule the integral of e^x inside and interval [b.a] using fortran.
I think I'm doing something wrong in the funcion calls. Thanks for helping me.
program trapezium
implicit none
integer :: i, n, b, a
real :: sumation, mean, deltax, f(i), integral
! The value of the integral using the trapezium rule can be found using
! integral = (b - a)*((f(a) +f(b))/2 + sumation_1_n-1 )/n
write(*,*) "type the limits b, a and the number of intervals"
read *, b, a, n
deltax = (b - a)/n
mean = (f(a) + f(b))/2
sumation = 0
do i = 1, n-1
sumation = sumation + f(i)
end do
integral = deltax*(mean + sumation)
write (*,*) "the value of the integral using the trapezoidal method is", integral
end program
function f(x)
real :: f(x)
integer :: x
f(x) = EXP(x)
end function
There are a couple of issues with your code:
f is a function, but at the same time you define an array f(i)
When defining an array of fixed size, the size has to be known at compile time. So real :: f(i) is only valid for a constant i
exp() expects a real variable, not an integer
Integer arithmetic might lead to unexpected results: 1/2 = 0 and not 0.5!
What about (This does not try to fix the maths, though - see my comment):
module functions
contains
function f(x)
implicit none
real :: f
integer,intent(in) :: x
f = EXP(real(x))
end function
end module
program trapezium
use functions
implicit none
integer :: i, n, b, a
real :: sumation, mean, deltax, integral
! The value of the integral using the trapezium rule can be found using
! integral = (b - a)*((f(a) +f(b))/2 + sumation_1_n-1 )/n
write(*,*) "type the limits b, a and the number of intervals"
read *, b, a, n
deltax = real(b - a)/real(n)
mean = (f(a) + f(b))/2
sumation = 0
do i = 1, n-1
sumation = sumation + f(i)
end do
integral = deltax*(mean + sumation)
write (*,*) "the value of the integral using the trapezoidal method is", integral
end program
Note that the use of modules enables the compiler to check for the arguments of the function. Additionally, you do not need to define the return value of the function in the main program.