I'm trying to work on a problem in Fortran using MPI, and I am getting an intermittent bug with clearly incorrect values appearing. The bug seems to occur when I use MPI_REDUCE.
I've pared my code down to as short a segment as possible, with the error still happening. This segment of code is pretty useless aside from its strange behaviour. Try as I might, I can't isolate it any further.
I don't understand the behaviour of this code - for example, if I remove the subroutine at the top (which is never invoked), the bug appears to go away. If I allocate the arrays using real,dimension(10,10) when I am declaring them, the bug appears to go away, although I don't think my current allocates are incorrect. Even if I change some of the variable names in this, the bug appears to go away. None of these are telling me why the bug exists, or how to fix it in my longer code project.
It seems like either I have failed to correctly allocate memory somewhere, or I am using MPI_REDUCE incorrectly, but I can't find the problem.
subroutine foo()
use netcdf
integer :: iret,ncid
iret = nf90_open('test.nc',nf90_nowrite,ncid) !open the mask file
iret = nf90_close(ncid) !close the mask file
return
end subroutine foo
program test
use mpi
integer :: ierr,pid
real :: diffsum,total_sum
real,allocatable,dimension(:,:) :: c,h,h_old
call MPI_INIT(ierr)
total_sum = 0.0
call MPI_COMM_RANK(MPI_COMM_WORLD,pid,ierr)
if(pid.ne.0) then
allocate(h (10,10))
allocate(h_old(10,10))
h(:,:) = 1.0
h_old(:,:) = 1.0
allocate(c(10,10))
c = h_old - h
diffsum = 0.0
endif
call MPI_REDUCE(diffsum,total_sum,1,MPI_REAL,mpi_sum,0,MPI_COMM_WORLD,ierr) !to get overall threshold
if(pid.eq.0)then
print*,'sum',total_sum
endif
call MPI_FINALIZE(ierr)
end program test
The value printed should always be 0, but sometimes other values appear.
Here is an example of the outputs from 10 runs:
sum -3.66304099E+25
sum 0.00000000
sum 0.00000000
sum -3.01998057E+29
sum 0.00000000
sum 0.00000000
sum 0.00000000
sum 0.00000000
sum 0.00000000
sum 0.00000000
Thank you for any ideas!
Related
I'm generally new to Fortran, and I have a project in which my professor wants the class to try to find pi. To do this he wants us to create our own arctan subroutine and use this specific equation: pi = 16*arctan(1/5) - 4*arctan(1/239).
Because the professor would not let me use the built in ATAN function, I made a subroutine that approximates it:
subroutine arctan(x,n,arc)
real*8::x, arc
integer::n, i
real*8::num, nm2
arc = 0.0
do i=1,n,4
num = i
nm2 = num+2
arc = arc+((x**num)/(num)) - (x**(nm2)/(nm2))
enddo
end subroutine arctan
This subroutine is based off of the Taylor series for arctan approximation, and seemed to work perfectly because I tested it by calling this.
real*8:: arc=0.0, approx
call arctan(1.d0,10000000,arc)
approx = arc*4
I called this from my main program which should return pi and I got
approx = 3.1415926335902506
which is close enough for me. The problem occurs when I try to do
pi = 16*arctan(1/5) - 4*arctan(1/239). I tried this:
real*8:: first, second
integer:: n=100
call arctan((1.d0/5.d0), n, arc)
first = 16*arc
call arctan((1.d0/239.d0), n, arc)
second = 4.d0*arc
approx = first - second
and somehow approx = 1.67363040082988898E-002, which is obviously not pi. arc resets with every call of the arctan subroutine so I know that isn't the problem. I think the problem is in how I'm calling the subroutine before I declare first and second, but I don't know what I could do to improve them.
What am I doing wrong?
EDIT:
I actually solved the problem and the actual problem was just fortran decided that it did not want to do approx = first - second
and was making it so that approx == second I have no idea why, but I solved the problem by replacing that statement with the following:
approx = (second-first)
approx = approx *(-1)
and as stupid as it looks, it works perfectly now, with a result of 3.1415926535897940!
The problem results from different types (single/double precsion) of the
variable arc in the call of arctan and the implementation of the subroutine. The iteration count of 10000... is way too much and may cause numerical problems, just 100 is more than enough (and much faster...).
Tip: always use implicit none for all progs and procedures. Here the compiler would have immediately told you that you forgot to declare arc...
Just make it double precision in the main program and you will get the desired answer.
On internet, I found this program that demonstrate Evaluating elliptic integrals of first and second kinds (complete)
implicit none
real*8 e,e1,e2,xk
integer i, n
e=1.d-7
print *,' K K(K) E(K) STEPS '
print *,'------------------------------------------'
xk=0.d0
do i = 1, 20
call CElliptic(e,xk,e1,e2,n)
write(*,50) xk,e1,e2,n
xk = xk + 0.05d0
end do
print *,'1.00 INFINITY 1.0000000 0'
stop
50 format(' ',f4.2,' ',f9.7,' ',f9.7,' ',i2)
end
Complete elliptic integral of the first and second kind. The input parameter is xk, which should be between 0 and 1. Technique uses Gauss' formula for the arithmogeometrical mean. e is a measure of the convergence accuracy. The returned values are e1, the elliptic integral of the first kind, and e2, the elliptic integral of the second kind.
Subroutine CElliptic(e,xk,e1,e2,n)
! Label: et
real*8 e,xk,e1,e2,pi
real*8 A(0:99), B(0:99)
integer j,m,n
pi = 4.d0*datan(1.d0)
A(0)=1.d0+xk ; B(0)=1.d0-xk
n=0
if (xk < 0.d0) return
if (xk > 1.d0) return
if (e <= 0.d0) return
et n = n + 1
! Generate improved values
A(n)=(A(n-1)+B(n-1))/2.d0
B(n)=dsqrt(A(n-1)*B(n-1))
if (dabs(A(n)-B(n)) > e) goto et
e1=pi/2.d0/A(n)
e2=2.d0
m=1
do j = 1, n
e2=e2-m*(A(j)*A(j)-B(j)*B(j))
m=m*2
end do
e2 = e2*e1/2.d0
return
end
I have compiled it but I have received the following errors:
gfortran -Wall -c "gauss.f" (nel direttorio: /home/pierluigi/Scrivania)
gauss.f:53.9:
50 format(' ',f4.2,' ',f9.7,' ',f9.7,' ',i2)
1
Error: Invalid character in name at (1)
gauss.f:83.72:
if (dabs(A(n)-B(n)) > e) goto et
1
Warning: Deleted feature: Assigned GOTO statement at (1)
gauss.f:83.35:
if (dabs(A(n)-B(n)) > e) goto et
1
Error: ASSIGNED GOTO statement at (1) requires an INTEGER variable
gauss.f:48.18:
write(*,50) xk,e1,e2,n
1
Error: FORMAT label 50 at (1) not defined
Compilation failed.
Any suggestions please?
EDIT
I have read all your answers and thanks to you I managed to compile the program. I also have another curiosity and I do not know whether to write another question. In the meantime I modify this question. In my program, xk is increased by 0.05. Now I will that the program to read data from a file containing: the minimum value of xk; the maximum value of xk; the number of intervals. I thought:
open (10,file='data/test')
read (10,*) xkmi, xkma
read (10,*) nk
close (10)
lkmi = dlog(xkmi)
lkma = dlog(xkma)
ldk = (lkma-lkmi)/dfloat(nk-1)
In addition, the program must be modified in such a way that the result is written to another file. How can I change the rest of the program? Thank you very much.
Your source code file extension is f which, I think (check the documentation), tells gfortran that the file contains fixed source form. Until Fortran 90 Fortran was still written as if onto punched cards and the location of various bits and pieces of a line is confined to certain columns. A statement label, such as 50 in the first of the error messages, had to be in columns 1 - 6. Two solutions:
Make sure the label is in (some of) those columns. Or, better
Move to free source form, perhaps by changing the file extension to f90, perhaps by using a compilation option (check your documentation).
The error raised by the goto et phrase is, as your compiler has told you, an example of a deleted feature, in which the goto jumps to a statement whose label is provided at run-time, ie the value of et. Either tell your compiler (check ...) to conform to an old standard, or modernise your source.
Fix those errors and, I suspect, the other error messages will disappear. They are probably raised as a consequence of the compiler not correctly parsing the source after the errors.
Because the file has type ".f" gfortan is interpreting it as fixed-source layout. Trying compiling with the free-form layout by using compiler option -ffree-form and see if that works. This probably explains the error about the "invalid character". That statement not being recognized explains the "format not defined error". The "computed goto" is obsolete but valid Fortran. You can ignore that warning. If you wish, later you can modernize the code. For the remaining error, for the "assigned goto", declare "et" as an integer.
I would just do this
10 n = n + 1
! Generate improved values
A(n)=(A(n-1)+B(n-1))/2.d0
B(n)=dsqrt(A(n-1)*B(n-1))
if (dabs(A(n)-B(n)) > e) goto 10
and possibly compile as free form source as others have shown. The label et seems weird and non-standard, possibly a rare vendor extension.
You could also change the lines above to a do-loop with an exit statement (Fortran 90).
(The program compiled for me after the change).
I tested the subroutine and compared with matlab and it was not the same. It is very similar to the algorithm used in Abramowitz's book. Here is the one I wrote that works well, just for comparing.
subroutine CElliptic(m,K,E)
implicit none
real*8 m,alpha,E,K,A,B,C,A_p,B_p,C_0,pi,suma
integer j,N
N=100
alpha=asin(sqrt(m))
pi = 4.d0*datan(1.d0)
A_p=1.0
B_p=cos(alpha)
C_0=sin(alpha)
suma=0.0
do j=1,N
A=(A_p+B_p)/2.0d0
B=dsqrt(A_p*B_p)
C=(A_p-B_p)/2.0d0
suma=suma+2**(j)*C**2
A_p=A
B_p=B
end do
K=pi/(2*A)
E=(1-1.d0/2.d0*(C_0**2+suma))*K
end Subroutine CElliptic
best regards
Ed.
I am quite new in Fortran, and just got the program from a PhD. It is used to count the number of beads in certain histograms. Here is the code:
program xrdf
implicit none
include 'currentconf.fi'
real drdf,rdf12(200)
real xni12, Zface
integer ibead,iconf,ii,io,i,j,k,linecount
integer mchains, iendbead, nstart
logical ifend
Zface=1.5
mchains=49
drdf=0.1
xni12=0.
io=10
nstart=12636
open(file='pcushion.tr.xmol',unit=io)
do i=1,200
rdf12(i)=0.0
end do
ifend=.false.
do iconf=1,1000000
! reading current frame
ii=iconf
call readconf(io,ii,linecount,ifend)
write(*,*)' conf ',iconf,' N=',n
if (ifend) go to 777
! if trajectory ended, exit loop
ibead=0
do i=1,mchains
iendbead=nstart+i*45
dz=abs(Zface-z(iendbead))
ii=int(dz/drdf)+1
rdf12(ii)=rdf12(ii)+1
xni12=xni12+1.0
end do
end do !iconf
777 write(*,*)' total ',iconf-1,' frames '
write(*,*)' r rho(z) '
do i=1,200
write(*,'(f10.4,e15.7)')(i-0.5)*drdf,rdf12(i)/xni12
end do
close(io)
stop
end
Because I really do not know which part is wrong, so I just past all the code here. When I compile this program, there comes an error:
i=int(dz/drdf)+1
1
Error: Incompatible ranks 0 and 1 in assignment at (1)
How can I edit the program to fix it?
I was able to reproduce your compiler error using a simple program. It seems likely that in
ii=int(dz/drdf)+1
you are trying to assign an array (maybe dz?) to an integer (ii).
integer ibead,iconf,ii,io,i,j,k,linecount
Compare the dimensions of ii (dimension is 1) with the dimensions of dz and drdf.
This is my program (compiled it using gfortran):
PROGRAM TEST
implicit none
integer dz(10),ii
real dy
dz=3
dy=2.0
ii=int(dz/dy)+1
END PROGRAM TEST
Using ifort the error message is more revealing:
error #6366: The shapes of the array expressions do not conform
Im updating a program in fortran to run with MPI and have run into an issue with the rank not showing up properly. In the beginning of this subroutine I call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr) and it returns the proper rank until this point:
DO IY=2,NY+1
DO IX=2,NX+1
D(IX,IY)=(h_roms(IX,IY)+zeta(IX,IY))*maskr(IX,IY)
call mpi_barrier(mpi_comm_world,ierr)
write(out,12) rank,ix,iy
12 format('disappearing?',i3,'ix:',i3,'iy',i3)
ENDDO
ENDDO
NY and NX are 124,84 respectively and the rank prints properly until iy becomes 125, and ix is 3. after that it only prints out as *** . IT still prints out everything twice (running on 2 processors) but the rank isn't valid, or giving any errors. Ive tried calling MPI_COMM_RANK after the do loop and still nothing. Any ideas would be much appreciated.
Fortran generally prints a sequence of asterisks, your ***, if a numeric output field is too small to contain the number you are trying to write into it. Try changing some of the i3s in the format statement to i6 or even i0; this last form tells the compiler to print an integer in a field wide enough for all its digits but no wider.
I have a small program that read some data from binary file and stores it into normal (unformatted) files. Here is the source:
Program calki2e
IMPLICIT NONE
!
DOUBLE PRECISION VAL
INTEGER P,Q,R,S
INTEGER IREC2C
PARAMETER( IREC2C=15000)
INTEGER AND,RSHIFT,LABEL,IMBABS,NX,IB,NFT77
INTEGER IND
DIMENSION IND(IREC2C)
DOUBLE PRECISION XP
DIMENSION XP(IREC2C)
CHARACTER(LEN=12) :: FN77 = 'input08'
CONTINUE
NFT77=77
!----------------------------------------------------------------------
2 CONTINUE
c
open(unit=NFT77,file=FN77,STATUS='OLD',
+ACCESS='SEQUENTIAL',FORM='UNFORMATTED')
open(unit=13,file='calki2e.txt')
REWIND(77)
4100 continue
READ(77) NX,IND,XP
IMBABS=IABS(NX)
DO 100 IB=1,IMBABS
LABEL=IND(IB)
P= AND(RSHIFT(LABEL, 24),255)
Q= AND(RSHIFT(LABEL, 16),255)
R= AND(RSHIFT(LABEL, 8),255)
S= AND( LABEL ,255)
VAL=XP(ib)
IF(P.EQ. Q) VAL=VAL+VAL
IF(R .EQ. S) VAL=VAL+VAL
IF((P .EQ. R).AND.(Q .EQ. S)) VAL=VAL+VAL
write(13,*)P,Q,R,S,val
100 CONTINUE
IF (NX.GT.0) GOTO 4100
CRB
CLOSE(UNIT=NFT77)
!
END
When I compile it using gfortran I obtain double precision in output file but with g77 I get only single precision. What it wrong and how to change it?
Do you mean the "write (13, *) statement. This is "list directed" output. It is a convenience I/O with few rules -- what you get will depend upon the compiler -- it is best used for debugging and "quick and dirty" programs. To reliably get all the digits of double precision, change to a formatted output statement, specifying the number of digits that you need. (It is probably best to switch to gfortran anyway, as g77 is no longer under development.)
your numbers are double precision but you are printing them in free format. You have to specify an explicit format
If you want to keep your code F77, try something like
write(13,1000) P,Q,R,S,val
1000 format(1X,4I7,1X,1E14.10)
The "1X"s mean one space, "4I7" means four seven-width integers, and 1E14.10 means one fourteen-charater width scientific-notation real number with 10 significant digits. Feel free to mess around with the numbers to get it to look right.
This is a pretty good tutorial on the topic.
I would be tempted to set the format on your write statement to something explicit, rather than use * in write(13,*)P,Q,R,S,val.