How do I diagnose bus error in fortran - fortran

I'm learning how to use fortran to do some data analysis. I'm working through the following example:
program linalg
implicit none
real :: v1(3), v2(3), m(3,3)
integer :: i,j
v1(1) = 0.25
v1(2) = 1.2
v1(3) = 0.2
! use nested do loops to initialise the matrix
! to the unit matrix
do i=1,3
do j=1,3
m(i,j) = 0.0
end do
m(i,j) = 1.0
end do
! do a matrix multiplicationof a vector equivalent to v2i = mij v1j
do i = 1,3
v2(i) = 0.0
do j = 1,3
v2(i) = v2(i) + m(i,j)*v1(j)
end do
end do
write(*,*) 'v2 = ', v2
end program linalg
which I execute with
f95 -o linalg linalg.f90
./linalg
However, I get the following message in the terminal:
Bus error
Some links that I've followed online suggest that this is to do with not having pre-define a variable, but I am sure that I have in this script and cannot find where the error is coming from. Is there another reason I would be getting this error?

Your mistake is in here
do i=1,3
do j=1,3
m(i,j) = 0.0
end do
m(i,j) = 1.0 ! here be a dragon
end do
Fortran is explicit in stating that after the end of a loop the value of the index variable is 1 greater than the value it had on the last iteration of the loop. So in this case the statement m(i,j) = 1.0 will try to address m(1,4) at the first go round, then m(2,4), and so forth.
Sometimes you get 'lucky' with an attempt to write outside the bounds of an array and the write stays inside the address space of the process you are working in. 'Lucky' in the sense that your program is wrong but doesn't crash -- this crash is a much better situation to be in. The bus error suggests that the compiler has generated an address to write to that lies in forbidden territory for any process.
You could have found this yourself by turning on 'run-time bounds checking' with your compiler. Your compiler's documentation, or other Qs and As here on SO, will tell you how to do that.
I'll leave it to you to fix this as you wish, you show every sign of being able to figure it out now you know the rules.

Related

MPI_Bcast non-root nodes not receiving all data

So, I'm currently writing a small scale code that does output using CGNS and adaptive mesh refinement (AMR) with Amrex. This is all being done with Fortran 95, though CGNS is C with Fortran interfaces and Amrex is C++ with Fortran interfaces (those are not in the sample code). I'm using OpenMPI 1.10.7.
This will eventually go into a full CFD code, but I wanted to test it small scale to work the bugs out before putting in the larger code. The program below seems to works every time, but it was originally a subroutine that did not.
I'm having an issue where not all of the data from MPI_Bcast is being received by every process... sometimes. I can hit execute on the same code, twice in a row and sometimes is bombs out (segfault from CGNS elsewhere in the code, and sometimes it works. As far as I can tell, the program bombs when not all of the data from MPI_Bcast is received in time to start work elsewhere. Despite MPI_wait and MPI_barrier, the writes at the bottom in the subroutine will spit out junk on lvl=1 for the last six indices of all the arrays. Printing information to the screen seems to help, but more processors seem to lower the likelihood of the code working.
I've currently got it as MPI_ibcast with an MPI_wait, but I've also tried MPI_Bcast with MPI_barriers after. Changing the communicator from one defined by Amrex to MPI_COMM_WORLD doesn't help.
...
program so_bcast
!
!
!
!
use mpi
implicit none
integer :: lvl,i,a,b,c,ier,state(MPI_STATUS_SIZE),d
integer :: n_elems,req,counter,tag,flavor
integer :: stat(MPI_STATUS_SIZE)
integer :: self,nprocs
type :: box_zones
integer,allocatable :: lower(:,:),higher(:,:),little_zones(:)
double precision,allocatable :: lo_corner(:,:),hi_corner(:,:)
integer :: big_zones
integer,allocatable :: zone_start(:),zone_end(:)
end type
type(box_zones),allocatable :: zone_storage(:)
call MPI_INIT(ier)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ier)
call MPI_COMM_RANK(MPI_COMM_WORLD,self,ier)
lvl = 1
! Allocate everything, this is done elsewhere in the actual code, but done here
! for simplification reasons
allocate(zone_storage(0:lvl))
zone_storage(0)%big_zones = 4
zone_storage(1)%big_zones = 20
do i = 0,lvl
allocate(zone_storage(i)%lower(3,zone_storage(i)%big_zones))
allocate(zone_storage(i)%higher(3,zone_storage(i)%big_zones))
allocate(zone_storage(i)%lo_corner(3,zone_storage(i)%big_zones))
allocate(zone_storage(i)%hi_corner(3,zone_storage(i)%big_zones))
zone_storage(i)%lower = self
zone_storage(i)%higher = self*2+1
zone_storage(i)%lo_corner = self*1.0D0
zone_storage(i)%hi_corner = self*1.0D0+1.0D0
allocate(zone_storage(i)%zone_start(0:nprocs-1))
allocate(zone_storage(i)%zone_end(0:nprocs-1))
zone_storage(i)%zone_start(self) = zone_storage(i)%big_zones/nprocs*self+1
zone_storage(i)%zone_end(self) = zone_storage(i)%zone_start(self)+zone_storage(i)%big_zones/nprocs-1
if (zone_storage(i)%zone_end(self)>zone_storage(i)%big_zones) zone_storage(i)%zone_end(self) = zone_storage(i)%big_zones
end do
do i = 0,lvl
write(*,*) 'lower check 0',self,'lower',zone_storage(i)%lower
write(*,*) 'higher check 0',self,'high',zone_storage(i)%higher
write(*,*) 'lo_corner check 0',self,'lo_corner',zone_storage(i)%lo_corner
write(*,*) 'hi_corner check 0',self,'hi_corner',zone_storage(i)%hi_corner
write(*,*) 'big_zones check 0',self,'big_zones',zone_storage(i)%big_zones
write(*,*) 'zone start/end 0',self,'lvl',i,zone_storage(i)%zone_start,zone_storage(i)%zone_end
end do
!
! Agglomerate the appropriate data to processor 0 using non-blocking receives
! and blocking sends
!
do i = 0,lvl
do a = 0,nprocs-1
call mpi_bcast(zone_storage(i)%zone_start(a),1,&
MPI_INT,a,MPI_COMM_WORLD,ier)
call mpi_bcast(zone_storage(i)%zone_end(a),1,&
MPI_INT,a,MPI_COMM_WORLD,ier)
end do
end do
call MPI_BARRIER(MPI_COMM_WORLD,ier)
counter = 0
do i = 0,lvl
n_elems = 3*zone_storage(i)%big_zones
write(*,*) 'number of elements',n_elems
if (self == 0) then
do a = 1,nprocs-1
do c = zone_storage(i)%zone_start(a),zone_storage(i)%zone_end(a)
tag = c*100000+a*1000+1!+d*10
call mpi_irecv(zone_storage(i)%lower(1:3,c),3,MPI_INT,a,&
tag,MPI_COMM_WORLD,req,ier)
tag = tag + 1
call mpi_irecv(zone_storage(i)%higher(1:3,c),3,MPI_INT,a,&
tag,MPI_COMM_WORLD,req,ier)
tag = tag +1
call mpi_irecv(zone_storage(i)%lo_corner(1:3,c),3,MPI_DOUBLE_PRECISION,a,&
tag,MPI_COMM_WORLD,req,ier)
tag = tag +1
call mpi_irecv(zone_storage(i)%hi_corner(1:3,c),3,MPI_DOUBLE_PRECISION,a,&
tag,MPI_COMM_WORLD,req,ier)
end do
end do
else
do b = zone_storage(i)%zone_start(self),zone_storage(i)%zone_end(self)
tag = b*100000+self*1000+1!+d*10
call mpi_send(zone_storage(i)%lower(1:3,b),3,MPI_INT,0,&
tag,MPI_COMM_WORLD,ier)
tag = tag + 1
call mpi_send(zone_storage(i)%higher(1:3,b),3,MPI_INT,0,&
tag,MPI_COMM_WORLD,ier)
tag = tag + 1
call mpi_send(zone_storage(i)%lo_corner(1:3,b),3,MPI_DOUBLE_PRECISION,0,&
tag,MPI_COMM_WORLD,ier)
tag = tag +1
call mpi_send(zone_storage(i)%hi_corner(1:3,b),3,MPI_DOUBLE_PRECISION,0,&
tag,MPI_COMM_WORLD,ier)
end do
end if
end do
write(*,*) 'spack'
!
call mpi_barrier(MPI_COMM_WORLD,ier)
do i = 0,lvl
write(*,*) 'lower check 1',self,'lower',zone_storage(i)%lower
write(*,*) 'higher check 1',self,'high',zone_storage(i)%higher
write(*,*) 'lo_corner check 1',self,'lo_corner',zone_storage(i)%lo_corner
write(*,*) 'hi_corner check 1',self,'hi_corner',zone_storage(i)%hi_corner
write(*,*) 'big_zones check 1',self,'big_zones',zone_storage(i)%big_zones
write(*,*) 'zone start/end 1',self,'lvl',i,zone_storage(i)%zone_start,zone_storage(i)%zone_end
end do
!
! Send all the data out to all the processors
!
do i = 0,lvl
n_elems = 3*zone_storage(i)%big_zones
req = 1
call mpi_ibcast(zone_storage(i)%lower,n_elems,MPI_INT,&
0,MPI_COMM_WORLD,req,ier)
call mpi_wait(req,stat,ier)
write(*,*) 'spiffy'
req = 2
call mpi_ibcast(zone_storage(i)%higher,n_elems,MPI_INT,&
0,MPI_COMM_WORLD,req,ier)
call mpi_wait(req,stat,ier)
req = 3
call mpi_ibcast(zone_storage(i)%lo_corner,n_elems,MPI_DOUBLE_PRECISION,&
0,MPI_COMM_WORLD,req,ier)
call mpi_wait(req,stat,ier)
req = 4
call mpi_ibcast(zone_storage(i)%hi_corner,n_elems,MPI_DOUBLE_PRECISION,&
0,MPI_COMM_WORLD,req,ier)
call mpi_wait(req,stat,ier)
call mpi_barrier(MPI_COMM_WORLD,ier)
end do
write(*,*) 'lower check 2',self,'lower',zone_storage(lvl)%lower
write(*,*) 'higher check 2',self,'high',zone_storage(lvl)%higher
write(*,*) 'lo_corner check ',self,'lo_corner',zone_storage(lvl)%lo_corner
write(*,*) 'hi_corner check ',self,'hi_corner',zone_storage(lvl)%hi_corner
write(*,*) 'big_zones check ',self,'big_zones',zone_storage(lvl)%big_zones
call MPI_FINALIZE(ier)
end program
...
As I said, this code works, but the larger version does not always work. OpenMPI throws several warnings akin to this:
mca: base: component_find: ess "mca_ess_pmi" uses an MCA interface that is not recognized (component MCA v2.1.0 != supported MCA v2.0.0) -- ignored
mca: base: component_find: grpcomm "mca_grpcomm_direct" uses an MCA interface that is not recognized (component MCA v2.1.0 != supported MCA v2.0.0) -- ignored
mca: base: component_find: rcache "mca_rcache_grdma" uses an MCA interface that is not recognized (component MCA v2.1.0 != supported MCA v2.0.0) -- ignored
etc. etc. But the program can still complete even with those warnings.
-Is there a way to ensure that MPI_bcast has emptied its buffer into the correct region of memory before moving on? It seems to miss this sometimes.
-Is there a different/better method to distribute the data? The sizes have to be able to vary unlike the test program.
Thank you ahead of time.
The most straightforward answer was to use MPI_allgatherv. As little as I wanted to mess with displacements, it was the best setup to share the information and reduce overall code length.
I believe a MPI_waitall solution would work too, as the data was not being fully received before being broadcast.

How to make a file with i number of rows

I'm using this code below to generate a file with i number (i=200 for instance) of rows, but the first and the second rows are fixed and I just want to create another i rows using a random extraction in a sphere of unitary radius in Fortran. Each row should start with m1 and then 6 random numbers between a range [0,1].
program InputGen
implicit none
integer :: i,n,j
character(len=500) :: firstLine, secondLine
real *8 :: m1
real:: r(10)
m1=1.d0
firstLine='3 5 1.d-4 1.d5 0.e-3 0.0 1. 0.1 0.e0 1'
secondLine='4.d6 0. 0. 0. 0. 0. 0. '
call random_seed()
call random_number(r)
open(unit=100, file='INPUT.TXT',Action='write', Status='replace')
write(100,'(A)') trim(firstLine)
write(100,'(A)') trim(secondLine)
do i=1,200
write(100,'(A)') '',m1,' ',(r(n),n=1,10),
' ',(r(n),n=1,10),'0.0',&
' ',(r(n),n=1,10),&
' ',(r(n),n=1,10),'0.0'
end do
write(*,*) 'Input file generated.'
write(*,*) 'Press Enter to exit...'
read(*,*)
end program InputGen
The first and second lines create perfectly, but the other rows in the loop not.
You did not tell us what is wrong (how the problem exhibits), but I suspect the format is incorrect. You are just specifying (A), but you have a mixed output list with strings and numbers.
You can just use a simple general format like (*(g0)) that will apply the generic g0 format to all items in the input list. You will want to add some manual spaces (although you already have some) in the input list to avoid joining two unrelated output items on the line.
Or you can just follow the input list and add a specific format for each item, like (a,1x,f12.6,a,10(f12.6,1x),a ... and so on. Adjust as needed, especially the spaces (either keep them as ' ' with the a descriptor, or use the 1x descriptor.
You are also currently writing the same r all the time. You should generate more numbers and re-generate them an each loop iteration
real:: r(40)
do i=1,200
call random_number(r)
write(100,'(*(g0))') '',m1,' ',r(1:10),
' ',r(11:20),'0.0',&
' ',r(21:30),&
' ',r(31:40),'0.0'
end do
This does not do anything with points in a sphere or anything similar, this just prints random numbers. I hope that is clear.

How to write the formatted matrix in a lines with fortran77?

Suppose I have the matrix c(i,j). I want to write it on the screen on oldest Fortran77 language with three signs after comma. I write
do i=1,N
write(*,"(F8.3)") ( c(i,j), j=1,N )
end do
but the output is in the form
c(1,1)
c(1,2)
...
c(1,10) c(2,1)
c(2,2)
...
Finally, I may simply write
do i=1,N
write(*,*) ( c(i,j), j=1,N )
end do
and then the output is like the matrix, but, of course, it is not formatted.
How to get the correct output in Fortran77?
An edit. It seems that one of solutions is to write
do i=1, N
do j=1, N
write(*,'(F9.3,A,$)') c(i,j), ' '
end do
write(*,*) ' '
end do
Your format only specifies a single float but you actually want to write N per line.
A fairly general solution for this simple case would be something like
program temp
implicit none
integer, parameter :: N=3
real, dimension(N,N) :: c
integer :: i,j
character(len=20) :: exFmt
c = 1.0
write(exFmt,'("(",I0,"(F8.3))")') N
do i=1,N
write(*,exFmt) (c(i,j), j=1,N)
end do
end program
This will make exFmt be '(3(F8.3))', which specifies printing three floats (note you probably really want '(3(F8.3," "))' to explicitly include some spacing).
Note some compilers will allow for exFmt to be just '(*(F8.3))'. This is part of the fortran 2008 specification so may not be provided by all compilers you have access to. See here for a summary of compiler support (see Unlimited format item, thanks to HighPerformanceMark for this)
Finally an easy bodge is to use a format statment like '(1000(F8.3))' where 1000 is larger than you will ever need.

Syntax error in function arguments in fortran

I have a sub routine file as follows
subroutine grids(Ngrids,gridsize,boundx,boundy,boundz,occmatrix,myid)
implicit NONE
integer i,j,k,Ngrids, occmatrix(14,14,10)
integer locx,locy,locz,myid
double precision gridsize,boundx,boundy,boundz
do i = 1, 14
do j = 1, 14
do k = 1, 10
occmatrix(i,j,k) = 0
enddo
enddo
enddo
open (13, file = 'grid_data.9deg')
write(*,'(A,i2)'),' READING GRID FILE ON PROC.....',myid
read(13,*) Ngrids,gridsize
read(13,*) boundx,boundy,boundz
do i = 1, Ngrids
read(13,*) locx, locy, locz
occmatrix(locx,locy,locz) = 1
enddo
close(13)
return
end
It gives the following syntax error in compiling
subroutine grids(Ngrids,gridsize,boundx,boundy,boundz,occmatrix,my
1
Error: Unexpected junk in formal argument list at (1)
It used to run well before
I would believe, your line is to long. Did you add a new argument? Your code looks like free form, but it might be the compiler tried to apply fixed form due to a .f suffix in the filename or something like that. Convince the compiler to assume free formatted source code (by compiler options or usually a .f90 suffix).
Even in free formatted files your line width is limited and you should break longer lines, which would for example look like:
subroutine grids( Ngrids,gridsize,boundx,boundy,boundz, &
& occmatrix,myid )
If you are stuck with fixed format you need to indicate a continuation line by a non blank character in column 6.
Here is how it looks like in fixed form:
subroutine grids(Ngrids,gridsize,boundx,boundy,boundz,
& occmatrix,myid)
Please do not use fixed form anymore! Instead, change your files to end with .f90, most compilers recognize this for free formatted code.

New to Fortran, questions about writing to file

I am completely new to Fortran and pretty new to programming in general. I am trying to compile a script someone else has written. This is giving me a few problems. The top half of the code is:
C
C Open direct-access output file ('JPLEPH')
C
OPEN ( UNIT = 12,
. FILE = 'JPLEPH',
. ACCESS = 'DIRECT',
. FORM = 'UNFORMATTED',
. RECL = IRECSZ,
. STATUS = 'NEW' )
C
C Read and write the ephemeris data records (GROUP 1070).
C
CALL NXTGRP ( HEADER )
IF ( HEADER .NE. 'GROUP 1070' ) CALL ERRPRT(1070,'NOT HEADER')
NROUT = 0
IN = 0
OUT = 0
1 READ(*,'(2i6)')NRW,NCOEFF
if(NRW .EQ. 0) GO TO 1
READ (*,'(3D26.18)',IOSTAT =IN) (DB(K),K=1,NCOEFF)
DO WHILE ( ( IN .EQ. 0 )
. .AND. ( DB(2) .LT. T2) )
IF ( 2*NCOEFF .NE. KSIZE ) THEN
CALL ERRPRT(NCOEFF,' 2*NCOEFF not equal to KSIZE')
ENDIF
C
C Skip this data block if the end of the interval is less
C than the specified start time or if the it does not begin
C where the previous block ended.
C
IF ( (DB(2) .GE. T1) .AND. (DB(1) .GE. DB2Z) ) THEN
IF ( FIRST ) THEN
C
C Don't worry about the intervals overlapping
C or abutting if this is the first applicable
C interval.
C
DB2Z = DB(1)
FIRST = .FALSE.
ENDIF
IF (DB(1) .NE. DB2Z ) THEN
C
C Beginning of current interval is past the end
C of the previous one.
CALL ERRPRT (NRW, 'Records do not overlap or abut')
ENDIF
DB2Z = DB(2)
NROUT = NROUT + 1
print*,'Out =', OUT
WRITE (12,REC=NROUT+2,IOSTAT=OUT) (DB(K),K=1,NCOEFF)
print*,'Out2 =', OUT
IF ( OUT .NE. 0 ) THEN
CALL ERRPRT (NROUT,
. 'th record not written because of error')
ENDIF
So, when I print "Out" and "Out2" to the screen I find that Out=0 and Out2=110. As it is not longer equal to zero, the program gives me an error. Therefore I am basically wondering about what is happening here:
WRITE (12,REC=NROUT+2,IOSTAT=OUT) (DB(K),K=1,NCOEFF)
I assume that 12 refers to the file I have opened (and created), and want to write to. What does the rest of the first brackets do? And what is the point of the second? Does that gives me the information I want to put in my file? In case, where does DB get filled with that?
Generally I am wondering what is going wrong? Why does OUT change value? (I need t
NCOEFF is defined as an Integer in the beginning of the programme, and DB: as DOUBLE PRECISION DB(3000), DB2Z/0.d0/ , so I assume DB is an array of some sort.
In this program, OUT is telling if the write statement was successful or not. (the IOSTAT parameter to the write statement means "I/O status", or input/output status). It returns 0 if the I/O operation was a success, or the number of the error code otherwise. You can find what the error codes mean here.
I'm not familiar with the REC parameter, but a starting place to investigate yourself can be found here.
To quote the handbook REC indicates the record number to be read or written. As advised, see the documentation which accompanies your compiler for further explanation.
(DB(K),K=1,NCOEFF) means 'all the elements in DB from 1 to NCOEFF. You are looking at an io-implied-do statement.
The statement
WRITE (12,REC=NROUT+2,IOSTAT=OUT) (DB(K),K=1,NCOEFF)
is, if memory serves me, called an "implied DO-loop". As written, it will write NCOEFF values from array DB, starting at DB(1).
It is called an implied DO-loop because the explicit form would be (in FORTRAN IV, for the ancients: I know it a lot better than the more modern variations) something along the lines of:
DO 10 K=1,NCOEFF
WRITE (12,REC=NROUT+2,IOSTAT=OUT) DB(K)
10 CONTINUE
(Pretend that the first two lines are indented six columns.) This is a DO-loop. The implied DO-loop form lets you put the "loop" right in the input/output statement.
What makes it useful is that you can have multiple arrays, and multiple loops. For a simple example:
WRITE (12,REC=NROUT+2,IOSTAT=OUT) (DB(K), DC(K), K=1,NCOEFF)
110 is the error code thrown by the WRITE call. You need to check your FORTRAN RTL (run-time library) reference. It should list the possible error codes. I think 110 means that you're trying to convert a double-precision value to an integer, but the value is bigger than you can store in an integer. Maybe dump the values in DB and see.