MPI_Bcast non-root nodes not receiving all data

MPI_Bcast non-root nodes not receiving all data - fortran

So, I'm currently writing a small scale code that does output using CGNS and adaptive mesh refinement (AMR) with Amrex. This is all being done with Fortran 95, though CGNS is C with Fortran interfaces and Amrex is C++ with Fortran interfaces (those are not in the sample code). I'm using OpenMPI 1.10.7.
This will eventually go into a full CFD code, but I wanted to test it small scale to work the bugs out before putting in the larger code. The program below seems to works every time, but it was originally a subroutine that did not.
I'm having an issue where not all of the data from MPI_Bcast is being received by every process... sometimes. I can hit execute on the same code, twice in a row and sometimes is bombs out (segfault from CGNS elsewhere in the code, and sometimes it works. As far as I can tell, the program bombs when not all of the data from MPI_Bcast is received in time to start work elsewhere. Despite MPI_wait and MPI_barrier, the writes at the bottom in the subroutine will spit out junk on lvl=1 for the last six indices of all the arrays. Printing information to the screen seems to help, but more processors seem to lower the likelihood of the code working.
I've currently got it as MPI_ibcast with an MPI_wait, but I've also tried MPI_Bcast with MPI_barriers after. Changing the communicator from one defined by Amrex to MPI_COMM_WORLD doesn't help.
...
program so_bcast
!
!
!
!
use mpi
implicit none
integer :: lvl,i,a,b,c,ier,state(MPI_STATUS_SIZE),d
integer :: n_elems,req,counter,tag,flavor
integer :: stat(MPI_STATUS_SIZE)
integer :: self,nprocs
type :: box_zones
integer,allocatable :: lower(:,:),higher(:,:),little_zones(:)
double precision,allocatable :: lo_corner(:,:),hi_corner(:,:)
integer :: big_zones
integer,allocatable :: zone_start(:),zone_end(:)
end type
type(box_zones),allocatable :: zone_storage(:)
call MPI_INIT(ier)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ier)
call MPI_COMM_RANK(MPI_COMM_WORLD,self,ier)
lvl = 1
! Allocate everything, this is done elsewhere in the actual code, but done here
! for simplification reasons
allocate(zone_storage(0:lvl))
zone_storage(0)%big_zones = 4
zone_storage(1)%big_zones = 20
do i = 0,lvl
allocate(zone_storage(i)%lower(3,zone_storage(i)%big_zones))
allocate(zone_storage(i)%higher(3,zone_storage(i)%big_zones))
allocate(zone_storage(i)%lo_corner(3,zone_storage(i)%big_zones))
allocate(zone_storage(i)%hi_corner(3,zone_storage(i)%big_zones))
zone_storage(i)%lower = self
zone_storage(i)%higher = self*2+1
zone_storage(i)%lo_corner = self*1.0D0
zone_storage(i)%hi_corner = self*1.0D0+1.0D0
allocate(zone_storage(i)%zone_start(0:nprocs-1))
allocate(zone_storage(i)%zone_end(0:nprocs-1))
zone_storage(i)%zone_start(self) = zone_storage(i)%big_zones/nprocs*self+1
zone_storage(i)%zone_end(self) = zone_storage(i)%zone_start(self)+zone_storage(i)%big_zones/nprocs-1
if (zone_storage(i)%zone_end(self)>zone_storage(i)%big_zones) zone_storage(i)%zone_end(self) = zone_storage(i)%big_zones
end do
do i = 0,lvl
write(*,*) 'lower check 0',self,'lower',zone_storage(i)%lower
write(*,*) 'higher check 0',self,'high',zone_storage(i)%higher
write(*,*) 'lo_corner check 0',self,'lo_corner',zone_storage(i)%lo_corner
write(*,*) 'hi_corner check 0',self,'hi_corner',zone_storage(i)%hi_corner
write(*,*) 'big_zones check 0',self,'big_zones',zone_storage(i)%big_zones
write(*,*) 'zone start/end 0',self,'lvl',i,zone_storage(i)%zone_start,zone_storage(i)%zone_end
end do
!
! Agglomerate the appropriate data to processor 0 using non-blocking receives
! and blocking sends
!
do i = 0,lvl
do a = 0,nprocs-1
call mpi_bcast(zone_storage(i)%zone_start(a),1,&
MPI_INT,a,MPI_COMM_WORLD,ier)
call mpi_bcast(zone_storage(i)%zone_end(a),1,&
MPI_INT,a,MPI_COMM_WORLD,ier)
end do
end do
call MPI_BARRIER(MPI_COMM_WORLD,ier)
counter = 0
do i = 0,lvl
n_elems = 3*zone_storage(i)%big_zones
write(*,*) 'number of elements',n_elems
if (self == 0) then
do a = 1,nprocs-1
do c = zone_storage(i)%zone_start(a),zone_storage(i)%zone_end(a)
tag = c*100000+a*1000+1!+d*10
call mpi_irecv(zone_storage(i)%lower(1:3,c),3,MPI_INT,a,&
tag,MPI_COMM_WORLD,req,ier)
tag = tag + 1
call mpi_irecv(zone_storage(i)%higher(1:3,c),3,MPI_INT,a,&
tag,MPI_COMM_WORLD,req,ier)
tag = tag +1
call mpi_irecv(zone_storage(i)%lo_corner(1:3,c),3,MPI_DOUBLE_PRECISION,a,&
tag,MPI_COMM_WORLD,req,ier)
tag = tag +1
call mpi_irecv(zone_storage(i)%hi_corner(1:3,c),3,MPI_DOUBLE_PRECISION,a,&
tag,MPI_COMM_WORLD,req,ier)
end do
end do
else
do b = zone_storage(i)%zone_start(self),zone_storage(i)%zone_end(self)
tag = b*100000+self*1000+1!+d*10
call mpi_send(zone_storage(i)%lower(1:3,b),3,MPI_INT,0,&
tag,MPI_COMM_WORLD,ier)
tag = tag + 1
call mpi_send(zone_storage(i)%higher(1:3,b),3,MPI_INT,0,&
tag,MPI_COMM_WORLD,ier)
tag = tag + 1
call mpi_send(zone_storage(i)%lo_corner(1:3,b),3,MPI_DOUBLE_PRECISION,0,&
tag,MPI_COMM_WORLD,ier)
tag = tag +1
call mpi_send(zone_storage(i)%hi_corner(1:3,b),3,MPI_DOUBLE_PRECISION,0,&
tag,MPI_COMM_WORLD,ier)
end do
end if
end do
write(*,*) 'spack'
!
call mpi_barrier(MPI_COMM_WORLD,ier)
do i = 0,lvl
write(*,*) 'lower check 1',self,'lower',zone_storage(i)%lower
write(*,*) 'higher check 1',self,'high',zone_storage(i)%higher
write(*,*) 'lo_corner check 1',self,'lo_corner',zone_storage(i)%lo_corner
write(*,*) 'hi_corner check 1',self,'hi_corner',zone_storage(i)%hi_corner
write(*,*) 'big_zones check 1',self,'big_zones',zone_storage(i)%big_zones
write(*,*) 'zone start/end 1',self,'lvl',i,zone_storage(i)%zone_start,zone_storage(i)%zone_end
end do
!
! Send all the data out to all the processors
!
do i = 0,lvl
n_elems = 3*zone_storage(i)%big_zones
req = 1
call mpi_ibcast(zone_storage(i)%lower,n_elems,MPI_INT,&
0,MPI_COMM_WORLD,req,ier)
call mpi_wait(req,stat,ier)
write(*,*) 'spiffy'
req = 2
call mpi_ibcast(zone_storage(i)%higher,n_elems,MPI_INT,&
0,MPI_COMM_WORLD,req,ier)
call mpi_wait(req,stat,ier)
req = 3
call mpi_ibcast(zone_storage(i)%lo_corner,n_elems,MPI_DOUBLE_PRECISION,&
0,MPI_COMM_WORLD,req,ier)
call mpi_wait(req,stat,ier)
req = 4
call mpi_ibcast(zone_storage(i)%hi_corner,n_elems,MPI_DOUBLE_PRECISION,&
0,MPI_COMM_WORLD,req,ier)
call mpi_wait(req,stat,ier)
call mpi_barrier(MPI_COMM_WORLD,ier)
end do
write(*,*) 'lower check 2',self,'lower',zone_storage(lvl)%lower
write(*,*) 'higher check 2',self,'high',zone_storage(lvl)%higher
write(*,*) 'lo_corner check ',self,'lo_corner',zone_storage(lvl)%lo_corner
write(*,*) 'hi_corner check ',self,'hi_corner',zone_storage(lvl)%hi_corner
write(*,*) 'big_zones check ',self,'big_zones',zone_storage(lvl)%big_zones
call MPI_FINALIZE(ier)
end program
...
As I said, this code works, but the larger version does not always work. OpenMPI throws several warnings akin to this:
mca: base: component_find: ess "mca_ess_pmi" uses an MCA interface that is not recognized (component MCA v2.1.0 != supported MCA v2.0.0) -- ignored
mca: base: component_find: grpcomm "mca_grpcomm_direct" uses an MCA interface that is not recognized (component MCA v2.1.0 != supported MCA v2.0.0) -- ignored
mca: base: component_find: rcache "mca_rcache_grdma" uses an MCA interface that is not recognized (component MCA v2.1.0 != supported MCA v2.0.0) -- ignored
etc. etc. But the program can still complete even with those warnings.
-Is there a way to ensure that MPI_bcast has emptied its buffer into the correct region of memory before moving on? It seems to miss this sometimes.
-Is there a different/better method to distribute the data? The sizes have to be able to vary unlike the test program.
Thank you ahead of time.

The most straightforward answer was to use MPI_allgatherv. As little as I wanted to mess with displacements, it was the best setup to share the information and reduce overall code length.
I believe a MPI_waitall solution would work too, as the data was not being fully received before being broadcast.

Related

Write in a new line on every write(33,*) comand in Fortran77

I create a fortran code to calculate the temperature of a cfd model. This code will be called on every iteration of a steady state simulation and calculate the temperature. On every calling of my code/iteration i want my fortran code also save the temperature field on a txt file and save. After calculating the temperature field and saving the values in TFIELD(100;1:6) the part with saving in txt file looks like:
OPEN(UNIT=35,FILE='W:\temperaturField.txt',
&FORM ='FORMATTED',STATUS='UNKNOWN',
&ACTION='READWRITE')
WRITE (35,FMT='5(F8.3,2X))') TFIELD(100,1:6)
With this code it only overwrites the first line of my txt file on every iteration. But i want to paste every TFIELD(100,1:6) array on a new line. How can i do this?

Add POSTITION='APPEND' to the OPEN:
OPEN(UNIT=35,FILE='W:\temperaturField.txt',
&FORM ='FORMATTED',STATUS='UNKNOWN',POSITION='APPEND',
&ACTION='READWRITE')

It seems like you are opening and closing the file for each iteration. This is a quick and dirty method if you need to debug, but it's slow.
If you want to do that, you might want to do what #Jack said: Include a POSITION='APPEND' to the OPEN statement to set the position to write the data to the end of the file. Also, you need to make sure that you close it every time.
A better (more efficient) method would be to keep the file open for the whole time. I'd do that with a module:
module temp_writer_module
implicit none
integer :: temp_writer_unit
logical :: is_opened = .FALSE.
private :: temp_writer_unit, is_opened
contains
subroutine temp_writer_open()
integer :: ios
character(len=100) :: iomsg
if (is_opened) then
print*, "Warning: Temperature file already openend"
return
end if
open(newunit=temp_writer_unit, file='W:\temperatureField', &
form='FORMATTED', status='UNKNOWN', action='WRITE', &
iostat=ios, iomsg=iomsg)
if (ios /= 0) then
print*, "Error opening temperature file:"
print*, trim(iomsg)
STOP
end if
is_opened = .TRUE.
end subroutine temp_writer_open
subroutine temp_writer_close()
if (.not. is_opened) return
close(temp_writer_unit)
is_opened = .FALSE.
end subroutine temp_writer_close
subroutine temp_writer(temps)
real, intent(in) :: temps(6)
integer :: ios
character(len=100) :: iomsg
if (.not. is_opened) call temp_writer_open()
write(temp_writer_unit, *, iostat=ios, iomsg=iomsg) temps
if (ios /= 0) then
print*, "Error writing to temperature file:"
print*, trim(iomsg)
end if
end subroutine temp_writer
end module temp_writer_module
Then you can use it in your program like this:
subroutine calc_temps(...)
use temp_writer_module
<variable declarations>
<calculations>
call temp_writer(tfield(100, 1:6))
end subroutine calc_temps
Just don't forget to call the temp_writer_close routine before your program ends.

How do I diagnose bus error in fortran

I'm learning how to use fortran to do some data analysis. I'm working through the following example:
program linalg
implicit none
real :: v1(3), v2(3), m(3,3)
integer :: i,j
v1(1) = 0.25
v1(2) = 1.2
v1(3) = 0.2
! use nested do loops to initialise the matrix
! to the unit matrix
do i=1,3
do j=1,3
m(i,j) = 0.0
end do
m(i,j) = 1.0
end do
! do a matrix multiplicationof a vector equivalent to v2i = mij v1j
do i = 1,3
v2(i) = 0.0
do j = 1,3
v2(i) = v2(i) + m(i,j)*v1(j)
end do
end do
write(*,*) 'v2 = ', v2
end program linalg
which I execute with
f95 -o linalg linalg.f90
./linalg
However, I get the following message in the terminal:
Bus error
Some links that I've followed online suggest that this is to do with not having pre-define a variable, but I am sure that I have in this script and cannot find where the error is coming from. Is there another reason I would be getting this error?

Your mistake is in here
do i=1,3
do j=1,3
m(i,j) = 0.0
end do
m(i,j) = 1.0 ! here be a dragon
end do
Fortran is explicit in stating that after the end of a loop the value of the index variable is 1 greater than the value it had on the last iteration of the loop. So in this case the statement m(i,j) = 1.0 will try to address m(1,4) at the first go round, then m(2,4), and so forth.
Sometimes you get 'lucky' with an attempt to write outside the bounds of an array and the write stays inside the address space of the process you are working in. 'Lucky' in the sense that your program is wrong but doesn't crash -- this crash is a much better situation to be in. The bus error suggests that the compiler has generated an address to write to that lies in forbidden territory for any process.
You could have found this yourself by turning on 'run-time bounds checking' with your compiler. Your compiler's documentation, or other Qs and As here on SO, will tell you how to do that.
I'll leave it to you to fix this as you wish, you show every sign of being able to figure it out now you know the rules.

Writing a scalar variable along an unlimited dimension in NetCDF

I'm trying to write a time variable from a hydrodynamic model into a netcdf file (unlimited dimension variable). I've attached a simplified code example in Fortran90 that highlights my issue.
The subroutine to write the netcdf file is called multiple times during a simulation depending on a user specified output interval (10 times for this example). I can create the file and add attributes for the first time the subroutine is called.
I can't get the start and count variables correct to write the time variable to the file during the subsequent calls of the subroutine. This is the error, at the writing the model time variable, I receive when trying to compile the code: Error: There is no specific function for the generic 'nf90_put_var'
PROGRAM test_netcdf
IMPLICIT NONE
INTEGER :: N
REAL :: time_step = 2.
! Call efdc_netcdf 10 times
DO N=1,10
CALL efdc_netcdf(N, time_step)
time_step=time_step + 1.
ENDDO
END PROGRAM test_netcdf
************************************
! Create NetCDF file and write variables
SUBROUTINE efdc_netcdf(N, time_step)
USE netcdf
IMPLICIT NONE
LOGICAL,SAVE::FIRST_NETCDF=.FALSE.
CHARACTER (len = *), PARAMETER :: FILE_NAME = "efdc_test.nc"
INTEGER :: ncid, status
INTEGER :: time_dimid
INTEGER :: ts_varid, time_varid
INTEGER :: start(1), count(1)
INTEGER :: deltat
INTEGER :: N
REAL :: time_step
start=(/N/)
count=(/1/)
! Create file and add attributes during first call of efdc_netcdf
IF(.NOT.FIRST_NETCDF)THEN
status=nf90_create(FILE_NAME, NF90_CLOBBER, ncid)
! Define global attributes once
status=nf90_put_att(ncid, NF90_GLOBAL, 'format', 'netCDF-3 64bit offset file')
status=nf90_put_att(ncid, NF90_GLOBAL, 'os', 'Linux')
status=nf90_put_att(ncid, NF90_GLOBAL, 'arch', 'x86_64')
! Define deltat variable
status=nf90_def_var(ncid,'deltat',nf90_int,ts_varid)
! Define model time dimension
status=nf90_def_dim(ncid,'efdc_time',nf90_unlimited,time_dimid)
! Define model time variable
status=nf90_def_var(ncid,'efdc_time',nf90_real,time_dimid,time_varid)
status=nf90_enddef(ncid)
! Put deltat during first call
deltat=7
status=nf90_put_var(ncid, ts_varid, deltat)
FIRST_NETCDF=.TRUE.
ENDIF
! Put model time variable
status=nf90_put_var(ncid, time_varid, time_step, start=start, count=count)
! Close file at end of DO loop
IF(N.EQ.10) THEN
status=nf90_close(ncid)
ENDIF
RETURN
END SUBROUTINE efdc_netcdf

The issue is in the line the compiler flags:
status=nf90_put_var(ncid, time_varid, time_step, start=start, count=count)
You are (correctly) trying to write a scalar variable, time_step, into a specific index (start) along variable time_varid, which is defined on a 1-d, infinite-extent dimension. However, in this case, the optional argument count isn't meaningful; you are writing the scalar, and count can only ever be 1. As a result, the fortran bindings for a nf90_put_var() taking a single scalar for input don't have the optional argument defined for count, and that's why you're getting the "no specific function for the generic' nf90_put_var" error from the compiler. This is all perfectly reasonable, but neither the error message nor the docs are super helpful in figuring out how to solve the problem.
You can fix your code by putting the time_step data into a real, dimension(1) variable, and putting that, instead; but easiest is to just get rid of the count specification, which isn't necessary here anyway:
status=nf90_put_var(ncid, time_varid, time_step, start=start)

MPI master unable to receive

I am using MPI in fortran for computation of my data. I verified by printing the data that, computations are being performed on the desired rang by each process just fine but, it the master is unable to collate the data.
Here is the code that I am trying to make it work:
EDIT: Created a tag which is constant for the send and recv
integer :: tag
tag = 123
if(pid.ne.0) then
print *,'pid: ',pid,'sending'
DO j = start_index+1, end_index
CALL MPI_SEND(datapacket(j),1, MPI_REAL,0, tag, MPI_COMM_WORLD)
!print *,'sending'
END DO
print *,'send complete'
else
DO slave_id = 1, npe-1
rec_start_index = slave_id*population_size+1
rec_end_index = (slave_id + 1) * population_size;
IF (slave_id == npe-1) THEN
rec_end_index = total-1;
ENDIF
print *,'received 1',rec_start_index,rec_end_index
CALL MPI_RECV(datapacket(j),1,MPI_REAL,slave_id,tag,MPI_COMM_WORLD, &
& status)
!print *,'received 2',rec_start_index,rec_end_index
END DO
It never prints received or anything after the MPI_RECV call but, I can see the sending happening just fine however, there is no way I can verify it except to rely on the print statements.
The variable databpacket is initialized as follows:
real, dimension (:), allocatable :: datapacket
Is there any thing that I am doing wrong here?
EDIT: For the test setup all the process are being run on the localhost.

You are using different message tags for all the sends, however in your receive you use just j, which is never altered on the root process. Also note that your implementation looks like a MPI_Gather, which I'd recommend you to use instead of implementing this yourself.
EDIT: Sorry, after your update I now, realize, that you are in fact sending multiple messages from each rank>0 (start_index+1 up to end_index), if you need that, you do need to have tags differentiating the individual messages. However, you then also need to have multiple receives on your master.
Maybe it would be better to state, what you actually want to achieve.
Do you want something like this:
integer :: tag
tag = 123
if(pid.ne.0) then
print *,'pid: ',pid,'sending'
CALL MPI_SEND(datapacket(start_index+1:end_index),end_index-start_index, MPI_REAL,0, tag, MPI_COMM_WORLD)
!print *,'sending'
print *,'send complete'
else
DO slave_id = 1, npe-1
rec_start_index = slave_id*population_size+1
rec_end_index = (slave_id + 1) * population_size;
IF (slave_id == npe-1) THEN
rec_end_index = total-1;
ENDIF
print *,'received 1',rec_start_index,rec_end_index
CALL MPI_RECV(datapacket(rec_start_index:rec_end_index),rec_end_index-rec_start_index+1,MPI_REAL,slave_id,tag,MPI_COMM_WORLD, &
& status)
!print *,'received 2',rec_start_index,rec_end_index
END DO
end if

How to go to the end of the file?

I have opened a file to write a number. I have to write the number at the end of the file so
how to go to the last line to write on it?

You should open the file with
open(..., position="append",...)
Alternatively, you can inquire for the size of the file
inquire(...,size=some_integer_variable,...)
then if the file is a direct access file, you can use this size to calculate the record number of the final record. Alternatively, if the access mode is "stream", you can use
write(..., pos=some_integer_variable)
to write starting at the end of the file.

I've been using the same trick for years, and would be interested in a more elegant way but I can propose you the following method. Note that it is less and less efficient as the file increases in number of lines. Note also that this part of code could endup in an elegant module dedicated to playing with input/output.
Open your file
open(11, file='monfichier')
Compute how many lines there are in your file
nbline = 0 ! should have been declared as an integer
do while(.true.)
read(11,*,iostat=ios) ! ios should have been declared as an integer
if( ios > 0 ) then
stop 'problem somewhere'
else if( ios < 0 ) then ! end of file is reached
exit
else
nbline = nbline + 1
end if
end do
close(11)
at this step, you have the total number of lines stored in variable nbline.
If you want to print something at the Nth line before the last line, then
open(11, file='monfichier')
do i = 1, nbline - N ! see my nota bene at the end of my answer; i and N are integers
read(11,*)
end do
write(11,*)'hello world'
Et voilà !
N.B. : Please be carefull in the way you count for nbline-N or nbline-(N-1), depending on exactly what you want.

subroutine to_last_rec (luout)
! purpose: position to last record of file
implicit none
integer :: luout
logical :: ende
! first executable statement
ende = .FALSE.
do while ( .NOT. ende)
read (luout,*,end=100)
enddo
100 return
end subroutine to_last_rec

PROGRAM example
IMPLICIT NONE
INTEGER :: ierr
OPEN(UNIT=13,FILE="ex.dat")
CALL FSEEK(13, 0, 2, ierr)
! DO WHATEVER YOU WANT THEN
CLOSE(13)
END PROGRAM example
the call to fseek goes to the end of the file ( used like that, check the usage http://docs.oracle.com/cd/E19957-01/805-4942/6j4m3r8ti/index.html)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

MPI_Bcast non-root nodes not receiving all data - fortran

Related

Write in a new line on every write(33,*) comand in Fortran77

How do I diagnose bus error in fortran

Writing a scalar variable along an unlimited dimension in NetCDF

MPI master unable to receive

How to go to the end of the file?

Categories

Resources