standard output in Fortran MPI code - fortran

I have a parallel fortran code in which I want only the rank=0 process to be able to write to stdout, but I don't want to have to litter the code with:
if(rank==0) write(*,*) ...
so I was wondering if doing something like the following would be a good idea, or whether there is a better way?
program test
use mpi
implicit none
integer :: ierr
integer :: nproc
integer :: rank
integer :: stdout
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, rank, ierr)
call mpi_comm_size(mpi_comm_world, nproc, ierr)
select case(rank)
case(0)
stdout = 6
case default
stdout = 7
open(unit=stdout, file='/dev/null')
end select
write(stdout,*) "Hello from rank=", rank
call mpi_finalize(ierr)
end program test
This gives:
$ mpirun -n 10 ./a.out
Hello from rank= 0
Thanks for any advice!

There are two disadvantages to your solution:
This "clever" solution actually obscures the code, since it lies: stdout isn't stdout any more. If someone reads the code he/she will think that all processes are writing to stdout, while in reality they aren't.
If you want all processes to write to stdout at some point, what will you do then? Add more tricks?
If you really want to stick with this trick, please don't use "stdout" as a variable for the unit number, but e.g. "master" or anything that indicates you're not actually writing to stdout. Furthermore, you should be aware that the number 6 isn't always stdout. Fortran 2003 allows you to check the unit number of stdout, so you should use that if you can.
My advice would be to stay with the if(rank==0) statements. They are clearly indicating what happens in the code. If you use lots of similar i/o statements, you could write subroutines for writing only for rank 0 or for all processes. These can have meaningful names that indicate the intended usage.

mpirun comes with the option to redirect stdout from each process into separate files. For example, -output-filename out would result in out.1.0, out.1.1, ... which you then can monitor using whatever way you like (I use tail -f). Next to if(rank.eq.0) this is the cleanest solution I think.

I am not so concerned with the two disadvantages mentioned by steabert. We can work that out by introducing another file descriptor that clearly indicates that it is stdout only on master process, e.g. stdout -> stdout0.
But my concern is here: The /dev/null will work in UNIX-like environment. Will it work on Windows environment? How about the funky BlueGene systems?

Related

MPI send-receive issue in Fortran

I am currently starting to develop a parallel code for scientific applications. I have to exchange some buffers from p0 to p1 and from p1 to p0 (I am creating ghost point between processors boundaries).
The error can be summarized by this sample code:
program test
use mpi
implicit none
integer id, ids, idr, ierr, tag, istat(MPI_STATUS_SIZE)
real sbuf, rbuf
call mpi_init(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,id,ierr)
if(id.eq.0) then
ids=0
idr=1
sbuf=1.5
tag=id
else
ids=1
idr=0
sbuf=3.5
tag=id
endif
call mpi_send(sbuf,1,MPI_REAL,ids,tag,MPI_COMM_WORLD,ierr)
call mpi_recv(rbuf,1,MPI_REAL,idr,tag,MPI_COMM_WORLD,istat,ierr)
call mpi_finalize(ierr)
return
end
What is wrong with this?
Coding with MPI can be difficult at first, and it's good that you're going through the steps of making a sample code. Your sample code as posted hangs due to deadlock. Both processes are busy MPI_SEND-ing, and the send cannot complete until it has been MPI_RECV-ed. So the code is stuck.
There are two common ways around this problem.
Send and Receive in a Particular Order
This is the simple and easy-to-understand solution. Code your send and receive operations such that nobody ever gets stuck. For your 2-process test case, you could do:
if (id==0) then
call mpi_send(sbuf,1,MPI_REAL,ids,tag,MPI_COMM_WORLD,ierr)
call mpi_recv(rbuf,1,MPI_REAL,idr,tag,MPI_COMM_WORLD,istat,ierr)
else
call mpi_recv(rbuf,1,MPI_REAL,idr,tag,MPI_COMM_WORLD,istat,ierr)
call mpi_send(sbuf,1,MPI_REAL,ids,tag,MPI_COMM_WORLD,ierr)
endif
Now, process 1 receives first, so there is never a deadlock. This particular example is not extensible, but there are various looping structures that can help. You can imagine a routine to send data from every process to every other process as:
do sending_process=1,nproc
if (id == sending_process) then
! -- I am sending
do destination_process = 1,nproc
if (sending_process == destination_process) cycle
call MPI_SEND ! Send to destination_process
enddo
elseif
! -- I am receiving
call MPI_RECV ! Receive from sending_process
endif
enddo
This works reasonably well and is easy to follow. I recommend this structure for beginners.
However, it has several issues for truly large problems. You are sending a number of messages equal to the number of processes squared, which can overload a large network. Also, depending on your operation, you probably do not need to send data from every process to every other process. (I suspect this is true for you given you mentioned ghosts.) You can modify the above loop to only send if data are required, but for those cases there is a better option.
Use Non-Blocking MPI Operations
For many-core problems, this is often the best solution. I recommend sticking to the simple MPI_ISEND and MPI_IRECV. Here, you start all necessary sends and receives, and then wait.
Here, I am using some list structure which has been setup already which defines the complete list of necessary destinations for each process.
! -- Open sends
do d=1,Number_Destinations
idest = Destination_List(d)
call MPI_ISEND ! To destination d
enddo
! -- Open receives
do s=1,Number_Senders
isend = Senders_List(s)
call MPI_IRECV ! From source s
enddo
call MPI_WAITALL
This option may look simpler but it is not. You must set up all necessary lists beforehand, and there are a variety of potential problems with buffer size and data alignment. Even still, it is typically the best answer for big codes.
As pointed by Vladimir, your code is too incomplete to provide a definitive answer.
That being said, that could be a well known error.
MPI_Send() might block. From a pragmatic point of view, MPI_Send() is likely to return immediately when sending a short message, but is likely to block when sending a large message. Note small and large depends on your MPI library, the interconnect you are using plus other runtime parameters. MPI_Send() might block until a MPI_Recv() is posted on the other end.
It seems you MPI_Send() and MPI_Recv() in the same block of code, so you can try using MPI_Sendrecv() to do it in one shot. MPI_Sendrecv() will issue a non blocking send under the hood, so that will help if your issue is really a MPI_Send() deadlock.

How to I determine the local node placement of tasks in MPI

I have a basic program that I am trying to run to determine where MPI will place a task given that the number of tasks is greater than the number of available processors (oversubscribing). If I run, for example, mpirun -np <program name> the result will give:
processor 0 of 4
processor 1 of 4
processor 2 of 4
processor 3 of 4
But, if I run the same command on "8" processors I get :
processor 1 of 8
processor 2 of 8
processor 5 of 8
processor 6 of 8
processor 4 of 8
processor 3 of 8
processor 7 of 8
processor 0 of 8
I understand that there are not 8 actual cores running my program and instead I have multiple tasks being run on the same processors and I want to know exactly how these are distributed. Thanks in advance.
edit:
program test
! Similar to "Hello World" example- trying to determine rank/ node placement
use mpi
implicit none
integer :: procid, ierr, numprocs, name_len
integer:: local
local= 'OMPI_COMM_WORLD_LOCAL_RANK'
!character* (MPI_max_processor_name) name
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_World, numprocs, ierr)
call MPI_COMM_RANK(MPI_COMM_World, procid, ierr)
!call Mpi_Get_Processor_Name(name,name_len, ierr)
print*, 'processor', procid, 'of', numprocs, 'On Local Node:',' ', local
call mpi_finalize(ierr)
end program test
can you post your test program ?
is "processor x of y" coming from MPI_Comm_rank() and MPI_Comm_size() ?
in this case, these numbers are MPI ranks and have nothing to do with binding.
you'd rather read your MPI documentation and figure out how the binding is done.
an other option i sometimes use is (with Open MPI)
mpirun --tag-output grep Cpus_allowed_list /proc/self/status
note there is a possibility no binding is performed when you oversubscribe your nodes.
Unfortunately, it is quite common in MPI to use the same words for different meanings.
For example, job managers tend to mix the word processor and use it for different meanings. In this specific case, I will use the following:
This also applies to MPI_Get_processor_name. The standard does not actually require to return a name uniquely identifying a processor. The name is left to the implementations, which usually tend to report the name of the host. This is not what I assume you are looking for.
I will use the word processor to identify a CPU core or, in an hyperthreading enabled scenario, a hardware thread (although a hardware thread is not exactly a CPU core ).
A regular process (be it MPI or not) is usually allowed to execute on different processors. This does not necessarily mean that the process will use ALL this processors, but rather being able to bounce from one to another if the former is now occupied by a different process (usually due to Operative System scheduler).
To obtain the process affinity (the list of processors that a process can use), you should use a different interface. For example, you can use something like sched_getaffinity (this is C, though). Some MPI implementations such as Intel MPI, allow you to print the process affinity at MPI_Init, setting an environment variable.
I would consider using existing programs that report the affinity. Check this page in the MPICH documentation.
Also included in the MPICH source is a program for printing out the affinity of a process (src/pm/hydra/examples/print_cpus_allowed.c) according to the OS. This can be used on Linux systems to test that bindings are working correctly.
shell$ mpiexec -n 8 -bind-to socket ./print_cpus_allowed | sort
crush[0]: Cpus_allowed_list: 0,2,4,6
crush[1]: Cpus_allowed_list: 1,3,5,7
crush[2]: Cpus_allowed_list: 0,2,4,6
crush[3]: Cpus_allowed_list: 1,3,5,7
crush[4]: Cpus_allowed_list: 0,2,4,6
crush[5]: Cpus_allowed_list: 1,3,5,7
crush[6]: Cpus_allowed_list: 0,2,4,6
crush[7]: Cpus_allowed_list: 1,3,5,7

Decorate Write-Operation for MPI-usage [duplicate]

I have a parallel fortran code in which I want only the rank=0 process to be able to write to stdout, but I don't want to have to litter the code with:
if(rank==0) write(*,*) ...
so I was wondering if doing something like the following would be a good idea, or whether there is a better way?
program test
use mpi
implicit none
integer :: ierr
integer :: nproc
integer :: rank
integer :: stdout
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, rank, ierr)
call mpi_comm_size(mpi_comm_world, nproc, ierr)
select case(rank)
case(0)
stdout = 6
case default
stdout = 7
open(unit=stdout, file='/dev/null')
end select
write(stdout,*) "Hello from rank=", rank
call mpi_finalize(ierr)
end program test
This gives:
$ mpirun -n 10 ./a.out
Hello from rank= 0
Thanks for any advice!
There are two disadvantages to your solution:
This "clever" solution actually obscures the code, since it lies: stdout isn't stdout any more. If someone reads the code he/she will think that all processes are writing to stdout, while in reality they aren't.
If you want all processes to write to stdout at some point, what will you do then? Add more tricks?
If you really want to stick with this trick, please don't use "stdout" as a variable for the unit number, but e.g. "master" or anything that indicates you're not actually writing to stdout. Furthermore, you should be aware that the number 6 isn't always stdout. Fortran 2003 allows you to check the unit number of stdout, so you should use that if you can.
My advice would be to stay with the if(rank==0) statements. They are clearly indicating what happens in the code. If you use lots of similar i/o statements, you could write subroutines for writing only for rank 0 or for all processes. These can have meaningful names that indicate the intended usage.
mpirun comes with the option to redirect stdout from each process into separate files. For example, -output-filename out would result in out.1.0, out.1.1, ... which you then can monitor using whatever way you like (I use tail -f). Next to if(rank.eq.0) this is the cleanest solution I think.
I am not so concerned with the two disadvantages mentioned by steabert. We can work that out by introducing another file descriptor that clearly indicates that it is stdout only on master process, e.g. stdout -> stdout0.
But my concern is here: The /dev/null will work in UNIX-like environment. Will it work on Windows environment? How about the funky BlueGene systems?

Reading many files in Fortran subroutine

I have a Fortran subroutine that is called several times in a main program (which I don't have access to). In my subroutine, I wish to read data from one of several (~10^4) files in every iteration based on an input argument. Each of the files has one line of data; and the format of my data is as follows:
0.97014199999999995 0.24253600000000000 0.0000000000000000
I'm using the following lines of code to open and read the files:
program test_read
implicit none
integer :: i, iopen_status, iread_status
real :: gb
CHARACTER(len=25) :: filename
CHARACTER(*), PARAMETER :: fileplace =
& "/home/ajax/hexmesh_readn/G3/"
dimension gb(3)
i = 5
WRITE(filename,'(a,I0,a)')'GBn_',i,'.txt'
open(unit=15,
& file=fileplace//filename,IOSTAT=iopen_status)
read (15,*,IOSTAT=iread_status) gb
print *,"gb",gb(1),gb(2),gb(3)
close(15)
end program test_read
In the main program, i is a variable, but I have a file for all possible values of i.
Now, this code works perfectly well when I run on my local machine. But, when I submit it along with the main program, it behaves somewhat weirdly. Specifically, it reads some of the files, but not others.
When I print out the IOSTAT for open and read, I see that the IOSTAT for open is 0 for all the files, whereas that for the read command is 0 for some, -1 for some and 29 for others! I looked up what the error code 29 means and I learned that it might indicate that the file is not found in the path. But the file is most definitely there.
Also, I don't see anything different about the files that it isn't able to read. In fact, I have even seen the same file giving an IOSTAT value of 0 and 29!
One thing to note is that I'm running the main program on several cores. Could this have anything to do with the error?
Are you running several instances of the same program simultaneously? On some operating systems, different programs can't simultaneously open the same file. Specifying that you want read-only access might allow access by multiple programs. On the Fortran open statement: action='read'.
If you are running a multi-threaded program, then different threads might be doing IO simultaneously on different files ... different unit numbers should be used by each thread to avoid conflicts.

Stop fortran program with non-zero exit status

I'm adapting some Fortran code I haven't written, and without a lot of fortran experience myself. I just found a situation where some malformed input got silently ignored, and would like to change that code to do something more appropriate. If this were C, then I'd do something like
fprintf(stderr, "There was an error of kind foo");
exit(EXIT_FAILURE);
But in fortran, the best I know how to do looks like
write(*,*) 'There was an error of kind foo'
stop
which lacks the choice of output stream (minor issue) and exit status (major problem).
How can I terminate a fortran program with a non-zero exit status?
In case this is compiler-dependent, a solution which works with gfortran would be nice.
The stop statement allows a integer or character value. It seems likely that these will be output to stderr when that exists, but as stderr is OS dependent, it is unlikely that the Fortran language standard requires that, if it says anything at all. It is also likely that if you use the numeric option that the exit status will be set. I tried it with gfortran on a Mac, and that was the case:
program TestStop
integer :: value
write (*, '( "Input integer: " )', advance="no")
read (*, *) value
if ( value > 0 ) then
stop 0
else
stop 9
end if
end program TestStop
While precisely what stop with an integer or string will do is OS-dependent, the statement is part of the language and will always compile. call exit is a GNU extension and might not link on some OSes.
In addition to stop n, there is also error stop n since Fortran 2008.
With gfortran under Windows, they both send the error number to the OS, as can be seen with a subsequent echo %errorlevel%. The statement error stop can also be passed an error message.
program bye
read *, n
select case (n)
case (1); stop 10
case (2); error stop 20
case (3); error stop "Something went wrong"
case (4); error stop 2147483647
end select
end program
I couldn't find anything about STOP in the gfortran 4.7.0 keyword index, probably because it is a language keyword and not an intrinsic. Nevertheless, there is an EXIT intrinsic which seems to do just what I was looking for: exit with a given status. And the fortran wiki has a small example of using stderr which mentions a constant ERROR_UNIT. So now my code now looks like this:
USE ISO_FORTRAN_ENV, ONLY : ERROR_UNIT
[…]
WRITE(ERROR_UNIT,*) 'There as an error of kind foo'
CALL EXIT(1)
This at least compiles. Testing still pending, but it should work. If someone knows a more elegant or more appropriate solution, feel free to offer alternative answers to this question.