Using GDB to debug an MPI program in Fortran - c++

I read this and arrived here, so now I think I should (if not so, please, tell me) rewrite the code
{
int i = 0;
char hostname[256];
gethostname(hostname, sizeof(hostname));
printf("PID %d on %s ready for attach\n", getpid(), hostname);
fflush(stdout);
while (0 == i)
sleep(5);
}
in Fortran. From this answer I understood that in Fortran I could simply use MPI_Get_processor_name in place of gethostname. Everything else is simple but flush. What about it?
Where should I put it? In the main program after MPI_Init?
And then? What should I do?
For what concerns the compile options, I referred to this and used -v -da -Q as options to the mpifort wrapper.
This solution doesn't fit my case, since I need to run the program on 27 processes as minimum, so I'd like to check one process only.

Simplest approach:
What I actually often do is I just run the MPI job locally and see what it does. Without any of the above code. Then if it hangs I use top to find out the PIDof the processes and usually one can guess easily which rank is which from the PIDs (they tend to be consecutive and the lowest one is rank 0). Below rank 0 is process 1641 and than they are rank 1 pid 1642 and so on...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1642 me 20 0 167328 7716 5816 R 100.0 0.047 0:25.02 a.out
1644 me 20 0 167328 7656 5756 R 100.0 0.047 0:25.04 a.out
1645 me 20 0 167328 7700 5792 R 100.0 0.047 0:24.97 a.out
1646 me 20 0 167328 7736 5836 R 100.0 0.047 0:25.00 a.out
1641 me 20 0 167328 7572 5668 R 99.67 0.046 0:24.95 a.out
Then I just do gdb -pid and I examine the stack and local variables in the processes. (use help stack in the GDB console)
The most important is to get a backtrace, so just print bt in the console.
This will work well when examining deadlocks. Less well when you have to stop at some specific place. Then you have to attach the debugger early.
Your code:
I don't think the flush is necessary in Fortran. I think Fortran write and print flush as necessary at least in compilers I use.
But you definitely can use the flush statement
use iso_fortran_env
flush(output_unit)
just put that flush after your write where you print hostname and pid. But as I said I would just start with printing alone.
What you than do is that you login to that node and attach gdb to the righ process with something like
gdb -pid 12345
For sleep you can use the non-standard sleep intrinsic subroutine available in many compilers or write your own.
Whether before or after MPI_Init? If you want to print the rank, it must be after. Also for using MPI_Get_processor_name it must be after. It is normally recommended to call MPI_Init as early as possible in your program.
The code is then something like
use mpi
implicit none
character(MPI_MAX_PROCESSOR_NAME) :: hostname
integer :: rank, ie, pid, hostname_len
integer, volatile :: i
call MPI_Init(ie)
call MPI_Get_processor_name(hostname, hostname_len, ie)
!non-standard extension
pid = getpid()
call MPI_Comm_rank(MPI_COMM_WORLD, rank, ie)
write(*,*) "PID ", pid, " on ", trim(hostname), " ready for attach is world rank ", rank
!this serves to block the execution at a specific place until you unblock it in GDB by setting i=0
i = 1
do
!non-standard extension
call sleep(1)
if (i==0) exit
end do
end
Important note: if you compile with optimizations than the compiler can see that i==0 is never true and will remove the check completely. You must lower your optimizations or declare i as volatile. Volatile means that the value can change at any time and the compiler must reload its value from memory for the check. That requires Fortran 2003.
Attaching the right process:
The above code will print, for example,
> mpif90 -ggdb mpi_gdb.f90
> mpirun -n 4 ./a.out
PID 2356 on linux.site ready for attach is world rank 1
PID 2357 on linux.site ready for attach is world rank 2
PID 2358 on linux.site ready for attach is world rank 3
PID 2355 on linux.site ready for attach is world rank 0
In top they look like
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2355 me 20 0 167328 7452 5564 R 100.0 0.045 1:42.55 a.out
2356 me 20 0 167328 7428 5548 R 100.0 0.045 1:42.54 a.out
2357 me 20 0 167328 7384 5500 R 100.0 0.045 1:42.54 a.out
2358 me 20 0 167328 7388 5512 R 100.0 0.045 1:42.51 a.out
and you just select which rank you want and execute
gdb -pid 2355
to attach rank 0 and so on. In a different terminal window, of course.
Then you get something like
MAIN__ () at mpi_gdb.f90:26
26 if (i==0) exit
(gdb) info locals
hostname = 'linux.site', ' ' <repeats 246 times>
hostname_len = 10
i = 1
ie = 0
pid = 2457
rank = 0
(gdb) set var i = 0
(gdb) cont
Continuing.
[Inferior 1 (process 2355) exited normally]

The posted code is basically just an infinite loop designed to "pause" execution whilst you attach the debugger. You can then use the debugger controls to exit this loop and the program will continue. You can write an equivalent loop in fortran, so provided you're happy to get the hostname and pid from another method (see mpi_get_processor_name as mentioned by VladimirF in his answer and if you are happy to use compiler extensions both gnu and intel compilers provide a getpid extension), you could use something like the following (thanks to this answer for the sleep example).
module fortran_sleep
!See https://stackoverflow.com/a/6932232
use, intrinsic :: iso_c_binding, only: c_int
implicit none
interface
! should be unsigned int ... not available in Fortran
! OK until highest bit gets set.
function FortSleep (seconds) bind ( C, name="sleep" )
import
integer (c_int) :: FortSleep
integer (c_int), intent (in), VALUE :: seconds
end function FortSleep
end interface
end module fortran_sleep
program mpitest
use mpi
use fortran_sleep
use, intrinsic :: iso_c_binding, only: c_int
implicit none
integer :: rank,num_process,ierr, tmp
integer :: i
integer (c_int) :: wait_sec, how_long
wait_sec = 5
call mpi_init (ierr)
call mpi_comm_rank (MPI_COMM_WORLD, rank, ierr)
call mpi_comm_size (MPI_COMM_WORLD, num_process, ierr)
call mpi_barrier (MPI_COMM_WORLD, ierr)
print *, 'rank = ', rank
call mpi_barrier (MPI_COMM_WORLD, ierr)
i=0
do while (i.eq.0)
how_long = FortSleep(wait_sec)
end do
print*,"Rank ",rank," has escaped!"
call mpi_barrier(MPI_COMM_WORLD, ierr)
call mpi_finalize (ierr)
end program mpitest
Now compile with something like
> mpif90 prog.f90 -O0 -g -o prog.exe
If I now launch this on two cores of my local machine using
> mpirun -np 2 ./prog.exe
On screen I see just
rank = 0
rank = 1
Now in another terminal I connect to the relevant machine and find the relevant process id using
ps -ef | grep prog.exe
This gives me several process id values corresponding to the different ranks. I can then attach to one of these using
gdb --pid <pidFromPSCmd> ./prog.exe
Now we're in gdb we can see where we are in the program using bt (backtrace), it's likely we're in sleep. We then step through the program using s(tep) until we reach our main program. Now we set i to something non-zero and then c(ontinue) execution, which allows this ranks process to continue and we see the rank has escaped message etc. The gdb section will look something like:
(gdb) bt
#0 0x00007f01354a1d70 in __nanosleep_nocancel () from /lib64/libc.so.6
#1 0x00007f01354a1c24 in sleep () from /lib64/libc.so.6
#2 0x0000000000400ef9 in mpitest () at prog.f90:35
#3 0x0000000000400fe5 in main (argc=1, argv=0x7ffecdc8d0ae) at prog.f90:17
#4 0x00007f013540cb05 in __libc_start_main () from /lib64/libc.so.6
#5 0x0000000000400d39 in _start () at ../sysdeps/x86_64/start.S:122
(gdb) s
Single stepping until exit from function __nanosleep_nocancel,
which has no line number information.
0x00007f01354a1c24 in sleep () from /lib64/libc.so.6
(gdb) s
Single stepping until exit from function sleep,
which has no line number information.
mpitest () at prog.f90:34
34 do while (i.eq.0)
(gdb) bt
#0 mpitest () at prog.f90:34
#1 0x0000000000400fe5 in main (argc=1, argv=0x7ffecdc8d0ae) at prog.f90:17
#2 0x00007f013540cb05 in __libc_start_main () from /lib64/libc.so.6
#3 0x0000000000400d39 in _start () at ../sysdeps/x86_64/start.S:122
(gdb) set var i = 1
(gdb) c
Continuing.
and in our original terminal we will see something like
Rank 0 has escaped!

Related

Run the same code with different parameterizations on multiple nodes of a slurm cluster

I have a fortran code with 3 scenarios.
I set a flag at the beginning of the code for which scenario I want to run.
integer :: scenario_no = 1 !Set 1, 2 or 3
I usually manually change this flag, compile the code, and run it into a cluster node.
Is there anyway to create a sbatch file to run each of the 3 scenarios on a different note without having to recompile each time?
I recommend reading in the command line arguments to the program.
integer :: clen, status, scenario_no
character(len=4) :: buffer
! Body of runcfdsim
scenario_no = 1
clen = command_argument_count()
if(clen>0) then
call get_command_argument (1, buffer, clen, status)
if(status==0) then
read(buffer, '(BN,I4)') scenario_no
if(scenario_no<1 .or. scenario_no>4) then
scenario_no = 1
end if
end if
end if
This way you can call
runcfdsim 1
runcfdsim 2
runcfdsim 3
runcfdsim 4
and each run will have a different value for the scenario_no variable.
No, if the variable is hard-coded, you have to recompile every time you change it.
EDIT: As pointed out in the comment, reading from a file may lead to problems when running in parallel. Better read from argument:
program test
character :: arg
integer :: scenario
if(command_argument_count() >= 1) then
call get_command_argument(1, arg)
read(unit=arg, fmt=*, iostat=ios) scenario
if(ios /= 0) then
write(*,"('Invalid argument: ',a)") arg
stop
end if
write(*,"('Running for scenario: ',i0)") scenario
else
write(*,"('Invalid argument')")
end if
end program test
srun test.exe 1 &
srun test.exe 2 &
srun test.exe 3 &
wait
Make sure the outputs are done in different files, so they are not overwritten. You may also need to pin the tasks on different cores.

Fortran call to sleep does not block program

I have the most basic Fortran program:
program sleep
print*, "Sleeping"
call sleep(30)
print*, "Done"
end program sleep
which I compile with gfortran sleep.f90 (version 9.3.0). From what I understood from the sleep documentation, this program is supposed to sleep for 30 seconds, i.e. I should expect to see "Done" being printed 30 seconds after "Sleeping". This doesn't happen: I see both print statements appearing instantaneously, suggesting that call sleep(30) does not block my program in any way. Doing call sleep(10000) didn't make any difference. I am compiling and running this program on a Windows Subsystem for Linux (WSL Ubuntu 20.04).
So this problem was fixed through a combination of solutions suggested by #roygvib in the comments. The main issue is that sleep in the WSL (Ubuntu 20.04) environment is broken. The first step is to replace the broken /usr/bin/sleep with this Python script:
#!/usr/bin/env python3
import sys
import time
time.sleep(int(sys.argv[1]))
Then, the Fortran program is modified to make a system call to this new sleep executable:
program sleep
print*, "Sleeping"
call system("sleep 30")
print*, "Done"
end program sleep
Until the next update of WSL, this hack will have to do.
The sleep procedure is not part of the Fortran standard and non-portable. Here is a solution that would potentially work on all systems with any standard-compliant Fortran compiler:
module sleep_mod
use, intrinsic :: iso_fortran_env, only: IK => int64, RK => real64, output_unit
implicit none
contains
subroutine sleep(seconds)
implicit none
real(RK), intent(in) :: seconds ! sleep time
integer(IK) :: countOld, countNew, countMax
real(RK) :: countRate
call system_clock( count=countOld, count_rate=countRate, count_max=countMax )
if (countOld==-huge(0_IK) .or. nint(countRate)==0_IK .or. countMax==0_IK) then
write(output_unit,"(A)") "Error occurred. There is no processor clock."
error stop
end if
countRate = 1._RK / countRate
do
call system_clock( count=countNew )
if (countNew==countMax) then
write(output_unit,"(A)") "Error occurred. Maximum processor clock count reached."
error stop
end if
if ( real(countNew-countOld,kind=RK) * countRate > seconds ) exit
cycle
end do
end subroutine sleep
end module sleep_mod
program main
use sleep_mod, only: output_unit, sleep, RK
implicit none
write(output_unit,"(A)") "Sleep for 5 second."
call execute_command_line(" ") ! flush stdout
call sleep(seconds = 5._RK)
write(output_unit,"(A)") "Wake up."
end program main
You can test it online here: https://www.tutorialspoint.com/compile_fortran_online.php

SIGBUS occurs when fortran code reads file on linux cluster

When I run my fortran code in parallel on a linux cluster with mpirun I get a sigbus error.
It occurs while reading a file, the timing is irregular, and sometimes it proceeds without error.
I have tried debug compilation options like -g, but I haven't gotten any information on what line the error is coming from.
Actually the code was executed previously in three different clusters without this error, but the error is only occurring on this machine.
I personally suspect this is related to the performance of the machine (especially storage i/o), but I am not sure.
The program code is simple. Each process executed by mpirun reads the file corresponding to its rank as follows.
!!!!!!!!!! start of code
OPEN(11, FILE='FILE_NAME_WITH_RANK', FORM='UNFORMATTED')
READ(11,*) ISIZE
ALLOCATE(SOME_VARIABLE(ISIZE))
DO I = 1, ISIZE
READ(11,*) SOME_VARIABLE(I)
ENDDO
READ(11,*) ISIZE2
ALLOCATE(SOME_VARIABLE2(ISIZE2))
DO I = 1, ISIZE2
READ(11,*) SOME_VARIABLE2(I)
ENDDO
! MORE VARIABLES
CLOSE(11)
!!!!!!!!!! end of code
I used 191 cpu, and the total size of 191 files it loads is about 11 GB.
The cluster used for execution consists of 24 nodes with 16 cpu each (384 cpu total) and uses common storage that is shared with another cluster.
I ran the code in parallel by specifying nodes 1 through 12 as the hostfile.
Initially, I had 191 cpu read all files at the same time out of sequence.
After doing so, the program ended with a sigbus error. Also, for some nodes, the ssh connection was delayed, and the bashrc file cannot be found by node with stale file handle error.
The stale file handle error waited a bit and it seemed to recover by itself, but I'm not sure what the system administrator did.
So, I changed it to the following code so that only one cpu can read the file at a time.
!!!!!!!!!! start of code
DO ICPU = 0, NUMBER_OF_PROCESS-1
IF(ICPU.EQ.MY_PROCESS) CALL READ_FILE
CALL MPI_BARRIER(MPI_COMMUNICATOR,IERR)
ENDDO
!!!!!!!!!! end of code
This seemed to work fine for single execution, but if I ran more than one of these programs at the same time, the first mpirun stopped and both ended with a sigbus error eventually.
My next attempt is to minimize the execution of the read statement by deleting the do statement when reading the array. However, due to limited time, I couldn't test the effectiveness of this modification.
Here are some additional information.
If I execute a search or copy a file with an explorer such as nautilus while running a parallel program, nautilus does not respond or the running program raise sigbus. In severe cases, I wasn't able to connect the VNC server with stale file handle errors.
I use OpenMPI 2.1.1, GNU Fortran 4.9.4.
I compile the program with following
$OPENMPIHOME/bin/mpif90 -mcmodel=large -fmax-stack-var-size-64 -cpp -O3 $SOURCE -o $EXE
I execute the program with following in gnome terminal
$OPENMPIHOME/bin/mpirun -np $NP -x $LD_LIBRARY_PATH --hostfile $HOSTFILE $EXE
The cluster is said to be running commercial software like FLUENT without problems.
Summing up the above, my personal suspicion is that the storage of the cluster is dismounted due to the excessive disk I/O generated by my code, but I don't know if this makes sense because I have no cluster knowledge.
If yes, I wonder if there is a way to minimize the disk I/O, if it is enough to proceed with the vectorized I/O mentioned above, or if there is an additional part.
I would appreciate it if you could tell me anything about the problem.
Thanks in advance.
!!!
I wrote an example code. As mentioned above, it may not be easy to reproduce because the occurrence varies depending on the machine.
PROGRAM BUSWRITE
IMPLICIT NONE
INTEGER, PARAMETER :: ISIZE1 = 10000, ISIZE2 = 20000, ISIZE3 = 30000
DOUBLE PRECISION, ALLOCATABLE :: ARRAY1(:), ARRAY2(:), ARRAY3(:)
INTEGER :: I
INTEGER :: I1, I2, I3
CHARACTER*3 CPUNUM
INCLUDE 'mpif.h'
INTEGER ISTATUS(MPI_STATUS_SIZE)
INTEGER :: IERR, NPES, MYPE
CALL MPI_INIT(IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NPES,IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYPE,IERR)
I1=MOD(MYPE/100,10)+48
I2=MOD(MYPE/10 ,10)+48
I3=MOD(MYPE ,10)+48
CPUNUM=CHAR(I1)//CHAR(I2)//CHAR(I3)
OPEN(11, FILE=CPUNUM//'.DAT', FORM='UNFORMATTED')
ALLOCATE(ARRAY1(ISIZE1))
ALLOCATE(ARRAY2(ISIZE2))
ALLOCATE(ARRAY3(ISIZE3))
DO I = 1, ISIZE1
ARRAY1(I) = I
WRITE(11) ARRAY1(I)
ENDDO
DO I = 1, ISIZE2
ARRAY2(I) = I**2
WRITE(11) ARRAY2(I)
ENDDO
DO I = 1, ISIZE3
ARRAY3(I) = I**3
WRITE(11) ARRAY3(I)
ENDDO
CLOSE(11)
CALL MPI_FINALIZE(IERR)
END PROGRAM
mpif90 -ffree-line-length-0 ./buswrite.f90 -o ./buswrite
mpirun -np 32 ./buswrite
I've got 32 000.DAT ~ 031.DAT
PROGRAM BUSREAD
IMPLICIT NONE
INTEGER, PARAMETER :: ISIZE1 = 10000, ISIZE2 = 20000, ISIZE3 = 30000
DOUBLE PRECISION, ALLOCATABLE :: ARRAY1(:), ARRAY2(:), ARRAY3(:)
INTEGER :: I
INTEGER :: I1, I2, I3
CHARACTER*3 CPUNUM
INCLUDE 'mpif.h'
INTEGER ISTATUS(MPI_STATUS_SIZE)
INTEGER :: IERR, NPES, MYPE
CALL MPI_INIT(IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NPES,IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYPE,IERR)
I1=MOD(MYPE/100,10)+48
I2=MOD(MYPE/10 ,10)+48
I3=MOD(MYPE ,10)+48
CPUNUM=CHAR(I1)//CHAR(I2)//CHAR(I3)
OPEN(11, FILE=CPUNUM//'.DAT', FORM='UNFORMATTED')
ALLOCATE(ARRAY1(ISIZE1))
ALLOCATE(ARRAY2(ISIZE2))
ALLOCATE(ARRAY3(ISIZE3))
DO I = 1, ISIZE1
READ(11) ARRAY1(I)
IF(ARRAY1(I).NE.I) STOP
ENDDO
DO I = 1, ISIZE2
READ(11) ARRAY2(I)
IF(ARRAY2(I).NE.I**2) STOP
ENDDO
DO I = 1, ISIZE3
READ(11) ARRAY3(I)
IF(ARRAY3(I).NE.I**3) STOP
ENDDO
CLOSE(11)
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
IF(MYPE.EQ.0) WRITE(*,*) 'GOOD'
CALL MPI_FINALIZE(IERR)
END PROGRAM
mpif90 -ffree-line-length-0 ./busread.f90 -o ./busread
mpirun -np 32 ./busread
I've got 'GOOD' output text from terminal as expected, but the machine in question is terminated with a sigbus error while running busread.
The issue was not observed after a device reboot. Even though I ran 4 programs at the same time under the same conditions, no problem occurred. In addition, other teams that used the device also had similar problems, which were resolved after reboot. The conclusion is a bit ridiculous, but if there are any people experiencing similar problems, I would like to summarize it as follows.
If your program terminates abnormally due to a memory error (like sigbus and sigsegv) while reading or writing a file, you can check the following.
Make sure there are no errors in your program. Check whether the time of occurrence of the error is constant or irregular, whether other programs have the same symptoms, whether it runs well on other machines, and whether there is a problem when run with a memory error checking tool such as valgrind.
Optimize the file I/O part. In the case of fortran, processing an entire array is tens of times faster than processing by element.
Immediately after an error occurs, try ssh connection to the machine (or node) to check whether the connection is smooth and that the file system is well accessed. If you cannot access the bashrc file or an error such as stale file handle occurs, please contact the system manager after combining the above reviewed information.
If someone has anything to add or if this post isn't appropriate, please let me know.

Niether 'MPI_Barrier' nor 'BLACS_Barrier' doesn't stop a processors executing its commands

I'm working on ScaLAPACK and trying to get used to BLACS routines which is essential using ScaLAPACK.
I've had some elementary course on MPI, so have some rough idea of MPI_COMM_WORLD stuff, but has no deep understanding on how it works internally and so on.
Anyway, I'm trying following code to say hello using BLACS routine.
program hello_from_BLACS
use MPI
implicit none
integer :: info, nproc, nprow, npcol, &
myid, myrow, mycol, &
ctxt, ctxt_sys, ctxt_all
call BLACS_PINFO(myid, nproc)
! get the internal default context
call BLACS_GET(0, 0, ctxt_sys)
! set up a process grid for the process set
ctxt_all = ctxt_sys
call BLACS_GRIDINIT(ctxt_all, 'c', nproc, 1)
call BLACS_BARRIER(ctxt_all, 'A')
! set up a process grid of size 3*2
ctxt = ctxt_sys
call BLACS_GRIDINIT(ctxt, 'c', 3, 2)
if (myid .eq. 0) then
write(6,*) ' myid myrow mycol nprow npcol'
endif
(**) call BLACS_BARRIER(ctxt_sys, 'A')
! all processes not belonging to 'ctxt' jump to the end of the program
if (ctxt .lt. 0) goto 1000
! get the process coordinates in the grid
call BLACS_GRIDINFO(ctxt, nprow, npcol, myrow, mycol)
write(6,*) 'hello from process', myid, myrow, mycol, nprow, npcol
1000 continue
! return all BLACS contexts
call BLACS_EXIT(0)
stop
end program
and the output with 'mpirun -np 10 ./exe' is like,
hello from process 0 0 0 3 2
hello from process 4 1 1 3 2
hello from process 1 1 0 3 2
myid myrow mycol nprow npcol
hello from process 5 2 1 3 2
hello from process 2 2 0 3 2
hello from process 3 0 1 3 2
Everything seems to work fine except that 'BLACS_BARRIER' line, which I marked (**) in the code's leftside.
I've put that line to make the output like below whose title line always printed at the top of the it.
myid myrow mycol nprow npcol
hello from process 0 0 0 3 2
hello from process 4 1 1 3 2
hello from process 1 1 0 3 2
hello from process 5 2 1 3 2
hello from process 2 2 0 3 2
hello from process 3 0 1 3 2
So the question goes,
I've tried BLACS_BARRIER to 'ctxt_sys', 'ctxt_all', and 'ctxt' but all of them does not make output in which the title line is firstly printed. I've also tried MPI_Barrier(MPI_COMM_WORLD,info), but it didn't work either. Am I using the barriers in the wrong way?
In addition, I got SIGSEGV when I used BLACS_BARRIER to 'ctxt' and used more than 6 processes when executing mpirun. Why SIGSEGV takes place in this case?
Thank you for reading this question.
To answer your 2 questions (in future it is best to give then separate posts)
1) MPI_Barrier, BLACS_Barrier and any barrier in any parallel programming methodology I have come across only synchronises the actual set of processes that calls it. However I/O is not dealt with just by the calling process, but at least one and quite possibly more within the OS which actually the process the I/O request. These are NOT synchronised by your barrier. Thus ordering of I/O is not ensured by a simple barrier. The only standard conforming ways that I can think of to ensure ordering of I/O are
Have 1 process do all the I/O or
Better is to use MPI I/O either directly, or indirectly, via e.g. NetCDF or HDF5
2) Your second call to BLACS_GRIDINIT
call BLACS_GRIDINIT(ctxt, 'c', 3, 2)
creates a context for 3 by 2 process grid, so holding 6 process. If you call it with more than 6 processes, only 6 will be returned with a valid context, for the others ctxt should be treated as an uninitialised value. So for instance if you call it with 8 processes, 6 will return with a valid ctxt, 2 will return with ctxt having no valid value. If these 2 now try to use ctxt anything is possible, and in your case you are getting a seg fault. You do seem to see that this is an issue as later you have
! all processes not belonging to 'ctxt' jump to the end of the program
if (ctxt .lt. 0) goto 1000
but I see nothing in the description of BLACS_GRIDINIT that ensures ctxt will be less than zero for non-participating processes - at https://www.netlib.org/blacs/BLACS/QRef.html#BLACS_GRIDINIT it says
This routine creates a simple NPROW x NPCOL process grid. This process
grid will use the first NPROW x NPCOL processes, and assign them to
the grid in a row- or column-major natural ordering. If these
process-to-grid mappings are unacceptable, BLACS_GRIDINIT's more
complex sister routine BLACS_GRIDMAP must be called instead.
There is no mention of what ctxt will be if the process is not part of the resulting grid - this is the kind of problem I find regularly with the BLACS documentation. Also please don't use goto, for your own sake. You WILL regret it later. Use If ... End If. I can't remember when I last used goto in Fortran, it may well be over 10 years ago.
Finally good luck in using BLACS! In my experience the documentation is often incomplete, and I would suggest only using those calls that are absolutely necessary to use ScaLAPACK and using MPI, which is much, much better defined, for the rest. It would be so much nicer if ScaLAPACK just worked with MPI nowadays.

mpirun is not working with two nodes

I am working in a cluster where each node has 16 processors. My version of Open MPI is
1.5.3. I have written the following simple code in fortran:
program MAIN
implicit none
include 'mpif.h'
integer status(MPI_STATUS_SIZE)
integer ierr,my_rank,size
integer irep, nrep, iex
character*1 task
!Initialize MPI
call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD,my_rank,ierr)
call mpi_comm_size(MPI_COMM_WORLD,size,ierr)
do iex=1,2
if(my_rank.eq.0) then
!Task for the master
nrep = size
do irep=1,nrep-1
task='q'
print *, 'master',iex,task
call mpi_send(task,1,MPI_BYTE,irep,irep+1,
& MPI_COMM_WORLD,ierr)
enddo
else
!Here are the tasks for the slaves
!Receive the task sent by the master node
call mpi_recv(task,1,MPI_BYTE,0,my_rank+1,
& MPI_COMM_WORLD,status,ierr)
print *, 'slaves', my_rank,task
endif
enddo
call mpi_finalize(ierr)
end
then I compile the code with:
/usr/lib64/openmpi/bin/mpif77 -o test2 test2.f
and run it with
/usr/lib64/openmpi/bin/mpirun -np 32 -hostfile nodefile test2
my nodefile looks like this:
node1
node1
...
node2
node2
...
with node1 and node2 repeated 16 times each.
I can compile successfully. When I run it for -np 16 (so just one node) it works
fine: each slave finishes its task and I get the prompt back in the terminal. But when I try -np 32, not all the slaves finish
their work, only 16 of them.
Actually with 32 nodes the program doesn't give me the
prompt back, so that I think the program is stacked somewhere and is waiting for
some task to be perform.
I would like to receive any comment from you as far as I have spent some time in this
trivial problem.
Thanks.
I'm not sure that your nodefile is correct. I'd expect to see lines like this:
node1 slots=16
OpenMPI is pretty well-documented, have you checked out their FAQ ?
did you try mpiexec instead of mpirun?