Error with mpi_comm_split in fortran - fortran

I have some questions on mpi_comm_split in Fortran.
Question I)
How can I create a single one communicator with mpi_comm_split? For example, I want to create a communicator based only on processors which are on the top of my domain (Cartesian). I know that I have to use MPI_UNDEFINED for process that I don't want to be part of my new communicator, but my code below didn't make want I expect.
do k=1,size(proc_up)
if(rank==proc_up(k)) then
color_up=1
else
color_up=MPI_UNDEFINED
end if
call MPI_COMM_SPLIT(comm2d ,color_up ,coords(2) ,comm_up ,code)
Why it didn't work?
Question II)
When I want to make several MPI_COMM_SPLIT (new comm for up, down, side1, side2), it returns an error:
[nin:30039] *** An error occurred in MPI_Comm_split
[nin:30039] *** on communicator MPI_COMM_WORLD
[nin:30039] *** MPI_ERR_ARG: invalid argument of some other kind
[nin:30039] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
Does anyone know why?
QUESTION III)
I can also use MPI_Cart_sub, but it will returns me many groups of process. How to be sure to use only the group I just want?

Related

Seg fault in fortran MPI_COMM_CREATE_GROUP if using a group not directly created from MPI_COMM_WORLD

I'm having a segmentation fault that I can not really understand in a simple code, that just:
calls the MPI_INIT
duplicates the global communicator, via MPI_COMM_DUP
creates a group with half of processes of the global communicator, via MPI_COMM_GROUP
finally from this group creates a new communicator via MPI_COMM_CREATE_GROUP
Specifically I use this last call, instead of just using MPI_COMM_CREATE, because it's only collective over the group of processes contained in group, while MPI_COMM_CREATE is collective over every process in COMM.
The code is the following
program mpi_comm_create_grp
use mpi
IMPLICIT NONE
INTEGER :: mpi_size, mpi_err_code
INTEGER :: my_comm_dup, mpi_new_comm, mpi_group_world, mpi_new_group
INTEGER :: rank_index
INTEGER, DIMENSION(:), ALLOCATABLE :: rank_vec
CALL mpi_init(mpi_err_code)
CALL mpi_comm_size(mpi_comm_world, mpi_size, mpi_err_code)
!! allocate and fill the vector for the new group
allocate(rank_vec(mpi_size/2))
rank_vec(:) = (/ (rank_index , rank_index=0, mpi_size/2) /)
!! create the group directly from the comm_world: this way works
! CALL mpi_comm_group(mpi_comm_world, mpi_group_world, mpi_err_code)
!! duplicating the comm_world creating the group form the dup: this ways fails
CALL mpi_comm_dup(mpi_comm_world, my_comm_dup, mpi_err_code)
!! creatig the group of all processes from the duplicated comm_world
CALL mpi_comm_group(my_comm_dup, mpi_group_world, mpi_err_code)
!! create a new group with just half of processes in comm_world
CALL mpi_group_incl(mpi_group_world, mpi_size/2, rank_vec,mpi_new_group, mpi_err_code)
!! create a new comm from the comm_world using the new group created
CALL mpi_comm_create_group(mpi_comm_world, mpi_new_group, 0, mpi_new_comm, mpi_err_code)
!! deallocate and finalize mpi
if(ALLOCATED(rank_vec)) DEALLOCATE(rank_vec)
CALL mpi_finalize(mpi_err_code)
end program !mpi_comm_create_grp
If instead of duplicating the COMM_WORLD, I directly create the group from the global communicator (commented line), everything works just fine.
The parallel debugger I'm using traces back the seg fault to a call to MPI_GROUP_TRANSLATE_RANKS, but, as far as I know, the MPI_COMM_DUP duplicates all the attributes of the copied communicator, ranks numbering included.
I am using the ifort version 18.0.5, but I also tried with the 17.0.4, and 19.0.2 with no better results.
Well the thing is a little tricky, at least for me, but after some tests and help, the root of the problem was found.
In the code
CALL mpi_comm_create_group(mpi_comm_world, mpi_new_group, 0, mpi_new_comm, mpi_err_code)
Creates a new communicator for the group mpi_new_group, previously
created. However the mpi_comm_world, which is used as first argument, is not in the same context as mpi_new_group, as explained in the mpich reference:
MPI_COMM_DUP will create a new communicator over the same group as
comm but with a new context
So the correct call would be:
CALL mpi_comm_create_group(my_comm_copy, mpi_new_group, 0, mpi_new_comm, mpi_err_code)
I.e. , replacing the mpi_comm_world for my_comm_copy, that is the one from which the mpi_group_world was created.
I am still not sure why it is working with OpenMPI, but it is generally more tolerant
with this sort of things.
Like suggested in the comments I wrote to openmpi user list, and they replied
That is perfectly valid. The MPI processes that make up the group are all part of comm world. I would file a bug with Intel MPI.
So I try and post a question on Intel forum.
It is a bug they solved in the last version of the libray, 19.3.

MPI reduction on user defined communicator

Currently I am working on an MPI code in Fortran. After using mpi_cart_create then mpi_group_excl to create a new group with half of the nodes in it, I am trying to perform a reduction using this communicator but I am obviously doing something wrong.
With the code
call MPI_cart_create(MPI_comm_world, 2, dims, (/.false.,.false./), reorder, comm_cart, ierr)
if (ierr/=0) stop 'Error with MPI_cart_create'
call MPI_group_excl(group_world, dims(2), excl_a, division_comm_a, ierr)
if (ierr/=0) stop 'Error with MPI_group_excl - division_comm_a'
call MPI_group_excl(group_world, dims(2), excl_b, division_comm_b, ierr)
if (ierr/=0) stop 'Error with MPI_group_excl - division_comm_b'
if (div_a_rank .gt. 0) then
call MPI_reduce(division_a(1), division_a(1), L_outer_y, MPI_DOUBLE_PRECISION, MPI_SUM, &
& 0, division_comm_a, ierr)
if (ierr/=0) stop 'Error with MPI_reduce on division_comm_a'
end if
the error I am getting is :
*** An error occurred in MPI_Reduce
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_ARG: invalid argument of some other kind
*** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
Following an answer I have used MPI_comm_create_group, however I am still getting
*** An error occurred in MPI_Reduce
*** reported by process [140046521663489,140045998620672]
*** on communicator MPI_COMM_WORLD
*** MPI_ERR_ARG: invalid argument of some other kind
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
The problem is that you're mixing groups and communicators. In MPI, a group is just a logical collection of processes. It can't be used to communicate.
If you want to create a new communicator from your new group, you should use the function MPI_COMM_CREATE_GROUP. You can pass your new group into that function to create a new communicator that you can use for your reduction.

How to find the origin of MPI message truncated errors?

I am currently having problems with a MPI Application.
I am sporadically receiving MPI errors of the form:
Fatal error in MPI_Allreduce: Message truncated, error stack:
MPI_Allreduce(1339)...............: MPI_Allreduce(sbuf=0x7ffa87ffcb98, rbuf=0x7ffa87ffcba8, count=2, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD) failed
MPIR_Allreduce_impl(1180).........:
MPIR_Allreduce_intra(755).........:
MPIDI_CH3U_Receive_data_found(129): Message from rank 0 and tag 14 truncated; 384 bytes received but buffer size is 16
rank 1 in job 1 l1442_42561 caused collective abort of all ranks
exit status of rank 1: killed by signal 9
However I do not know at where to look. I know that the error is happening in an Allreduce function call however there are multiple ones.
How do I know which function call produces the error? Simple printf debugging does not help as the function could be called a million times before the error occurs the first time.
It might also not occur at all or immediately after the start of the program.
I have been able to track down the origin of the error by calling
MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN)
and then checking the return value of each of the Allreduce functions for not being equal to MPI_SUCCESS. This is a location where an error occurs

Trouble using MPI_BCAST with MPI_CART_CREATE

I am having trouble with MPI_BCAST in Fortran. I create a new communicator using MPI_CART_CREATE (say 'COMM_NEW'). When I broadcast data from root using old communicator (i.e. MPI_COMM_WORLD) it works fine. But, when i use new communicator that i just created it gives the error:
[compute-4-15.local:15298] *** An error occurred in MPI_Bcast
[compute-4-15.local:15298] *** on communicator MPI_COMM_WORLD
[compute-4-15.local:15298] *** MPI_ERR_COMM: invalid communicator
[compute-4-15.local:15298] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
It do get the result from the processors involved in COMM_NEW, and also the above error, think the problem is with other processors which are not included in COMM_NEW, but are present in MPI_COMM_WORLD. Any help will be greatly appreciated. Is it because the number of processors in COMM_NEW is less than total processors. If so how do i broadcast among a set of processors which are less than the total. Thanks.
My sample code is:
!PROGRAM TO BROADCAST THE DATA FROM ROOT TO DEST PROCESSORS
PROGRAM MAIN
IMPLICIT NONE
INCLUDE 'mpif.h'
!____________________________________________________________________________________
!-------------------------------DECLARE VARIABLES------------------------------------
INTEGER :: ERROR, RANK, NPROCS, I
INTEGER :: SOURCE, TAG, COUNT, NDIMS, COMM_NEW
INTEGER :: A(10), DIMS(1)
LOGICAL :: PERIODS(1), REORDER
!____________________________________________________________________________________
!-------------------------------DEFINE VARIABLES-------------------------------------
SOURCE = 0; TAG = 1; COUNT = 10
PERIODS(1) = .FALSE.
REORDER = .FALSE.
NDIMS = 1
DIMS(1) = 6
!____________________________________________________________________________________
!--------------------INITIALIZE MPI, DETERMINE SIZE AND RANK-------------------------
CALL MPI_INIT(ERROR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, NPROCS, ERROR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, RANK, ERROR)
!
CALL MPI_CART_CREATE(MPI_COMM_WORLD, NDIMS, DIMS, PERIODS, REORDER, COMM_NEW, ERROR)
IF(RANK==SOURCE)THEN
DO I=1,10
A(I) = I
END DO
END IF
!____________________________________________________________________________________
!----------------BROADCAST VECTOR A FROM ROOT TO DESTINATIONS------------------------
CALL MPI_BCAST(A,10,MPI_INTEGER,SOURCE,COMM_NEW,ERROR)
!PRINT*, RANK
!WRITE(*, "(10I5)") A
CALL MPI_FINALIZE(ERROR)
END PROGRAM
I think the error you give at the top of your question doesn't match up with the code at the bottom since it's complaining about a Bcast on MPI_COMM_WORLD and you don't actually do one in your code.
Anyway, if you're running with more processes than dimensions, some of the processes won't be included in COMM_NEW. Instead, when the call to MPI_CART_CREATE returns, they'll get MPI_COMM_NULL for COMM_NEW instead of the new communicator with the topology. You just need to do a check to make sure you have a real communicator instead of MPI_COMM_NULL before doing the Bcast (or just have all of the ranks above DIMS(1) not enter the Bcast.
To elaborate on Wesley Bland's answer and to clarify the apparent discrepancy in the error message. When the number of MPI processes in MPI_COMM_WORLD is larger than the number of processes in the created Cartesian grid, some of the processes won't become members of the new Cartesian communicator and will get MPI_COMM_NULL -- the invalid communicator handle -- as a result. Calling a collective communication operation requires a valid inter- or intra-communicator handle. Unlike the allowed usage of MPI_PROC_NULL in point-to-point operations, using the invalid communicator handle in collective calls is erroneous. The last statement is not explicitly written in the MPI standard - instead, the language used is:
If comm is an intracommunicator, then ... If comm is an intercommunicator, then ...
Since MPI_COMM_NULL is neither an intra-, nor an inter-communicator, it doesn't fall in any of the two categories of defined behaviour and hence leads to an error condition.
Since communication errors have to occur in some context (i.e. in a valid communicator), Open MPI substitutes MPI_COMM_WORLD in the call to the error handler and hence the error message says "*** on communicator MPI_COMM_WORLD". This is the relevant code section from ompi/mpi/c/bcast.c, where MPI_Bcast is implemented:
if (ompi_comm_invalid(comm)) {
return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_COMM,
FUNC_NAME);
}
...
if (MPI_IN_PLACE == buffer) {
return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_ARG, FUNC_NAME);
}
Your code triggers the error handler inside the first check. In all other error checks comm is used instead (since it is determined to be a valid communicator handle) and the error message will state something like "*** on communicator MPI COMMUNICATOR 5 SPLIT FROM 0".

OpenMPI Reduce using MINLOC

I'm currently working on some MPI code for a graph theory problem in which a number of nodes can each contain an answer and the length of that answer. To get everything back to the master node I'm doing an MPI_Gather for the answers and am attempting to do an MPI_Reduce using the MPI_MINLOC operation to figure out who had the shortest solution. Right now my datatype that stores the length and node ID is defined as (per examples shown on numerous sites like http://www.open-mpi.org/doc/v1.4/man3/MPI_Reduce.3.php):
struct minType
{
float len;
int index;
};
On each node I'm initializing the local copies of this struct in the following manner:
int commRank;
MPI_Comm_rank (MPI_COMM_WORLD, &commRank);
minType solutionLen;
solutionLen.len = 1e37;
solutionLen.index = commRank;
At the end of the execution I have an MPI_Gather call that successfully pulls down all of the solutions (I've printed them out from in memory to verify them), and the call:
MPI_Reduce (&solutionLen, &solutionLen, 1, MPI_FLOAT_INT, MPI_MINLOC, 0, MPI_COMM_WORLD);
It's my understanding that the arguments are supposed to be:
The data source
is the target for the result (only significant on the designated root node)
The number of items sent by each node
The datatype (MPI_FLOAT_INT appears to be defined based on the above link)
The operation (MPI_MINLOC appears to be defined as well)
The root node's ID in the specified comm group
The communications group to wait on.
When my code makes it to the reduce operation I get this error:
[compute-2-19.local:9754] *** An error occurred in MPI_Reduce
[compute-2-19.local:9754] *** on communicator MPI_COMM_WORLD
[compute-2-19.local:9754] *** MPI_ERR_ARG: invalid argument of some other kind
[compute-2-19.local:9754] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 9754 on
node compute-2-19.local exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
I'll admit to being completely stumped on this. In case it matters I'm compiling using OpenMPI 1.5.3 (built using gcc 4.4) on a Rocks cluster based on CentOS 5.5.
I think you are not allowed to use the same buffer for input and output (first two arguments). The man page says:
When the communicator is an intracommunicator, you can perform a
reduce operation in-place (the output buffer is used as the input
buffer). Use the variable MPI_IN_PLACE as the value of the root
process sendbuf. In this case, the input data is taken at the root
from the receive buffer, where it will be replaced by the output data.