`MPI_ERR_TRUNCATE: message truncated` error [closed] - fortran

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I was facing a problem similar to the one discussed in this topic, I have a MPI code which sum the lines of a vector which has a specific number of lines. I attach the code here.
When I try to compile with one core online mpirun -n 1 ./program I obtain:
500000 sum 125000250000.00000 calculated by root process.
The grand total is: 125000250000.00000
Because I have only one core that compute the sum, it looks OK. But when I try to use multicore mpirun -n 4 ./program I obtain:
please enter the number of numbers to sum:
500000
[federico-C660:9540] *** An error occurred in MPI_Recv
[federico-C660:9540] *** on communicator MPI_COMM_WORLD
[federico-C660:9540] *** MPI_ERR_TRUNCATE: message truncated
[federico-C660:9540] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
sum 7812562500.0000000 calculated by root process.
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 9539 on
node XXXXX1 exiting without calling "finalize".
I also red similar problem for C program here. The same with 2 and 3 processors.
Could someone help me to figure out what is the problem? My guess is that I made a mistake in the MPI_RECV calling related with the "sender".

There were a couple of problems in the code;
The most obvious problem was the syntax error with receive variable, num_rows_to_receive. You have received the rows calculated by the root_process in num_rows_to_received, but used the variable num_rows_to_receive for actually receiving the vector.
CALL mpi_recv (num_rows_to_receive, 1 , mpi_integer,
root_process, mpi_any_tag, mpi_comm_world,
STATUS, ierr)
CALL mpi_recv (vector2, num_rows_to_receive, mpi_real8, root_process,
mpi_any_tag, mpi_comm_world, STATUS, ierr)
This should resolve the error.
The second problem (atleast I could see that on my system) is the MPI_REAL datatype defaults to MPI_REAL4 and the size of the vector gets truncated. So we won't be able to receive the actual summation of all the elements. Changing mpi_real to MPI_REAL8 will fix the summation issue and you can get the exact summation value for all any number of ranks.
~/temp$ mpirun -n 8 ./a.out
please enter the number of numbers to sum:
500000
sum 1953156250.0000000 calculated by root process.
partial sum 5859406250.0000000 returned from process 1
partial sum 9765656250.0000000 returned from process 2
partial sum 17578156250.000000 returned from process 4
partial sum 21484406250.000000 returned from process 5
partial sum 13671906250.000000 returned from process 3
partial sum 25390656250.000000 returned from process 6
partial sum 29296906250.000000 returned from process 7
The grand total is: 125000250000.00000

Related

Is it possible to see MPI_ISEND not to return immediately

I realised recently that during the first iterations of my simulation, I experienced particularly slower steps which boiled down to the following section
do I= 1, hpc_nodes
call mpi_isend(cnt(i), 1, mpi_integer, whoisrootInNode(i), 10000, comm, req(i), mpierr)
enddo
I realised that one of the ith mpi_isend returns after around 2 seconds, while the rest would be a fraction of second. This eventually fades off and all mpi_isend calls will be minimal in cost. However, I am trying to understand why this could happen.
Why is it that sometimes mpi_isend would return after 2 seconds for a certain I within the same rank and the one right after I+1 is 10 times faster - why does it only happen in the beginning , i.e. the code section is within a big time loop and after 2-3 time steps this problem seems to vanish
Is this something related to the following article

Niether 'MPI_Barrier' nor 'BLACS_Barrier' doesn't stop a processors executing its commands

I'm working on ScaLAPACK and trying to get used to BLACS routines which is essential using ScaLAPACK.
I've had some elementary course on MPI, so have some rough idea of MPI_COMM_WORLD stuff, but has no deep understanding on how it works internally and so on.
Anyway, I'm trying following code to say hello using BLACS routine.
program hello_from_BLACS
use MPI
implicit none
integer :: info, nproc, nprow, npcol, &
myid, myrow, mycol, &
ctxt, ctxt_sys, ctxt_all
call BLACS_PINFO(myid, nproc)
! get the internal default context
call BLACS_GET(0, 0, ctxt_sys)
! set up a process grid for the process set
ctxt_all = ctxt_sys
call BLACS_GRIDINIT(ctxt_all, 'c', nproc, 1)
call BLACS_BARRIER(ctxt_all, 'A')
! set up a process grid of size 3*2
ctxt = ctxt_sys
call BLACS_GRIDINIT(ctxt, 'c', 3, 2)
if (myid .eq. 0) then
write(6,*) ' myid myrow mycol nprow npcol'
endif
(**) call BLACS_BARRIER(ctxt_sys, 'A')
! all processes not belonging to 'ctxt' jump to the end of the program
if (ctxt .lt. 0) goto 1000
! get the process coordinates in the grid
call BLACS_GRIDINFO(ctxt, nprow, npcol, myrow, mycol)
write(6,*) 'hello from process', myid, myrow, mycol, nprow, npcol
1000 continue
! return all BLACS contexts
call BLACS_EXIT(0)
stop
end program
and the output with 'mpirun -np 10 ./exe' is like,
hello from process 0 0 0 3 2
hello from process 4 1 1 3 2
hello from process 1 1 0 3 2
myid myrow mycol nprow npcol
hello from process 5 2 1 3 2
hello from process 2 2 0 3 2
hello from process 3 0 1 3 2
Everything seems to work fine except that 'BLACS_BARRIER' line, which I marked (**) in the code's leftside.
I've put that line to make the output like below whose title line always printed at the top of the it.
myid myrow mycol nprow npcol
hello from process 0 0 0 3 2
hello from process 4 1 1 3 2
hello from process 1 1 0 3 2
hello from process 5 2 1 3 2
hello from process 2 2 0 3 2
hello from process 3 0 1 3 2
So the question goes,
I've tried BLACS_BARRIER to 'ctxt_sys', 'ctxt_all', and 'ctxt' but all of them does not make output in which the title line is firstly printed. I've also tried MPI_Barrier(MPI_COMM_WORLD,info), but it didn't work either. Am I using the barriers in the wrong way?
In addition, I got SIGSEGV when I used BLACS_BARRIER to 'ctxt' and used more than 6 processes when executing mpirun. Why SIGSEGV takes place in this case?
Thank you for reading this question.
To answer your 2 questions (in future it is best to give then separate posts)
1) MPI_Barrier, BLACS_Barrier and any barrier in any parallel programming methodology I have come across only synchronises the actual set of processes that calls it. However I/O is not dealt with just by the calling process, but at least one and quite possibly more within the OS which actually the process the I/O request. These are NOT synchronised by your barrier. Thus ordering of I/O is not ensured by a simple barrier. The only standard conforming ways that I can think of to ensure ordering of I/O are
Have 1 process do all the I/O or
Better is to use MPI I/O either directly, or indirectly, via e.g. NetCDF or HDF5
2) Your second call to BLACS_GRIDINIT
call BLACS_GRIDINIT(ctxt, 'c', 3, 2)
creates a context for 3 by 2 process grid, so holding 6 process. If you call it with more than 6 processes, only 6 will be returned with a valid context, for the others ctxt should be treated as an uninitialised value. So for instance if you call it with 8 processes, 6 will return with a valid ctxt, 2 will return with ctxt having no valid value. If these 2 now try to use ctxt anything is possible, and in your case you are getting a seg fault. You do seem to see that this is an issue as later you have
! all processes not belonging to 'ctxt' jump to the end of the program
if (ctxt .lt. 0) goto 1000
but I see nothing in the description of BLACS_GRIDINIT that ensures ctxt will be less than zero for non-participating processes - at https://www.netlib.org/blacs/BLACS/QRef.html#BLACS_GRIDINIT it says
This routine creates a simple NPROW x NPCOL process grid. This process
grid will use the first NPROW x NPCOL processes, and assign them to
the grid in a row- or column-major natural ordering. If these
process-to-grid mappings are unacceptable, BLACS_GRIDINIT's more
complex sister routine BLACS_GRIDMAP must be called instead.
There is no mention of what ctxt will be if the process is not part of the resulting grid - this is the kind of problem I find regularly with the BLACS documentation. Also please don't use goto, for your own sake. You WILL regret it later. Use If ... End If. I can't remember when I last used goto in Fortran, it may well be over 10 years ago.
Finally good luck in using BLACS! In my experience the documentation is often incomplete, and I would suggest only using those calls that are absolutely necessary to use ScaLAPACK and using MPI, which is much, much better defined, for the rest. It would be so much nicer if ScaLAPACK just worked with MPI nowadays.

How to find the origin of MPI message truncated errors?

I am currently having problems with a MPI Application.
I am sporadically receiving MPI errors of the form:
Fatal error in MPI_Allreduce: Message truncated, error stack:
MPI_Allreduce(1339)...............: MPI_Allreduce(sbuf=0x7ffa87ffcb98, rbuf=0x7ffa87ffcba8, count=2, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD) failed
MPIR_Allreduce_impl(1180).........:
MPIR_Allreduce_intra(755).........:
MPIDI_CH3U_Receive_data_found(129): Message from rank 0 and tag 14 truncated; 384 bytes received but buffer size is 16
rank 1 in job 1 l1442_42561 caused collective abort of all ranks
exit status of rank 1: killed by signal 9
However I do not know at where to look. I know that the error is happening in an Allreduce function call however there are multiple ones.
How do I know which function call produces the error? Simple printf debugging does not help as the function could be called a million times before the error occurs the first time.
It might also not occur at all or immediately after the start of the program.
I have been able to track down the origin of the error by calling
MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN)
and then checking the return value of each of the Allreduce functions for not being equal to MPI_SUCCESS. This is a location where an error occurs

Trouble using MPI_BCAST with MPI_CART_CREATE

I am having trouble with MPI_BCAST in Fortran. I create a new communicator using MPI_CART_CREATE (say 'COMM_NEW'). When I broadcast data from root using old communicator (i.e. MPI_COMM_WORLD) it works fine. But, when i use new communicator that i just created it gives the error:
[compute-4-15.local:15298] *** An error occurred in MPI_Bcast
[compute-4-15.local:15298] *** on communicator MPI_COMM_WORLD
[compute-4-15.local:15298] *** MPI_ERR_COMM: invalid communicator
[compute-4-15.local:15298] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
It do get the result from the processors involved in COMM_NEW, and also the above error, think the problem is with other processors which are not included in COMM_NEW, but are present in MPI_COMM_WORLD. Any help will be greatly appreciated. Is it because the number of processors in COMM_NEW is less than total processors. If so how do i broadcast among a set of processors which are less than the total. Thanks.
My sample code is:
!PROGRAM TO BROADCAST THE DATA FROM ROOT TO DEST PROCESSORS
PROGRAM MAIN
IMPLICIT NONE
INCLUDE 'mpif.h'
!____________________________________________________________________________________
!-------------------------------DECLARE VARIABLES------------------------------------
INTEGER :: ERROR, RANK, NPROCS, I
INTEGER :: SOURCE, TAG, COUNT, NDIMS, COMM_NEW
INTEGER :: A(10), DIMS(1)
LOGICAL :: PERIODS(1), REORDER
!____________________________________________________________________________________
!-------------------------------DEFINE VARIABLES-------------------------------------
SOURCE = 0; TAG = 1; COUNT = 10
PERIODS(1) = .FALSE.
REORDER = .FALSE.
NDIMS = 1
DIMS(1) = 6
!____________________________________________________________________________________
!--------------------INITIALIZE MPI, DETERMINE SIZE AND RANK-------------------------
CALL MPI_INIT(ERROR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, NPROCS, ERROR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, RANK, ERROR)
!
CALL MPI_CART_CREATE(MPI_COMM_WORLD, NDIMS, DIMS, PERIODS, REORDER, COMM_NEW, ERROR)
IF(RANK==SOURCE)THEN
DO I=1,10
A(I) = I
END DO
END IF
!____________________________________________________________________________________
!----------------BROADCAST VECTOR A FROM ROOT TO DESTINATIONS------------------------
CALL MPI_BCAST(A,10,MPI_INTEGER,SOURCE,COMM_NEW,ERROR)
!PRINT*, RANK
!WRITE(*, "(10I5)") A
CALL MPI_FINALIZE(ERROR)
END PROGRAM
I think the error you give at the top of your question doesn't match up with the code at the bottom since it's complaining about a Bcast on MPI_COMM_WORLD and you don't actually do one in your code.
Anyway, if you're running with more processes than dimensions, some of the processes won't be included in COMM_NEW. Instead, when the call to MPI_CART_CREATE returns, they'll get MPI_COMM_NULL for COMM_NEW instead of the new communicator with the topology. You just need to do a check to make sure you have a real communicator instead of MPI_COMM_NULL before doing the Bcast (or just have all of the ranks above DIMS(1) not enter the Bcast.
To elaborate on Wesley Bland's answer and to clarify the apparent discrepancy in the error message. When the number of MPI processes in MPI_COMM_WORLD is larger than the number of processes in the created Cartesian grid, some of the processes won't become members of the new Cartesian communicator and will get MPI_COMM_NULL -- the invalid communicator handle -- as a result. Calling a collective communication operation requires a valid inter- or intra-communicator handle. Unlike the allowed usage of MPI_PROC_NULL in point-to-point operations, using the invalid communicator handle in collective calls is erroneous. The last statement is not explicitly written in the MPI standard - instead, the language used is:
If comm is an intracommunicator, then ... If comm is an intercommunicator, then ...
Since MPI_COMM_NULL is neither an intra-, nor an inter-communicator, it doesn't fall in any of the two categories of defined behaviour and hence leads to an error condition.
Since communication errors have to occur in some context (i.e. in a valid communicator), Open MPI substitutes MPI_COMM_WORLD in the call to the error handler and hence the error message says "*** on communicator MPI_COMM_WORLD". This is the relevant code section from ompi/mpi/c/bcast.c, where MPI_Bcast is implemented:
if (ompi_comm_invalid(comm)) {
return OMPI_ERRHANDLER_INVOKE(MPI_COMM_WORLD, MPI_ERR_COMM,
FUNC_NAME);
}
...
if (MPI_IN_PLACE == buffer) {
return OMPI_ERRHANDLER_INVOKE(comm, MPI_ERR_ARG, FUNC_NAME);
}
Your code triggers the error handler inside the first check. In all other error checks comm is used instead (since it is determined to be a valid communicator handle) and the error message will state something like "*** on communicator MPI COMMUNICATOR 5 SPLIT FROM 0".

OpenMPI Reduce using MINLOC

I'm currently working on some MPI code for a graph theory problem in which a number of nodes can each contain an answer and the length of that answer. To get everything back to the master node I'm doing an MPI_Gather for the answers and am attempting to do an MPI_Reduce using the MPI_MINLOC operation to figure out who had the shortest solution. Right now my datatype that stores the length and node ID is defined as (per examples shown on numerous sites like http://www.open-mpi.org/doc/v1.4/man3/MPI_Reduce.3.php):
struct minType
{
float len;
int index;
};
On each node I'm initializing the local copies of this struct in the following manner:
int commRank;
MPI_Comm_rank (MPI_COMM_WORLD, &commRank);
minType solutionLen;
solutionLen.len = 1e37;
solutionLen.index = commRank;
At the end of the execution I have an MPI_Gather call that successfully pulls down all of the solutions (I've printed them out from in memory to verify them), and the call:
MPI_Reduce (&solutionLen, &solutionLen, 1, MPI_FLOAT_INT, MPI_MINLOC, 0, MPI_COMM_WORLD);
It's my understanding that the arguments are supposed to be:
The data source
is the target for the result (only significant on the designated root node)
The number of items sent by each node
The datatype (MPI_FLOAT_INT appears to be defined based on the above link)
The operation (MPI_MINLOC appears to be defined as well)
The root node's ID in the specified comm group
The communications group to wait on.
When my code makes it to the reduce operation I get this error:
[compute-2-19.local:9754] *** An error occurred in MPI_Reduce
[compute-2-19.local:9754] *** on communicator MPI_COMM_WORLD
[compute-2-19.local:9754] *** MPI_ERR_ARG: invalid argument of some other kind
[compute-2-19.local:9754] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 9754 on
node compute-2-19.local exiting improperly. There are two reasons this could occur:
1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
I'll admit to being completely stumped on this. In case it matters I'm compiling using OpenMPI 1.5.3 (built using gcc 4.4) on a Rocks cluster based on CentOS 5.5.
I think you are not allowed to use the same buffer for input and output (first two arguments). The man page says:
When the communicator is an intracommunicator, you can perform a
reduce operation in-place (the output buffer is used as the input
buffer). Use the variable MPI_IN_PLACE as the value of the root
process sendbuf. In this case, the input data is taken at the root
from the receive buffer, where it will be replaced by the output data.