Error when making a fortran MPI_Allgather code - fortran

I have a Fortran code that compiles okay, but returns error when executing this MPI_Allgather routine
call MPI_Allgather(rank, 1, MPI_INTEGER,
allranks(0:np-1), np, MPI_INTEGER, MPI_COMM_WORLD, erro)
rank is an integer variable, allranks is an integer array with np positions labeled from 0 to np-1
The error is
malloc.c:4630: _int_malloc: Assertion `(unsigned long)(size) >= (unsigned long)(nb)' failed.
Does anyone have an idea of the cause of the error? If so, how I can solve this?

The 5th argument states the number of elements to receive from any process. That is in your case this should be 1. That is recvcount should state how many entries you expect from each process. The MPI standard states:
The type signature associated with sendcount, sendtype at a process must be equal to the type signature associated with recvcount, recvtype at any other process.

Related

Segmentation fault error in Fortran MPI code [duplicate]

In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.

Using MPI_Isend with mpi_f08 subarrays for high-dimensional arrays in ifort corrupts data

I am trying to use the mpi_f08 module to do halo exchange on a series of rank 4, 5, and 6 arrays. Previously I used subarray types for this, but ended up with so many that ifort couldn't keep track of all of them and started corrupting them when compiling with -ipo.
I am using code along the lines of
call MPI_Isend(Array(1:kthird, ksizex_l, 1:ksizey_l, 1:ksizet_l, 1:size5, 1:size6), size, MPI_Double_Complex, ip_xup, 0 + tag_offset, comm, reqs(1))
call MPI_Irecv(Array(1:kthird, 0, 1:ksizey_l, 1:ksizet_l, 1:size5, 1:size6), size, MPI_Double_Complex, ip_xdn, 0 + tag_offset, comm, reqs(2))
(and then later a call to MPI_WaitAll)
ifort 2017 with Intel MPI 2017 gives the following warning for each such line:
test_mpif08.F90(51): warning #8100: The actual argument is an array section or assumed-shape array, corresponding dummy argument that has either the VOLATILE or ASYNCHRONOUS attribute shall be an assumed-shape array. [ARRAY]
In spite of this, the halo exchange works fine for rank-4 and -5 arrays. However, when it comes to rank-6 arrays, array data goes to and comes from completely the wrong places, with data from the halo on the sending process (which was not in the array segment passed into MPI_Isend) appearing in the bulk of the receiving process (which was not passed into MPI_Irecv).
Using ifort 2018 and Intel MPI 2019 preview gives an additional error (not warning):
test_halo_6_aio.F90(60): warning #8100: The actual argument is an array section or assumed-shape array, corresponding dummy argument that has either the VOLATILE or ASYNCHRONOUS attribute shall be an assumed-shape array. [ARRAY]
call MPI_Isend(Array(1:kthird, ksizex_l, 1:ksizey_l, 1:ksizet_l, 1:size5, 1:size6), size, MPI_Double_Complex, ip_xup, 0 + tag_offset, comm, reqs(1))
-------------------^
test_halo_6_aio.F90(60): error #7505: If an actual argument is an array section with vector subscript and corresponding dummy argument does not have VALUE attribute, it must not have ASYNCHRONOUS / VOLATILE attribute. [BUF]
call MPI_Isend(Array(1:kthird, ksizex_l, 1:ksizey_l, 1:ksizet_l, 1:size5, 1:size6), size, MPI_Double_Complex, ip_xup, 0 + tag_offset, comm, reqs(1))
^
Three interrelated questions:
Is there something incorrect about my syntax in the calls to MPI_Isend and MPI_Irecv that is causing the warnings? How can I fix it so that the warnings are no longer triggered?
Is this warning the cause of the array corruption I'm seeing with rank-6 arrays?
How can I avoid corrupting rank-6 arrays?
I've put a failing example into this gist.

seg fault when sending derived type data with allocatable array in mpi

I'm trying to send a derived type data with allocatable array in mpi ad got a seg fault.
program test_type
use mpi
implicit none
type mytype
real,allocatable::x(:)
integer::a
end type mytype
type(mytype),allocatable::y(:)
type(mytype)::z
integer::n,i,ierr,myid,ntasks,status,request
integer :: datatype, oldtypes(2), blockcounts(2)
integer(KIND=MPI_ADDRESS_KIND) :: offsets(2)
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world,myid,ierr)
call mpi_comm_size(mpi_comm_world,ntasks,ierr)
n=2
allocate(z%x(n))
if(myid==0)then
allocate(y(ntasks-1))
do i=1,ntasks-1
allocate(y(i)%x(n))
enddo
else
call random_number(z%x)
z%a=myid
write(0,*) "z in process", myid, z%x, z%a
endif
call mpi_get_address(z%x,offsets(1),ierr)
call mpi_get_address(z%a,offsets(2),ierr)
offsets=offsets-offsets(1)
oldtypes=(/ mpi_real,mpi_integer /)
blockcounts=(/ n,1 /)
write(0,*) "before commit",myid,offsets,blockcounts,oldtypes
call mpi_type_create_struct(2,blockcounts,offsets,oldtypes,datatype,ierr)
call mpi_type_commit(datatype, ierr)
write(0,*) "after commit",myid,datatype, ierr
if(myid==0) then
do i=1,ntasks-1
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
write(0,*) "received", y(i)%x,y(i)%a
enddo
else
call mpi_isend(z,1,datatype,0,0,mpi_comm_world,request,ierr)
write(0,*) "sent"
write(0,*) myid, z%x, z%a
end if
call mpi_finalize(ierr)
end program
And this is what I got printed out running with 2 processes:
before commit 0 0 -14898056
2 1 13 7
after commit 0 73 0
z in process 1 3.9208680E-07 2.5480442E-02 1
before commit 1 0 -491689432
2 1 13 7
after commit 1 73 0
received 0.0000000E+00 0.0000000E+00 0
forrtl: severe (174): SIGSEGV, segmentation fault occurred
It seems to get negative address offsets. Please help.
Thanks.
There are multiple issues with this code.
Allocatable arrays with most Fortran compilers are like pointers in C/C++: the real object behind the array name is something that holds a pointer to the allocated data. That data is usually allocated on the heap and that could be anywhere in the virtual address space of the process, which explains the negative offset. By the way, negative offsets are perfectly acceptable in MPI datatypes (that's why MPI_ADDRESS_KIND specifies a signed integer kind), so no big problem here.
The bigger problem is that the offsets between dynamically allocated things usually vary with each allocation. You could check that:
ADDR(y(1)%x) - ADDR(y(1)%a)
is completely different than
ADDR(y(i)%x) - ADDR(y(i)%a), for i = 2..ntasks-1
(ADDR here is just a shorhand notation for the object address as returned by MPI_GET_ADDRESS)
Even if it happens the offsets match for some value(s) of i, that is more of a coincidence than a rule.
That leads to the following: the type that you construct using offsets from the z variable cannot be used to send elements of the y array. To solve this, simply remove the allocatable property of mytype%x if that is possible (e.g. if n is known in advance).
Another option that should work well for small values of ntasks is to define as many MPI datatypes as the number of elements of the y array. Then use datatype(i), which is based on the offsets of y(i)%x and y(i)%a, to send y(i).
A more severe issue is the fact that you are using non-blocking MPI operations and never wait for them to complete before accessing the data buffers. This code simply won't work:
do i=1,ntasks-1
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
write(0,*) "received", y(i)%x,y(i)%a
enddo
Calling MPI_IRECV starts an asynchronous receive operation. The operation is probably still in progress by the time the WRITE operator gets executed, therefore completely random data is being accessed (some memory allocators might actually zero the data in debug mode). Either insert a call to MPI_WAIT inbetween the MPI_ISEND and WRITE calls or use the blocking receive MPI_RECV.
A similar problem exists with the use of the non-blocking send call MPI_ISEND. Since you never wait on the completion of the request or test for it, the MPI library is allowed to postpone indefinitely the actual progression of the operation and the send might never actually occur. Again, since there is absolutely no justification for the use of the non-blocking send in your case, replace MPI_ISEND by MPI_SEND.
And last but not least, rank 0 is receiving messages from rank 1 only:
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
^^^
At the same time, all other processes are sending to rank 0. Therefore, your program will only work if run with two MPI processes. You might want to replace the underlined 1 in the receive call with i.

How do I choose my own name for the MPI communicator?

The name for the default communicator is MPI_COMM_WORLD. I want it to be mpicomm in my program. How do I set the communicator to the name I want?
Note, that my program is already working using this variable name, 'mpicomm', but I never actually told the program that that should be the name of the communicator. I guess the value of MPI_COMM_WORLD is 0 and so is 'mpicomm' when I run the program. But I don't want this to be working due to a fluke.
For example:
program main
use mpi
implicit none
integer :: mpierr, mpicomm, rank
call MPI_Init(mpierr)
call MPI_Comm_rank(mpicomm,rank,mpierr)
call MPI_Finalize(mpierr)
end program main
This works, and rank comes out to the correct value; however, I don't think this is going to work if MPI_COMM_WORLD happens to be some value besides zero.
I don't think you can rely on mpi_comm_world being 0; I just checked an mpif.h file I have lying around and in that the value is 91. I think you got lucky passing an uninitialised variable (your mpicomm) as an input argument in the call to mpi_comm_rank.
Since mpi_comm_world is, as far as Fortran is concerned, just an integer, why not insert the line
mpicomm = mpi_comm_world
before you first use mpicomm ? You could probably even declare it like this
integer, parameter :: mpicomm = mpi_comm_world

MPI_Recv overwrites parts of memory it should not access

In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.