In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.
Related
In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.
In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.
I am trying to use the mpi_f08 module to do halo exchange on a series of rank 4, 5, and 6 arrays. Previously I used subarray types for this, but ended up with so many that ifort couldn't keep track of all of them and started corrupting them when compiling with -ipo.
I am using code along the lines of
call MPI_Isend(Array(1:kthird, ksizex_l, 1:ksizey_l, 1:ksizet_l, 1:size5, 1:size6), size, MPI_Double_Complex, ip_xup, 0 + tag_offset, comm, reqs(1))
call MPI_Irecv(Array(1:kthird, 0, 1:ksizey_l, 1:ksizet_l, 1:size5, 1:size6), size, MPI_Double_Complex, ip_xdn, 0 + tag_offset, comm, reqs(2))
(and then later a call to MPI_WaitAll)
ifort 2017 with Intel MPI 2017 gives the following warning for each such line:
test_mpif08.F90(51): warning #8100: The actual argument is an array section or assumed-shape array, corresponding dummy argument that has either the VOLATILE or ASYNCHRONOUS attribute shall be an assumed-shape array. [ARRAY]
In spite of this, the halo exchange works fine for rank-4 and -5 arrays. However, when it comes to rank-6 arrays, array data goes to and comes from completely the wrong places, with data from the halo on the sending process (which was not in the array segment passed into MPI_Isend) appearing in the bulk of the receiving process (which was not passed into MPI_Irecv).
Using ifort 2018 and Intel MPI 2019 preview gives an additional error (not warning):
test_halo_6_aio.F90(60): warning #8100: The actual argument is an array section or assumed-shape array, corresponding dummy argument that has either the VOLATILE or ASYNCHRONOUS attribute shall be an assumed-shape array. [ARRAY]
call MPI_Isend(Array(1:kthird, ksizex_l, 1:ksizey_l, 1:ksizet_l, 1:size5, 1:size6), size, MPI_Double_Complex, ip_xup, 0 + tag_offset, comm, reqs(1))
-------------------^
test_halo_6_aio.F90(60): error #7505: If an actual argument is an array section with vector subscript and corresponding dummy argument does not have VALUE attribute, it must not have ASYNCHRONOUS / VOLATILE attribute. [BUF]
call MPI_Isend(Array(1:kthird, ksizex_l, 1:ksizey_l, 1:ksizet_l, 1:size5, 1:size6), size, MPI_Double_Complex, ip_xup, 0 + tag_offset, comm, reqs(1))
^
Three interrelated questions:
Is there something incorrect about my syntax in the calls to MPI_Isend and MPI_Irecv that is causing the warnings? How can I fix it so that the warnings are no longer triggered?
Is this warning the cause of the array corruption I'm seeing with rank-6 arrays?
How can I avoid corrupting rank-6 arrays?
I've put a failing example into this gist.
I'm trying to send a derived type data with allocatable array in mpi ad got a seg fault.
program test_type
use mpi
implicit none
type mytype
real,allocatable::x(:)
integer::a
end type mytype
type(mytype),allocatable::y(:)
type(mytype)::z
integer::n,i,ierr,myid,ntasks,status,request
integer :: datatype, oldtypes(2), blockcounts(2)
integer(KIND=MPI_ADDRESS_KIND) :: offsets(2)
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world,myid,ierr)
call mpi_comm_size(mpi_comm_world,ntasks,ierr)
n=2
allocate(z%x(n))
if(myid==0)then
allocate(y(ntasks-1))
do i=1,ntasks-1
allocate(y(i)%x(n))
enddo
else
call random_number(z%x)
z%a=myid
write(0,*) "z in process", myid, z%x, z%a
endif
call mpi_get_address(z%x,offsets(1),ierr)
call mpi_get_address(z%a,offsets(2),ierr)
offsets=offsets-offsets(1)
oldtypes=(/ mpi_real,mpi_integer /)
blockcounts=(/ n,1 /)
write(0,*) "before commit",myid,offsets,blockcounts,oldtypes
call mpi_type_create_struct(2,blockcounts,offsets,oldtypes,datatype,ierr)
call mpi_type_commit(datatype, ierr)
write(0,*) "after commit",myid,datatype, ierr
if(myid==0) then
do i=1,ntasks-1
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
write(0,*) "received", y(i)%x,y(i)%a
enddo
else
call mpi_isend(z,1,datatype,0,0,mpi_comm_world,request,ierr)
write(0,*) "sent"
write(0,*) myid, z%x, z%a
end if
call mpi_finalize(ierr)
end program
And this is what I got printed out running with 2 processes:
before commit 0 0 -14898056
2 1 13 7
after commit 0 73 0
z in process 1 3.9208680E-07 2.5480442E-02 1
before commit 1 0 -491689432
2 1 13 7
after commit 1 73 0
received 0.0000000E+00 0.0000000E+00 0
forrtl: severe (174): SIGSEGV, segmentation fault occurred
It seems to get negative address offsets. Please help.
Thanks.
There are multiple issues with this code.
Allocatable arrays with most Fortran compilers are like pointers in C/C++: the real object behind the array name is something that holds a pointer to the allocated data. That data is usually allocated on the heap and that could be anywhere in the virtual address space of the process, which explains the negative offset. By the way, negative offsets are perfectly acceptable in MPI datatypes (that's why MPI_ADDRESS_KIND specifies a signed integer kind), so no big problem here.
The bigger problem is that the offsets between dynamically allocated things usually vary with each allocation. You could check that:
ADDR(y(1)%x) - ADDR(y(1)%a)
is completely different than
ADDR(y(i)%x) - ADDR(y(i)%a), for i = 2..ntasks-1
(ADDR here is just a shorhand notation for the object address as returned by MPI_GET_ADDRESS)
Even if it happens the offsets match for some value(s) of i, that is more of a coincidence than a rule.
That leads to the following: the type that you construct using offsets from the z variable cannot be used to send elements of the y array. To solve this, simply remove the allocatable property of mytype%x if that is possible (e.g. if n is known in advance).
Another option that should work well for small values of ntasks is to define as many MPI datatypes as the number of elements of the y array. Then use datatype(i), which is based on the offsets of y(i)%x and y(i)%a, to send y(i).
A more severe issue is the fact that you are using non-blocking MPI operations and never wait for them to complete before accessing the data buffers. This code simply won't work:
do i=1,ntasks-1
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
write(0,*) "received", y(i)%x,y(i)%a
enddo
Calling MPI_IRECV starts an asynchronous receive operation. The operation is probably still in progress by the time the WRITE operator gets executed, therefore completely random data is being accessed (some memory allocators might actually zero the data in debug mode). Either insert a call to MPI_WAIT inbetween the MPI_ISEND and WRITE calls or use the blocking receive MPI_RECV.
A similar problem exists with the use of the non-blocking send call MPI_ISEND. Since you never wait on the completion of the request or test for it, the MPI library is allowed to postpone indefinitely the actual progression of the operation and the send might never actually occur. Again, since there is absolutely no justification for the use of the non-blocking send in your case, replace MPI_ISEND by MPI_SEND.
And last but not least, rank 0 is receiving messages from rank 1 only:
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
^^^
At the same time, all other processes are sending to rank 0. Therefore, your program will only work if run with two MPI processes. You might want to replace the underlined 1 in the receive call with i.
In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.