Using mpi subrouine changes value of variable [duplicate] - fortran

In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000

For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.

Related

Segmentation fault error in Fortran MPI code [duplicate]

In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.

Why do all process ranks become 0 after MPI_IRecv? [duplicate]

In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.

seg fault when sending derived type data with allocatable array in mpi

I'm trying to send a derived type data with allocatable array in mpi ad got a seg fault.
program test_type
use mpi
implicit none
type mytype
real,allocatable::x(:)
integer::a
end type mytype
type(mytype),allocatable::y(:)
type(mytype)::z
integer::n,i,ierr,myid,ntasks,status,request
integer :: datatype, oldtypes(2), blockcounts(2)
integer(KIND=MPI_ADDRESS_KIND) :: offsets(2)
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world,myid,ierr)
call mpi_comm_size(mpi_comm_world,ntasks,ierr)
n=2
allocate(z%x(n))
if(myid==0)then
allocate(y(ntasks-1))
do i=1,ntasks-1
allocate(y(i)%x(n))
enddo
else
call random_number(z%x)
z%a=myid
write(0,*) "z in process", myid, z%x, z%a
endif
call mpi_get_address(z%x,offsets(1),ierr)
call mpi_get_address(z%a,offsets(2),ierr)
offsets=offsets-offsets(1)
oldtypes=(/ mpi_real,mpi_integer /)
blockcounts=(/ n,1 /)
write(0,*) "before commit",myid,offsets,blockcounts,oldtypes
call mpi_type_create_struct(2,blockcounts,offsets,oldtypes,datatype,ierr)
call mpi_type_commit(datatype, ierr)
write(0,*) "after commit",myid,datatype, ierr
if(myid==0) then
do i=1,ntasks-1
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
write(0,*) "received", y(i)%x,y(i)%a
enddo
else
call mpi_isend(z,1,datatype,0,0,mpi_comm_world,request,ierr)
write(0,*) "sent"
write(0,*) myid, z%x, z%a
end if
call mpi_finalize(ierr)
end program
And this is what I got printed out running with 2 processes:
before commit 0 0 -14898056
2 1 13 7
after commit 0 73 0
z in process 1 3.9208680E-07 2.5480442E-02 1
before commit 1 0 -491689432
2 1 13 7
after commit 1 73 0
received 0.0000000E+00 0.0000000E+00 0
forrtl: severe (174): SIGSEGV, segmentation fault occurred
It seems to get negative address offsets. Please help.
Thanks.
There are multiple issues with this code.
Allocatable arrays with most Fortran compilers are like pointers in C/C++: the real object behind the array name is something that holds a pointer to the allocated data. That data is usually allocated on the heap and that could be anywhere in the virtual address space of the process, which explains the negative offset. By the way, negative offsets are perfectly acceptable in MPI datatypes (that's why MPI_ADDRESS_KIND specifies a signed integer kind), so no big problem here.
The bigger problem is that the offsets between dynamically allocated things usually vary with each allocation. You could check that:
ADDR(y(1)%x) - ADDR(y(1)%a)
is completely different than
ADDR(y(i)%x) - ADDR(y(i)%a), for i = 2..ntasks-1
(ADDR here is just a shorhand notation for the object address as returned by MPI_GET_ADDRESS)
Even if it happens the offsets match for some value(s) of i, that is more of a coincidence than a rule.
That leads to the following: the type that you construct using offsets from the z variable cannot be used to send elements of the y array. To solve this, simply remove the allocatable property of mytype%x if that is possible (e.g. if n is known in advance).
Another option that should work well for small values of ntasks is to define as many MPI datatypes as the number of elements of the y array. Then use datatype(i), which is based on the offsets of y(i)%x and y(i)%a, to send y(i).
A more severe issue is the fact that you are using non-blocking MPI operations and never wait for them to complete before accessing the data buffers. This code simply won't work:
do i=1,ntasks-1
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
write(0,*) "received", y(i)%x,y(i)%a
enddo
Calling MPI_IRECV starts an asynchronous receive operation. The operation is probably still in progress by the time the WRITE operator gets executed, therefore completely random data is being accessed (some memory allocators might actually zero the data in debug mode). Either insert a call to MPI_WAIT inbetween the MPI_ISEND and WRITE calls or use the blocking receive MPI_RECV.
A similar problem exists with the use of the non-blocking send call MPI_ISEND. Since you never wait on the completion of the request or test for it, the MPI library is allowed to postpone indefinitely the actual progression of the operation and the send might never actually occur. Again, since there is absolutely no justification for the use of the non-blocking send in your case, replace MPI_ISEND by MPI_SEND.
And last but not least, rank 0 is receiving messages from rank 1 only:
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
^^^
At the same time, all other processes are sending to rank 0. Therefore, your program will only work if run with two MPI processes. You might want to replace the underlined 1 in the receive call with i.

mpi alters a variable it shouldn't [duplicate]

This question already has an answer here:
MPI_Recv overwrites parts of memory it should not access
(1 answer)
Closed 3 years ago.
I have some Fortran code that I'm parallelizing with MPI which is doing truly bizarre things. First, there's a variable nstartg that I broadcast from the boss process to all the workers:
call mpi_bcast(nstartg,1,mpi_integer,0,mpi_comm_world,ierr)
The variable nstartg is never altered again in the program. Later on, I have the boss process send eproc elements of an array edge to the workers:
if (me==0) then
do n=1,ntasks-1
(determine the starting point estart and the number eproc
of values to send)
call mpi_send(edge(estart),eproc,mpi_integer,n,n,mpi_comm_world,ierr)
enddo
endif
with a matching receive statement if me is non-zero. (I've left out some other code for readability; there's a good reason I'm not using scatterv.)
Here's where things get weird: the variable nstartg gets altered to n instead of keeping its actual value. For example, on process 1, after the mpi_recv, nstartg = 1, and on process 2 it's equal to 2, and so forth. Moreover, if I change the code above to
call mpi_send(edge(estart),eproc,mpi_integer,n,n+1234567,mpi_comm_world,ierr)
and change the tag accordingly in the matching call to mpi_recv, then on process 1, nstartg = 1234568; on process 2, nstartg = 1234569, etc.
What on earth is going on? All I've changed is the tag that mpi_send/recv are using to identify the message; provided the tags are unique so that the messages don't get mixed up, this shouldn't change anything, and yet it's altering a totally unrelated variable.
On the boss process, nstartg is unaltered, so I can fix this by broadcasting it again, but that's hardly a real solution. Finally, I should mention that compiling and running this code using electric fence hasn't picked up any buffer overflows, nor did -fbounds-check throw anything at me.
The most probable cause is that you pass an INTEGER scalar as the actual status argument to MPI_RECV when it should be really declared as an array with an implementation-specific size, available as the MPI_STATUS_SIZE constant:
INTEGER, DIMENSION(MPI_STATUS_SIZE) :: status
or
INTEGER status(MPI_STATUS_SIZE)
The message tag is written to one of the status fields by the receive operation (its implementation-specific index is available as the MPI_TAG constant and the field value can be accessed as status(MPI_TAG)) and if your status is simply a scalar INTEGER, then several other local variables would get overwritten. In your case it simply happens so that nstartg falls just above status in the stack.
If you do not care about the receive status, you can pass the special constant MPI_STATUS_IGNORE instead.

MPI_Recv overwrites parts of memory it should not access

In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.