My question is probably stupid but still I'm gonna ask it to be sure !
Question : Do you expect the two codes below to work the same using MPI_Comm_Split to build 1 sub-communicator ? (for example, let's say I'm running the code with 6 procs with a rank between 0 and 5)
NB : The code is in fortran 90 with intel compiler 2019 and I use Mpich for the Mpi.
CODE 1
call Mpi_Init(ierror)
call Mpi_Comm_Rank(mpi_comm_world,rank,ierror)
if (rank > 2) then
call Mpi_Comm_Split(mpi_comm_world,0,rank,new_comm,ierror)
else
call Mpi_Comm_Split(mpi_comm_world,mpi_undefined,rank,new_comm,ierror)
endif
CODE 2
call Mpi_Init(ierror)
call Mpi_Comm_Rank(mpi_comm_world,rank,ierror)
if (rank > 2) then
color = 0
else
color = mpi_undefined
endif
call Mpi_Comm_Split(mpi_comm_world,color,rank,new_comm,ierror)
The Mpi_Comm_Split is not called the same way in the 2 codes but to me, it should behave the same but I'm not sure... I read that Mpi_Comm_Split has to be invoked at the same line but how procs can know that the call of Mpi_Comm_Split is done at a one line or another one (it doesn't make anysense to me) ?!
NB : With Mpich and intel fortran, I tested it and both implentation of the communicator splitting works, but I'm afraid of the behavior of different Mpi compilers...
Assuming you declared color correctly, both codes are equivalent.
MPI_Comm_split() is a collective operation, and hence must be invoked by all the ranks of the parent communicator. That does not mandate the call has to be performed by the same line of code.
Related
In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.
In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.
In the following code the value of xysize gets changed, if I do not declare it as a parameter (which I generally cannot do). It happens only with optimizations -O2 and more in gfortran 4.7.2 and OpenMPI 1.6. How is it possible? I cannot find the exact interface, that I import from mpi.mod, but the C prototype clearly states that count is passed by value, hence it cannot change.
write(*,*) im,"receiving from",image_index([iim,jim,kim+1]),"size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
call MPI_RECV(D%A(D%starti:D%endi,D%startj:D%endj,D%endk+1),xysize , MPI_REAL, image_index([iim,jim,kim+1])-1,&
5000, comm, status, ierr)
write(*,*) im,"received size",&
size(D%A(D%starti:D%endi,D%startj:D%endj,D%endk)),xysize
output:
1 receiving from 2 size 4096 4096
1 received size 4096 5000
For the sake of future visitors, I suppose I'll answer this even though it's all answered in the comments above.
As far as I'm aware, if your program is behaving properly, you cannot change the value of that parameter ("count") in a call to MPI_Recv.
Your argument status is too small, it should be an array status(MPI_STATUS_SIZE), and you're getting a buffer overflow -- This often results in a segmentation fault, but at times (depending on how the compiler packed the variables in memory), it can result in funny behavior like this.
I have a parallel fortran code in which I want only the rank=0 process to be able to write to stdout, but I don't want to have to litter the code with:
if(rank==0) write(*,*) ...
so I was wondering if doing something like the following would be a good idea, or whether there is a better way?
program test
use mpi
implicit none
integer :: ierr
integer :: nproc
integer :: rank
integer :: stdout
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, rank, ierr)
call mpi_comm_size(mpi_comm_world, nproc, ierr)
select case(rank)
case(0)
stdout = 6
case default
stdout = 7
open(unit=stdout, file='/dev/null')
end select
write(stdout,*) "Hello from rank=", rank
call mpi_finalize(ierr)
end program test
This gives:
$ mpirun -n 10 ./a.out
Hello from rank= 0
Thanks for any advice!
There are two disadvantages to your solution:
This "clever" solution actually obscures the code, since it lies: stdout isn't stdout any more. If someone reads the code he/she will think that all processes are writing to stdout, while in reality they aren't.
If you want all processes to write to stdout at some point, what will you do then? Add more tricks?
If you really want to stick with this trick, please don't use "stdout" as a variable for the unit number, but e.g. "master" or anything that indicates you're not actually writing to stdout. Furthermore, you should be aware that the number 6 isn't always stdout. Fortran 2003 allows you to check the unit number of stdout, so you should use that if you can.
My advice would be to stay with the if(rank==0) statements. They are clearly indicating what happens in the code. If you use lots of similar i/o statements, you could write subroutines for writing only for rank 0 or for all processes. These can have meaningful names that indicate the intended usage.
mpirun comes with the option to redirect stdout from each process into separate files. For example, -output-filename out would result in out.1.0, out.1.1, ... which you then can monitor using whatever way you like (I use tail -f). Next to if(rank.eq.0) this is the cleanest solution I think.
I am not so concerned with the two disadvantages mentioned by steabert. We can work that out by introducing another file descriptor that clearly indicates that it is stdout only on master process, e.g. stdout -> stdout0.
But my concern is here: The /dev/null will work in UNIX-like environment. Will it work on Windows environment? How about the funky BlueGene systems?
Hi
I am trying to use fortran structure like this
type some
u ! actual code will have 17 such scalars
end type some
TYPE(some),ALLOCATABLE,DIMENSION(:) :: metvars,newmetvars
Now the aim of my test program is to send 10 numbers from one processor to another but the starting point of these 10 numbers would be my choice (example if i have an vector of say 20 numbers not necesary i will take the first 10 numbers to the next processor but lets say my choice is from 5 to 15). So first u use mpi_type_contiguous like this
CALL MPI_TYPE_CONTIGUOUS(10,MPI_REAL,MPI_METVARS,ierr) ! declaring a derived datatype of the object to make it in to contiguous memory
CALL MPI_TYPE_COMMIT(MPI_METVARS,ierr)
I do the send rec and was able to get the first 10 numbers to the other processor (I am testing it for 2 processors)
if(rank.EQ.0)then
do k= 2,nz-1
metvars(k)%u = k
un(k)=k
enddo
endif
I am sending this
now
for the second part i used mpi_TYPE_CREATE_SUBARRAY so
then
array_size = (/20/)
array_subsize =(/10/)
array_start = (/5/)
CALL MPI_TYPE_CREATE_SUBARRAY(1,array_size,array_subsize,array_start,MPI_ORDER_FORTRAN,MPI_METVARS,newtype,ierr)
CALL MPI_TYPE_COMMIT(newtype,ierr)
array_size = (/20/)
array_subsize =(/10/)
array_start = (/0/)
CALL MPI_TYPE_CREATE_SUBARRAY(1,array_size,array_subsize,array_start,MPI_ORDER_FORTRAN,MPI_METVARS,newtype2,ierr)
CALL MPI_TYPE_COMMIT(newtype2,ierr)
if(rank .EQ. 0)then
CALL MPI_SEND(metvars,1,newtype,1,19,MPI_COMM_WORLD,ierr)
endif
if(rank .eq. 1)then
CALL MPI_RECV(newmetvars,1,newtype2,0,19,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ierr)
endif
I don't understand how to do this.
I get an error saying
[flatm1001:14066] *** An error occurred in MPI_Recv
[flatm1001:14066] *** on communicator MPI_COMM_WORLD
[flatm1001:14066] *** MPI_ERR_TRUNCATE: message truncated
[flatm1001:14066] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
I use openmpi in my local machine. I was able to make use of the subarray command without the mpi_type_contiguous part. However if i combine both because in this case I need to do that since i have a structure with fortran in the real code. I dunno if there is a better way to do it either. Any sort of help and suggestios are appreciated.
Thanks in advance
I assume your custom type contains 1 real, as it's not specified. You first construct a contigious type of 10 of these variables, i.e. MPI_METVARS represents 10 contiguous reals. Now, I don't know if this is really the problem, as the code you posted might be incomplete, but the way it looks now is that you construct a subarray of 10 MPI_METVARS types, meaning you have in effect 100 contiguous reals in newtype and newtype2.
The 'correct' way to handle the structure is to create a type for it with MPI_TYPE_CREATE_STRUCT, which should be your MPI_METVARS type.
So, pls provide the correct code for your custom type and check the size of the newtype type.