Is MPI_Gather the best choice?

Is MPI_Gather the best choice? - fortran

There are 4 processes and one of them (0) is the master which has to build the matrix C as follow
-1 0 0 -1 0
0 -1 0 0 -1
-1 1 1 -1 1
1 -1 1 1 -1
-1 2 2 -1 2
2 -1 2 2 -1
-1 3 3 -1 3
3 -1 3 3 -1
To do so, the matrix is declared as REAL, DIMENSION(:,:), ALLOCATABLE :: C and allocated with
IF (myid == 0) THEN
ALLOCATE(C(2*nprocs,-2:+2))
END IF
where nprocs is the number of processes. Process 0 also sets C = -1. For the communications I first tried with
CALL MPI_GATHER((/0.0+myid,0.0+myid/),&
& 2,MPI_REAL,&
& C(:,0),&
& 2,MPI_REAL,&
& 0,MPI_COMM_WORLD,ieri)
to fill up the central column, and this worked.
Then I tried with
CALL MPI_GATHER((/myid, myid, myid, myid/),&
& 4,MPI_REAL,&
& (/C(1:2*nprocs:2,-1),C(2:2*nprocs:2,-2),C(1:2*nprocs:2,+2),C(2:2*nprocs:2,+1)/),&
& 4,MPI_REAL,&
& 0,MPI_COMM_WORLD,ierr)
to fill the other columns, but it didn't work, giving errors like the following
Fortran runtime error: Index '1' of dimension 1 of array 'c' outside of expected range (140735073734712:140735073734712).
To understand why, I tried to fill the first column alone with the call
CALL MPI_GATHER((/0.0-myid/),&
& 1,MPI_REAL,&
& C(1:2*nprocs:2,-2),&
& 1,MPI_REAL,&
& 0,MPI_COMM_WORLD,ierr)
but the same happened, more or less.
I solved the problem by allocating C for all the processes (i.e. regardless of the process id). Why does this make the call work?
After this I did a little change (before trying again to fill all the columns at once) simply putting the receive buffer in (/.../)
CALL MPI_GATHER((/0.0-myid/),&
& 1,MPI_REAL,&
& (/C(1:2*nprocs:2,-2)/),&
& 1,MPI_REAL,&
& 0,MPI_COMM_WORLD,ieri)
but this makes the call ineffective (no errors, but not even one element in C changed).
Hope someone can explain to me
what's wrong with the constructor (/.../) in the receive buffer?
why the receive buffer has to be allocated in the non-root processes?
it is necessary to use mpi_gatherv to accomplish the task?
is there a better way to build up such a matrix?
EDIT
Is it possible to use MPI derived data types to build the matrix?

First do use use mpi instead of include mpif.h if you are not doing that already. Some of these errors might be found by this.
You cannot use an array constructor as a receive buffer. Why? The array created by a constructor is an expression. You cannot use it where a variable is required.
The same way you cannot pass 1+1 to a subroutine which changes is argument. 1+1 is an expression and you need a variable if it is to be changed.
Secondly, every array into which you write or from which you read must be allocated. In MPI_Gather the receive buffer is ignored for all nonroot processes. BUT when you make a subarray from an array like C(1:2*nprocs:2,-2) from C, such an array must be allocated. This is a Fortran thing, not an MPI one.
If the number of elements received from each rank is the same you can use MPI_Gather, you don' need MPI_Gatherv.
You may consider just receiving the data into a 1D buffer and reorder them as necessary. Another option is to decompose it along the last dimension instead.

Related

seg fault when sending derived type data with allocatable array in mpi

I'm trying to send a derived type data with allocatable array in mpi ad got a seg fault.
program test_type
use mpi
implicit none
type mytype
real,allocatable::x(:)
integer::a
end type mytype
type(mytype),allocatable::y(:)
type(mytype)::z
integer::n,i,ierr,myid,ntasks,status,request
integer :: datatype, oldtypes(2), blockcounts(2)
integer(KIND=MPI_ADDRESS_KIND) :: offsets(2)
call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world,myid,ierr)
call mpi_comm_size(mpi_comm_world,ntasks,ierr)
n=2
allocate(z%x(n))
if(myid==0)then
allocate(y(ntasks-1))
do i=1,ntasks-1
allocate(y(i)%x(n))
enddo
else
call random_number(z%x)
z%a=myid
write(0,*) "z in process", myid, z%x, z%a
endif
call mpi_get_address(z%x,offsets(1),ierr)
call mpi_get_address(z%a,offsets(2),ierr)
offsets=offsets-offsets(1)
oldtypes=(/ mpi_real,mpi_integer /)
blockcounts=(/ n,1 /)
write(0,*) "before commit",myid,offsets,blockcounts,oldtypes
call mpi_type_create_struct(2,blockcounts,offsets,oldtypes,datatype,ierr)
call mpi_type_commit(datatype, ierr)
write(0,*) "after commit",myid,datatype, ierr
if(myid==0) then
do i=1,ntasks-1
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
write(0,*) "received", y(i)%x,y(i)%a
enddo
else
call mpi_isend(z,1,datatype,0,0,mpi_comm_world,request,ierr)
write(0,*) "sent"
write(0,*) myid, z%x, z%a
end if
call mpi_finalize(ierr)
end program
And this is what I got printed out running with 2 processes:
before commit 0 0 -14898056
2 1 13 7
after commit 0 73 0
z in process 1 3.9208680E-07 2.5480442E-02 1
before commit 1 0 -491689432
2 1 13 7
after commit 1 73 0
received 0.0000000E+00 0.0000000E+00 0
forrtl: severe (174): SIGSEGV, segmentation fault occurred
It seems to get negative address offsets. Please help.
Thanks.

There are multiple issues with this code.
Allocatable arrays with most Fortran compilers are like pointers in C/C++: the real object behind the array name is something that holds a pointer to the allocated data. That data is usually allocated on the heap and that could be anywhere in the virtual address space of the process, which explains the negative offset. By the way, negative offsets are perfectly acceptable in MPI datatypes (that's why MPI_ADDRESS_KIND specifies a signed integer kind), so no big problem here.
The bigger problem is that the offsets between dynamically allocated things usually vary with each allocation. You could check that:
ADDR(y(1)%x) - ADDR(y(1)%a)
is completely different than
ADDR(y(i)%x) - ADDR(y(i)%a), for i = 2..ntasks-1
(ADDR here is just a shorhand notation for the object address as returned by MPI_GET_ADDRESS)
Even if it happens the offsets match for some value(s) of i, that is more of a coincidence than a rule.
That leads to the following: the type that you construct using offsets from the z variable cannot be used to send elements of the y array. To solve this, simply remove the allocatable property of mytype%x if that is possible (e.g. if n is known in advance).
Another option that should work well for small values of ntasks is to define as many MPI datatypes as the number of elements of the y array. Then use datatype(i), which is based on the offsets of y(i)%x and y(i)%a, to send y(i).
A more severe issue is the fact that you are using non-blocking MPI operations and never wait for them to complete before accessing the data buffers. This code simply won't work:
do i=1,ntasks-1
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
write(0,*) "received", y(i)%x,y(i)%a
enddo
Calling MPI_IRECV starts an asynchronous receive operation. The operation is probably still in progress by the time the WRITE operator gets executed, therefore completely random data is being accessed (some memory allocators might actually zero the data in debug mode). Either insert a call to MPI_WAIT inbetween the MPI_ISEND and WRITE calls or use the blocking receive MPI_RECV.
A similar problem exists with the use of the non-blocking send call MPI_ISEND. Since you never wait on the completion of the request or test for it, the MPI library is allowed to postpone indefinitely the actual progression of the operation and the send might never actually occur. Again, since there is absolutely no justification for the use of the non-blocking send in your case, replace MPI_ISEND by MPI_SEND.
And last but not least, rank 0 is receiving messages from rank 1 only:
call mpi_irecv(y(i),1,datatype,1,0,mpi_comm_world,request,ierr)
^^^
At the same time, all other processes are sending to rank 0. Therefore, your program will only work if run with two MPI processes. You might want to replace the underlined 1 in the receive call with i.

cross correlation of two string in c++

Consider I have two matrices of 1's and 0's. I want to save it as bool Matrix but opencv doesn't store that way instead it is stored as uchar Mat. Therefore my space increase by 8 times. (each element is 8 bit instead of 1 bit).
My code is basically as follows:
Mat mat1, mat2; //I want each index to be 1 bit
load(mat1); //data size is not important in memory
load(mat2);
corr2(mat1, mat2); //this corr2 is same as Matlab's cross correlation.
I'm doing this part 10M times. Therefore loading takes so much time. My matrices are 1K*1K, so I m able to store them as 1 MB but I want them to be 128 KB (matlab stores as 178 KB approx).
Here is my question: I want to store my matrices as string and instead of Mat operation, I want to use string.
For example, size of mat1 and mat2 is 2*8.
mat1:
0 1 0 0 0 0 1 0 (66=B)
0 1 1 1 0 1 1 1 (122=y)
mat2:
0 1 0 0 0 0 1 1 (67=C)
0 1 1 1 1 0 0 0 (122=z)
I will store str1=By and str2=Cz
Is there a way to cross correlate str1, str2?
Thanks in advance,

Note: This is not an answer, but rather a long comment. I'm posting it as an answer in order to avoid spamming the comments section of the OP.
Storing 1M elements of numeric type is never going to be a problem on any modern computer.
You should learn a little bit more about C and memory storage; bool is not an elemental type, and bool storage therefore only exists virtually. Packing several bits into a char is a good idea, but you should have a look at C++'s bitset if you want to be efficient.
Understand that there might be a significant difference between the way you store your data on hard-disk, and the format that is best suited for processing of active memory (eg RAM). This is probably the reason behind the odd size for Matlab's storage; storing additional information and/or in seemingly inefficient storage units is often desirable to make the algorithms easier to write and the elementary operations execute faster on CPU.
Overall, I think the advantage of switching to "bool-packed chars" storage like you suggest would be negligible in terms of processing speed, and will certainly incur a difficult programming work and obscure the process of maintenance. You are better off sticking with chars for the processing and switch to single-bit storage for write-on-disk operations.

Passing Two dimensional array to C++ using command line

I am having a requirement where in I need to call a C++ application from command line and need to pass a two dimensional array of int type to it. Can anyone please let me know how to do that, and how to interpret it in C++ application using argv parameter
thanks in advance.

In argv you can pass only a one dimensional array, containing strings, it's
char* argv[]
So, you can't really pass 2D array, but you can "simulate" it.
For example, pass 2 parameters, saying what are the sizes of the matrix - number of rows and number of columns and then pass all elements, one by one.
Then parse the arguments in your program, knowing what format you will use.
For example: if you want to pass
1 2 3
4 5 6
you may run your program like this:
./my_program 2 3 1 2 3 4 5 6
This way, you'll know, that argv[1] is the number of rows, argv[2] s the number of columns and them all elements of the 2D array, starting from the upper left corner.
Don't forget, that argv is array, containing char* pointers. In other words, you'll need to convert all parameters ints.

I would recommend passing a file as the only argument. Or data in the same format on stdin as #j_random_hacker suggests. If no human needs to edit it, it could be a binary file. One possible format:
4 bytes = size of first dimension
4 bytes = size of second dimension
4 bytes * size of first * size of second = contents of array
When reading, everything is aligned. Just read every four byte int and interpret as above.
If it needs to be human readable I would do csv or space-delimited. There would be no need to specify the dimensions in that case because each row ends in newline.

To use MPI_type_contiguous and MPI_Type_CREATE_Subarray in FORTRAN 90 structure

Hi
I am trying to use fortran structure like this
type some
u ! actual code will have 17 such scalars
end type some
TYPE(some),ALLOCATABLE,DIMENSION(:) :: metvars,newmetvars
Now the aim of my test program is to send 10 numbers from one processor to another but the starting point of these 10 numbers would be my choice (example if i have an vector of say 20 numbers not necesary i will take the first 10 numbers to the next processor but lets say my choice is from 5 to 15). So first u use mpi_type_contiguous like this
CALL MPI_TYPE_CONTIGUOUS(10,MPI_REAL,MPI_METVARS,ierr) ! declaring a derived datatype of the object to make it in to contiguous memory
CALL MPI_TYPE_COMMIT(MPI_METVARS,ierr)
I do the send rec and was able to get the first 10 numbers to the other processor (I am testing it for 2 processors)
if(rank.EQ.0)then
do k= 2,nz-1
metvars(k)%u = k
un(k)=k
enddo
endif
I am sending this
now
for the second part i used mpi_TYPE_CREATE_SUBARRAY so
then
array_size = (/20/)
array_subsize =(/10/)
array_start = (/5/)
CALL MPI_TYPE_CREATE_SUBARRAY(1,array_size,array_subsize,array_start,MPI_ORDER_FORTRAN,MPI_METVARS,newtype,ierr)
CALL MPI_TYPE_COMMIT(newtype,ierr)
array_size = (/20/)
array_subsize =(/10/)
array_start = (/0/)
CALL MPI_TYPE_CREATE_SUBARRAY(1,array_size,array_subsize,array_start,MPI_ORDER_FORTRAN,MPI_METVARS,newtype2,ierr)
CALL MPI_TYPE_COMMIT(newtype2,ierr)
if(rank .EQ. 0)then
CALL MPI_SEND(metvars,1,newtype,1,19,MPI_COMM_WORLD,ierr)
endif
if(rank .eq. 1)then
CALL MPI_RECV(newmetvars,1,newtype2,0,19,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ierr)
endif
I don't understand how to do this.
I get an error saying
[flatm1001:14066] *** An error occurred in MPI_Recv
[flatm1001:14066] *** on communicator MPI_COMM_WORLD
[flatm1001:14066] *** MPI_ERR_TRUNCATE: message truncated
[flatm1001:14066] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
I use openmpi in my local machine. I was able to make use of the subarray command without the mpi_type_contiguous part. However if i combine both because in this case I need to do that since i have a structure with fortran in the real code. I dunno if there is a better way to do it either. Any sort of help and suggestios are appreciated.
Thanks in advance

I assume your custom type contains 1 real, as it's not specified. You first construct a contigious type of 10 of these variables, i.e. MPI_METVARS represents 10 contiguous reals. Now, I don't know if this is really the problem, as the code you posted might be incomplete, but the way it looks now is that you construct a subarray of 10 MPI_METVARS types, meaning you have in effect 100 contiguous reals in newtype and newtype2.
The 'correct' way to handle the structure is to create a type for it with MPI_TYPE_CREATE_STRUCT, which should be your MPI_METVARS type.
So, pls provide the correct code for your custom type and check the size of the newtype type.

Importing data from file to array

I have 2 dimensional table in file, which look like this:
11, 12, 13, 14, 15
21, 22, 23, 24, 25
I want it to be imported in 2 dimensional array. I wrote this code:
INTEGER :: SMALL(10)
DO I = 1, 3
READ(UNIT=10, FMT='(5I4)') SMALL
WRITE(UNIT=*, FMT='(6X,5I4)') SMALL
ENDDO
But it imports everything in one dimensional array.
EDIT:
I've updated code:
program filet
integer :: reason
integer, dimension(2,5) :: small
open(10, file='boundary.inp', access='sequential', status='old', FORM='FORMATTED')
rewind(10)
DO
READ(UNIT=10, FMT='(5I4)', iostat=reason) SMALL
if (reason /= 0) exit
WRITE(UNIT=*, FMT='(6X,5I4)') SMALL
ENDDO
write (*,*) small(2,1)
end program
Here is output:
11 12 13 14 15
21 22 23 24 25
12

Well, you have defined SMALL to be a 1-D array, and Fortran is just trying to be helpful. You should perhaps have defined SMALL like this;
integer, dimension(2,5) :: small
What happened when the read statement was executed was that the system ran out of edit descriptor (you specified 5 integers) before either SMALL was full or the end of the file was encountered. If I remember rightly Fortran will re-use the edit descriptor until either SMALL is full or the end-of-file is encountered. But this behaviour has been changed over the years, according to Fortran standards, and various compilers have implemented various non-standard features in this part of the language, so you may need to check your compiler's documentation or do some more experiments to figure out exactly what happens.
I think your code is also a bit peculiar in that you read from SMALL 3 times. Why ?
EDIT: OK, we're getting there. You have just discovered that Fortran stores arrays in column-major order. I believe that most other programming languages store them in row-major order. In other words, the first element of your array is small(1,1), the second (in memory) is small(2,1), the third is small(1,2) and so forth. I think that your read (and write) statements are not standard but widely implemented (which is not unusual in Fortran compilers). I may be wrong, it may be standard. Either way, the read statement is being interpreted to read the elements of small in column-major order. The first number read is put in small(1,1), the second in small(2,1), the third in small(1,2) and so on.
Your write statement makes use of the same feature; you might have discovered this for yourself if you had written out the elements in loops with the indices printed too.
The idiomatic Fortran way of reading an array and controlling the order in which elements are placed into the array, is to include an implied-do loop in the read statement, like this:
READ(UNIT=10, FMT='(5I4)', iostat=reason) ((SMALL(row,col), col = 1,numCol), row=1,numRow)
You can also use this approach in write statements.
You should also study your compiler documentation carefully and determine how to switch on warnings for all non-standard features.

Adding to what High Performance Mark wrote...
If you want to use commas to separate the numbers, then you should use list-directed IO rather than formatted IO. (Sometimes this is called format-free IO, but that non-standard term is easy to confuse with binary IO). This is easier to use since you don't have to arrange the numbers precisely in columns and can separate them with spaces or commas. The read is simply "read (10, *) variables"
But sticking to formatted IO, here is some sample code:
program demo1
implicit none
integer, dimension (2,5) :: small
integer :: irow, jcol
open ( unit=10, file='boundary.txt', access='sequential', form='formatted' )
do irow=1, ubound (small, 1)
read (10, '(5I4)') (small (irow, jcol), jcol=1, ubound (small, 2))
end do
write (*, '( / "small (1,2) =", I2, " and small (2,1)=", I2 )' ) small (1,2), small (2,1)
end program demo1
Using the I4 formatted read, the data need to be in columns:
12341234123412341234
11 12 13 14 15
21 22 23 24 25
The data file shouldn't contain the first row "1234..." -- that is in the example to make the alignment required for the format 5I4 clear.
With my example program, there is an outer do loop for irow and an "implied do loop" as part of the read statement. You could also eliminate the outer do loop and use two implied do loops on the read statement, as High Performance Mark showed. In this case, if you kept the format specification (5I4), it would get reused to read the second line -- this is called format reversion. (On a more complicated format, one needs to read the rules to understand which part of the format is reused in format reversion.) This is standard, and has been so at least since FORTRAN 77 and probably FORTRAN IV. (Of course, the declarations and style of my example are Fortran 90).
I used "ubound" so that you neither have to carry around variables storing the dimensions of the array, nor use specific numeric values. The later method can cause problems if you later decide to change the dimension of the array -- then you have to hunt down all of the specific values (here 2 and 5) and change them.
There is no need for a rewind after an open statement.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Is MPI_Gather the best choice? - fortran

Related

seg fault when sending derived type data with allocatable array in mpi

cross correlation of two string in c++

Passing Two dimensional array to C++ using command line

To use MPI_type_contiguous and MPI_Type_CREATE_Subarray in FORTRAN 90 structure

Importing data from file to array

Categories

Resources