I am trying to use MPI_ALLGATHERV using derived data types. Actually, I have to pass chunks of small 3D array in the form of:
SS(IIST:IIEND,JJST:JJEND,KKST:KKEND)
Here, IIST, IIEND, JJST, JJEND, KKST, KKEND are local indices of each process. So I tried to define a derived datatype in the following form:
INTEGER :: MPII,MPJJ,MPKK
CALL MPI_TYPE_CONTIGUOUS(IIEND-IIST+1,MPI_DOUBLE_PRECISION,MPII,IERR)
CALL MPI_TYPE_CONTIGUOUS(JJEND-JJST+1,MPII,MPJJ,IERR)
CALL MPI_TYPE_CONTIGUOUS(KKEND-KKST+1,MPJJ,MPKK,IERR)
CALL MPI_TYPE_COMMIT(MPKK,IERR)
Now, I am defining a displacement array which is visible to every process to be used in MPI_ALLGATHERV. The total number of processes is 27 and they are numbered from 0-26.
DO NT=0,26
DISPL(1)=0
IF (NT.GT.0) DISPL(NT+1)= DISPL(NT)+1
ENDDO
Now, I am executing MPI_ALLGATHERV with the following syntax:
CALL MPI_ALLGATHERV(SS(IIST:IIEND,JJST:JJEND,KKST:KKEND),SPANX*SPANY*SPANZ,MPI_DOUBLE_PRECISION,SS(1,1,1),1,DISPL,MPKK,MPI_COMM_WORLD,IERR)
This is giving me error. Any pointers in this problem will be very helpful and appreciated.
spanx = iiend-iist+1
spany = jjend-jjst+1
spanz = kkend-kkst+1
oldsize = (/spanx,spany,spanz/)
newsize = (/spanx,spany,spanz/)
starts = (/0,0,0/)
CALL MPI_TYPE_CREATE_SUBARRAY(3,OLDSIZE,NEWSIZE,STARTS,MPI_ORDER_FORTRAN,MPI_DOUBLE_PRECISION,ARR,IERR)
CALL MPI_TYPE_COMMIT(ARR,IERR)
DO NT=0,26
DISPL(1)=0
IF (NT.GT.0) DISPL(NT+1)= DISPL(NT)+1
SIZECC(NT)=1
ENDDO
CALL MPI_ALLGATHERV(SS(IIST:IIEND,JJST:JJEND,KKST:KKEND),SPANX*SPANY*SPANZ,MPI_DOUBLE_PRECISION,SS(1,1,1),SIZECC,DISPL,ARR,MPI_COMM_WORLD,IERR)
Still the output doesn`t match. I think something is wrong in Displacement array.
Related
I am trying to "push" a big float into a Tuple. But get following error:
# where test() is a function with big floats as values
store = Tuple{Any, Any}][]
for i in 1:10
push!(store, test(i))
end
store
The error message mentions convert() as a solution, but I am not sure how to convert test().
You cannot push BigFloat into a container that accepts only Tuples. Your container has to accept BigFloats instead, so initialize it with:
store = BigFloat[]
Also note that you could have just written:
store = test.(1:10)
I've been working on a program where I need to be able to sum rows in a two-dimensional array whose number of columns are variables. I should also add that the rows are "split" into two parts (part A, and part B) whose sizes depend on user input.
I can obviously sum a row just using a for loop, but I wanted a more elegant solution that would also be easier to set up across the whole program. I stumbled across the accumulate function out of the numeric library, but all examples that I was able to find were exclusively for one-dimensional arrays.
Here's a sample of my problem code:
total = partNum[PART_A] + partNum[PART_B];
partStart[PART_A] = 0;
partEnd[FUNC_A] = partNum[PART_A];
partStart[PART_B] = partNum[PART_A];
partEnd[FUNC_B] = total;
double stat[5][total];
double mass_sum = 0.0
func = PART_A;
accumulate(stat[MASS][partStart[func]], stat[MASS][partStart[func]], mass_sum);
However, I get a buildtime error which states that:
Indirection requires pointer operand ('double' invalid')
I assume this is a syntax error, but changing how I defined the array's start and end did nothing to fix the error.
The two first argument of accumulate are iterators that the function will use to iterate over the range, but you are passing actual element of the array
Iterator in C++ is a concept that requires certain operations to be valid on your object, as defined per the standard. For instance, pointer types usually match the LegacyRandomAccessIterator, meaning that you can basically use them to as array-like object (you can increment them with ++, you can indirect them with *, you can access an element at position i with [], etc.). I won't go into full details about what are iterators in C++ because it's a whole topic and you can find plenty of references, in particular on the two links I provided.
Back to your problem, what you want is to give accumulate iterator to the beginning and the end of your row, or the beginning and the end of your subranges. There are two ways to do this:
Take the address of the element stat[MASS][partStart[func]], i.e., &stat[MASS][partStart[func]], which will give you the beginning of the range, and then &stat[MASS][partEnd[func]] will give you the end of the range. This works because stat is as double stat[5][total] and the access operator ([]) gives you a reference (a double&), that you can take the address of, and the element on the row are contiguous in memory (that would not work for a column).
Use stat[MASS] + partStart[func] and stat[MASS] + partEnd[func]. In this case, you take the beginning of the row (stat[MASS]), which is (or is implicitly convertible to) a pointer to double (double*) and you increment that pointer by partStart[func] or partEnd[func], giving you the addresses of the elements you want in the row.
So basically:
std::accumulate(&stat[MASS][partStart[func]], &stat[MASS][partEndfunc]], mass_sum);
// or
std::accumulate(stat[MASS] + partStart[func], stat[MASS] + partEnd[func], mass_sum);
This is my code:
Program Arrays
Implicit none
Integer::i,j
Integer,dimension(2)::V_Max
Complex,dimension(0:7,3)::V_cvo
Complex,dimension(7,3)::V_cvo_temp
Do concurrent(i=0:7,j=1:3)
V_cvo(i,j)=cmplx(i+j,2*i-j)
End Do
V_cvo_temp=V_cvo(1:,:)
V_Min=minloc(abs((/((V_cvo_temp(i,j),j=1,3),i=2,5)/)))
Stop
End Program Arrays
After compiling I got a this message:
Error: Different shape for array assignment on dimension 1 (2 and 1)|
What is wrong here? If I want to find location of minimal element in some array in specific sector of that array how it is possible?
This could be one of the possible solution for the problem:
Program Arrays
Implicit none
Integer::i,j
Integer,dimension(2)::V_Max
Complex,dimension(0:7,2)::V_cvo
Logical,dimension(0:7,2) :: lmask = .FALSE.
forall(i=2:5,j=1:2)lmask(i,j) = .TRUE.
Do concurrent(i=0:7,j=1:2)
V_cvo(i,j)=cmplx(i+j,2*i-j)
End Do
V_Max = Maxloc(abs(V_cvo),mask=lmask)-(/1,0/)
Open(1,File='Output.txt',Status='Unknown')
Write(1,'(2x,i0,2x,i0)') V_max
Write(1,*)
Do concurrent(i=2:5,j=1:2)
Write(1,'(1x,i0,1x,i0,2x,f7.4)')i,j,abs(V_cvo(i,j))
End Do
Close(1)
Stop
End Program Arrays
Output file is:
5 1
2 1 4.2426
3 1 6.4031
4 1 8.6023
5 1 10.8167
2 2 4.4721
3 2 6.4031
4 2 8.4853
5 2 10.6301
Opinions about this?
This expression
minloc(abs((/((V_cvo_temp(i,j),j=1,3),i=2,5)/)))
returns a rank-1 array with 1 element. The lhs of the assignments is a rank-1 array with 2 elements. Fortran won't assign incompatible arrays -- hence the compiler error message.
(#gdlmx's answer is subtly wrong in its diagnosis, if the expression returned a scalar Fortran would happily broadcast its value to every element of an array.)
If the expression did return a scalar it would still not return the location of the minimum element in that section of V_cvo. The sub-expression
(/((V_cvo_temp(i,j),j=1,3),i=2,5)/)
produces a rank-1 array containing the specified elements of V_cvo_temp, it essentially flattens the array into a vector and loses their locations along the way. This is why the first expression returns a rank-1 array with 1 element - it's the location of an element in a rank-1 array.
The problem with this solution
V_Min=minloc(abs(V_cvo(2:5,1:3)))
is that the expression abs(V_cvo(2:5,1:3)) will return a (temporary) array indexed, as Fortran arrays are by default, from 1 on each rank. When I try the code it returns the location (1,1) which appears to be outside the section considered. That's the location of the minimum element of the temporary array.
The problem with the 'clever' solutions I've tried has been that abs(V_cvo(2:5,1:3)) always returns, even if hidden from view, a temporary array indexed from 1 on each rank. Any application of minloc or similar functions uses those indices, not the indices that v_cvo uses. The best solution might be to make an explicit temporary array (suitably declared) like this:
allocate(abstemp(LBOUND(v_cvo,1):UBOUND(v_cvo,1),LBOUND(v_cvo,2):UBOUND(v_cvo,2)))
then
v_min = minloc(abstemp(2:5,1:3))
and
deallocate(abstemp)
It seems that the right side of
V_Min=minloc(abs((/((V_cvo_temp(i,j),j=1,3),i=2,5)/)))
returns a scalar instead of a vector of 2 components. What you need is array slicing: V_cvo_temp(1:3,2:5)
V_Min=minloc(abs(V_cvo_temp(2:5,1:3)))
or simpler
V_Min=minloc(abs(V_cvo(2:5,1:3))) ! without temp array
Also you don't need the stop at the end.
Edit1:
minloc returns the index relative to (1,1). To understand this behavior, try this example:
Program Hello
Implicit none
Integer,dimension(2)::V_Min
Complex,dimension(0:7,3)::V_cvo
V_cvo = cmplx(10,10)
V_cvo(3,2) = cmplx(0,0) ! the minimum index = [3,2]
V_Min=minloc(abs(V_cvo))
print *, 'minloc for whole array: ', V_Min
V_Min=minloc(abs(V_cvo(3:,2:)))
print *, 'minloc for sub-array: ', V_Min
End Program Hello
It outputs:
minloc for whole array: 4 2 ! base index=[-1,0]
minloc for sub-array: 1 1 ! base index=[2,2]
So if passing a sub-array to minloc, you need to add your base index to get the 'correct' answer.
This solution also works fine (best maybe):
forall(i=1:7,j=1:3) V_cvo_temp(i,j)=abs(V_cvo(i,j))
V_Min = MINLOC(V_cvo_temp(m:n,:))+(/m-1,0/)
Code are correct for every m and n if they are in interval 1:7 for this case or in some other interval.
Have some experience with MPI, but not with some of the more advanced aspects like derived types, which is what my question is related to.
The code I am working on has several arrays dimensioned (-1:nx+2,-1:ny+2,-1:nz+2). To make it clear, each process has its own values of nx, ny, and nz. There is overlap between the arrays. For instance x(:,:,-1:2) on one proc will represent the same information as x(:,:,nz-1:nz+2) on the proc just "below" it.
A derived cell_zface type has been defined:
idir = 3
sizes = (/nx_glb, ny_glb, nz_glb/) !These nums are the same for all procs.
subsizes = (/nx, ny, 2/)
mpitype = MPI_DATATYPE_NULL
CALL MPI_TYPE_CREATE_SUBARRAY(3, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, mpireal, mpitype, errcode)
CALL MPI_TYPE_COMMIT(mpitype, errcode)
cell_zface = mpitype
Now, this derived type gets used, successfully, in several MPI_SENDRECV calls. For example
CALL MPI_SENDRECV( &
x(-1,-1, 1), 1, cell_zface, proc_z_min, tag, &
x(-1,-1,nz+1), 1, cell_zface, proc_z_max, tag, &
comm, status, errcode)
As I understand it, this call is sending and receiving two "horizontal" slices (i.e. x-y slices) of the array between procs.
I want to do something a little different, namely sending four "horizontal" slices. So I try
call mpi_send(x(-1,-1,nz-1), 2, cell_zface, &
proc_z_max, rank, comm, mpierr)
with an accompanying receive.
And finally, my problem: The code runs, but erroneously. AFAICT, this sends only two horizontal slices, even though I use "2" instead of "1" as the count argument. I can fix this by making two calls to mpi_send:
call mpi_send(x(-1,-1,nz-1), 1, cell_zface, &
proc_z_max, rank, comm, mpierr)
call mpi_send(x(-1,-1,nz+1), 1, cell_zface, &
proc_z_max, rank, comm, mpierr)
with accompanying receives, but this is certainly not pretty.
So, why does the mpi_send send only two horizontal slices, even though I set the count argument to "2"? And is there a clean way to do what I want to do here?
Every MPI datatype has two sizes, so to speak. One is the true size, i.e. the amount of memory it takes to store all the significant data referred by the datatype. One can think of it as of the amount of space in the actual message that an element of that datatype takes.
Another size is the so-called extent. Each datatype in MPI is a collection of instructions of the type: "go to offset dispi from the provided buffer location and read/write an element of basic type typei". The set of all (typei, dispi) pairs is called the type map of the datatype. The minimum offset is called the lower bound and the maximum offset + the size of the of the basic type at that offset + any padding needed is called the upper bound. The extent of a datatype is the difference between the upper bound and the lower bound and gives the size of the shortest contiguous memory region, which includes all locations accessed by the datatype.
As MPI mandates that no memory location is read from or written to more than once during any communication operation, the pairs in the typemap have to refer to disjoint locations. Therefore, the true extent of a datatype is always bigger than or equal to its size.
MPI uses the extent of the datatype when accessing consecutive elements of that datatype. The following statement:
MPI_SEND(buf, n, dtype, ...)
results in:
MPI takes one element of type dtype from location buf following the rules encoded as the typemape of dtype;
MPI takes the next element starting from location buf + extent(dtype);
...
MPI takes the n-th element starting from location buf + (n-1)*extent(dtype).
Primitive datatypes such as MPI_INTEGER, MPI_REAL, etc. have their extent matching the size of the basic type (INTEGER, REAL, etc.) + any padding mandated by the architecture, which makes it possible to send arrays of the basic type by simply specifying the count of elements.
Now, back to your case. You are creating a datatype that covers an nx x ny x 2 subarray from an nx_glb x ny_glb x nz_glb array. The size of that datatype is indeed nx * ny * 2 times the size of mpireal, but the extent is actually nx_glb * ny_glb * nz_glb times the extent of mpireal. In other words:
MPI_SEND(buf, 2, cell_zface, ...)
will not extract two consecutive nx x ny x 2 slabs from the big array at buf. Rather, it will extract one slab from each of two consecutive arrays of size nx_glb x ny_glb x nz_glb, starting from location (startx, starty, startz) in each array. If your program doesn't segfault when run, consider yourself lucky.
Now comes the tricky part. MPI allows one to give each datatype a fake extent (that's why I called the extent as defined earlier "true") by artificially setting the value of the lower and the upper bounds. Doing so does not affect the size of the datatype or its typemap (i.e. MPI still goes to the same offsets and manipulates elements of the same basic types), but affects the strides in memory that MPI makes while accessing consecutive elements of the given datatype. Earlier, setting the extent was done by "sandwiching" the datatype in a stricture between elements of the special pseudotypes MPI_LB and MPI_UB. Ever since MPI-2, the MPI_TYPE_CREATE_RESIZED is used to achieve the same.
integer(kind=MPI_ADDRESS_KIND) :: lb, extent
integer :: newtype
! First obtain the extent of the old type used to construct cell_zface
call MPI_TYPE_GET_EXTENT(mpireal, lb, extent, errcode)
! Adjust the extent of cell_zface
extent = (nx_glb * ny_glb * subsizes(3)) * extent
call MPI_TYPE_CREATE_RESIZED(cell_zface, lb, extent, newtype, errcode)
call MPI_TYPE_COMMIT(newtype, errcode)
! Get rid of the previous type
call MPI_TYPE_FREE(cell_zface, errcode)
cell_zface = newtype
You can now use cell_zface to send several consecutive slabs.
An alternative and presumably simpler approach is to set the size of 3-rd dimension of the array equal to the size of the 3-rd dimension of the subarray while calling MPI_TYPE_CREATE_SUBARRAY:
idir = 3
subsizes = (/nx, ny, 2/)
sizes = (/nx_glb, ny_glb, subsizes(3)/) !These nums are the same for all procs.
mpitype = MPI_DATATYPE_NULL
CALL MPI_TYPE_CREATE_SUBARRAY(3, sizes, subsizes, starts, &
MPI_ORDER_FORTRAN, mpireal, mpitype, errcode)
CALL MPI_TYPE_COMMIT(mpitype, errcode)
cell_zface = mpitype
In both cases I assume that starts(3) is equal to 0.
Consider the following:
program main
integer, parameter :: n=10, m=20
integer ints(n,m)
real floats(m,n)
!... initialize ints
! ...
floats=transpose(ints)
!... do stuff with floats
end
looking at the documentation for gfortran, it seems that transpose(ints) will return an integer array which will then be cast to reals. In this operation, the compiler (gfortran) creates a temporary array for the transposed array which seems like a waste (compile with gfortran -O3 -Warray-temporaries -o test test.f90). Also note that if you change the real array "floats" into an integer array, the warning goes away.
Is there a way to do this (for arbitrary types) without generating a temporary array? (I also tried floats(:,:)=transpose(ints) because I read somewhere that it mattered ... ). Does it behave this way with other compilers?
You could try
floats = transpose(real(ints))
but I wouldn't be very surprised if gfortran (or any other compiler) generated a temporary array to implement this. I'd be more surprised if it didn't.
You might also try
forall (J=1:N, K=1:M) floats(K, J) = real(ints(J, K))
Again, I wouldn't be surprised if a compiler created a temporary array to implement this.
do i = 1, n
do j = 1, m
floats(j,i) = real(ints(i,j))
enddo
enddo
You could make your own transpose interface for handling different data types, although it would have to be a subroutine and not a function.
interface transpose_
module procedure transpose_ints_to_reals
end interface
subroutine transpose_ints_to_reals(ints_in, reals_out)
...
end subroutine
call transpose_(ints,floats)