I have a computer with nproc processors and I'd like to initialize two blacs grids, one of the dimension p x q = nprocs and one of the dimension 1 x 1.
Assume MPI allready initialized and a routine finding good block sizes, the first grid is initialized via
call blacs_get( -1, 0, self%context )
call blacs_gridinit( self%context, 'R', self%nprows, self%npcols )
call blacs_gridinfo( self%context, self%nprows, self%npcols, self%myrow, self%mycol )
But how do I set up the second? Do I have to introduce another mpi communicator first?
As an answer and example, I share this implementation:
call blacs_get( -1, 0, self%context )
call blacs_gridinit( self%context, 'R', self%nprows, self%npcols )
call blacs_gridinfo( self%context, self%nprows, self%npcols, self%myrow, self%mycol )
print*, "A ", self%context, self%nprows, self%npcols, self%myrow, self%mycol
call sleep(1)
call blacs_get( -1, 0, val )
call blacs_gridinit( val, 'R', 1, 1 )
call blacs_gridinfo( val, self%nprows, self%npcols, self%myrow, self%mycol )
call sleep(1)
print*, "B ", val, self%nprows, self%npcols, self%myrow, self%mycol
call sleep(1)
call blacs_get( -1, 0, val2 )
call blacs_gridinit( val2, 'R', 2, 2 )
call blacs_gridinfo( val2, self%nprows, self%npcols, self%myrow, self%mycol )
call sleep(1)
print*, "C ", val2, self%nprows, self%npcols, self%myrow, self%mycol
Which adds three blacs context, no need to initialize another MPI communicator, and amounts to the following output on four cores:
A 0 2 2 1 1
A 0 2 2 0 0
A 0 2 2 1 0
A 0 2 2 0 1
B -1 -1 -1 -1 -1
B -1 -1 -1 -1 -1
B -1 -1 -1 -1 -1
B 1 1 1 0 0
C 1 2 2 1 0
C 1 2 2 1 1
C 1 2 2 0 1
C 2 2 2 0 0
So, the crucial point is that the first argument of blacs_gridinit is an input/output argument, needing the globale blacs context of all processes as an input. It is recived in a new variable by the call to blacs_get, third argument.
What I found quite counter intuitive in this case is the fact, that the value of the context seems to follow some kind of sum rule, so after initializing the 1x1 grid and then again a 4x4 grid, the values of the 4x4 grid handle are not the same on all processes.
Related
I use MPI_Probe to determine the size of a dynamic array that i just pass as a tag. but i send two arrays. the basic structure is :
call MPI_Isend(message, destination, MPI_INT, size, COMM, status, error)
call MPI_Isend(message2, destination, MPI_DOUBLE, size*3, COMM, status, error)
...
call MPI_Probe(sender, MPI_ANY_TAG, COMM, status, error)
size1 = status(MPI_TAG)
call MPI_Probe(sender, MPI_ANY_TAG, COMM, status, error)
size2 = status(MPI_TAG)
actual_size = MIN(size1, size2)
call MPI_Recv(size)
call MPI_Recv(size*3)
So this doesn't work because MPI_Probe will just get same value twice. Any idea how to cycle through different probes or something?
If this doesn't work i plan to change my code around to just have send - recv - send - recv instead of send - send - recv - recv. Just checking if someone has better solutions
As stated in the comments you shouldn't use the tag to send size data from one process to another as the value the tag can take has an upper bound of MPI_TAG_UB that in theory could be quite small, potentially not allow you to communicate a large enough integer. In fact it's bad practice to use the tag to transmit information at all, you should use the message data, that's what it is for after. Victor Eijkhout has the right way, you should inquire of the status argument using MPI_GET_COUNT how many things are being transmitted and use that to allocate the dynamic array. Here is an example, which also uses the correct handles for datatypes, you are using the C rather than the Fortran variants:
ijb#ijb-Latitude-5410:~/work/stack$ cat probe.f90
Program probe
Use, Intrinsic :: iso_fortran_env, Only : stdout => output_unit
Use mpi_f08, Only : mpi_status, mpi_comm_world, mpi_integer, &
mpi_init, mpi_finalize, mpi_send, mpi_probe, mpi_get_count, mpi_recv
Implicit None
Type( mpi_status ) :: status
Real :: rnd
Integer, Dimension( : ), Allocatable :: stuff
Integer :: nprc, rank
Integer :: n
Integer :: error
Integer :: i
Call mpi_init( error )
Call mpi_comm_size( mpi_comm_world, nprc, error )
Call mpi_comm_rank( mpi_comm_world, rank, error )
If( rank == 0 ) Then
Write( stdout, * ) 'Running on ', nprc, ' procs'
End If
If( rank == 0 ) Then
! On rank zero generate a random sized array
Call Random_number( rnd )
n = Int( 10.0 * rnd + 1 )
Write( stdout, * ) 'Allocating ', n, ' elements on rank 0'
Allocate( stuff( 1:n ) )
stuff = [ ( i, i = 1, n ) ]
Write( stdout, '( "Data on proc 0: ", *( i0, 1x ) )' ) stuff
Call mpi_send( stuff, Size( stuff ), mpi_integer, 1, 10, &
mpi_comm_world, error )
Else If( rank == 1 ) Then
! On rank 1 probe the message to get the status
Call mpi_probe( 0, 10, mpi_comm_world, status, error )
! From the status find how many things are being sent
Call mpi_get_count( status, mpi_integer, n, error )
! Use that to allocate the array
Allocate( stuff( 1:n ) )
! And recv the date
Call mpi_recv( stuff, Size( stuff ), mpi_integer, 0, 10, &
mpi_comm_world, status, error )
Write( stdout, * ) 'Recvd ', n, ' integers on proc 1'
Write( stdout, '( "Data on proc 1: ", *( i0, 1x ) )' ) stuff
Else
Write( stdout, * ) 'Busy doing nothing ... ', rank
End If
Write( stdout, * ) 'done', rank
Call mpi_finalize( error )
End Program probe
ijb#ijb-Latitude-5410:~/work/stack$ mpif90 --version
GNU Fortran (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ijb#ijb-Latitude-5410:~/work/stack$ mpif90 -std=f2018 -Wall -Wextra -pedantic -fcheck=all -fbacktrace -Wuse-without-only -Werror -g probe.f90
ijb#ijb-Latitude-5410:~/work/stack$ mpirun -np 2 ./a.out
Running on 2 procs
Allocating 8 elements on rank 0
Data on proc 0: 1 2 3 4 5 6 7 8
done 0
Recvd 8 integers on proc 1
Data on proc 1: 1 2 3 4 5 6 7 8
done 1
ijb#ijb-Latitude-5410:~/work/stack$ mpirun -np 2 ./a.out
Running on 2 procs
Allocating 5 elements on rank 0
Data on proc 0: 1 2 3 4 5
done 0
Recvd 5 integers on proc 1
Data on proc 1: 1 2 3 4 5
done 1
ijb#ijb-Latitude-5410:~/work/stack$ mpirun -np 2 ./a.out
Running on 2 procs
Allocating 2 elements on rank 0
Data on proc 0: 1 2
done 0
Recvd 2 integers on proc 1
Data on proc 1: 1 2
done 1
ijb#ijb-Latitude-5410:~/work/stack$ mpirun -np 2 ./a.out
Recvd 1 integers on proc 1
Data on proc 1: 1
done 1
Running on 2 procs
Allocating 1 elements on rank 0
Data on proc 0: 1
done 0
ijb#ijb-Latitude-5410:~/work/stack$ mpirun -np 2 ./a.out
Running on 2 procs
Allocating 3 elements on rank 0
Data on proc 0: 1 2 3
done 0
Recvd 3 integers on proc 1
Data on proc 1: 1 2 3
done 1
ijb#ijb-Latitude-5410:~/work/stack$
I have been stuck on a problem having two recursive functions in it. I could not understand the mechanism of loop and stack behind it. This are the lines of code of my program.
#include<iostream>
using namespace std;
int test(int num)
{
if (num != 0)
{
num = num - 1;
test(num);
cout << num << " ";
test(num);
}
}
int main()
{
test(3);
}
This is the output of the program
0 1 0 2 0 1 0
Can someone explain me the output of this program using stack?
Unroll the recursion. The output in the rightmost column below matches what you are getting.
executed printed
-------- -------
test(3)
test(2)
test(1)
test(0)
cout << 0 0
test(0)
cout << 1 1
test(1)
test(0)
cout << 0 0
test(0)
cout << 2 2
test(2)
test(1)
test(0)
cout << 0 0
test(0)
cout << 1 1
test(1)
test(0)
cout << 0 0
test(0)
For brevity lets call this function f.
What does this function do? Well, it prints something if its argument is positive and calls itself recursively twice, with the argument decreased by 1. If the argument is zero, the function immediately returns, so the recursion starting from positive arguments will be stopped there. For negative arguments we have an error.
Now, What does it do? It decreases its argument, then calls itself on it, prints it, calls itself again. We can draw a diagram like this:
n: [f(n-1) n-1 f(n-1)]
which means that (neglecting the problem of the number of spaces) it prints whatever f(n-1) prints, then n-1 (and a space), then again f(n-1). The first conclusion: the printout will be symmetric about its central element. And it actually is. If you expand this formula a step further, you'll get this:
n-1: [f(n-2) n-2 f(n-2) n-1 f(n-2) n-2 f(n-2)]
So, in the central position there will always be n-1. Its left-hand and right-hand "neighbour substrings" will be identical. n-2 will be seen in this sequence twice. It's not difficult to see that n-3 will be seen 4 times, etc., till 0 will be seen 2^n times. One can even see that 0 will occupy every second position.
How many numbers will we see? 1 + 2 + ... + 2^n = 2^{n-1} -1. For n=2 this gives 2^3 - 1 = 8 - 1 = 7. That's correct.
What may be the "meaning" of this sequence?
Look (n = 5):
0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0
Now, take the consequtive integers
1 2 3 4 5 6 7 8 9 10 ... 31
and count the number of times each of them is divisible by 2. Can you see? 1 is not divisible by 2, hence 0 in your sequence. 2 is divisible once, hence 1. 3 is not divisible by 2 - hence 0. 4 is divisible twice, hence 2 on the fourth position. Etc., etc.
So for my application I would need to create a tri diagonal matrix. This is easy to do with any language, you loop through all rows and columns, then set the main diagonal values, the sub diagonal values and the super diagonal values. Usually, this is performed on a 2d array.
For my application, I need to create a 1d array of "tridiagonal". Otherway to say this is: take the 2d tridiagonal matrix then turn it into 1d. I can just start with 2d then write some functions that convert 2d array to 1d array. This, I can do. I would like to know if we ca go directly to a 1D "tridiagonal"? For example, say the 2D array is 10*10, then my 1D array would be 100 elements long, then I would need to figure out which index is the main, super and sub diagonal.
Is it possible to do this? Please let me know and thank you
The elements on the main diagonal are at indexes (i, i) and there are n of them; the supra- and infra- diagonals at (i, i-1) and (i, i+1) and there are n-1 of them (i starts at 2 and ends at n-1 respectively).
An option is to use three vectors and store the elements at the respective indexes i in those three vectors.
You can also pack all values in a single vector of length 3n (or 3n-2 if you want to spare space). Add n or 2n to the index, depending on the diagonal you want to address. For an element (i, j), the index of the diagonal is given by j-i+2.
You can just look at your 1D array using a 2D array pointer. Fortran:
integer, target :: A(100)
integer, pointer :: B(:,:)
B(1:10,1:10) => A
B = 0
do i = 1, 10
B(i,i) = 1
end do
print '(*(1x,g0))', A
end
> gfortran diag1d.f90
> ./a.out
1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1
In C++ the casting is easy as well.
Please consider also:
#YvesDaoust's answer, because proposes a better storage strategy: instead of storing all the elements of the tridiagonal matrix, store just the non-zero. You could write a derived-type to encapsulate behavior, if it worths.
#VladmirF's answer, because a pointer association maybe a better approach (depending on your use case), than copying over all the array through reshape, if all you want is a temporarily indexing change for convenience while working on data.
Said that, let's go to my answer.
Populating a tridiagonal matrix is not a different problem than constructing any matrix. I don't think it really matters here. Just bear in mind that Fortran stores arrays in Column Major order, 1 based index.
Changing the shape of your data is easy and obvious, if storage is contiuous. You can make a pointer association or transferring it to a new variable with reshape.
Extracting the main, super and sub diagonal is also a trivial problem, and can be done with simple manipulation of the array index triplet. Look:
program tridiagonal
implicit none
integer, parameter :: n = 4
integer :: A(n, n), B(n**2), main(n), sub(n-1), sup(n-1)
A(1,:) = [1, 4, 0, 0]
A(2,:) = [3, 4, 1, 0]
A(3,:) = [0, 2, 3, 4]
A(4,:) = [0, 0, 1, 3]
! Remember, colum major
B = reshape(A, shape(B)) ! 1, 3, 0, 0, 4, 4, 2, 0, 0, 1, 3, 1, 0, 0, 4, 3
main = B( 1:n**2:n+1) ! 1, 4, 3, 3
sub = B( 2:n**2:n+1) ! 3, 2, 1
sup = B(n+1:n**2:n+1) ! 4, 1, 4
end
This is my code:
Program Arrays
Implicit none
Integer::i
Integer,parameter,dimension(3,4)::Fir_array=0,Sec_array=1
Open(Unit=15,File='Output.txt',Status='Unknown',Action='Readwrite')
Do concurrent(i=1:3)
Write(15,'(1x,i0,".",4(2x,i0,1x,i0))') i,Fir_array(i,:),Sec_array(i,:)
End Do
Close(Unit=15,Status='Keep')
End Program Arrays
The content of Output.txt is:
1. 0 0 0 0 1 1 1 1
2. 0 0 0 0 1 1 1 1
3. 0 0 0 0 1 1 1 1
My intention with this code is to get this content in Output.txt:
1. 0 1 0 1 0 1 0 1
2. 0 1 0 1 0 1 0 1
3. 0 1 0 1 0 1 0 1
How to do that with do loops or implied do?
As usual, there is more than one way of going about this, but the first thing that came to my mind would be to place the desired components of fir_array and sec_array into a temporary array, in the desired order, and then print it.
! Add the following variables to your code:
integer, dimension(8) :: temp
integer :: d1
! Begin:
d1 = size(fir_array, dim=1)
do i = 1, d1
temp([1,3,5,7]) = fir_array(i,:) !! If you're clever you can create a scheme to
temp([2,4,6,8]) = sec_array(i,:) !! obtain the proper indices for arrays of any size.
write(15, '(1x,i0,".",4(2x,i0,1x,i0))') i, temp
enddo
You get the output desired:
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1
I am trying to do a Cholesky decomposition via pdpotrf() of MKL-Intel's library, which uses ScaLAPACK. I am reading the whole matrix in the master node and then distribute it like in this example. Everything works fine when the dimension of the SPD matrix is even. However, when it's odd, pdpotrf() thinks that the matrix is not positive definite.
Could it be because the submatrices are not SPD? I am working with this matrix:
and the submatrices are (with 4 processes and blocks of size 2x2):
A_loc on node 0
4 1 2
1 0.5 0
2 0 16
nrows = 3, ncols = 2
A_loc on node 1
2 0.5
0 0
0 0
nrows = 2, ncols = 3
A_loc on node 2
2 0 0
0.5 0 0
nrows = 2, ncols = 2
A_loc on node 3
3 0
0 0.625
Here, every submatrix is not SPD, however, the overall matrix is SPD (have checked with running with 1 process). What should I do? Or there is nothing I can do and pdpotrf() does not work with matrices of odd size?
Here is how I call the routine:
int iZERO = 0;
int descA[9];
// N, M dimensions of matrix. lda = N
// Nb, Mb dimensions of block
descinit_(descA, &N, &M, &Nb, &Mb, &iZERO, &iZERO, &ctxt, &lda, &info);
...
pdpotrf((char*)"L", &ord, A_loc, &IA, &JA, descA, &info);
I also tried this:
// nrows/ncols is the number of rows/columns a submatrix has
descinit_(descA, &N, &M, &nrows, &ncols, &iZERO, &iZERO, &ctxt, &lda, &info);
but I get an error:
{ 0, 0}: On entry to { 0, 1}: On entry to PDPOTR{ 1,
0}: On entry to PDPOTRF parameter number 605 had an illegal value {
1, 1}: On entry to PDPOTRF parameter number 605 had an illegal
value F parameter number 605 had an illegal value
PDPOTRF parameter number 605 had an illegal value info < 0: If the
i-th argument is an array and the j-entry had an illegal value, then
INFO = -(i*100+j), if the i-th argument is a scalar and had an illegal
value, then INFO = -i. info = -605
From my answer, you can see what the arguments of the function mean.
The code is based on this question. Output:
gsamaras#pythagoras:~/konstantis/check_examples$ ../../mpich-install/bin/mpic++ -o test minor.cpp -I../../intel/mkl/include ../../intel/mkl/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group ../../intel/mkl/lib/intel64/libmkl_intel_lp64.a ../../intel/mkl/lib/intel64/libmkl_core.a ../../intel/mkl/lib/intel64/libmkl_sequential.a -Wl,--end-group ../../intel/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.a -lpthread -lm -ldl
gsamaras#pythagoras:~/konstantis/check_examples$ mpiexec -n 4 ./test
Processes grid pattern:
0 1
2 3
nrows = 3, ncols = 3
A_loc on node 0
4 1 2
1 0.5 0
2 0 16
nrows = 3, ncols = 2
A_loc on node 1
2 0.5
0 0
0 0
nrows = 2, ncols = 3
A_loc on node 2
2 0 0
0.5 0 0
nrows = 2, ncols = 2
A_loc on node 3
3 0
0 0.625
Description init sucesss!
matrix is not positive definte
Matrix A result:
2 1 2 0.5 2
0.5 0.5 0 0 0
1 0 1 0 -0.25
0.25 -1 -0.5 0.625 0
1 -1 -2 -0.5 14
The issue may come from :
MPI_Bcast(&lda, 1, MPI_INT, 0, MPI_COMM_WORLD);
Before this line, lda is different on each process if the dimension of the matrix is odd. Two processes handle 2 rows and two processes handle 3 rows. But after the MPI_Bcast(), lda is the same everywhere (3).
The problem is that the argument lda of the subroutine DESCINIT must be the leading dimension of the local array, that is either 2 or 3.
By commenting MPI_Bcast(), i got:
Description init sucesss!
SUCCESS
Matrix A result:
2 1 2 0.5 2
0.5 0.5 0 0 0
1 -1 1 0 0
0.25 -0.25 -0.5 0.5 0
1 -1 -2 -3 1
At last, it would explain that the program works well for even dimensions and fails for odd dimensions !