Here's an interesting problem that doesn't seem to have an obvious solution. After allocating a pointer to be a 16 element long PetscReal vector, filling it with data, and pulling the data from it, on deallocate() this error is thrown:
*** Error in `./myprogram': corrupted size vs. prev_size: 0x000000001930d8e0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f85076667e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x80dfb)[0x7f850766fdfb]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f850767353c]
...
I'm completely stumped on how this error could arise, given the way the array is crashes around is set up.
Here are all snippets of code related to this problem:
PetscReal, allocatable :: adj_temp_area(:), adj_local_area(:)
PetscInt :: temp_int, rec_local_tri
...do unrelated stuff...
rec_local_tri = int(sum(vec_ptr-1))
allocate(adj_local_area(rec_local_tri))
print*,rec_local_tri ! prints out '16'
temp_int=int(maxval(vec_ptr))
allocate(adj_temp_area(temp_int))
print*,temp_int ! prints out '4'
do i=1,grid%num_pts_local
...do stuff...
call MatGetRow(grid%adjmatrix_area,grid%vertex_ids(i)-1,ncols,PETSC_NULL_INTEGER,adj_temp_area,ierr);CHKERRQ(ierr)
adj_local_area(pos2+1:pos2+temp_int2) = adj_temp_area(2:temp_int)
enddo
print*, adj_local_area ! prints out 16 element array, filled with correct values
! CRASHES HERE!
deallocate(adj_local_area)
Several things:
adj_local_area is filled properly and its bounds are not overwritten. If you print out the literal value of the line where the array is filled, you see this:
adj_local_area( 1 : 2 ) = adj_temp_area(2: 3 )
adj_local_area( 3 : 5 ) = adj_temp_area(2: 4 )
adj_local_area( 6 : 7 ) = adj_temp_area(2: 3 )
adj_local_area( 8 : 9 ) = adj_temp_area(2: 3 )
adj_local_area( 10 : 12 ) = adj_temp_area(2: 4 )
adj_local_area( 13 : 14 ) = adj_temp_area(2: 3 )
adj_local_area( 15 : 15 ) = adj_temp_area(2: 2 )
adj_local_area( 16 : 16 ) = adj_temp_area(2: 2 )
Several other arrays are allocated, filled, and deallocated in the same way. There are no problems. If I comment out the deallocate(adj_local_area), the code runs fine, until it tries to exit the subroutine and clear the heap - and it crashes with the same message.
I initially thought it was a type thing (i.e., real*8 getting written into a real*4 vector, say) but values being put into the array are all PetscReal, the same type as the vector itself (PetscReal is compiled as real*4 for my configuration, I believe).
Any ideas? Let me know if you need more code and I can provide it.
Related
Consider the following simple fortran program
program test_vec_allocation
use mpi
implicit none
integer(kind=8) :: N
! =========================BLACS and MPI=======================
integer :: ierr, size, rank,dims(2)
! -------------------------------------------------------------
integer, parameter :: block_size = 100
integer :: context, nprow, npcol, local_nprow, local_npcol
integer :: numroc, indxl2g, descmat(9),descvec(9)
integer :: mloc_mat ,nloc_mat ,mloc_vec ,nloc_vec
call blacs_pinfo(rank,size)
dims=0
call MPI_Dims_create(size, 2, dims, ierr)
nprow = dims(1);npcol = dims(2)
call blacs_get(0,0,context)
call blacs_gridinit(context, 'R', nprow, npcol)
call blacs_gridinfo(context, nprow, npcol, local_nprow,local_npcol)
N = 700
mloc_vec = numroc(N,block_size,local_nprow,0, nprow)
nloc_vec = numroc(1,block_size,local_npcol,0, npcol)
print *,"Rank", rank, mloc_vec, nloc_vec
call blacs_gridexit(context)
call blacs_exit(0)
end program test_vec_allocation
when I run it with 11 mpi ranks i get
Rank 0 100 1
Rank 4 100 1
Rank 2 100 1
Rank 1 100 1
Rank 3 100 1
Rank 10 0 1
Rank 6 100 1
Rank 5 100 1
Rank 9 0 1
Rank 8 0 1
Rank 7 0 1
which is how i would expect scalapack to divide this array, however, for even number of ranks i get:
Rank 0 200 1
Rank 8 200 0
Rank 9 100 1
Rank 10 100 0
Rank 1 200 0
Rank 6 200 1
Rank 11 100 0
Rank 3 200 1
Rank 4 200 0
Rank 2 200 0
Rank 7 200 0
Rank 5 200 0
which makes no sense, why would rank 0 get 200 elements for block size 100 and ranks * block size > N.
Because of this my program works for mpi ranks 1,2,3,5,7,11, but fails for ranks 4,6,8,9,10,12, etc (I dont why it is failing for rank 9!). Can anyone explain what is wrong in my approach?
GFortran version: 6.1.0
SCALPACK version: 2.1.0
MacOS version: 10.11
There are a number of things wrong with your code
1) Firstly don't use Integer( 8 ). As Vladimir put it, please unlearn this. Not only is it not portable and therefore very bad practice (please see many examples here, e.g. Fortran 90 kind parameter) here it is wrong as numroc expects an integer of default kind as its first argument (see e.g. https://software.intel.com/content/www/us/en/develop/documentation/mkl-developer-reference-fortran/top/scalapack-routines/scalapack-utility-functions-and-routines/numroc.html)
2) You call an MPI routine before you call MPI_Init, with a hand full of exceptions (and this isn't one) this results in undefined behaviour. Note the description at https://www.netlib.org/blacs/BLACS/QRef.html#BLACS_PINFO makes no reference to actually calling MPI_Init. As such I also prefer to call MPI_Finalise
3) You have misunderstood MPI_Dims_create. You seem to assume you will get a 1 dimensional distribution, but you actually ask it for a two dimensional one. Quoting from the standard at https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
The entries in the array dims are set to describe a Cartesian grid
with ndims dimensions and a total of nnodes nodes. The dimensions are
set to be as close to each other as possible,using an appropriate
divisibility algorithm. The caller may further constrain the
operation of this routine by specifying elements of array dims. If
dims[i] is set to a positive number,the routine will not modify the
number of nodes in dimension i; only those entries where dims[i] = 0
are modified by the call.
You set dims equal to zero, so the routine is free to set both dimensions. Thus for 11 processes you will get a 1x11 or 11x1 grid, which is what you seem to expect. However for 12 processes, as The dimensions are set to be as close to each other as possible you will get either a 3x4 or 4x3 grid, NOT 12x1. If it is 3x4 along each row you expect numroc to return 3 processes with 200 elements ( 2 blocks ), and 1 with 100. As there are 3 rows you therefore expect 3x3=9 processes returning 200 and 3x1=3 returning 100. This is what you see. Also try 15 procs - you will see an odd number of processes that according to you "does not work", this is because (advanced maths alert) 15=3x5. Incidentally on my machine 9 processes does NOT return 3x3 - this looks like a bug in openmpi to me.
I have that each processor has its own unique matrix, A, of size Nx2 where N varies with processor. I want to collect all these matrices into one single buff (NxP)x2 matrix, where P is the number of processors.
Size wise in Fortran they are allocated like,
A(N,2)
buff(N*P,2)
As an example, let P = 2 and the A matrices for each processor be,
for Proc-1
10 11
10 11
for Proc-2
20 21
20 21
To this end I use MPI_GATHERV and save the individual matrices in the buff matrix. If I do this then buff will look like this,
10 20
10 20
11 21
11 21
But what I want is the matrix to look like this,
10 11
10 11
20 21
20 21
In memory (I think) Buff : |10 , 10 , 20, 20 , 11 , 11 , 21 , 21|
Sample code is below,
...
! size = 2
root = 0
ALLOCATE ( count(size), num(size) )
! -----------------------------------------------------------
! Mock data
! -----------------------------------------------------------
IF(rank.eq.0) THEN
m = 2
mm = m*2
allocate(A(m,2))
A(1,1) = 10
A(1,2) = 11
A(2,1) = 10
A(2,2) = 11
ELSE
m = 2
mm = m*2
allocate(A(m,2))
A(1,1) = 20
A(1,2) = 21
A(2,1) = 20
A(2,2) = 21
END IF
! -----------------------------------------------------------
! send number of elements
! -----------------------------------------------------------
CALL MPI_GATHER(mm,1,MPI_INTEGER,count,1,MPI_INTEGER,root,cworld,ierr)
! -----------------------------------------------------------
! Figure out displacement vector needed for gatherv
! -----------------------------------------------------------
if(rank.eq.0) THEN
ALLOCATE (buff(SUM(count)/2,2), disp(size), rdisp(size))
rdisp = count
disp(1) = 0
DO i = 2,size
disp(i) = disp(i-1) + count(i-1)
END DO
END IF
! -----------------------------------------------------------
! Rank-0 gathers msg
! -----------------------------------------------------------
CALL MPI_GATHERV(A,mm,MPI_INTEGER,buff,rdisp,disp,MPI_INTEGER,root,cworld,ierr)
! -----------------------------------------------------------
! Print buff
! -----------------------------------------------------------
if(rank.eq.0) THEN
DO i = 1,sum(count)/2
print*, buff(i,:)
end do
END IF
I have looked at Using Gatherv for 2d Arrays in Fortran but am a little confused with the explanation.
I’m not very familiar with the MPI details, but is there a "simple" way to gather all the matrices and place them in the correct memory position in buff?
**** Edit ****
Fallowing what Gilles Gouaillardet suggested. I'm trying to figure how to do that,
The derived type for sending the rows should look something like this (I think),
CALL MPI_TYPE_vector(2,1,2,MPI_INTEGER,MPI_ROWS,ierr)
CALL MPI_TYPE_COMMIT(MPI_ROWS,ierr)
Then I extend,
call MPI_Type_size(MPI_INTEGER, msg_size, ierr)
lb = 0
extent = 2*msg_size
call MPI_Type_create_resized(MPI_ROWS, lb, extent , MPI_ROWS_extend, ierr)
CALL MPI_TYPE_COMMIT(, MPI_ROWS_extend,ierr)
I’m trying to understand why I need the second derived type for receiving. I’m not sure how that one should look like.
I have an input data file storing two columns (first column contains the names of the variables and the second column contains their values). I am trying to read this input file through my FORTRAN script and to print on the screen the variables I've just created.
Here are the input file, the script, as well as the terminal output displayed on the terminal at the execution:
input file:
a 7 2 4
b 150
vec1 1 2 3
vec2 4 5 6
c 56
script
program main
implicit none
character(16) :: cinput
integer :: a0,a1,a2,b0,c0,i,j
integer,dimension(:,:),allocatable :: gfd
open(9, file='inputdata.dat')
read(9,*) cinput,a0,a1,a2
read(9,*) cinput,b0
allocate(gfd(3,2))
read(9,*) cinput,gfd(:,1)
read(9,*) cinput,gfd(:,2)
read(9,*) cinput,c0
close(9)
write(*,*) 'a0=', a0,'a1=', a1,'a2=', a2,'b0=', b0,'c0=', c0
do j=1,2
do i=1,3
write(*,*) gfd(i,j)
enddo
enddo
end program main
Output on the terminal
a0 = 7, a1 = 2, a2 = 4, b0 = 150, c0 = 56
1
2
3
4
5
6
Now, this is good, but would there be a way to assign the values to the variable "gfd" without having to specify the size of the array in "allocate"? I could then modify the input file with a longer/smaller array, without having to modify the script when I allocate the variable "gfd".
Thank you for your support if you can help me!
ms518
EDIT: thanks for your answer, this procedure is working and it is now possible to work with various array sizes in the input file without having to modify the fortran script. Below are the modifications in inputfile, script and the result obtained.
input file:
size 5 2
a 7 2 4
b 150
vec1 1 2 3 4 5
vec2 6 7 8 9 10
c 56
script
program main
implicit none
character(16) :: cinput
integer :: a0,a1,a2,b0,c0,i,j, rows, cols
integer,dimension(:,:),allocatable :: gfd
open(9, file='inputdata.dat')
read(9,*) cinput,rows,cols
read(9,*) cinput,a0,a1,a2
read(9,*) cinput,b0
allocate(gfd(rows,cols))
read(9,*) cinput,gfd(:,1)
read(9,*) cinput,gfd(:,2)
read(9,*) cinput,c0
close(9)
write(*,*) 'a0=', a0,'a1=', a1,'a2=', a2,'b0=', b0,'c0=', c0
do j=1,cols
do i=1,rows
write(*,*) gfd(i,j)
enddo
enddo
end program main
Output on the terminal
a0 = 7, a1 = 2, a2 = 4, b0 = 150, c0 = 56
1
2
3
4
5
6
7
8
9
10
The best way to specify the size of the array would be to include its dimensions in the input file, read them, allocate the array, then read the array.
If you need assistance programming this, modify your question. You could, if you want, post your revised code to answer your own question.
Suppose I have a Fortran program which includes the following loop:
do i=1, 10
print *, i
enddo
The output of this will be like
1
2
...
10
How can I write these values to a single line, like in the following?
1 2 ... 10
There are a number of ways, two that come to mind immediately are shown in the following little program
$ cat loop.f90
Program loop
Implicit None
Integer :: i
Write( *, * ) 'First way - non-advancing I/O'
Do i = 1, 10
Write( *, '( i0, 1x )', Advance = 'No' ) i
End Do
Write( *, * ) ! Finish record
Write( *, * ) 'Second way - implied do loop'
Write( *, * ) ( i, i = 1, 10 )
End Program loop
$ gfortran -std=f2003 -Wall -Wextra -fcheck=all loop.f90
$ ./a.out
First way - non-advancing I/O
1 2 3 4 5 6 7 8 9 10
Second way - implied do loop
1 2 3 4 5 6 7 8 9 10
$
The first method. non-advancing I/O, suppresses the end of record marker being written, which is normally a new line, but does require an explicit format. The second, implied do loop, doesn't require a format, but is less flexible.
BTW in English they are normally called "loops"
I have two arrays, array global has 8 values and it will be scatter among array local with 2 values. What I was trying to do is, take the big array, split into small arrays, do some work, then put it back together.
Problem:
Even though I successfully scattered the data, the do loop as written is only working for the first sub array local. What I want is all of the integers in the scattered local array should be multiplied by 2, then gathered into the global array.
Code for the do loop (some work has been done here):
do j = 1,2
local(j) = j*2
print *, j
end do
Here's the full code. If you go down below you'll notice the part which I need your help.
MODULE MPI
IMPLICIT NONE
INCLUDE 'mpif.h'
INTEGER :: MYID,TOTPS, IERR, MPISTTS
CONTAINS
SUBROUTINE MPIINIT
IMPLICIT NONE
CALL MPI_INIT( IERR )
CALL MPI_COMM_RANK(MPI_COMM_WORLD,MYID,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,TOTPS,IERR)
RETURN
END SUBROUTINE MPIINIT
END MODULE MPI
PROGRAM SCATTER
USE MPI
IMPLICIT NONE
CALL MPIINIT
CALL TEST
CALL MPI_FINALIZE(IERR)
CONTAINS
SUBROUTINE TEST
USE MPI
IMPLICIT NONE
INTEGER :: I,J
INTEGER,DIMENSION(8) :: GLOBAL
INTEGER,DIMENSION(2) :: LOCAL
if (myid .eq. 0) then
do i = 1,8
global(i) = i
end do
end if
call mpi_scatter(global,2,mpi_integer,local,2,mpi_integer,0, &
mpi_comm_world,ierr)
print*,"task",myid,":",local
call mpi_barrier(mpi_comm_world,ierr)
!!!!!!! do some work here
do j = 1,2
local(j) = j*2
print*,j
end do
!!!!!! end work
call mpi_gather(local,2,mpi_integer,global,2,mpi_integer,0, &
mpi_comm_world,ierr)
if(myid .eq. 0) then
print*,"task",myid,":",global
end if
END SUBROUTINE TEST
END PROGRAM SCATTER
Notes:
(1) I've been reading & learning from this thread but it looks challenging for now.
(2) Run code mpif90 SCATTER.f90 .. mpirun -np 4 ./a.out
Output:
task 0 : 1 2
task 1 : 3 4
task 2 : 5 6
task 3 : 7 8
1
2
1
2
1
2
1
2
task 0 : 2 4 2 4 2 4 2 4
What I want to get is: task 0 : 2 4 6 8 10 12 14 16
You wrote
local(j) = j * 2
print*, j
I don't think that does what you think it does.
You probably meant to write
local(j) = local(j) * 2
print*, local(j)