I think i'm observing a bug in ifort 2015.
$> ifort test.f90 -O1 -g && ./a.out
6 0 0 0 0 0 0
1 0
$> ifort test.f90 -O0 -g && ./a.out
6 0 0 0 0 0 0
6 0 0 0 0 0 0
The second result is the good one, and I see no reason for the difference.
file test.f90 :
module useless_module
! this module is useless
! remove it and the bug disappear
implicit none
! those variables are useless
! they will never be touched
! remove one of them and the bug disappear
! rename one of them and the bug disappear
integer,allocatable,dimension(:) :: num_dr , &
num_cf , &
num_cfi, &
num_num, &
num_typ
end module useless_module
program test_program
implicit none
! those variables are useless
! they will never be touched
! remove one of them and the bug disappear
integer,allocatable,dimension(:) :: a1, b1, c1, d1, &
e1, g1, f1, h1, &
i1, j1, k1
call routine_1(a1,b1,c1,d1,e1,f1,g1,h1,i1,j1,k1)
contains
subroutine routine_1(a3,b3,c3,d3,e3,f3,num_cf,num_dr, &
num_typ,num_num,num_cfi)
implicit none
! those arguments are useless
! they will never be touched
! remove one of them and the bug disappear
integer,allocatable,dimension(:) :: a3,b3,c3,d3,e3,f3
! those arguments are useless
! they will never be touched
! remove one of them and the bug disappear
! rename one of them and the bug disappear
integer,allocatable,dimension(:) :: num_dr , &
num_cf , &
num_cfi, &
num_num, &
num_typ
! this variable is useless
! it will never be touched
! remove it and the bug disappear
integer,allocatable,dimension(:) :: g3
! those variables are actualy used !
integer,allocatable,dimension(:,:,:) :: h3,i3,j3
allocate(h3(1,1,1),i3(1,1,1),j3(1,1,1))
call routine_2(g3,i3,j3)
! here, normaly, size(i3)=6 and i3= 0 0 0 0 0 0
! But that is not what is printed : BUG ?
! printing size(i3) AND i3 is mandatory to make the bug happen
write(*,'(7i2)') size(i3),i3
deallocate(h3,i3,j3)
end subroutine routine_1
subroutine routine_2(a2,b2,c2)
use useless_module
implicit none
integer,allocatable,dimension(:) :: a2,d2,e2,f2,g2
integer,allocatable,dimension(:,:,:) :: b2, c2
integer :: j2
! j2 have to be be a variable
j2=1
! allocate and deallocate some array
! not doing that will make the bug desappear
allocate (d2(j2),e2(1),f2(1),g2(1))
deallocate(d2 ,e2 ,f2 ,g2)
call reallocate( c2,3,2,1)
call reallocate( b2,3,2,1) ; b2=0
! here, we have size(b2)=6 and b2= 0 0 0 0 0 0
! printing size(b2) AND b2 is mandatory to make the bug happen
write(*,'(7i2)') size(b2),b2
end subroutine routine_2
subroutine reallocate(a4,b4,c4,d4)
implicit none
integer,allocatable,dimension(:,:,:) :: a4
integer :: b4,c4,d4
deallocate(a4) ; allocate(a4(b4,c4,d4))
end subroutine reallocate
end program test_program
As you can see, I'm doing nothin fancy.
I tried to reduce the code a much as i could
I tried on three computer with three version of ifort (15.0.0 20140723, 15.0.2 20150121 and 14.0.0 20130728)
I always see the same thing.
I don't see it with gfortran (4.8.2 or 5.1.0)
It seems big, and I'm sure I'm making a mistake, but I don't see it.
Any help will be appreciated
edit : I'm under linux (ubuntu and archlinux)
You are correct - this is a bug in ifort. I have sent this on to the developers.
Related
Consider the following Fortran program
program test_prg
use iso_fortran_env, only : real64
use mpi_f08
implicit none
real(real64), allocatable :: arr_send(:), arr_recv(:)
integer :: ierr
call MPI_Init(ierr)
allocate(arr_send(3), arr_recv(3))
arr_send = 1
print *, lbound(arr_recv)
call MPI_Gatherv(arr_send, size(arr_send), MPI_DOUBLE_PRECISION, arr_recv, [size(arr_send)], [0], MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
print *, lbound(arr_recv)
call MPI_Finalize(ierr)
end program
Execution of this program on 1 processor (compiled with gfortran 9.3.0 and mpich 3.3.2), prints:
1
0
So arr_recv has changed its lower bound after the call to MPI_Gatherv. If I use arr_recv(1) instead of arr_recv in the call to MPI_Gatherv, then it doesn't change. If I replace mpi_f08 module with mpi, then using either arr_recv(1) or arr_recv doesn't change the lower bound.
Why is lower bound changing in this program?
At this stage, I believe this is a bug in gfortran affecting the MPI Fortran 2018 bindings (e.g. use mpi_f08) and I reported it at https://gcc.gnu.org/pipermail/fortran/2020-September/055068.html.
All gfortran versions are affected (I tried 9.2.0, 10.2.0 and the latest master branch, versions 8 and earlier do not support dimension(..).
The reproducer below can be used to evidence the issue
MODULE FOO
INTERFACE
SUBROUTINE dummyc(x0) BIND(C, name="sync")
type(*), dimension(..) :: x0
END SUBROUTINE
END INTERFACE
contains
SUBROUTINE dummy(x0)
type(*), dimension(..) :: x0
call dummyc(x0)
END SUBROUTINE
END MODULE
PROGRAM main
USE FOO
IMPLICIT NONE
integer :: before(2), after(2)
INTEGER, parameter :: n = 1
DOUBLE PRECISION, ALLOCATABLE :: buf(:)
DOUBLE PRECISION :: buf2(n)
ALLOCATE(buf(n))
before(1) = LBOUND(buf,1)
before(2) = UBOUND(buf,1)
CALL dummy (buf)
after(1) = LBOUND(buf,1)
after(2) = UBOUND(buf,1)
if (before(1) .NE. after(1)) stop 1
if (before(2) .NE. after(2)) stop 2
before(1) = LBOUND(buf2,1)
before(2) = UBOUND(buf2,1)
CALL dummy (buf2)
after(1) = LBOUND(buf2,1)
after(2) = LBOUND(buf2,1)
if (before(1) .NE. after(1)) stop 3
if (before(2) .NE. after(2)) stop 4
END PROGRAM
FWIW, Intel ifort compiler (I tried 18.0.5) works fine with the reproducer.
This question is a bit old, but I had the same issue on the last days with GNU Fortran (GCC) 11.2.0 when moving from use mpi to use mpi_f08 - but in my case, it was with MPI_Allreduce. It's clearly a bug on the MPI functions, but one workaround is to send the argument with the bounds defined, as var(1:end). This worked for me. In your case, you could try:
call MPI_Gatherv(arr_send(1:3), size(arr_send(1:3)), MPI_DOUBLE_PRECISION, arr_recv(1:3), [size(arr_send(1:3))], [0], MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr)
I have MPI ranks split up to calculate different parts an an array, then I want to put/send those slices onto a different rank that doesn't participate in the calculation. That rank is the master of a new communicator set up to do other things with the array (averaging, IO, etc). I got it to work with MPI_isend and MPI_irecv, and now I want to try MPI_Put.
use mpi_f08
use iso_c_binding
implicit none
integer, parameter :: n=10, gps = 18, pes=12, dpes = 6
integer :: main=pes, d=dpes
integer :: diag_master
integer :: global_size, global_rank, diag_size, diag_rank
type(MPI_comm),allocatable :: diag_comm
integer :: pelist_diag
TYPE(MPI_Win) :: win
integer :: ierr, i, j
type(MPI_COMM) :: comm, mycomm
integer :: gsz, grk
integer :: lsz, lrk
integer(KIND=MPI_ADDRESS_KIND) :: local_group
logical :: local_flag
integer :: color,key
!!! THIS IS THE ARRAY
real, dimension(n,pes) :: r
!!!
logical :: on_dpes = .false.
logical,allocatable,dimension(:) :: dpes_list ! true if on dpes list
integer :: comm_manager
integer :: dmg
integer(KIND=MPI_ADDRESS_KIND) :: buff_size !< the size of a variable type
integer(kind=MPI_ADDRESS_KIND) :: displacement
integer :: disp_size
integer :: loc_base
integer, pointer :: fptr
!!!!!!!! THIS ALL WORKS BEGIN !!!!!!!!
comm=MPI_COMM_WORLD
call MPI_INIT(ierr)
call MPI_COMM_SIZE(COMM, gsz, ierr)
call MPI_COMM_RANK(COMM, grk, ierr)
allocate(dpes_list(gsz))
! write (6,*) "I am ",grk," of ",gsz
!> Find the group
call MPI_COMM_GET_ATTR(COMM,MPI_APPNUM,local_group,local_flag,ierr)
!> Split a new communicator as mycom
color = int(local_group)
key = 0
call MPI_COMM_SPLIT(COMM, color, key, mycomm, ierr)
!> Get information about the split communicators
call mpi_comm_size(mycomm,lsz,ierr)
call mpi_comm_rank(mycomm,lrk,ierr)
!> Create data on the main communicator
if (lsz == pes) then
comm_manager = main
on_dpes = .false.
r = 0.0
if (mod(lrk,2) == 0) then
c_loop: do concurrent (i=1:n)
r(i,lrk+1) = sin(real(i))+real(i)
enddo c_loop
else
r(:,lrk+1) = 10.0-dble(lrk)
endif
if (lsz == dpes) then
diag_size = lsz
diag_rank = lrk
comm_manager = d
on_dpes = .true.
diag_comm = mycomm
if (lrk==0) then
dmg = grk
endif
endif
call MPI_ALLGATHER(on_dpes,1,MPI_LOGICAL, &
dpes_list,gsz,MPI_LOGICAL, MPI_COMM_WORLD, ierr)
!> Get the master of dpes
do i=1,gsz
if (dpes_list(i)) then
dmg = i-1
exit
endif
enddo
diag_master = dmg
diag_global_master = dmg
!!!!!!!! THIS ALL WORKS END !!!!!!!!
!! At this point, the ranks that participate in the calculation
!! have values in r(i,lrk+1) where lrk is their rank
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!! THIS IS WHERE THINGS GO WRONG? !!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
disp_size = storage_size(r)
buff_size = disp_size*size(r)
call c_f_pointer(c_loc(r(1,1)),fptr)
loc_base = fptr
nullify(fptr)
write (6,*) loc_base, grk
call MPI_Win_create(loc_base,buff_size,disp_size,MPI_INFO_NULL,&
mpi_comm_world,win,ierr)
call MPI_Win_Fence(0,win,ierr)
displacement = loc_base + disp_size *buff_size
! if (.not.allocated(diag_comm)) then
if (grk == 11) then
call MPI_Put(r(:,global_rank+1),size(r,1),MPI_FLOAT,&
diag_master,displacement,size(r,1), MPI_FLOAT, win ,ierr)
endif
call MPI_Win_Fence(0,win,ierr)
CALL MPI_WIN_FREE(win, ierr)
call MPI_FINALIZE(ierr)
I have ! if (.not.allocated(diag_comm)) then commented out because I tried to do this with all of the ranks that calculate r, but I got the same result.
I am compiling with mpiifort -O0 -fpe0 -init=snan,arrays -no-wrap-margin -traceback -stand f18 and run with mpirun -n 12 ./$#.x : -n 6 ./$#.x in my Makefile. The version of mpiifort I am using is
> mpiifort -v
mpiifort for the Intel(R) MPI Library 2019 Update 2 for Linux*
Copyright 2003-2019, Intel Corporation.
ifort version 19.0.2.187
The output (write (6,*) loc_base, grk)is strange.
1072411986 0
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 9
0 10
0 11
2142952877 12
2142952877 13
2142952877 14
2142952877 15
2142952877 16
2142952877 17
Rank 12-17 are the ranks that don't participate in "calculating r", but I'm not sure why c_loc(r(1,1)) is different for these ranks. Also, it is different for rank 0.
My actual questions are
1) How do I calculate the displacement variable? Am I doing it correctly? Is it supposed to be different between ranks because it will be in this case?
2) Why is c_loc(r(1,1)) different for the ranks 12-17? Does it have anything to do with the fact that this is a SPMD program? Why is it different for rank 0?
3) Can I do the one way communication with all of the ranks instead of just one? I had each rank call mpi_isend, and then i just called mpi_irecv in a loop through all of the ranks sending when I did this the other way. Can I do something similar with MPI_Put? Should I be using MPI_Get? Something else?
4) How do I get this to work? This is just an educational example for myself, and what I actually need to do is much more complicated.
I can answer item 2, at least. You have:
call c_f_pointer(c_loc(r(1,1)),fptr)
loc_base = fptr
where loc_base is declared integer. You seem to be assuming that loc_base is some sort of address, but it is not. In Fortran, intrinsic assignment from a pointer assigns the value of the target, not the location of the target. So you're effectively doing a TRANSFER of the REAL values of r to loc_base - probably not what you want.
It seems like whenever I am trying to allocate a window around 30-32 Mb I get a segmentation fault?
I am using following routine MPI_WIN_ALLOCATE_SHARED
Does anybody know if there is a limit to how big my window can be? If so, is there a way to compile my code relaxing that limit?
I am using INTEL MPI 19.0.3 and ifort 19.0.3 -
Example written in Fortran. By varying the integer size_ you can see when the segmentation fault occurs. I tested it with size_=10e3 and size_=10e4 the latter caused a segmentation fault
C------
program TEST_STACK
use, INTRINSIC ::ISO_C_BINDING
implicit none
include 'mpif.h'
!--- Parameters (They should not be changed ! )
integer, parameter :: whoisroot = 0 ! - Root always 0 here
!--- General parallel
integer :: whoami ! - My rank
integer :: mpi_nproc ! - no. of procs
integer :: mpierr ! - Error status
integer :: status(MPI_STATUS_SIZE)! - For MPI_RECV
!--- Shared memory stuff
integer :: whoami_shm ! - Local rank in shared memory group
integer :: mpi_shm_nproc ! - No. of procs in Shared memory group
integer :: no_partners ! - No. of partners for share memory
integer :: info_alloc
!--- MPI groups
integer :: world_group ! - All procs across all nodes
integer :: shared_group ! - Only procs that share memory
integer :: MPI_COMM_SHM ! - Shared memory communicators (for those in shared_group)
type(C_PTR) :: ptr_buf
integer(kind = MPI_ADDRESS_KIND) :: size_bytes, lb
integer :: win, size_, disp_unit
call MPI_INIT ( mpierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, whoami, mpierr )
call MPI_COMM_RANK ( MPI_COMM_WORLD, whoami, mpierr )
call MPI_COMM_SIZE ( MPI_COMM_WORLD, mpi_nproc, mpierr)
call MPI_COMM_SPLIT_TYPE( MPI_COMM_WORLD
& , MPI_COMM_TYPE_SHARED
& , 0
& , MPI_INFO_NULL
& , MPI_COMM_SHM
& , mpierr )
call MPI_COMM_RANK( MPI_COMM_SHM, whoami_shm, mpierr )
call MPI_COMM_SIZE( MPI_COMM_SHM, mpi_shm_nproc, mpierr )
size_ = 10e4! - seg fault
size_bytes = size_ * MPI_REAL
disp_unit = MPI_REAL
size_bytes = size_*disp_unit
call MPI_INFO_CREATE( info_alloc, mpierr )
call MPI_INFO_SET( info_alloc
& , "alloc_shared_noncontig"
& , "true"
& , mpierr )
!
call MPI_WIN_ALLOCATE_SHARED( size_bytes
& , disp_unit
& , info_alloc
& , MPI_COMM_SHM
& , ptr_buf
& , win
& , mpierr )
call MPI_WIN_FREE(win, mpierr)
end program TEST_STACK
I run my code using following command
mpif90 test_stack.f90; mpirun -np 2 ./a.out
This wrapper is linked to my ifort 19.0.3 and Intel MPI library. This has been verified by running
mpif90 -v
and to be very precise my mpif90 is a symbolic link to my mpiifort wrapper. This is made for personal convenience but shouldn't be causing problems I guess?
The manual says that the call to MPI_WIN_ALLOCATE_SHARED looks like this
USE MPI
MPI_WIN_ALLOCATE_SHARED(SIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR)
INTEGER(KIND=MPI_ADDRESS_KIND) SIZE, BASEPTR
INTEGER DISP_UNIT, INFO, COMM, WIN, IERROR
At least the types of disp_unit and baseptr do not match in your program.
I was finally able to diagnose where the error stems from.
In the code I have
disp_unit = MPI_REAL
size_bytes = size_*disp_unit
MPI_REAL is a constant/parameter defined by MPI and is not equal to 4 as I very wrongly expected (4 for 4 bytes for single precision)!. In my version it is set to 1275069468 which most likely refers to an id rather than any sensible number.
Hence, multiplying this number with the size of my array can very quickly exceeds the available memory, but and also the number of digits that can be represented by a byte integer
Trying to get a one-parameter least squares minimisation working in fortran77. Here's the code; it compiles and seems to work except....it gets caught in an infinite loop between values of h1= 1.8E-2 and 3.5E-2.
Having a look now but, odds are, I'm not going to have much luck sussing the issue on my own. All help welcome!
PROGRAM assignment
! A program designed to fit experiemental data, using the method
! of least squares to minimise the associated chi-squared and
! obtain the four control parameters A,B,h1 and h2.
!*****************************************************************
IMPLICIT NONE
INTEGER i
DOUBLE PRECISION t(17),Ct(17),eCt(17)
DOUBLE PRECISION h1loop1,h1loop2,deltah,Cs
DOUBLE PRECISION chisqa,chisqb,dchisq
OPEN(21, FILE='data.txt', FORM='FORMATTED', STATUS='OLD')
DO i=1,17
READ(21,*)t(i),Ct(i),eCt(i)
END DO
CLOSE(21)
!Read in data.txt as three one dimensional arrays.
!*****************************************************************
!OPEN(21, FILE='outtest.txt', FORM='FORMATTED', STATUS='NEW')
!DO i=1,17
! WRITE(21,*)t(i),Ct(i),eCt(i)
!END DO
!CLOSE(21)
!
!Just to check input file is being read correctly.
!*****************************************************************
!**********************Minimising Lamda1 (h1)*********************
deltah= 0.0001
h1loop2= 0.001
h1loop1= 0.0 !Use initial value of 0 to calculate start-point chisq
DO 10
chisqa= 0.0
DO 20 i= 1, 17
Cs= exp(-h1loop1*t(i))
chisqa= chisqa + ((Ct(i) - Cs)/eCt(i))**2
20 END DO
chisqb= 0.0
DO 30 i= 1, 17
h1loop2= h1loop2 + deltah
Cs= exp(-h1loop2*t(i))
chisqb= chisqb + ((Ct(i) - Cs)/eCt(i))**2
30 END DO
!Print the two calculated chisq values to screen.
WRITE(6,*) 'Chi-squared a=',chisqa,'for Lamda1=',h1loop1
WRITE(6,*) 'Chi-squared b=',chisqb,'for Lamda1=',h1loop2
dchisq= chisqa - chisqb
IF (dchisq.GT.0.0) THEN
h1loop1= h1loop2
ELSE
deltah= deltah - ((deltah*2)/100)
END IF
IF (chisqb.LE.6618.681) EXIT
10 END DO
WRITE(6,*) 'Chi-squared is', chisqb,' for Lamda1 = ', h1loop2
END PROGRAM assignment
EDIT: Having looked at it again I've decided I have no clue what's screwing it up. Should be getting a chi-squared of 6618.681 from this, but it's just stuck between 6921.866 and 6920.031. Help!
do i=1
is not starting a loop, for a loop you need to specify an upper bound as well:
do i=1,ub
that's why you get the error message about the doi not having a type, in fixed format spaces are insignificant...
Edit: If you want to have an infinite loop, just skip the "i=" declaration completely. You can use an exit statement to leave the loop, when a certain criterion has been reached:
do
if (min_reached) EXIT
end do
Edit2: I don't know why you stick to F77 fixed format. Here is your program in free format, with some fixes of places, which looked weird, without digging too much into the details:
PROGRAM assignment
! A program designed to fit experiemental data, using the method
! of least squares to minimise the associated chi-squared and
! obtain the four control parameters A,B,h1 and h2.
!*****************************************************************
IMPLICIT NONE
integer, parameter :: rk = selected_real_kind(15)
integer, parameter :: nd = 17
integer :: i,t0
real(kind=rk) :: t(nd),t2(nd),Ct(nd),eCt(nd),Ctdiff(nd),c(nd)
real(kind=rk) :: Aa,Ab,Ba,Bb,h1a,h1b,h2a,h2b,chisqa,chisqb,dchisq
real(kind=rk) :: deltah,Cs(nd)
OPEN(21, FILE='data.txt', FORM='FORMATTED', STATUS='OLD')
DO i=1,nd
READ(21,*) t(i),Ct(i),eCt(i)
END DO
CLOSE(21)
!Read in data.txt as three one dimensional arrays.
!*****************************************************************
!OPEN(21, FILE='outtest.txt', FORM='FORMATTED', STATUS='NEW')
!DO i=1,17
! WRITE(21,*)t(i),Ct(i),eCt(i)
!END DO
!CLOSE(21)
!
!Just to check input file is being read correctly.
!*****************************************************************
!****************************Parameters***************************
Aa= 0
Ba= 0
h1a= 0
h2a= 0
!**********************Minimising Lamda1 (h1)*********************
deltah= 0.001_rk
h1b= deltah
minloop: DO
chisqa= 0
DO i= 1,nd
Cs(i)= exp(-h1a*t(i))!*Aa !+ Ba*exp(-h2a*t(i))
Ctdiff(i)= Ct(i) - Cs(i)
c(i)= Ctdiff(i)**2/eCt(i)**2
chisqa= chisqa + c(i)
h1a= h1a + deltah
END DO
! Use initial h1 value of 0 to calculate start-point chisq.
chisqb= 0
DO i= 1,nd
h1b= h1b + deltah
Cs(i)= exp(-h1b*t(i))!*Ab !+ Bb*exp(-h2b*t(i))
Ctdiff(i)= Ct(i) - Cs(i)
c(i)= Ctdiff(i)**2/eCt(i)**2
chisqb= chisqb + c(i)
END DO
! First-step h1 used to find competing chisq for comparison.
WRITE(6,*) 'Chi-squared a=', chisqa,'for Lamda1=',h1a
WRITE(6,*) 'Chi-squared b=', chisqb,'for Lamda1=',h1b
! Prints the two calculated chisq values to screen.
dchisq= chisqa - chisqb
IF (dchisq.GT.0) THEN
h1a= h1b
ELSE IF (dchisq.LE.0) THEN
deltah= (-deltah*2)/10
END IF
IF (chisqb.LE.6000) EXIT minloop
END DO minloop
WRITE(6,*) 'Chi-squared is', chisqb,'for Lamda1=',h1b
END PROGRAM assignment
I like to do this:
program main
implicit none
integer l
integer, allocatable, dimension(:) :: array
allocate(array(10))
array = 0
!$omp parallel do private(array)
do l = 1, 10
array(l) = l
enddo
!$omp end parallel do
print *, array
deallocate(array)
end
But I am running into error messages:
* glibc detected * ./a.out: munmap_chunk(): invalid pointer: 0x00007fff25d05a40 *
This seems to be a bug in ifort according to some discussions at intel forums but should be resolved in the version I am using (11.1.073 - Linux). This is a MASSIVE downscaled version of my code! I unfortunately can not use static arrays to have a workaround.
If I put the print into the loop, I get other errors:
* glibc detected ./a.out: double free or corruption (out): 0x00002b22a0c016f0 **
I didn't get the errors you're getting, but you have an issue with privatizing array in your OpenMP call.
[mjswartz#666-lgn testfiles]$ vi array.f90
[mjswartz#666-lgn testfiles]$ ifort -o array array.f90 -openmp
[mjswartz#666-lgn testfiles]$ ./array
0 0 0 0 0 0
0 0 0 0
[mjswartz#666-lgn testfiles]$ vi array.f90
[mjswartz#666-lgn testfiles]$ ifort -o array array.f90 -openmp
[mjswartz#666-lgn testfiles]$ ./array
1 2 3 4 5 6
7 8 9 10
First run is with private array, second is without.
program main
implicit none
integer l
integer, allocatable, dimension(:) :: array
allocate(array(10))
!$omp parallel do
do l = 1, 10
array(l) = l
enddo
print*, array
deallocate(array)
end program main
I just ran your code with ifort and openmp and it spewed 0d0's. I had to manually quit the execution. What is your expected output? I'm not a big fan of unnecessarily dynamically allocating arrays. You know what you're going to allocate your matrices as, so just make parameters and statically do it. I'll mess with some stuff and edit this response in a few.
Ok, so here's my edits:
program main
implicit none
integer :: l, j
integer, parameter :: lmax = 15e3
integer, parameter :: jmax = 25
integer, parameter :: nk = 300
complex*16, dimension(9*nk) :: x0, xin, xout
complex*16, dimension(lmax) :: e_pump, e_probe
complex*16 :: e_pumphlp, e_probehlp
character*25 :: problemtype
real*8 :: m
! OpenMP variables
integer :: myid, nthreads, omp_get_num_threads, omp_get_thread_num
x0 = 0.0d0
problemtype = 'type1'
if (problemtype .ne. 'type1') then
write(*,*) 'Problem type not specified. Quitting'
stop
else
! Spawn a parallel region explicitly scoping all variables
!$omp parallel
myid = omp_get_thread_num()
if (myid .eq. 0) then
nthreads = omp_get_num_threads()
write(*,*) 'Starting program with', nthreads, 'threads'
endif
!$omp do private(j,l,m,e_pumphlp,e_probehlp,e_pump,e_probe)
do j = 1, jmax - 1
do l = 1, lmax
call electricfield(0.0d0, 0.0d0, e_pumphlp, &
e_probehlp, 0.0d0)
! print *, e_pumphlp, e_probehlp
e_pump(l) = e_pumphlp
e_probe(l) = e_probehlp
print *, e_pump(l), e_probe(l)
end do
end do
!$omp end parallel
end if
end program main
Notice I removed your use of a module since it was unnecessary. You have an external module containing a subroutine, so just make it an external subroutine. Also, I changed your matrices to be statically allocated. Case statements are a fancy and expensive version of if statements. You were casing 15e3*25 times rather than once (expensive), so I moved those outside. I changed the OpenMP calls, but only semantically. I gave you some output so that you know what OpenMP is actually doing.
Here is the new subroutine:
subroutine electricfield(t, tdelay, e_pump, e_probe, phase)
implicit none
real*8, intent(in) :: t, tdelay
complex*16, intent(out) :: e_pump, e_probe
real*8, optional, intent (in) :: phase
e_pump = 0.0d0
e_probe = 0.0d0
return
end subroutine electricfield
I just removed the module shell around it and changed some of your variable names. Fortran is not case sensitive, so don't torture yourself by doing caps and having to repeat it throughout.
I compiled this with
ifort -o diffeq diffeq.f90 electricfield.f90 -openmp
and ran with
./diffeq > output
to catch the program vomiting 0's and to see how many threads I was using:
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
Starting program with 32 threads
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
(0.000000000000000E+000,0.000000000000000E+000)
Hope this helps!
It would appear that you are running into a compiler bug associated with the implementation of OpenMP 3.0.
If you can't update your compiler, then you will need to change your approach. There are a few options - for example you could make the allocatable arrays shared, increase their rank by one and have one thread allocate them such that the extent of the additional dimension is the number of workers in the team. All subsequent references to those arrays then need to be have the subscript for that additional rank be the omp team number (+ 1, depending on what you've used for the lower bound).
Explicit allocation of the private allocatable arrays inside the parallel construct (only) may also be an option.