I am trying to learn openmp ( particularly $omp do ) in Fortran90 and facing a strange problem. I have written a simple code and executing it using 2 processors. I have compiled the program using "gfortran -fopenmp filename.f90".
use omp_lib
implicit none
integer :: i,j,a(2)
write ( *, '(a,i8)' ) &
' The number of processors available = ', omp_get_num_procs ( )
write ( *, '(a,i8)' ) &
' The number of threads available = ', omp_get_max_threads ( )
!$OMP DO
do i = 1, 2
a(i) = i + 1
j = OMP_GET_THREAD_NUM()
print*,"a(i)=",a(i), "j = ",OMP_GET_THREAD_NUM()
enddo
!$OMP END DO
end
The output I am seeing is
The number of processors available = 6
The number of threads available = 2
a(i)= 2 j = 0
a(i)= 3 j = 0
But I was expecting variable j to take up values 0 and 1 for i = 1 and i = 2 respectively. Where am I going wrong? My expectation is met if I use $omp do parallel instead of $omp do.
Related
This question already has answers here:
Does Fortran preserve the value of internal variables through function and subroutine calls?
(3 answers)
Closed 2 years ago.
I'm recently studying Fortran, and trying to make a program to check the prime number. The function works fine without any loop. It can give 1 when the given number is a prime number and 0 otherwise. However, it doesn't work properly when it is used in do while loop. In the range of 2 ~ 10, it is supposed to give 1 (for 2), 1(for 2), 0(for 4), 1(for 5), 0(for 6), etc. But, it keeps showing only 0. I'm pretty new to programming, so I'm not sure what I'm missing. I know there are many answers related to prime numbers, but I don't see any issue like this.
** Function checking prime numbers **
module prime_function
contains
integer function isPrime(inp_num)
implicit none
integer :: inp_num
integer :: i = 1
integer :: temp1 = 0
do while (i < inp_num)
i = i + 1
if(mod(inp_num, i) == 0) then
exit
end if
end do
if(inp_num == i) then
temp1 = 1
else
temp1 = 0
end if
isPrime = temp1
end function
end module
program fortran_q
use prime_function
implicit none
integer :: ii, a
a = isPrime(10)
print *, "10 is prime number, so the return : ", a
a = isPrime(11)
print *, "11 is prime number, so the return : ", a
ii = 1
do while (ii < 10)
ii = ii + 1
print *, isPrime(ii)
end do
end program
** Results **
10 is prime number, so the return : 0
11 is prime number, so the return : 1
0
0
0
0
0
0
0
0
0
You have a classic issue for people new to Fortran. The initialization of i and temp0 implies the SAVE attribute. When you call isPrime for the first time the values are set to 1 and 0. On the next invocation, the values of i and temp0 are set to whatever their previous values were when isPrime was last executed. The belong program fixes the issue.
module prime_function
implicit none
private
public isprime
contains
function isPrime(inp_num) result(res)
integer res
integer, intent(in) :: inp_num
integer i, temp1
i = 1
temp1 = 0
do while (i < inp_num)
i = i + 1
if (mod(inp_num, i) == 0) exit
end do
res = 0
if (inp_num == i) res = 1
end function
end module
program fortran_q
use prime_function
implicit none
integer :: ii, a
a = isPrime(10)
print *, "10 is prime number, so the return : ", a
a = isPrime(11)
print *, "11 is prime number, so the return : ", a
ii = 1
do while (ii < 10)
ii = ii + 1
print *, isPrime(ii)
end do
end program
I have parallelized a subroutine. It have very good benchmark : speedup 4X on a quad core. I have them in two different source: serial.f and paral.f. The comparison is made running them from terminal and printing elapsed wall clock time. Inside each source code there is only call to the associate subroutine. But, when I modify the sources like this :
serial.f :
do i=1,100
call serial
end do
and like this
paral.f :
do i=1,100
call paral
end do
performance goes down to 0.96 X speed: the parallel version is bad than the serial one! The code can be found in why calling many N times a serial subroutine is faster than calling N times the parallel version of the same subroutin
For obtaining the serial.f just comment the block containing the call paral. For obtaining the paral.f just comment the block containing the call serial.
I'm asking : is this a common problem ? How can I solve it to maintain the 4 X speedup maintaning the loop call?
Please note :
(1)I've tried translating to C and timing, benchmarks and problems remains all the same
(2) I've tried translating to modern fortran and timing, benchmarks and problems remains all the same
(3) I've tried all kind of tricks and rewriting of the code. I'm sure the problem is not how the subroutine is parallelized (I achieved 4 X ) but that it is called too many times inside a loop.
Thank you.
EDIT ::
As requested, I'm posting a program written in modern fortran who esibit the same issues :
program main
use omp_lib
implicit none
integer ( kind = 4 ), parameter :: m = 5000
integer ( kind = 4 ), parameter :: n = 5000
integer ( kind = 4 ) i
integer ( kind = 4 ) j
integer ( kind = 4 ) nn
real ( kind = 8 ) u(m,n)
real ( kind = 8 ) w(m,n)
real ( kind = 8 ) wtime,h
call random_seed()
do j=1,n
do i=1,m
call random_number(u(i,j))
end do
end do
wtime = omp_get_wtime ( )
do nn=1,100
!$omp parallel do default(none) shared(u, w) private(i,j)
do j = 2, n - 1
do i = 2, m - 1
w(i,j) = 0.25D+00 * ( u(i-1,j) + u(i+1,j) + u(i,j-1) + u(i,j+1) )
end do
end do
!$omp end parallel do
end do
wtime = omp_get_wtime ( ) - wtime
h=0.0D+00
do j=1,n
do i=1,m
h=h+w(i,j)
end do
end do
write ( *, '(a,g14.6)' ) ' Wall clock time serial= ', wtime
write ( *, '(a,g14.6)' ) ' h ', h
stop
end
In order to get serial_with_loop.f90 just comment openmp directives and the nn loop. You must obtain also with a similar method parall_with_loop.f90 and serial and parall without loop. You can compile with " gfortran -o name.out -fopenmp -O3 name.f90 " and launch from terminal with output redirection to text file "name.out > time_result.txt"
The problem you have is that you are parallelizing the loop on j that is located inside a loop on nn. Therefore, for each nn value, your machine needs time to create a pool of threads that do the job for different value of j. Therefore, this time (required for creating the pool) is serial and cannot be devided by the number of used threads. As I see your code, there is no reason for not being able to parallelize the nn loop and creating that pool only once, instead of nn times. I think that your code will work better if you write
wtime = omp_get_wtime ( )
!$omp parallel do default(none) shared(u, w) private(nn,i,j)
do nn=1,100
do j = 2, n - 1
do i = 2, m - 1
w(i,j) = 0.25D+00 * ( u(i-1,j) + u(i+1,j) + u(i,j-1) + u(i,j+1))
end do
end do
end do
!$omp end parallel do
wtime = omp_get_wtime ( ) - wtime
I hope that this helps you.
I'm trying to use OpenMP in Fortran 90 to parallelize a do loop with function call inside. The code listed first runs fine. The code listed next does not. I receive a segmentation fault.
First program: $ gfortran -O3 -o output -fopenmp OMP10.f90
program OMP10
!$ use omp_lib
IMPLICIT NONE
integer, parameter :: n = 100000
integer :: i
real(kind = 8) :: sum,h,x(0:n),f(0:n),ZBQLU01
!$ call OMP_set_num_threads(4)
h = 2.d0/dble(n)
!$OMP PARALLEL DO PRIVATE(i)
do i = 0,n
x(i) = -1.d0+dble(i)*h
f(i) = 2.d0*x(i)
end do
!$OMP END PARALLEL DO
sum = 0.d0
!$OMP PARALLEL DO PRIVATE(i) REDUCTION(+:SUM)
do i = 0,n-1
sum = sum + h*f(i)
end do
!$OMP END PARALLEL DO
write(*,*) "The integral is ", sum
end program OMP10
Second program: $ gfortran -O3 -o output -fopenmp randgen.f OMP10.f90
program OMP10
!$ use omp_lib
IMPLICIT NONE
integer, parameter :: n = 100000
integer :: i
real(kind = 8) :: sum,h,x(0:n),f(0:n),ZBQLU01
!$ call OMP_set_num_threads(4)
h = 2.d0/dble(n)
!$OMP PARALLEL DO PRIVATE(i)
do i = 0,n
x(i) = ZBQLU01(0.d0)
end do
!$OMP END PARALLEL DO
sum = 0.d0
!$OMP PARALLEL DO PRIVATE(i) REDUCTION(+:SUM)
do i = 0,n-1
sum = sum + h*f(i)
end do
!$OMP END PARALLEL DO
write(*,*) "The integral is ", sum
end program OMP10
In the above command, randgen.f is a library that contains the function ZBQLU01.
You cannot just call any function from a parallel region. The function must be thread safe. See What is meant by "thread-safe" code? and https://en.wikipedia.org/wiki/Thread_safety .
Your function is quite the opposite of thread safe as is quite typical for random number generators. Just notice the SAVE statements in the source code for many local variables and for a common block.
The solution is to use a good parallel random number generator. The site is not for software recommendation, but as a pointer just search the web for "parallel prng" or "parallel random number generator". I personally use a library which I already pointed to in https://stackoverflow.com/a/38263032/721644 A simple web search reveals another simple possibility in https://jblevins.org/log/openmp . And then there are many larger and more complex libraries.
This is my code:
Program Arrays_0
Implicit none
Integer :: i , Read_number , Vig_Position , Vipg_Position , n_iter
Integer , parameter :: Br_gra = 12
Integer , parameter , dimension ( Br_gra ) :: Vig = [ ( i , i = 1 , Br_gra) ]
Integer , parameter , dimension ( Br_gra ) :: Vipg = [ 0 , 1 , 1 , 1 , 2 , 2 , 3 , 4 , 4 , 7 , 7 , 7 ]
Integer :: Result_of_calculation
Write(*,*)"Enter the number (From 1 to Br_gra):"
Read(*,*) Read_number
Vig_Position = Vig(Read_number)
Vipg_Position = Vipg(Vig_Position)
!K_str( Vig_Position_temp ) = Vig_Position_temp + 2.3
n_iter = 0
Result_of_calculation = Vig_Position
Do while( Vipg_Position .ne. Vipg(1) )
n_iter = n_iter + 1
Vig_Position = Vipg_Position
! K_str( Vig_Position_temp ) = Vig_Position_temp + 2.3
Result_of_calculation = Result_of_calculation + Vig_Position
Vipg_Position = Vipg(Vig_Position)
End Do
Write(*,'(a,1x,i0)')"The number of iteration is:",n_iter
Write(*,'(a,1x,i0)')"The result of calculation is:",Result_of_calculation
End Program Arrays_0
There is no problem with code if I want to make calculation for a n_iter and Result_of_calculation but I have a problem with declaration of K_str in way that can follow correctly specific use of this two variables (my intention for using this variables in calculation was showed in comments).
So question is how to declare, for example, in case that Read_number is 12?
In that case I have: K_str(12), K_str(7), K_str(3) and K_str(1).
What I can do is this:
Real, dimension (Br_gra):: K_str
But in this case a must import one more loop for all elements from Vig (12 calculation). I want to prevent that number of calculation and in this case, I want to that my code make just a 4 calculation.
How to do that?
So you want to get an array, which e.g. starts at index 1, ends at index 12, but does not contain all the indexes in between, just some of them?
That is not possible with Fortran arrays. Actually, it is not possible with arrays in any other language I know.
One can use the dictionary data structure for something like that, which is intrinsic in some languages, but not Fortran. There are external libraries for similar data-structures in Fortran. See http://fortranwiki.org/fortran/show/Hash+tables
You could also use a linked list with all usual drawbacks and advantages (no direct indexing etc.).
Unless your need is for some very large ranges of indexes, much much larger than your example, use a regular array that contains all indexes.
You can also use one array which will contain the data (indexed 1 to 4) and another array of the same size, which will contain the global position (1,3,7 and 12). Or a derived type with these two components. But it will not be the same usage as you propose.
I am using OpenMP with Fortran. I have boiled down my use case to a very simple example. I have an array of objects with a custom derived type, and each object contains an array with a different size. I want to make sure that whatever happens in the loop, I apply a reduction to all the values array components of the vector objects:
program main
implicit none
integer :: i
type vector
real,allocatable :: values(:)
end type vector
type(vector) :: vectors(3)
allocate(vectors(1)%values(3))
vectors(1)%values = 0
allocate(vectors(2)%values(6))
vectors(2)%values = 0
allocate(vectors(3)%values(9))
vectors(3)%values = 0
!$OMP PARALLEL REDUCTION(+:vectors%values)
!$OMP DO
do i=1,1000
vectors(1)%values = vectors(1)%values + 1
vectors(2)%values = vectors(2)%values + 2
vectors(3)%values = vectors(3)%values + 3
end do
!$OMP END DO
!$OMP END PARALLEL
print*,sum(vectors(1)%values)
print*,sum(vectors(2)%values)
print*,sum(vectors(3)%values)
end program main
In this case, REDUCTION(+:vectors%values) doesn't work because I get the following errors:
test2.f90(22): error #6159: A component cannot be an array if the encompassing structure is an array. [VALUES]
!$OMP PARALLEL REDUCTION(+:vectors%values)
-------------------------------------^
test2.f90(22): error #7656: Subobjects are not allowed in this OpenMP* clause; a named variable must be specified. [VECTORS]
!$OMP PARALLEL REDUCTION(+:vectors%values)
-----------------------------^
compilation aborted for test2.f90 (code 1)
I tried overloading the meaning of + for the vector type and then specifying REDUCTION(+:vectors), but then I still get:
test.f90(43): error #7621: The data type of the variable is not defined for the operator or intrinsic specified on the OpenMP* REDUCTION clause. [VECTORS]
!$OMP PARALLEL REDUCTION(+:vectors)
-----------------------------^
What is the recommended way to deal with derives types such as these and getting the reduction to work?
Just for reference, the correct output when compiling without OpenMP is
3000.000
12000.00
27000.00
This is not just OpenMP problem, you cannot reference vectors%values as a one entity if values is an allocatable array component because rules of Fortran 2003 forbid this. That is because such an array would not have any regular strides in memory, the allocatable components are stored at random adresses.
If the number of elements of the encompassing array is small you can do
!$OMP PARALLEL REDUCTION(+:vectors(1)%values,vectors(2)%values,vectors(3)%values)
!$OMP DO
do i=1,1000
vectors(1)%values = vectors(1)%values + 1
vectors(2)%values = vectors(2)%values + 2
vectors(3)%values = vectors(3)%values + 3
end do
!$OMP END DO
!$OMP END PARALLEL
otherwise you must make another loop, let's say j and make the reduce just vectors(j)%values.
If the compiler does not accept structure components in the reduction clause (have to study the latest standard to see if it hasn't been relaxed), you can make a workaround
!$OMP PARALLEL
do j = 1, size(vectors)
call aux(vectors(j)%values)
end do
!$OMP END PARALLEL
contains
subroutine aux(v)
real :: v(:)
!$OMP DO REDUCTION(+:v)
do i=1,1000
v = v + j
end do
!$OMP END DO
end subroutine
Associate or pointers would be simpler, but they are not allowed either.
As an alternative to Vladimir's answer, you can always implement your own reduction using a temporary array and a critical section:
program main
implicit none
integer :: i
type vector
real,allocatable :: values(:)
end type vector
type(vector) :: vectors(3)
type(vector),allocatable :: tmp(:)
allocate(vectors(1)%values(3))
vectors(1)%values = 0
allocate(vectors(2)%values(6))
vectors(2)%values = 0
allocate(vectors(3)%values(9))
vectors(3)%values = 0
!$OMP PARALLEL PRIVATE(TMP)
! Use a temporary array to hold the local sum
allocate( tmp(size(vectors)) )
do i=1,size(tmp)
allocate( tmp(i)%values( size(vectors(i)%values )) )
tmp(i)%values = vectors(i)%values
enddo ! i
!$OMP DO
do i=1,1000
tmp(1)%values = tmp(1)%values + 1
tmp(2)%values = tmp(2)%values + 2
tmp(3)%values = tmp(3)%values + 3
end do
!$OMP END DO
! Get the global sum one thread at a time
!$OMP CRITICAL
vectors(1)%values = vectors(1)%values + tmp(1)%values
vectors(2)%values = vectors(2)%values + tmp(2)%values
vectors(3)%values = vectors(3)%values + tmp(3)%values
!$OMP END CRITICAL
deallocate(tmp)
!$OMP END PARALLEL
print*,sum(vectors(1)%values)
print*,sum(vectors(2)%values)
print*,sum(vectors(3)%values)
end program main
This snippet could be arranged more efficiently by a loop over all elements of vectors. Then, tmp could be a scalar.