OpenMP Fortran Default Clause - fortran

This parallel works fine.
!$OMP PARALLEL Private(irep)
!$OMP DO
do irep = 1, nrep
print *, "Using thread: ", omp_get_thread_num(), "irep: ", irep
end do
!$OMP END DO NOWAIT
!$OMP END PARALLEL
This works fine too.
!$OMP PARALLEL
!$OMP DO
do irep = 1, nrep
print *, "Using thread: ", omp_get_thread_num(), "irep: ", irep
end do
!$OMP END DO NOWAIT
!$OMP END PARALLEL
Why does it return nothing when I use Default clause?
!$OMP PARALLEL DEFAULT(Private)
!$OMP DO
do irep = 1, nrep
print *, "Using thread: ", omp_get_thread_num(), "irep: ", irep
end do
!$OMP END DO NOWAIT
!$OMP END PARALLEL
Thank you so much!

Let's just take a look at a simpler case:
program testprivate
use omp_lib
integer :: nrep
nrep=16
!$OMP PARALLEL DEFAULT(Private)
print *, "Thread: ", omp_get_thread_num(), "sees nrep = ", nrep
!$OMP END PARALLEL
end program testprivate
We run this and get:
$ gfortran -o private private.f90 -fopenmp
$ export OMP_NUM_THREADS=8
$ ./private
Thread: 3 sees nrep = 0
Thread: 0 sees nrep = 0
Thread: 1 sees nrep = 32581
Thread: 7 sees nrep = 0
Thread: 4 sees nrep = 0
Thread: 5 sees nrep = 0
Thread: 2 sees nrep = 0
Thread: 6 sees nrep = 0
OpenMP private variables, whether private by default or otherwise, are undefined on entering the private section. That doesn't matter for the loop index, irep, which is set in the do loop; but if (say) your compiler is setting each thread's nrep to zero inside the private section then the loop will never execute. Worse, each thread could have different values for nrep and anything could happen.
So you don't want nrep to be private. You could still have default(private) shared(nrep), or even firstprivate(nrep), although there's no advantage here to having each thread have its own nrep.

Related

Thread Segmentation fault when calling function in loop with OpenMP

I'm trying to use OpenMP in Fortran 90 to parallelize a do loop with function call inside. The code listed first runs fine. The code listed next does not. I receive a segmentation fault.
First program: $ gfortran -O3 -o output -fopenmp OMP10.f90
program OMP10
!$ use omp_lib
IMPLICIT NONE
integer, parameter :: n = 100000
integer :: i
real(kind = 8) :: sum,h,x(0:n),f(0:n),ZBQLU01
!$ call OMP_set_num_threads(4)
h = 2.d0/dble(n)
!$OMP PARALLEL DO PRIVATE(i)
do i = 0,n
x(i) = -1.d0+dble(i)*h
f(i) = 2.d0*x(i)
end do
!$OMP END PARALLEL DO
sum = 0.d0
!$OMP PARALLEL DO PRIVATE(i) REDUCTION(+:SUM)
do i = 0,n-1
sum = sum + h*f(i)
end do
!$OMP END PARALLEL DO
write(*,*) "The integral is ", sum
end program OMP10
Second program: $ gfortran -O3 -o output -fopenmp randgen.f OMP10.f90
program OMP10
!$ use omp_lib
IMPLICIT NONE
integer, parameter :: n = 100000
integer :: i
real(kind = 8) :: sum,h,x(0:n),f(0:n),ZBQLU01
!$ call OMP_set_num_threads(4)
h = 2.d0/dble(n)
!$OMP PARALLEL DO PRIVATE(i)
do i = 0,n
x(i) = ZBQLU01(0.d0)
end do
!$OMP END PARALLEL DO
sum = 0.d0
!$OMP PARALLEL DO PRIVATE(i) REDUCTION(+:SUM)
do i = 0,n-1
sum = sum + h*f(i)
end do
!$OMP END PARALLEL DO
write(*,*) "The integral is ", sum
end program OMP10
In the above command, randgen.f is a library that contains the function ZBQLU01.
You cannot just call any function from a parallel region. The function must be thread safe. See What is meant by "thread-safe" code? and https://en.wikipedia.org/wiki/Thread_safety .
Your function is quite the opposite of thread safe as is quite typical for random number generators. Just notice the SAVE statements in the source code for many local variables and for a common block.
The solution is to use a good parallel random number generator. The site is not for software recommendation, but as a pointer just search the web for "parallel prng" or "parallel random number generator". I personally use a library which I already pointed to in https://stackoverflow.com/a/38263032/721644 A simple web search reveals another simple possibility in https://jblevins.org/log/openmp . And then there are many larger and more complex libraries.

Conditional shared status in OpenMP

In the code I am attempting to port to OpenMP, I have a parallelized loop nested in an outer loop. Depending on the iteration of the outer loop, I would like a particular array to be either shared or reduction(+). Is there a way to do this in Fortran?
Here's a mockup of what I want:
do i = 1, 2
!$omp if(i.eq.1) parallel do reduction(+:foo)
!$omp if(i.eq.2) parallel do shared(foo)
do j = 1,j_max
work on foo
enddo
!$omp end parallel
enddo
The discussion in openMP conditional pragma "if else" suggests that scheduling cannot be modified during execution. Is that also the case for shared/private/reduction/etc.?
One obvious course of action is to create foo_1 (reduction:+) and foo_2 (shared), copy foo_1 to foo_2 after the first iteration on i, and then have if statements within the loop over j to refer to the proper array. But that's not terribly elegant. I'm hoping there's a better/cleverer/cleaner way to do this.
Edit: for the unimaginative, here's the pseudocode version of my alternative
do i = 1, 2
!$omp parallel do reduction(+:foo_1), shared(foo_2)
do j = 1,j_max
if( i .eq. 1 ) then
work on foo_1
else
work on foo_2
endif
enddo
!$omp end parallel
foo_2 = foo_1
enddo
As you don't mind having two parallel regions you could use orphaned directives - I find these great for organising the overall structure of large OpenMP codes. I mean something like
i = 1
!$omp parallel shared( i, foo, ... )
Call do_the_work( i, foo, ... )
!$omp end parallel
i = 2
!$omp parallel shared( i, ... ) reduction( +:foo )
Call do_the_work( i, foo, ... )
!$omp end parallel
...
Subroutine do_the_work( i, foo, ... )
!$omp do
do j = 1,j_max
work on foo
enddo
End Subroutine do_the_work
If the parallel region is as big as you say it probably wants to be in one or more routines by itself anyway.

What is the best way to reduce an array of arrays using OpenMP?

I am using OpenMP with Fortran. I have boiled down my use case to a very simple example. I have an array of objects with a custom derived type, and each object contains an array with a different size. I want to make sure that whatever happens in the loop, I apply a reduction to all the values array components of the vector objects:
program main
implicit none
integer :: i
type vector
real,allocatable :: values(:)
end type vector
type(vector) :: vectors(3)
allocate(vectors(1)%values(3))
vectors(1)%values = 0
allocate(vectors(2)%values(6))
vectors(2)%values = 0
allocate(vectors(3)%values(9))
vectors(3)%values = 0
!$OMP PARALLEL REDUCTION(+:vectors%values)
!$OMP DO
do i=1,1000
vectors(1)%values = vectors(1)%values + 1
vectors(2)%values = vectors(2)%values + 2
vectors(3)%values = vectors(3)%values + 3
end do
!$OMP END DO
!$OMP END PARALLEL
print*,sum(vectors(1)%values)
print*,sum(vectors(2)%values)
print*,sum(vectors(3)%values)
end program main
In this case, REDUCTION(+:vectors%values) doesn't work because I get the following errors:
test2.f90(22): error #6159: A component cannot be an array if the encompassing structure is an array. [VALUES]
!$OMP PARALLEL REDUCTION(+:vectors%values)
-------------------------------------^
test2.f90(22): error #7656: Subobjects are not allowed in this OpenMP* clause; a named variable must be specified. [VECTORS]
!$OMP PARALLEL REDUCTION(+:vectors%values)
-----------------------------^
compilation aborted for test2.f90 (code 1)
I tried overloading the meaning of + for the vector type and then specifying REDUCTION(+:vectors), but then I still get:
test.f90(43): error #7621: The data type of the variable is not defined for the operator or intrinsic specified on the OpenMP* REDUCTION clause. [VECTORS]
!$OMP PARALLEL REDUCTION(+:vectors)
-----------------------------^
What is the recommended way to deal with derives types such as these and getting the reduction to work?
Just for reference, the correct output when compiling without OpenMP is
3000.000
12000.00
27000.00
This is not just OpenMP problem, you cannot reference vectors%values as a one entity if values is an allocatable array component because rules of Fortran 2003 forbid this. That is because such an array would not have any regular strides in memory, the allocatable components are stored at random adresses.
If the number of elements of the encompassing array is small you can do
!$OMP PARALLEL REDUCTION(+:vectors(1)%values,vectors(2)%values,vectors(3)%values)
!$OMP DO
do i=1,1000
vectors(1)%values = vectors(1)%values + 1
vectors(2)%values = vectors(2)%values + 2
vectors(3)%values = vectors(3)%values + 3
end do
!$OMP END DO
!$OMP END PARALLEL
otherwise you must make another loop, let's say j and make the reduce just vectors(j)%values.
If the compiler does not accept structure components in the reduction clause (have to study the latest standard to see if it hasn't been relaxed), you can make a workaround
!$OMP PARALLEL
do j = 1, size(vectors)
call aux(vectors(j)%values)
end do
!$OMP END PARALLEL
contains
subroutine aux(v)
real :: v(:)
!$OMP DO REDUCTION(+:v)
do i=1,1000
v = v + j
end do
!$OMP END DO
end subroutine
Associate or pointers would be simpler, but they are not allowed either.
As an alternative to Vladimir's answer, you can always implement your own reduction using a temporary array and a critical section:
program main
implicit none
integer :: i
type vector
real,allocatable :: values(:)
end type vector
type(vector) :: vectors(3)
type(vector),allocatable :: tmp(:)
allocate(vectors(1)%values(3))
vectors(1)%values = 0
allocate(vectors(2)%values(6))
vectors(2)%values = 0
allocate(vectors(3)%values(9))
vectors(3)%values = 0
!$OMP PARALLEL PRIVATE(TMP)
! Use a temporary array to hold the local sum
allocate( tmp(size(vectors)) )
do i=1,size(tmp)
allocate( tmp(i)%values( size(vectors(i)%values )) )
tmp(i)%values = vectors(i)%values
enddo ! i
!$OMP DO
do i=1,1000
tmp(1)%values = tmp(1)%values + 1
tmp(2)%values = tmp(2)%values + 2
tmp(3)%values = tmp(3)%values + 3
end do
!$OMP END DO
! Get the global sum one thread at a time
!$OMP CRITICAL
vectors(1)%values = vectors(1)%values + tmp(1)%values
vectors(2)%values = vectors(2)%values + tmp(2)%values
vectors(3)%values = vectors(3)%values + tmp(3)%values
!$OMP END CRITICAL
deallocate(tmp)
!$OMP END PARALLEL
print*,sum(vectors(1)%values)
print*,sum(vectors(2)%values)
print*,sum(vectors(3)%values)
end program main
This snippet could be arranged more efficiently by a loop over all elements of vectors. Then, tmp could be a scalar.

Understanding the correct use of !$omp parallel do reduction(...)

I am trying to write a program that counts the number of primes between 1 and some number n in Fortran 90 utilizing OpenMP. The nested loop just counts the numbers that are not prime. I want to use an omp parallel do to speed this up. As far as I understand, since I am just counting numbers that are not prime, it is appropriate to just use something like !$omp parallel do reduction(+:not_primes). When I run the code below in serial without the !$omp lines I get the following output
Primes: 5134
OpenMP time elapsed 0.49368596076965332
but when I include the !$omp lines I get
Primes: -1606400834
OpenMP time elapsed 0.37933206558227539
Have I used the parallel do correctly here? (apparently not, but why?) Thanks!
program prime_counter
integer n, not_primes, i, j
real*8 :: ostart,oend, omp_get_wtime
ostart = omp_get_wtime()
n=50000
!$omp parallel do reduction(+:not_primes)
do i=2,n
do j=2,i-1
if(mod(i,j)==0) then
not_primes= not_primes+1
exit
end if
end do
end do
!$omp end parallel do
print*, 'Primes:', n-not_primes
oend = omp_get_wtime()
write(*,*) 'OpenMP time elapsed', oend-ostart
end program
You do not initialize not_primes anywhere, it is undefined. The usage of the OpenMP reduction is OK. The index j should be marked as private, I normally mark all indexes as private, but that is not necessary.
not_primes = 0
!$omp parallel do reduction(+:not_primes) private(i,j)

Open MP if OpenMP:...else

Problem:
I have a some code that myself and a few others have been writing, I took the code and made it use mpi and openmp with great results (helps that I am running it on a Blue Gene/Q).
One thing I am not a fan of is that now I cannot compile the code without the -openmp directive because to get the speedup I needed I used reduction variables.
Example:
!$OMP parallel do schedule(DYNAMIC, 4) reduction(min:min_val)
....
min_val = some_expression(i)
....
!$OMP end parallel do
result = sqrt(min_val)
I am looking for something like:
!$OMP if OMP:
!$OMP min_val = some_expression(i)
!$OMP else:
if ( min_val .gt. some_expression(i) ) min_val = some_expression(i)
!$OMP end else
Anybody know of something like this? Notice that without -openmp the !$OMP lines are ignored and the code runs normally with the correct, er same, answer.
Thanks,
(Yes it is FORTRAN code, but its almost identical to C and C++)
To your exact question:
!$ whatever_statement
will use that statement only when compiled with OpenMP.
Otherwise, in your specific case, can't you just use:
!$OMP parallel do schedule(DYNAMIC, 4) reduction(min:min_val)
....
min_val = min(min_val, some_expression(i))
....
!$OMP end parallel do
result = sqrt(min_val)
?
I'm using this normally with and without -openmp quite often.
If you are willing to use pre-processed FORTRAN source file, you can always rely on the macro _OPENMP to be defined when using OpenMP. The simplest example is:
program pippo
#ifdef _OPENMP
print *, "OpenMP program"
#else
print *, "Non-OpenMP program"
#endif
end program pippo
Compiled with:
gfortran -fopenmp main.F90
the program will give the following output:
OpenMP program
If you are unwilling to use pre-processed source files, then you can set a variable using FORTRAN conditional compilation sentinel:
program pippo
implicit none
logical :: use_openmp = .false.
!$ use_openmp = .true.
!$ print *, "OpenMP program"
if( .not. use_openmp) then
print *, "Non-OpenMP program"
end if
end program pippo