No specific subroutine for omp_set_num_threads( ) - fortran

I want to set my thread number to 10, by doing:
CALL OMP_SET_NUM_THREADS(10)
!$OMP PARALLEL
T=OMP_GET_NUM_THREADS()
!$OMP END PARALLEL
PRINT*, T
It prints out 10, which is correct. However, if I define a variable NUM_THREADS, and pass it into the get threads number subroutine, like this:
INTEGER(KIND=16), PARAMETER :: NUM_THREADS=10
CALL OMP_SET_NUM_THREADS(NUM_THREADS)
And run it, it gives me the bug:
Error: There is no specific subroutine for the generic ‘omp_set_num_threads’ at (1).
Why is that?

I don't know which type of integer your compiler has as KIND=16 but it seems that it's a non-standard type for which OpenMP does not have a corresponding subroutine.
There is really no reason to use a non-standard internet kind for a number that can easily be represented by standard 16 or 32 bit integers.
Leave the kind descriptor out of the INTEGER declaration, and it should work.

Related

gfortran how do I increment random_seed by 2^128

The gfortran page on random_seed says that when using OMP threads, each thread increments its seed by 2^128. I am wondering how I increment the seed by 2^128 manually. I wrote a little test program to set the master seed at all 0, and then see what the seeds were, but I don't understand what I'm seeing. What I'd like to know is for example what I put in the subroutine increment_by_2_tothe_128
program main
implicit none
character(len=32) :: arg
integer :: n
integer :: i
integer :: nthreads
integer, allocatable :: seed(:, :)
integer, allocatable :: master_seed(:)
real, allocatable :: rn(:)
call get_command_argument(1, arg)
read(arg, *) nthreads
call random_seed(size=n)
allocate(seed(n, nthreads))
allocate(master_seed(n))
allocate(rn(nthreads))
master_seed = 0
seed = 0
call random_seed(put=master_seed)
! call increment_by_2_tothe_128(n)
call omp_set_num_threads(nthreads)
!$OMP PARALLEL DO
do i=1,nthreads
call random_number(rn(i))
call random_seed(get=seed(:,i))
end do
do i=1,nthreads
print *, i
print *, rn(i)
print *, seed(:,i)
end do
end program main
subroutine increment_by_2_tothe_128(n)
implicit none
integer, intent(in) :: n
integer :: current_seed(n)
integer :: increment_seed(n)
call random_seed(get=current_seed)
! what goes here:
! incrememt_seed = current_seed + 2**128
call random_seed(put=increment_seed)
end subroutine increment_by_2_tothe_128
You cannot do that manually. You need the access to the random number generator to be able to do that, but the internals are not exposed to Fortran programmers. And you obviously cannot call the generator 2^128 times.
If you need to do the shift, you need to use some pseud-random number generator that does expose the internals and at the same time allows this kind of shift. That can be, for example, the xoroshiro PRNG family that is used internally by gfortran. These generators have a specialized function for this shift:
All generators, being based on linear recurrences, provide jump
functions that make it possible to simulate any number of calls to the
next-state function in constant time, once a suitable jump polynomial
has been computed. We provide ready-made jump functions for a number
of calls equal to the square root of the period, to make it easy
generating non-overlapping sequences for parallel computations, and
equal to the cube of the fourth root of the period, to make it
possible to generate independent sequences on different parallel
processors.
These generators are most often implemented in C, but Fortran implementations also exist (subroutine rng_jump is the jump function, disclaimer: the link goes to my repository, no guarantees for the quality).

Why do I always get the same result when using function and contains in fortran

I'm trying to do a basic calculation by calling a function using contains
Program main
implicit none
integer*8 Nmax,i
Parameter (Nmax=5)
real*8 x, f(Nmax), n
do i=1, Nmax
n=i
f=func(n,Nmax)
write(*,*) f(i)
end do
Contains
real*8 function func(x,Nmax)
integer*8 Nmax,i
real*8 x, f(Nmax)
do i=1, Nmax-1
f(i)=i**2d0-4d0*i-7d0
end do
end function
end program main
I get this result:
-9.255963134931783E+061
-9.255963134931783E+061
-9.255963134931783E+061
-9.255963134931783E+061
-9.255963134931783E+061
I think I'm making the wrong variable definitions. Thank you for your help.
There are multiple problems with your program.
First, you probably meant to write:
f(i)=func(n,Nmax)
in the main program. Without the subscript you assign the same value to each element of the array. You might think that explains the results, but it doesn't as you'd still see what you expect.
Another problem is highlighted by the following warning I get when I compile your code with Intel Fortran:
t.f90(14): warning #6178: The return value of this FUNCTION has not been defined. [FUNC]
real*8 function func(x,Nmax)
-------------------^
You never assign the value of func, so you get whatever garbage happens to be in the return register.
The function you have isn't really what you want, either. You probably want one that computes and returns a scalar (single) value and hence there is no need for an array inside func.
A third problem is that func is ignoring the n argument (which, contrary to convention, you have declared as a real.)
If you want a loop in the main program, have the function compute and return a single result based on the argument passed to it. There is no need to pass both the loop index and nmax each time. Other options, slightly more advanced, would be to keep the array assignment in the main program but do away with the loop there and either have the function return an array or make the function ELEMENTAL. I will leave it as an exercise for you once you figure out what you really intend here.
Lastly, I would discourage you from using nonstandard syntax such as "real*8". Please learn about KIND specifiers and the SELECTED_REAL_KIND intrinsic function.

programming issue with openmp

I am having issues with openmp, described as follows:
I have the serial code like this
subroutine ...
...
do i=1,N
....
end do
end subroutine ...
and the openmp code is
subroutine ...
use omp_lib
...
call omp_set_num_threads(omp_get_num_procs())
!$omp parallel do
do i=1,N
....
end do
!$omp end parallel do
end subroutine ...
No issues with compiling, however when I run the program, there are two major issues compared to the result of serial code:
The program is running even slower than the serial code (which supposedly do matrix multiplications (matmul) in the do-loop
The numerical accuracy seems to have dropped compared to the serial code (I have a check for it)
Any ideas what might be going on?
Thanks,
Xiaoyu
In case of an parallelization using OpenMP, you will need to specify the number of threads your program is to use. You can do so by using the environment variable OMP_NUM_THREADS, e.g. calling your program by means of
OMP_NUM_THREADS=5 ./myprogram
to execute it using 5 threads.
Alternatively, you may set the number of threads at runtime omp_set_num_threads (documentation).
Side Notes
Don't forget to set private variables, if there are any within the loop!
Example:
!$omp parallel do private(prelimRes)
do i = 1, N
prelimRes = myFunction(i)
res(i) = prelimRes + someValue
end do
!$omp end parallel do
Note how the variable prelimRes is declared private so that every thread has its own workspace.
Depending on what you actually do within the loop (i.e. use OpenBLAS), your results may indeed vary (variations should be smaller than 1e-8 with regard to double precision variables) due to the differing, parellel processing.
If you are unsure about what is happening, you should check the CPU load using htop or a similar program while your program is running.
Addendum: Setting the number of threads to automatically match the number of CPUs
If you would like to use the maximum number of useful threads, e.g. use as many threads as there are CPUs, you can do so by using (just like you stated in your question):
subroutine ...
use omp_lib
...
call omp_set_num_threads(omp_get_num_procs())
!$omp parallel do
do i=1,N
....
end do
!$omp end do
!$omp end parallel
end subroutine ...

fortran result variable Not initiallized

I met a surprsing problem about loacal variable initializing.
I got following function to calculate gammar
function gammar(z) result(gz)
implicit none
real(8),intent(out)::gz
real(8)::z,t,low,up
real(8),parameter::increment=1.0
real(8),parameter::lower_t=0.0,upper_t=10.0
integer(4)::i,n
!gz=0.0
n=(upper_t-lower_t)/increment
do i=1,n
low=lower_t+(i-1)*increment
up=lower_t+(i)*increment
gz=gz+(f(z,low)+f(z,up))*increment/2.0
end do
end function gammar
Then I call this function in main program like
df=9.0
t=0.0
write(*,*) gammar((df+1.0)/2.0)/sqrt(pi*df)/gammar(df/2.0)
I got wrong answer!! 0.126
I found the reason was after gammar((df+1.0)/2.0) was calculated, the local variable gz was not set to 0.
Hence ,when calculate gammar(df/2.0), the gz still retained old value 24. Eventually,gammar(df/2.0) got wrong answer 34..
If I add gz=0.0 in the gammar function, this problem was fixed.
This is really surprising. Why the local gz was not initiallized to zero when the gammar called every time?
Thanks a lot
Regards
Ke
Unless you have a statement to initialize a local variable in a procedure, such as the gz = 0 that you have commented out, those local variables are not initialized when the procedure is invoked. Their values are undefined. They could have a value left from a previous invocation, or some random value.
If you use full warning options of the compiler, it will likely tell you of this problem. gfortran warned of an uninitialized variable at compile time. ifort detected the problem at run time.
Another initialization method is with a declaration. That still doesn't repeat the initialization of additional invocations of the procedure. If you initialize a local variable in a procedure with a declaration, such as integer :: count = 0, that initialization is only done on the first invocation on the procedure. But ... the variable remains defined and on the next invocation will retain that value that it had when the previous invocation exited.
P.S. real(8) is not a portable way to obtain double precision reals. The language standard does not specify specific numeric values for kinds ... compilers are free to whatever value they wish. Most compilers use the number of bytes, but use other numbering methods. It is better to use selected_real_kind or the ISO_FORTRAN_ENV and (for double precision) real64. See quad precision in gfortran
P.P.S. Trying this code with gfortran, that compiler points out another problem with gz:
function gammar(z) result(gz)
1
Error: Symbol at (1) is not a DUMMY variable
So remove the intent(out) in the declaration.

OpenMP no threading in subroutine

I'm writing a matrix multiplication subroutine in Fortran. I'm using the Intel Fortran compiler. I've written a simple static scheduled parallel do-loop. Unfortunately, it's running on only one thread. Here's the code:
SUBROUTINE MATMULT(A,B,C,L,M,N)
REAL*8 A,B,C
INTEGER NCORES, CHUNK, TID
DIMENSION A(L,N),B(L,M),C(M,N)
PARAMETER (NCORES=8)
CHUNK=(L/(NCORES+1))+1
TID=0
!$OMP PARALLELDO SHARED(A,B,C,L,M,N,CHUNK) PRIVATE(I,J,K,TID)
!$OMP+DEFAULT(NONE) SCHEDULE(STATIC,CHUNK)
DO I=1,L
TID = OMP_GET_THREAD_NUM()
PRINT *, "THREAD ", TID, " ON I=", I
DO K=1,N
DO J=1,M
A(I,K) = A(I,K) + B(I,J)*C(J,K)
END DO
END DO
END DO
!$OMP END PARALLELDO
RETURN
END
Note:
There are no parallel directives in the main program that calls the routine
The arrays A,B,C are initialized serially in the main program. A is initialized to zeros
I am enforcing the Fortran fixed source form during compilation
I have confirmed the following:
Another example program works fine with 8 threads (so no hardware issue)
I have used the -openmp compiler argument
OMP_GET_NUM_PROCS() and OMP_GET_MAX_THREADS() both return 0
TID is 0 for every iteration over I (which shouldn't be the case)
I am unable to diagnose my mistake. I'd appreciate any inputs on this.
The identifier OMP_GET_THREAD_NUM is not explicitly declared. The default implicit typing rules mean it will be of type real. That's not consistent with the declaration in the OpenMP spec for the function of that name.
Adding USE OMP_LIB would fix that issue. Further, not using implicit typing (IMPLICIT NONE) would avoid this and a multitude of similar problems.